I've been playing with assembly on the Powerbook but I've been a bit lost since the migration to Intel processors. Most of the information available on the Intel assembler is using the Intel assembly style and that looks a bit different when compared to the AT&T style used by the PowerPC.
The Intel line of Macs comes with The Netwide Assembler, an assembler that works with the Intel style syntax. You probably have to install the dev tools to get nasm.
The current article walks through an assembler piece that prints Hello World, reverses every two characters from the string and then exits back to the shell.
There are some assumptions used in the code below, the most potentially damaging being that the data section and the code section are assumed to be in the same segment. But I'm going to ignore that for now and should the address be computed wrongly the code will segfault nicely.
First things first: defining a message to print.
We define a message as a sequence of bytes in the data section.
The length of the message, called "len" is computed as the difference between the current byte (immediately following the msg variable) and the beginning of the string (the "msg" label).
Well, we print using the write syscall.
Write allows you to send a buffer of a given length to one previously opened file descriptor.
We will run in the console, so one of the open file descriptors is "1", corresponding to stdout.
I haven't found the prototype of SYS_write anywhere, but the BSD prototype of the write function is:
To have the system execute a system call, you push the system call's arguments on the stack in reverse order (this is the C convention too) and then call interrupt 0x80.
Our piece of code then reverses every two neighboring bytes (i.e. for He to become eH).
We are doing this using the loop operation that will run until the CX register is zero.
So we first initialize CX to the length of the message and then iterate through the string, two bytes at a time.
This is how you call nasm and then gdb to link the object file made by nasm:
Sidenote: yes, the format used by nasm is macho.
Update: "-mmacosx-version-min=10.3" is required on Leopard because printf has been replaced by printf$2003 for UNIX compliance purposes.
When running the binary you should get:
The web is very poor on pages using nasm or even assembler on the Intel Mac.
Slightly different approach using AT&T syntax and explaining about computing the effective address using a CALL statement (aka the magical __i686.get_pc_thunk.bx):
Intel assembler on Mac OS X
The Intel line of Macs comes with The Netwide Assembler, an assembler that works with the Intel style syntax. You probably have to install the dev tools to get nasm.
The current article walks through an assembler piece that prints Hello World, reverses every two characters from the string and then exits back to the shell.
There are some assumptions used in the code below, the most potentially damaging being that the data section and the code section are assumed to be in the same segment. But I'm going to ignore that for now and should the address be computed wrongly the code will segfault nicely.
The message
First things first: defining a message to print.
We define a message as a sequence of bytes in the data section.
SECTION .data ; data section
msg: db "Hello World",10 ; the string to print, 10=new line
len: equ $-msg ; "$" means "here"
The length of the message, called "len" is computed as the difference between the current byte (immediately following the msg variable) and the beginning of the string (the "msg" label).
Printing via syscall
Well, we print using the write syscall.
Write allows you to send a buffer of a given length to one previously opened file descriptor.
We will run in the console, so one of the open file descriptors is "1", corresponding to stdout.
cristi:~ diciu$ cat /usr/include/sys/syscall.h | grep write
#define SYS_write 4
[..]
I haven't found the prototype of SYS_write anywhere, but the BSD prototype of the write function is:
ssize_t
write(int d, const void *buf, size_t nbytes);
To have the system execute a system call, you push the system call's arguments on the stack in reverse order (this is the C convention too) and then call interrupt 0x80.
SECTION .text ; code section
global _main ; make label available to linker
_main: ; standard gcc entry point, _ because gcc prepends it to function names
push dword len ; length of the string
lea eax, [msg] ; load effective address - address of message
push dword eax ; push eax on the stack and thus the address of message
push dword 1 ; push the file descriptor of stdout on the stack
mov eax, 4 ; load the syscall id (write) in eax
push dword eax ; push eax on the stack
int 0x80 ; call the syscall for write - this should print "Hello World"
Our piece of code then reverses every two neighboring bytes (i.e. for He to become eH).
We are doing this using the loop operation that will run until the CX register is zero.
So we first initialize CX to the length of the message and then iterate through the string, two bytes at a time.
lea si, [msg] ; load the message address into si
mov cx, len ; load the length into CX; CX is used as counter for the loop operation
again: mov ax, [si] ; loop label, load AX with address pointed to by SI
xchg al, ah ; xchange AL with AH effectively replacing neighboring bytes (He becomes eH)
mov [si], ax ; write the reversed pair back into the memory location pointed by SI
add si, 2 ; advance pointer to the next pair
loop again ; loop while CX > 0
push dword len ; print the reversed string
lea eax, [msg]
push dword eax
push dword 1
mov eax, 4
push dword eax
int 0x80
push dword 0 ; return code 0 means everything was ok thank you very much
mov eax, 1 ; syscall exit
push dword eax ; push eax containing syscall exit code on the stack
int 0x80 ; call syscall exit and thus exit
Assembling the assembler code
This is how you call nasm and then gdb to link the object file made by nasm:
nasm -f macho testp.s
gcc -mmacosx-version-min=10.3 -o testp testp.o
Sidenote: yes, the format used by nasm is macho.
Update: "-mmacosx-version-min=10.3" is required on Leopard because printf has been replaced by printf$2003 for UNIX compliance purposes.
When running the binary you should get:
cristian-draghicis-computer:~/Programming/assembler diciu$ ./testp
Hello World
eHll ooWlr
d
References
The web is very poor on pages using nasm or even assembler on the Intel Mac.
Slightly different approach using AT&T syntax and explaining about computing the effective address using a CALL statement (aka the magical __i686.get_pc_thunk.bx):
Intel assembler on Mac OS X