Here are the notes I made before teaching Intel assembly to the ACM in Fall '97. I revised the notes and added more information (starting with "segments"). - Phil (kooderi@jhu.edu) ------------------ | Intel Assembly | ------------------ What is assembly? Lowest-level code dependent on hardware There's a 1:1 correspondence between assembly code and machine instructions (usually) eg, If you add "mov ax,bx" to your code the compiled program will be 2 bytes larger :) (actually we say "assembled" instead of "compiled") Why Assembly? The assembled code is fast and tiny Assembly allows you to do things you can't do in other programming languages If your computer can do something, you can write assembly code to do it How do we use it? Often used as "inline code" to speed up programs in higher-level languages (EXCEPT code compiled with optimizing C compilers may overpower unoptimized asm) We can also make full programs in assembly Intel Assemblers -> TASM, MASM, NASM, a86 are the most common (We're going to use MASM,TASM syntax. The rest are similar) Assembly is NOT case-sensitive Names of data sizes: bit - 1 bit nybble/nibble - 4 bits byte - 8 word - 16 dword - 32 (double word) qword - 64 (quad word) tbyte - 80 (tbyte = ten bytes, used for floating point stuff) We usually ONLY use: bit, byte, word, and dword Most numbers we use are hex (1234h), decimal (123 or 123d), or binary (10110011b). General Purpose Registers: (like hardware variables) AX: AH, AL - accumulator (word: nybble,nybble) BX: BH, BL - base register CX: CH, CL - counter DX: DH, DL - data reg (EAX,EBX,ECX,EDX) - in 32-bit programs (dwords) SI - source index (aka pointer) (word) DI - destination index (aka pointer) (word) How 16-bit regs work: ah is always equal to the high byte of ax al is the low byte of ax mov ax,1234h ;this is a comment. This line is like "AX=0x1234;" in C ;after the command: ; ax = 1234h ; ah = 12h ; al = 34h mov ax,0 ;numbers are in decimal (base 10), unless appended 'h' for hex mov ah,1 ;they can also be binary ('b') or octal ('o') ;ax = 0100h ;ah = 01h ;al = 00h (32-bit) mov eax,12345678h ;sets ax & upper 16 bits of eax ;no name for upper 16 bits of eax Instructions - one per line, no semicolons MOV reg,reg MOV reg,mem MOV reg,immed ;immediate = contstant (ie, 3) MOV mem,reg MOV mem,immed ADD, SUB, OR (bitwise), AND (bitwise), XOR (bitwise) are the same INC mem ;increment INC reg DEC mem ;decrement DEC reg examples: add bx,cx ;bx = bx + cx xor ax,ax ;ax = 0 (usually better than mov ax,0 --> less bytes, faster) ; (however, this changes the flags--you'll learn later) mov MyVariable,4 MUL: mul mem8 ;mem8 = 8-bit memory mul reg8 ;don't use AL,AH ;these will do AX = AL * operand8 mul mem16 mul reg16 ;don't use AX,DX ;these will do DX:AX = AX * operand16 DIV: div mem8 div reg8 ;don't use AL,AH ;these will do AL = AX / operand8, AH = AX % operand8 (remainder) div mem16 div reg16 ;don't use AX,DX ;these will AX = DX:AX / operand16, DX = DX:AX % operand16 Other registers: BP - base pointer (usually used to access the stack) SP - stack pointer (don't mess with this unless you know what you're doing!) IP - instruction pointer (doesn't really have a name.. your assembler doesn't know what "IP" is. You can modify it with goto's and such. It points to the next instruction that's going to execute) FLAGS: ZF - zero CF - carry OF - overflow SF - sign (1 = negative) IF - interrupts (1 = enabled) Flags can't be changed directly. Instead you use instructions like CMP and TEST CMP: "compare" - sets the flags cmp ax,bx ;test to see if AX=BX TEST: tests if bits are set test ax,11b ;tests the last 2 bits in ax ;operates like AND --> 0101 AND 0011 = 0001 (not zero, so zf not set) JUMPS: "goto"s jmp - unconditional jump (ie, a goto) jz = je - jump if zero/equal jnz=jne - jump if not zero/equal jc, jnc - jump on (not) carry ja, jb - unsigned comparisons (above, below) jg, jl - signed comparisons (greater than, less than) jo, jno - overflow js, jns - sign (set if negative) ie: sub ax,3 jz AXwas3 ;jump if ZF is set (ie, jump if last result was zero) ie: cmp ax,bx je AXequalsBX ;jz would also work More instructions: LOOP MyLabel ;same as: DEC CX ; JNZ MyLabel example of a for loop (print out AAAAAAAAAA): mov cx,10 MyLabel: mov ah,2 ;these 3 lines print out "A" mov dl,'A' ; int 21h ; loop MyLabel Segments SS - stack segment CS - code seg DS - data ES - extra (or "extended") (FS, GS) - 32-bit (ran out of names) Segment registers have some restrictions. ie, "mov ds,10" is illegal They work with other registers (usually si,di) to access memory Real mode DOS memory is set up like this: base address = segment*16 + offset So 1234h:0005h = 1234h*10h + 5h = 12345h in protected mode DS:SI accesses the element at segment DS and offset SI ie, if you set DS to some value, you can access 64k of memory with SI example of memory access: .data var db 10 ;db = "define byte".. declares a variable pointer dw OFFSET var ;dw = "define word" .code mov bx,SEG var ;do this since we can't do "mov ds,seg var" mov ds,bx mov si,pointer ;si = offset var mov al,ds:[si] ;al now equals 10 example of string copy: .data string db "hello",0 buffer db 100 dup (0) ;like 0,0,0,0... .code mov ax,SEG string ;set ds:si = pointer to String mov ds,ax mov si,OFFSET string mov es,ax ;es:di = pointer to Buffer mov di,OFFSET buffer copy: mov al,ds:[si] ;get byte from String mov es:[di],al ;copy byte to Buffer inc si ;increment pointers inc di ; cmp al,0 ;did we copy the last byte? jne copy ;if not, goto Copy "lodsb" is an instruction that's equivalent to: mov al,ds:[si] inc si It is less typing and uses 1 byte instead of 3-4. However, it runs slower. Likewise, "stosb" is equivalent to: mov es:[di],al inc di and "movsb" is equivalent to: lodsb stosb (except movsb doesn't change the contents of al) So four lines in the above code could be compressed down to "movsb" (one byte). "lodsw" is the word version, and "lodsd" (32-bit only) is the dword version "rep lodsb" means run lodsb the number of times in cx. So we could have done: copy: mov cx,3 ;'hello',0 = 6 bytes = 3 words rep movsw ;copy it using words to increase speed performance Note that cx=0 after 'loop' and 'rep '. Interrupts Here's the fun stuff. These are "built-in" routines. Each interrupt has a different purpose with lots of routines (selected by ah). Both the interrupt number and ah are always specified in hex. Here are some important ones: int 16h - BIOS Keyboard interrupt ah=0 reads key from keyboard without echo returns al=ascii, ah=scan code ah=1 finds out if a key was hit if no keys were pressed: zf set (ie, jz NoKey) ax=0 if a key was pressed: zf clear (not set -> jnz KeyReady) ah=scan code al=ascii int 21h - DOS functions ah=1 reads key from keyboard with echo returns al=ascii ah=2 prints a character (dl=ascii) returns nothing ah=5 prints character to printer (dl=ascii) returns nothing ah=9 prints a string ending in $ (ds:dx=pointer to string) returns nothing ah=a buffered keyboard input (ds:dx=pointer to buffer) The buffer is in this form: 1 byte The max number of characters to read (including enter) 1 byte The number of characters read (it gets filled in) n bytes The buffer (n must be at least 1) Enter (ascii 13: carraige return) is also stored. ah=4c Exit the program (al=error code) int 10h - BIOS Video interrupt ah=0 set video mode (al=3 -> text, al=12h,13h -> vga) returns nothing ah=2 set cursor position (bh=0, dh=row, dl=column) dh and dl are zero-based returns nothing ah=c display graphics pixel (al=color, bh=0, cx=column, dx=row) cx and dx are zero-based returns nothing int 19h - Bootstrap loader Will reboot your computer to the drive specified in dl (ie, 0=drive a:) Some computers warm boot, some cold boot, and others do nothing register ah doesn't matter here here's a full program: .model small ;memory model.. usually small or tiny .stack ;explained in the next section .data ;note that this is just a continous block of data in memory ;it will be: (ascii 10),?, '$$$$$$$$$$$Enter word: $' ;you'll see this if you look at the assembled program in a text editor buffer label byte db 10,? ;? means we don't care what it is string db 11 dup ('$') ;you can use single or double quotes prompt db "Enter word: $" .code ProgramStartsHere: mov cx,@data ;This works in MASM and TASM.. same as SEG Buffer mov ds,cx mov ah,9 ;print the Prompt mov dx,offset prompt ;now ds:dx = pointer to Prompt int 21h mov ah,0ah ;let's do some buffered input! mov dx,offset buffer ;ds:dx = pointer to buffer int 21h mov ah,2 ;print a newline after the enter (=13 =carraige return) mov dl,10 ;10=line feed (newline) int 21h mov ah,9 ;print their string back mov dx,offset String int 21h mov dl,2 ;reboot the computer int 19h end ProgramStartsHere The program might be more useful if you do this instead of rebooting: mov ax,4c00h ;exit with error code 0 int 21h To assemble with TASM it's simply: C:\>TASM program.asm C:\>TLINK program.obj C:\>program.exe The assembled program is 586 bytes. (NOT kilobytes :]) The Stack All programs have a stack. This allows for the program to store information in memory and retrieve it easily. It also has many other purposes, such as allowing arguments to be passed to procedures. The stack is very powerful but also very dangerous. If you lose track of what's in your stack your whole program could crash. And large assembly programs aren't ones you want to debug... "push ax" will do just that, push 16-bit ax onto the stack "pop bx" will pop the last 16 bits and store it in bx "push al" will probably crash your program. Always work with words (or dwords, or 3 words...) when dealing with the stack. If you enable 186 instructions (by putting ".186" anywhere) you can push immediate values. ie: .286 ;this will work, too. Same for .386, .486, .586... push WORD PTR 10 ;if you enabled 32-bit instructions via ;.386 or higher, you can also use DWORD PTR 10 However you cannot pop immediates, as that makes no sense. swapping ax and bx: method 1 --> xchg bx,ax method 2 --> push ax ;stack: old_ax push bx ;stack: old_ax old_bx pop ax ;stack: old_ax (now ax=old_bx) pop bx ;stack empty (now bx=old_ax) Note that the second method is slower, requires more thinking, requires more typing, and will take up 4 bytes in the executable file while the first only takes 1. more stack examples: push ax ;save ax sub ax,10 pop ax ;restore ax ------------------------------------------------------------ push ax ;save ax push bx ;save bx push cx ;save cx ..do stuff... pop cx ;restore cx pop bx ;restore bx add sp,2 ;manually remove 2 bytes from the stack ;this will empty the stack without restoring ax (Note that you generally want to pop items in reverse order) ------------------------------------------------------------ mov bp,sp ;save sp (can't push it!) sub sp,4 ;give us 4 free bytes to play with LocalVar1 = word ptr [bp-2] LocalVar2 = word ptr [bp-4] ;we now have aliases to where our reserved bytes are mov LocalVar1,25 mov LocalVar2,1011b ... mov sp,bp ;restore sp And finally one last example: .model tiny ;we'll make a COM file (must be tiny model) .code ;no separate data or stack segments. In tiny model, CS=DS=SS org 100h ;nevermind this. It's needed for com files begin_here: mov ax,13h ;set 320x200x256 vga mode int 10h push WORD PTR 10 ;x coordinate push WORD PTR 20 ;y coordinate push Green ;color (see below) call Pixel xor ax,ax ;wait for keypress (ah=0) int 16h mov ax,3 ;return to text mode and exit int 10h ret ;COM files can exit like this if the stack is not messed up Green dw 10 ;this could go after the procedure, too... Pixel PROC mov bp,sp ;save sp X = word ptr [bp+6] Y = word ptr [bp+4] Color = word ptr [bp+2] ReturnAddress = word ptr [bp] ;in case you were curious sub sp,2 ;let's add a local variable to this mess Dummy = word ptr [bp-2] mov ah,0ch ;plot a pixel mov al,BYTE PTR color ;typecast.. (only typecast more bytes to less bytes!) xor bx,bx ;(if you're really curious, bh=page# here) mov cx,X mov dx,Y int 10h mov sp,bp ;restore sp ret 6 ;remember to return! The optional 6 means pop 6 bytes off stack Pixel ENDP END begin_here The assembled COM file (via TASM prog, TLINK/t prog) is 69 bytes. (EOF)