ORG 100H

組譯器之實作 (Assembler)

Assembler之實作步驟
 Lexical Analysis
 Syntax Analysis
 Convert Assembly Instruction to Machine Code
–Pass1 –Pass2

Example – Hello World!
註解, assembler看到會一直讀到換行(不處理) ;_Program:_Hello_World_!←
________MOV_AH,_9← ________MOV_DX,_OFFSET(MESSAGE)← ________INT_21H_______________;call DOS← ←
________INT_20H__________;return to DOS← ←
MESSAGE_DB__’Hello,_World!$’● 3

Lexical Analysis
程式語言最細小的單位
 將輸入的原始程式轉換成Token MOV AH , 9
(1,109) (3,1) (4,3) (6,10)
(token type, token value)
CS Help, Email: tutorcs@163.com
Syntax Analysis
 將Token分辨成Token Group (label,opcode,operand ),並判斷指令是否合乎文法
 Example 1 MESSAGE DB
Label opcode  Example 2
Opcode operand error
‘Hello,World!$’ operand(literal)

• • • • • • • • • •
opcode opcode opcode opcode
opcode opcode opcode opcode
operand operand
;comment ;comment
;comment ;comment
ASM Grammar
可有可無,可為Programmer所定義
label: label: label: label: label:
instruction
可能0個/1個/2個(若不是, 即為syntax error)
operand operand

Instruction Table 1

Pseudo and Extra Table 2
CODE SEGMENT
3 PROC 4 NEAR
ASSUME 6 ORG
DB DW EQU ENDP ENDS END WORD BYTE PTR DUP OFFSET

Register table 3
6 BX CH CL CX
DH DL DX SP BP SI DI ES

Delimiter Table 4
7 8 9 10 11 12 13
/ : ; ? ( ) ‘

Symbol Table 5

Integer/Real Table 6

String Table 7
Hello,World!$
程序代写 CS代考 加QQ: 749389476
Useless information for assembler
 Space/Tab  Enter
– only used for determining the end of line  Comment
– begin with semicolon (;)  Comma
– used for dividing operand/literal  Colon
– end of label (as language definition) 14

前4個table找完都沒有,即為programmer define, 接著計算ASCII看是否在30~69, 有的話 即為0~9的數字; 若為string, 前後要有單引號(‘)
Example 1 – Hello
Program: Hello World !
MOV DX, OFFSET(MESSAGE) INT
MESSAGE DB ‘Hello, World!$’

Lexical Analyzer
MOV AH , 9
MOV (1,109)
(1,47) MESSAGE (5,1)
OFFSET ( (2,17) (4,11)
(3,12) 21H
(2,7) (4,13) (7,1) (4,13)
` Hello,World!$ `

看到white space檢查緩衝區是否有暫存
 Space/Tab/Enter (white space)  Delimiter
[,] ,, ,+,-,… 範例:
MOV WORD PTR [ BP ] [ DI ] + 1234H
1. 2B3CH => 16進位
2. 2B3C => symbol
3. 注意literal是integer or real or string (integer需考量進位種類) 4. String前後會有引號

Lexical Analysis 方法  找到white space 或 Delimiter
 當遇到white space,到各table內查是否為預先設定之指令, 符號,…等,如果是則建立token (可能有1個或0個token)
 當遇到Delimiter,則到各table內查並建立token(可能有一 個或兩個token)
 若查表沒有此token,表示它為symbol或integer/Real或 String,以Hashing function將其放入table內

Hash function
 將identifier 中的每個字元的ASCII 碼相加之後取 100 的餘 數
 有碰撞產生,就向後遞增至空的地方
1. 固定的table, entry由1算起
2. Hashing table entry由0算起, 因為mod可能為0

Syntax Analysis實作
 分辨為token,並依文法需求保留下label,opcode,opeand資 訊

Useful information for assembler
opcode 指向table1就是instruction
– MOV, ADD, JP, …
– can be stored in a table to access easily
operand MOV->2個;ADD->2個;JP->1個
– “AX,09h” “AL, Label+2” “dx, offset(A)” … – have to be divided into several parts
label Table5就是label
– “A DB ‘1234$’” “A: MOV AX, BX”… .
– recognize and store in a different table 21

Operand grammar
 2-parameter operand – REG, REG
– REG, address
– REG, number
 1-parameter operand
– offset(something) – …

Lexical Analysis及Syntax Analysis後之結果
MOV AH , 9
(1,109) ( 3,1)(4,3) (6,10) token token Group
Immediate to register

Lexical Analysis及Syntax Analysis後之結果
MOV DX , OFFSET( MESSAGE )
在x86裡看到OFFSET,將MESSAGGE的address作為value (1,109) (3,12)(4,3) (2,17) (4,11) (5,1) (4,12)
Immediate to register

如何翻Machine code
 各指令與機器碼的對照表  Symbol Table之進一步考慮  其他必須之tables

Example 1 – Hello World!;
Program: ; Program: Hello World ! ORG 100H
MOV DX, OFFSET(MESSAGE) INT 21H ;call DOS
INT 20H ;return to DOS
MESSAGE DB ‘Hello, World!$’

Sample Output
LOC OBJ 0100
0100 B409 0102 BA0901 0105 CD21 0107
0109 48656C6C6F2C
20576F726C64 2124
LINE SOURCE
1 ; Program: Hello World !
MOV DX, OFFSET(MESSAGE)
INT 21H ;call DOS
INT 20H ;return to DOS
DB ‘Hello, World!$’

• org 100h
• MOV AH, 9
Reg. immediate Immediate to register
– p.94, Fig.4.5 #3(Machine Language Coding…)
– Byte 1 = OpCode, 1011.w.reg = 1011.0.100 = B4h – Byte 2 = 09h
• 0100 B4 • 0101 09

在x86裏頭若加上OFFSET,則整個視為一個value
• MOV DX, OFFSET(MESSAGE) Memory address – Byte 1 = 1011.w.reg = 1011.1.010 = BAh
– Byte 2, 3 = Offset(Message)
– will be found in 2nd pass
• 0103 Message(Lo) • 0104 Message(Hi)
Immediate to register

• INT 21H • INT 20H
– p.99, Fig.4.5 #1
– Byte 1 = OpCode = 11001101 = CDh – Byte 2 = 21h/20h
• 0105 CD • 0106 21 • 0107 CD • 0108 20

• MESSAGE DB ‘Hello, World!$’
• Start at 0109h
• 0109h~0116h = 48 65 6C 6C 6F 2C 20 57 6F 72 6C 64 21 24 • Fill 0103/0104h (2nd pass)
– 0103 Message(Lo) = 09h – 0104 Message(Hi) = 01h

Example 2 – CLS
mov ah,15 int 10h mov bl,bh xor cx,cx mov dl,ah dec dl mov dh,24 mov bh,7 cmp al,4 jb point cmp al, 7
mov bh,cl point:mov al,cl mov ah,6
int 10h mov ah,2
mov bh,bl mov dx,cx int 10h

• mov ah,15 (1000h)
– Byte 1 = 1011.0.100 = B4h – Byte 2 = 15 = 0fh
• int 10h (1002h)
– Byte 1 = 11001101 = CDh – Byte 2 = 10h
• mov bl,bh (1004h)
Immediate to reg.
– p.94, %1 #1
– Byte 1 = 100010.d.w = 100010.1.0 = 8Ah – Byte 2 = mod.reg.r/m = 11.011.111 = DFh
Reg. to reg.

• xor cx, cx (1006h)
– p.97 %4 #1
– Byte 1 = 001100.d.w = 001100.0.1 = 31h
– Byte 2 = mod.reg.r/m = 11.001.001 = C9h
• mov dl, ah Reg. to reg. (1008h) – Byte 1 = 10001010 = 8Ah
– Byte 2 = 11.010.100 = D4h

– Byte 1 = 01001.reg = 01001010 (X) (Why?) – Byte 1 = 1111111.w = 11111110 = FEh
– Byte 2 = mod.001.r/m = 11.001.010 = CAh
• mov dh, 24 Immediate to reg. (100Ch) – Byte 1 = 1011.0.110 = B6h
– Byte 2 = 24 = 18h00

• mov bh, 7 Immediate to reg.(100Eh) – Byte 1 = 1011.0.111 = B7h
– Byte 2 = 7 = 07h
• cmp al, 4 Immediate to reg. (1010h)
– p.96 %3 #3
– Byte 1 = 0011110.w = 0011110.0 = 3Ch – Byte 2 = 4 = 04h

jump等指令要算距離
• jb point (1012h)
– p.98 %2 #8
– Byte 1 = 01110010 = 72h
– Byte 2 = shift(point)
• cmp al, 7 (1014h) Immediate to reg.
– Byte 1 = 0011110.w = 0011110.0 = 3Ch – Byte 2 = 7 = 07h
計算PC到point的距離
1014h -> 1010Ah

jump等指令要算距離 • je point
1018h -> 1010Ah
– p.98 %2 #5
– Byte 1 = 01110100 = 74h – Byte 2 = shift(point)
• mov bh, cl
– Byte 1 = 10001010 = 8Ah – Byte 2 = 11.111.001 = F9h
Reg. to reg.

• point: mov al, cl (101Ah) – remember point address = 101Ah – Byte 1 = 10001010 = 8Ah
– Byte 2 = 11.000.001 = C1h
• mov ah, 6 (101Ch) – Byte 1 = 1011.0.100 = B4h – Byte 2 = 6 = 06h
Immediate to reg.

• int 10h (101Eh) – Byte 1 = 11001101 = CDh – Byte 2 = 10h
• mov ah, 2 (1020h) Immediate to reg. – Byte 1 = 1011.0.100 = B4h
– Byte 2 = 2 = 02h
• mov bh, al (1022h) – Byte 1 = 10001010 = 8Ah – Byte 2 = 11.111.000 = F8h
Reg. to reg.

• mov dx, cx (1024h) reg. to reg. – Byte 1 = 10001011 = 8Bh
– Byte 2 = 11.010.001 = D1h
• int 10h (1026h) – Byte 1 = 11001101 = CDh – Byte 2 = 10h
• int 20h (1028h) – Byte 1 = 11001101 = CDh – Byte 2 = 20h (102ah)

Assembler (2nd pass)
• point mov al, cl (101Ah)
– remember point address = 101Ah
• jb point (1012h)
– Byte 2 = shift(point) = 101A – 1014 =
• je point (1016h)
– Byte 2 = shift(point) = 101A – 1018 = 02h

Programming Help, Add QQ: 749389476
Microsoft (R) Macro Assembler Version 6.1a test13.asm
07/30/99 09:20:50 Page 1 – 1
0000 477265656E20 0006 477265656E20 000C 477261737320 0012 486F6D65
0016 0A0D24
0019 2E: A1 0000 R 001D 8E D8
001F 2E:8B160000R 0024 E8 0004
0027 B4 4C
0029 CD 21
002B B4 09
002D CD 21
CODE SEGMENT
Mycode Msg
ASSUME CS:CODE
BYTE ‘Green ‘
BYTE ‘Green ‘
BYTE ‘Grass ‘
BYTE ‘Home’
BYTE 0AH, 0DH, ‘$’
MOV AX,WORD PTR Msg MOV DS,AX
MOV DX,WORD PTR Msg CALL DispMsg
MOV AH,4CH
Mycode ENDP
PROC NEAR MOV AH,09H INT 21H RET
DispMsg ENDP CODE ENDS
END Mycode

ASMer Writing Techniques
AND 20 0020 CMP 38 0020 OR 08 0020 SBB 18 0020 SUB 28 0020 XOR 30 0020
OP Code 同類項 D5 0001 D4 0001

Encode MOV Instruction
 MOV instruction format (Partial)

Op1 OPCode
? 11 no r/m ? ??
? ?? no reg
Encode Data
0 IMM REG 110001 1
REG 100010
MEM REG 100010
MOV OP1 OP2 d = 1 reg. r/m d = 0 r/m reg.
r/m REG IMM
MEM Reg. 100010 MEM 110001

Flowchart of Handle_MOV

Kinds of Handle_MOV
1. 100010 0 2. 100010 0 3. 110001 1 4. 100010 1 5. 110001 1
MOV OP1 OP2 d = 1 reg. r/m d = 0 r/m reg.
Test OP1 11 OP1 OP2
Test OP1 ?? Test OP1 11 Test OP1 ?? Test OP1 ??
OP1 OP2 000 OP1 OP1 OP2 000 OP1

Something about MOV
MOV AX, BL
AX: reg. -> word BL: reg. ->Byte
• Check for Semantic Error
– Think about “data type” of operands. – Check for type matching
• Word • DWord
– Check for Destination operand • no literal
文法沒錯, 但semantic有錯

Instruction Table Lookup
Length# OpCode
Name Operand#
Add 2 Mov 2 Jmp 1 Nop 0 Start 0 End 0
x x x 1 0 0
… … … … … …

Label/Symbol Table Lookup
Name Start End Type Msg 0000 0010 string Num1 0011 0012 word Num2 0013 0013 byte … … … …

How to Prove
 Assembler
 Dis-Assembler
 Binary Test using Debug
測試data(source) -> x86 assembler -> .obj
-> 自己寫 assembler -> .obj

 CPU Instruction Set SIC及SIC/XE , x86
1. Executable Instructions 2. Pseudo Instruction
* START/END
* Define constant/ storage(BYTE,WORD) * LTORG 處理Literal的東西
* USE BASE register

 Literal(常數值)
包括string , character , decimal ,
hexadecimal
 Error Diagnostic,並report Unsolved reference
section,不用分開data segment , code segment)
Macro , Multiple Segments(僅一個Control

 第一支程式:Assembler
 需有 System Design Document
 程式驗收(上機)
 繳交時間:期中考後一週(以 i-learning 公告為主)

System Design Document
 選用那一個CPU?使用何種程式語言撰寫?使用何種電腦執行?
 可處理那幾個pseudo Instructions , 該pseudo Instruction 做什麼工作?  Data structure 之設計(重點)
Instruction Format , Instruction type,…  Output Format