Skip to content

Assembly

This section will be using x86 assembly.

Assembly language comprises instructions that are sent to a CPU. There are multiple "variations" of assembly, catering to different CPU architectures.

Assembler

A program that translates assembly code into machine code.

Disassembler

A program that translates machine code into assembly code.

For example, IDA Pro.

Register

CPU's basic unit of storage. Registers are fast, but limited in size.

Control unit

The part of the CPU that executes instructions.

Control unit gets instructions to execute from RAM through the Instruction Pointer.

Instruction Pointer

A register that points to the next instruction to be executed.

Memory

Memory can be split into different parts, namely Stack, Heap, Code and Data. These may seem contiguous in the diagram below, but they are actually not in order and scattered in memory.

Main Memory structure

Tip

Memory addresses always go from high to low.

Stack

The stack is a region of memory that is used to store temporary data(local variables and parameters for functions). Used to control program flow as function, APIs and subroutines are called here.

Tip

Functions vs Subroutines - the difference is that functions are used when a value is needed to be returned, while subroutines are used when a desired task is needed, but no return values are needed.

Heap

The heap is a region of memory that is used to store data that is allocated dynamically (Content change freqently when program is running, also meaning that it constantly allocates new values and free unwanted values during execution or run time).

Code

The code is a region of memory that is used to store instructions to be executed.

Data

The data is a region of memory that is used to store static data.

Operands

Immediate operands

Fixed values

Register operands

Registers (e.g. ecx)

Tip

EDX can be used for division

EAX can be used for multiplication

EAX can also hold return value for function call

ESP, EBP used for function call/return

ESI, EDI, ECX are used in repeat instructions

Index registers (ESI, EDI) may store memory addresses

All registers above have backward compatibility. This is because the x86 architecture extends the previously 16 bits and 8 bit processing. Register Breakdown

💫 x64 architecture extends the x86 we are currently learning. It has 64-bit registers, replacing the 'e' with 'r'. Hence, registers are called rax, rbx, rcx ...etc

Opcode

Opcode is the machine language equivalent of an assembly instruction

Memory adress

e.g. [ecx]

Status Flags

EFLAGS, which are 32 bits. Each bit is a flag with value 0 (clear) or 1 (set). Flags used to control CPU operations or indicate results. Important ones have been listed here:

Flags Description
Zero Flag (ZF) Set when operation result = 0
Cary Flag (CF) Set when operation result cannot be stored (results out of the range of a a byte typically)
Sign Flag (SF) Set when operation result is negative or when MSB set after arithmetic oepration
Trap Flag (TF) Set to debug, causing CPU to single step.
Overflow Flag (OF) Set when operation result generates invalid signed results

Instruction Pointer, EIP

EIP is 32 bits. It stores the address of next instruction to execute. When you control EIP, you can control what is executed by the CPU. If attackers have maliicous code/malware in memory, then they simply can modify EIP to point to that code to exploit a system.

Data Allocation

Directives Reference

Tip

Anything that follows a ';' is a comment and is ignored by the assembler

Multiple definitions can also be abbreviated. Abbreviated

References to certain values stored goes like this: Referencing When Z above is called, he value returned would then be 1, while Z + 4 will be 2.

DUP

DUP initializes an array of specifiec integers/bytes.

(e.g. 10 DUP (0) initializes an array of 10 elements, all initalized to 0. The result of this would be like this: 0,0,0,0,0,0,0,0,0,0)

EQU

EQU assigns the result of expression to name. The expression is evalutaed at assembly time. (e.g. The expression 50, is assigned to the name NUM_OF_ROWS below) Example of EQU

Correspondenc to C data types

C data types

Program Layout

Program Layout

Assembly Instructions

Move , mov

Copies a value specified or the value stored at a specified address into the destination. Move instructions

Load Effective Address , lea

Copies value of address into the destination LEA instructions

Arithmetic Instructions

Assembly Math P1 Assembly Math P2

Logical/Shifting Instructions

Logical and Shifting Instructions Each Logical and Shifting Instructions have their purposes. Generally, they are :

Instruction Description
xor Used to clears registers, and specify which bits to change
or Used to set a certain bit
sh (shift) Used for fast multiplication
ro (rotate) Used for fast division

NOP and INT

NOP and INT

Conditionals

Program execution depends on comparison result (Changes in status flags - specific bits may be set or cleared). The following are instructions that affect status flags.

AND

AND

OR

OR

XOR

XOR

NOT

NOT

test

Performs a nondestructive AND operation between each pair of matching bits in two operands.Only affects the ZF.

cmp

Compares destination and source.

Tip

You can imagine it as a CMP result: Destination - Source

cmp with unsigned integers

image image

cmp with signed integers

image

Conditional Jumps

Branches to a label when specific register/flag conditions are met. Based on specific flags, equality, unsigned/signed comparisions. image

Tip

You Parity Flag is used for error correciton. Counts for the number of set bits(bits that are 1) and if the count is even or odd.

image image image

Repeat Instructions

Repeat instructions are used for processing multi byte data like byte arrays. Uses ESI (source index), EDI(destination index) and ECX (counting variable) registers. Registers must be properly initialized for repeat instructions to work.

ECX decreases once one repeat has occured.

Instruction Description
rep Based on the value stored in ECX, repeat for that number of times.
repe Repeat until ECX = 0
repz Repeat until ZF = 0
repne Repeat while ECX != 0
repnz Repeat while ZF != 1

Examples

image image

Stack

Stores memory for functions, local variables and flow control. The stack grows downwards and memory locations lower than the esp should always be available, unless the stack has overflowed.

Push

Decrements the stack pointer by 4 bytes. Copies a value into the location pointed to by the stack pointer, esp.

Warning

Push can only be done on 16/32 bits register/memory addresses or 32 bit immediate operands(fixed values).

Pop

Increments the stack pointer by either 2 or 4 bytes. (depends on attribute of the operand receiving the data - is it a DD or DQ?) Copies value at location pointed to by the stack pointer into a register or variable.

Warning

Pop can only be done on 16/32 bits register/memory addresses.

Basic constructs

Recognizing the main method

image image

Example

image

If Else

image

Loops

image

For Loop

image

While Loop

image

Switch

image

Struct

A complex data type declaration that defines a physically grouped list of variables to be placed under one name in a block of memory, allowing the different variables to be accessed via a single pointer, or the struct declared name which returns the same address. 

Or simply a collection of variables (can be of different data types) under a single name. image Structures are accessed with base address.

Stack Frame

image


Last update: June 11, 2023
Created: June 11, 2023