Demystifying Assembly Language
Part I - Getting The Basics Ready
Hey, are you confused too about where to start learning about assembly language? I think you’re at the right place then, I too started learning assembly a few months ago, Mainly to get through reverse engineering and binary exploitation challenges. So, Let’s Get Started?
What Are We Going to Work on?
We’ll be working with 32-bit assembly written for Linux OS. We’ll be doing a lot of reading from the Intel SDM(Software Developer Manuals) in the later parts of this series. Which is a great source once you get how to extract information from it.
What is Assembly Language?
Assembly language is a type of low-level programming language which is intended to communicate directly with the computer’s hardware. Unlike machine code which consists of binary and hexadecimal characters, assembly language is designed to be readable by humans and is converted into machine language when we pass it through an assembler.
What Does Assembly Language Look Like?
This is how pretty it looks. Okay, it might not seem beautiful at first, but you’ll start loving it when you start to understand what each line of it means. And we’ll walk through that.
How To Program in Assembly Language?
You don’t need to worry, We aren’t going to be writing binary or hex, That would be cruel. We will be writing the code in opcodes and operands which is way easier than the other method.
Getting our program ready for execution.
- We will use an assembler to assemble the code(and produce an object file). This will be the file having unreadable characters cause all the hex values don’t have an ASCII character associated with them.
- Then the linker will get the object file that was generated by the assembler and turn it into an executable file.
I’d suggest you read more on the compiling process of a c/c++ program. That’ll make things clearer for you and it is interesting(or maybe I like nerdy stuff).
Learning about the basics :
Bit, Binary, Byte, Word, Hex :
Getting good at this stuff will clear a lot of things for you when we move further.
Bit :
A bit is the smallest unit of data, which can be either 0 or 1. Everything happens in the computer in 0s and 1s. The language of 0s and 1s is called binary (Language with base 2). Hex is short for hexadecimal which we will see a lot. All the disassemblers output the assembly of a compiled program in hex for ease.
Hexadecimal :
Imagine writing 11111111(binary) every time you want to represent the value 255(decimal) which is ff in hexadecimal. ff is way easier, right? Don’t worry if you don’t get it now, You’ll get it when we’ll be disassembling files.
Getting back to binary :
0010 1100 in binary is 0*2⁷+0*2⁶+1*2⁵+0*2⁴+1*2³+1*2²+0*2¹+0*2⁰
For converting the same to Hex, we take pairs of 4 binary digits and convert them to decimal first and then just map it to the correct hex value(always take 4 digits from the left to right while converting).
For Example, 0010 is 2 in decimal which is the same in hex, And for 1100, converting it to decimal will give the value 1*2³+1*2²+0*2¹+0*2⁰ = 8 + 4 = 12. Which maps to the hex value c.
So the value 0010 1100 In hex will be 2c, Easier to write in hex than in binary, right? Same for readability.
Getting to bytes :
A byte is information of 8 bits of data. And to represent A hex value, we just need 4 binary digits. It means that a byte represents 2 hexadecimal values.
For example, 1011 0101 in binary is b5 in hexadecimal.
Registers :
Registers are high-speed temporary storage areas in the processor which are used as variables to perform certain tasks(mainly arithmetic and logic). A processor has a limited number of registers, i.e. the amount of data that can be stored in registers is very limited.
EAX, EBX, ECX, EDX - General Purpose Registers, Can be used for anything, some of these registers are used for specific purposes by compilers. But that won’t be the case when we’ll program in assembly
ESP - Stack Pointer(Contains the address of the top of the stack)
EBP - Base Pointer(Contains the address of the bottom of the stack)
EIP - Instruction Pointer(Contains the address of the next instruction to be executed)
EFLAGS - The flags register is also used to store bits like any other register, which mainly is used for representing several events and deciding the control flow of the program.