How to Convert a C Program to Assembly

Abdul Mateen Feb 02, 2024
  1. The Assembly Language
  2. The C Language
  3. Convert a C Program to Assembly Language
How to Convert a C Program to Assembly

This tutorial will discuss converting a C language program into assembly language code.

We will briefly discuss the fundamentals of Assembly and C languages. Later, we will see the conversion of the C program to Assembly code and the de-assembling of an Assembly code.

The Assembly Language

Assembly is a low-level interpreted language. Generally, a statement written in assembly language is translated into a single machine-level instruction.

However, it is much more readable than machine language because it uses mnemonics. The mnemonics are English-like instructions or operation codes.

For example, the mnemonic ADD is used to add two numbers. Similarly, MOV is used to perform data movements.

Likewise, CMP compares two expressions, and JMP jumps the execution control to some specific label or location marker.

Assembly language is very close to machine (hardware); thus, instructions written in assembly language are very fast. However, the programmer needs to have much more hardware knowledge than a developer of a high-level language.

Assembly language is typically used to write efficient system programs like device drivers, virus/anti-virus programs, embedded system software, and TSR (terminated and stay resident programs).

An assembler must assemble an assembly language program into a machine language program executable on the machine.

The C Language

C is a high-level machine-independent programming language. Usually, C programs don’t require hardware knowledge (only a little knowledge is required).

C has high-level statements and requires a compiler program that translates each statement of C language into one or multiple assembly language statements. For example, a simple instruction in C language, c = a + b, is translated into the following assembly language statements:

mov edx, DWORD PTR - 12 [rbp] mov eax, DWORD PTR - 8 [rbp] add eax,
    edx mov DWORD PTR - 4 [rbp], eax

Here, in the first & second statement value of variables from memory is moved to registers. The add instruction is adding two register values.

In the fourth statement, the value from the register is moved to a variable in memory.

Besides, the compiler has to do a lot of work, but the programmer’s life is simple working in C language. C language has a broad spectrum of applications, from high-level business applications to low-level utility programs.

Convert a C Program to Assembly Language

Typically, people use the sophisticated integrated environment to write, edit, compile, run, modify, & debug C language programs or the gcc command to convert the C language program into executable programs.

These tools keep the users unaware of the steps involved in converting a source code written in some high-level language like C into machine executable code. Typically, the following steps are performed in between:

  1. Pre-Processing - A pre-processor program does three tasks. The first task is to include header files, the second task is to replace macros, and the third task is to remove comments from the source program
  2. Compiler - In the second step, the compiler translates high-level language programs into assembly language programs
  3. Assembler - In the third step, the assembler program takes an assembly language program (translated by the compiler) and assembles it into a machine executable form called object code
  4. Linker - In the fourth step, a linker program attaches compiled library files with the object code to run this program independently

Commands to Convert C Code to an Assembly Equivalent

Typically, command line users type gcc program_name.c, which generates an executable file (in case of no errors). If the target file name is not given, it is either available with a.out in the UNIX operating systems family or program_name.exe in the Windows operating system.

Nevertheless, the gcc command has a vast list of parameters to perform specific tasks. This tutorial will discuss only -s and -C flags.

The -S flag generates an assembly language program from the C source code. Let’s understand this flag using the following example where we have test.c as a source file:

// test.c
int main() {
  int a = 2, b = 3, c;
  c = a + b;
  return 0;
}

The following command will generate the target Assembly language code with the extension .S:

$ gcc -S test.c
$ ls
test.c test.s

The command has not created machine language code; only the Assembly language code is generated. Let’s display the contents of this generated Assembly code using the cat command in Bash:

$ cat test.s
    .file   "Test.c"
    .text
    .globl  main
    .type   main, @function
main:
.LFB0:
    .cfi_startproc
    endbr64
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    movl    $2, -12(%rbp)
    movl    $3, -8(%rbp)
    movl    -12(%rbp), %edx
    movl    -8(%rbp), %eax
    addl    %edx, %eax
    movl    %eax, -4(%rbp)
    ...

The generated Assembly code may not be familiar to many programmers who have experience writing Assembly codes for Intel x86 architecture.

If we want the target Assembly code for Intel x86 architectures, the following command will do this for us:

$ gcc -S -masm=intel  Test.c

Again, the output will be generated in the Test.s file, which can be viewed using the cat command in the Bash terminal. In Windows, we can open it in some editor like Notepad or a better editor.

Anyway, let’s see the contents of the Assembly code generated by the above command:

 cat Test.s
    .file   "Test.c"
    .intel_syntax noprefix
    .text
    .globl  main
    .type   main, @function
main:
.LFB0:
    .cfi_startproc
    endbr64
    push    rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    mov rbp, rsp
    .cfi_def_cfa_register 6
    mov DWORD PTR -12[rbp], 2
    mov DWORD PTR -8[rbp], 3
    mov edx, DWORD PTR -12[rbp]
    mov eax, DWORD PTR -8[rbp]
    add eax, edx
    mov DWORD PTR -4[rbp], eax
    ...

The output is slightly different; the mov and add commands are very clear.

De-Assemble an Object Code

Besides converting a C language program into assembly language, one may want to disassemble binary code (machine code) to see the equivalent Assembly language code. We can use the objdump utility in Linux to do that.

Example:

Assume we execute the gcc -c Test.c command to compile the Test.c file in a Bash terminal. It creates an object file (machine language code) with the name Test.o.

Now, if we want to see re-convert/de-assemble this object code to the equivalent Assembly code, we can do that using the following Bash command:

$ objdump -d Test.o

Test.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <main>:
   0:   f3 0f 1e fa             endbr64
   4:   55                      push   %rbp
   5   48 89 e5                 mov    %rsp,%rbp
   8:   c7 45 f4 02 00 00 00    movl   $0x2,-0xc(%rbp)
   f:   c7 45 f8 03 00 00 00    movl   $0x3,-0x8(%rbp)
  16:   8b 55 f4                mov    -0xc(%rbp),%edx
  19:   8b 45 f8                mov    -0x8(%rbp),%eax
  1c:   01 d0                   add    %edx,%eax
  1e:   89 45 fc                mov    %eax,-0x4(%rbp)
  21:   b8 00 00 00 00          mov    $0x0,%eax
  26:   5d                      pop    %rbp

In this output, the code on the left-hand side is the binary code in hexadecimal. On the right-hand side, the assembly language code in readable form is visible.