How to Convert C++ to ARM Assembly

  1. Use the GCC Compiler to Convert C++ to ARM Assembly
  2. Create a MOD (Assembly-Time Modulus) Function to Convert C++ to ARM Assembly
  3. Use the arm-linux-gnueabi-gcc Command to Convert C++ to ARM Assembly
  4. Use the armclang Command in ARM Compiler for Linux to Convert C++ to ARM Assembly
  5. Use the __asm Keyword to Convert C++ to ARM Assembly
How to Convert C++ to ARM Assembly

Interfacing C++ with ARM assembly serves the programmers in many ways, and it is also a straightforward process that helps C++ access various functions and variables defined in assembly language and vice versa. This tutorial will teach you how to convert C++ code or functions to ARM assembly.

Programmers can use separate assembly code modules to link them with C++-compiled modules to use the assembly variables and inline assembly embedded in C++ or modify the assembly code that the compiler produces.

Most importantly, you must preserve any dedicated registers modified by a function, enable interrupt routines to save all the registers, ensure functions return values correctly according to their C++ declaration, no assembly module using the .cinit section, enable the compiler to assign link names to all external objects, and declare every object and function with the .def or .global directive that is accessed or called from C++ in the assembly modifier before converting C++ to the ARM assembly.

Define the functions called from the assembly language with C (functions prototyped as an extern C) in a C++ file. Define variables in the .bss section or assign them a linker symbol to later identify which one requires conversion.

Use the GCC Compiler to Convert C++ to ARM Assembly

The gcc is a great source of getting intermediate outputs from C++ code during its execution. It is a feature that gets the assembler output using the -S option.

The -S option is for the output after compiling the code before sending it to the assembler.

Its syntax is gcc –S your_program.cpp, and you can write a simple C++ program to convert into ARM assembly by just declaring this command. Besides being one of the simplest approaches, its output is complex and hard to understand, even for intermediate-level programmers.

GNN.cpp file:

#include <iostream>
using namespace std;
main() {
  int i, u, div;
  i = 2;
  u = 10;
  div = i / u;
  cout << "Answer: " << div << endl;
}

Run this command on GCC in Microsoft Windows:

gcc –S GNN.cpp

Output:

gcc compiler

It is possible to use a series of ASM statements or a single ASM statement for a single line of assembly code insertion into the assembly file within your C++ program that the compiler creates. These assembly statements place sequential lines of code (assembly code) into the compiler (C++ compiler output) with no intervening code (without any code interruptions).

However, always maintain the C++ environment because the compiler does not check/analyze the inserted instructions. Always avoid inserting labels or umps into C++ code as they may produce unpredictable results and confuse the register-tracking algorithms that code generates.

Furthermore, the ASM statements are not a valid choice for inserting assembler directives, and you can use the symdebug:dwarf command or the -g command without changing the assembly environment and avoiding assembly macros creation in C++ code because the C++ environment debugs information.

Create a MOD (Assembly-Time Modulus) Function to Convert C++ to ARM Assembly

As the ARM Assembly lacks the MOD commands, you can create a MOD function with subs and easily convert C++ to ARM Assembly. You need to load the memory address of the variable via ldr reg, =var, and in case you want to load the variable, it requires doing another ldr with that reg like ldr r0, =carry ldr r0, [r0] to load the value stored at the memory address in r0.

Use sdiv because it is much faster than a subtract loop except for minimal inputs, where the loop only runs once or twice.

Concept:

;Precondition: R0 % R1 is the required computation
;Postcondition: R0 has the result of R0 % R1
              : R2 has R0 / R1

; Example comments for 10 % 7
UDIV R2, R0, R1      ; 1 <- 10 / 7       ; R2 <- R0 / R1
MLS  R0, R1, R2, R0  ; 3 <- 10 - (7 * 1) ; R0 <- R0 - (R1 * R2 )
#include <iostream>
using namespace std;
main() {
  int R0, R1, R2;
  R1 = 7;
  R2 = 1;
  R0 = 10;
  int Sol1, Sol2;
  Sol1 = R2 < -R0 / R1;
  Sol2 = R0 < -R0 - (R1 * R2);

  cout << Sol1 << endl;
  cout << Sol2;
}

Output:

mod

Use the arm-linux-gnueabi-gcc Command to Convert C++ to ARM Assembly

The arm-linux-gnueabi-gcc command is a perfect way to convert C++ to ARM assembly for x86 & x64 machines. As the gcc doesn’t have ARM targets available, you cannot use it for general systems, but only if you are on an ARM system where you can use the regular gcc instead.

The complete command arm-linux-gnueabi-gcc -S -O2 -march=armv8-a GNN.cpp is incredibly strong where -S represents the output assembly and tells gcc about it, -02 is a code optimizer and reduces debug clutter from the result. The -02 is optional; on the other hand, the -march=armv8-a is compulsory and tells it to use the ARM v8 target while compiling.

You can change the ARM target while compiling by using the different versions of ARM v8, including; armv8-a, armv8.1-a to armv8.6-a, armv8-m.base, armv8-m.main, and armv8.1-m.main where each one is slightly different, and you can perform in-depth analysis and select the one that suits your needs perfectly.

The power.c from the command tells which file to compile, and if you haven’t specified an output file like -o output.asm, the assembly will be outputted to the similar file name power.s.

The arm-linux-gnueabi-gcc is a great alternative to compiling on an arm machine that provides the target or output assembly with regular gcc.

The gcc lets programmers specify the target architecture with -march=xxx, and you must know to identify your machine’s apt package to select the right one.

GNN.cpp file:

#include <iostream>
using namespace std;

int power(int x, int y) {
  if (x == 0) {
    return 0;
  } else if (y < 0) {
    return 0;
  } else if (y == 0) {
    return 1;
  } else {
    return x * power(x, y - 1);
  }
}

main() {
  int x, y, sum;
  x = 2;
  y = 10;
  sum = power(x, y);
  cout << sum;
}
arm-linux-gnueabi-gcc -S -O2 -march=armv8-a GNN.cpp

Output:

power(int, int):
        push    rbp
        mov     rbp, rsp
        sub     rsp, 16
        mov     DWORD PTR [rbp-4], edi
        mov     DWORD PTR [rbp-8], esi
        cmp     DWORD PTR [rbp-4], 0
        jne     .L2
        mov     eax, 0
        jmp     .L3
.L2:
        cmp     DWORD PTR [rbp-8], 0
        jns     .L4
        mov     eax, 0
        jmp     .L3
.L4:
        cmp     DWORD PTR [rbp-8], 0
        jne     .L5
        mov     eax, 1
        jmp     .L3
.L5:
        mov     eax, DWORD PTR [rbp-8]
        lea     edx, [rax-1]
        mov     eax, DWORD PTR [rbp-4]
        mov     esi, edx
        mov     edi, eax
        call    power(int, int)
        imul    eax, DWORD PTR [rbp-4]
.L3:
        leave
        ret
main:
        push    rbp
        mov     rbp, rsp
        sub     rsp, 16
        mov     DWORD PTR [rbp-4], 2
        mov     DWORD PTR [rbp-8], 10
        mov     edx, DWORD PTR [rbp-8]
        mov     eax, DWORD PTR [rbp-4]
        mov     esi, edx
        mov     edi, eax
        call    power(int, int)
        mov     DWORD PTR [rbp-12], eax
        mov     eax, DWORD PTR [rbp-12]
        mov     esi, eax
        mov     edi, OFFSET FLAT:_ZSt4cout
        call    std::basic_ostream<char, std::char_traits<char> >::operator<<(int)
        mov     eax, 0
        leave
        ret
__static_initialization_and_destruction_0(int, int):
        push    rbp
        mov     rbp, rsp
        sub     rsp, 16
        mov     DWORD PTR [rbp-4], edi
        mov     DWORD PTR [rbp-8], esi
        cmp     DWORD PTR [rbp-4], 1
        jne     .L10
        cmp     DWORD PTR [rbp-8], 65535
        jne     .L10
        mov     edi, OFFSET FLAT:_ZStL8__ioinit
        call    std::ios_base::Init::Init() [complete object constructor]
        mov     edx, OFFSET FLAT:__dso_handle
        mov     esi, OFFSET FLAT:_ZStL8__ioinit
        mov     edi, OFFSET FLAT:_ZNSt8ios_base4InitD1Ev
        call    __cxa_atexit
.L10:
        nop
        leave
        ret
_GLOBAL__sub_I_power(int, int):
        push    rbp
        mov     rbp, rsp
        mov     esi, 65535
        mov     edi, 1
        call    __static_initialization_and_destruction_0(int, int)
        pop     rbp
        ret

Alternatively, you can install the ARM compiler for Linux by loading the module for ARM compiler by running module load arm<major-version>/<package-version> where <package-version> is <major-version>.<minor-version>{.<patch-version>}, for example: module load arm21/21.0.

The armclang -S <source>.c command can help you compile your C++ source and specify an assembly code output where -S represents assembly code output and <source>.s is the file that will contain converted code.

Use the armclang Command in ARM Compiler for Linux to Convert C++ to ARM Assembly

You can produce annotated assembly code using the ARM C++ compiler, which is the first step to learning how the compiler vectorizes loops. An ARM compiler for Linux OS is a prerequisite for generating the assembly code from C++.

After loading the module for the ARM compiler, run the module load arm<major-version>/<package-version> command, for example: module load arm21/21.0 by putting <major-version>.<minor-version>{.<patch-version>} where the <package-version> is part of the command.

Compile your source code using the armclang -S <source>.cpp command and insert the source file name in the location of <source>.cpp.

The ARM assembly compiler does something different from the GCC compiler, using SIMD (Single Instruction Multiple Data) instructions and registers to vectorize the code.

GNN.cpp file:

#include <iostream>
using namespace std;

void subtract_arrays(int a, int b, int c) {
  int sum;
  for (int i = 0; i < 5; i++) {
    a = (b + c) - i;
    sum = sum + a;
  }
  cout << sum;
}
int main() {
  int a = 1;
  int b = 2;
  int c = 3;
  subtract_arrays(a, b, c);
}
armclang -O1 -S -o source_O1.s GNN.cpp

Output:

subtract_arrays(int, int, int):
        push    rbp
        mov     rbp, rsp
        sub     rsp, 32
        mov     DWORD PTR [rbp-20], edi
        mov     DWORD PTR [rbp-24], esi
        mov     DWORD PTR [rbp-28], edx
        mov     DWORD PTR [rbp-8], 0
        jmp     .L2
.L3:
        mov     edx, DWORD PTR [rbp-24]
        mov     eax, DWORD PTR [rbp-28]
        add     eax, edx
        sub     eax, DWORD PTR [rbp-8]
        mov     DWORD PTR [rbp-20], eax
        mov     eax, DWORD PTR [rbp-20]
        add     DWORD PTR [rbp-4], eax
        add     DWORD PTR [rbp-8], 1
.L2:
        cmp     DWORD PTR [rbp-8], 4
        jle     .L3
        mov     eax, DWORD PTR [rbp-4]
        mov     esi, eax
        mov     edi, OFFSET FLAT:_ZSt4cout
        call    std::basic_ostream<char, std::char_traits<char> >::operator<<(int)
        nop
        leave
        ret
main:
        push    rbp
        mov     rbp, rsp
        sub     rsp, 16
        mov     DWORD PTR [rbp-4], 1
        mov     DWORD PTR [rbp-8], 2
        mov     DWORD PTR [rbp-12], 3
        mov     edx, DWORD PTR [rbp-12]
        mov     ecx, DWORD PTR [rbp-8]
        mov     eax, DWORD PTR [rbp-4]
        mov     esi, ecx
        mov     edi, eax
        call    subtract_arrays(int, int, int)
        mov     eax, 0
        leave
        ret
__static_initialization_and_destruction_0(int, int):
        push    rbp
        mov     rbp, rsp
        sub     rsp, 16
        mov     DWORD PTR [rbp-4], edi
        mov     DWORD PTR [rbp-8], esi
        cmp     DWORD PTR [rbp-4], 1
        jne     .L8
        cmp     DWORD PTR [rbp-8], 65535
        jne     .L8
        mov     edi, OFFSET FLAT:_ZStL8__ioinit
        call    std::ios_base::Init::Init() [complete object constructor]
        mov     edx, OFFSET FLAT:__dso_handle
        mov     esi, OFFSET FLAT:_ZStL8__ioinit
        mov     edi, OFFSET FLAT:_ZNSt8ios_base4InitD1Ev
        call    __cxa_atexit
.L8:
        nop
        leave
        ret
_GLOBAL__sub_I_subtract_arrays(int, int, int):
        push    rbp
        mov     rbp, rsp
        mov     esi, 65535
        mov     edi, 1
        call    __static_initialization_and_destruction_0(int, int)
        pop     rbp
        ret

Use the __asm Keyword to Convert C++ to ARM Assembly

It’s known to be the most valid approach as the compiler provides an inline assembler to write assembly code in your C++ source code and enables you to access features of the target processor that are not a part of or available from C++.

Using the GNU inline assembly syntax, the _arm keyword helps you incorporate or write inline assembly code into a function.

However, it is not a good approach to migrate armasm syntax assembly code to GNU syntax as the inline assembler does not support legacy assembly code written in armasm assembly syntax.

The __asm [volatile] (code); /* Basic inline assembly syntax */ inline assembly statement shows the general form of an _arm statement, and there is also an extended version of inline assembly syntax which you will find in the example code below.

Using the volatile qualifier for assembler instructions is beneficial but can have some drawbacks that the compiler might be unaware of, including; the chances of disabling certain compiler optimizations that can lead to the compiler removing the code block.

As the volatile qualifier is optional, using it can ensure the compiler does not remove the assembly code blocks when compiling with -01 or above.

#include <stdio.h>

int add(int x, int y) {
  int sum = 0;
  __asm("ADD %[_sum], %[input_x], %[input_y]"
        : [_sum] "=r"(sum)
        : [input_x] "r"(x), [input_y] "r"(y));
  return sum;
}

int main(void) {
  int x = 1;
  int y = 2;
  int z = 0;

  z = add(x, y);

  printf("Result of %d + %d = %d\n", x, y, z);
}

Output:

add(int, int):
        push    rbp
        mov     rbp, rsp
        mov     DWORD PTR [rbp-20], edi
        mov     DWORD PTR [rbp-24], esi
        mov     DWORD PTR [rbp-4], 0
        mov     eax, DWORD PTR [rbp-20]
        mov     edx, DWORD PTR [rbp-24]
        ADD eax, eax, edx
        mov     DWORD PTR [rbp-4], eax
        mov     eax, DWORD PTR [rbp-4]
        pop     rbp
        ret
.LC0:
        .string "Result of %d + %d = %d\n"
main:
        push    rbp
        mov     rbp, rsp
        sub     rsp, 16
        mov     DWORD PTR [rbp-4], 1
        mov     DWORD PTR [rbp-8], 2
        mov     DWORD PTR [rbp-12], 0
        mov     edx, DWORD PTR [rbp-8]
        mov     eax, DWORD PTR [rbp-4]
        mov     esi, edx
        mov     edi, eax
        call    add(int, int)
        mov     DWORD PTR [rbp-12], eax
        mov     ecx, DWORD PTR [rbp-12]
        mov     edx, DWORD PTR [rbp-8]
        mov     eax, DWORD PTR [rbp-4]
        mov     esi, eax
        mov     edi, OFFSET FLAT:.LC0
        mov     eax, 0
        call    printf
        mov     eax, 0
        leave
        ret

The code keyword in the _arm assembly statement is the assembly instruction, and the code_template is its template; if you only specify it rather than code, then you must specify the output_operand_list before specifying the optional input_operand_list and clobbered_register_list.

The output_operand_list (as an output operands list) is separated by commas, and each operand consists of a symbolic name in square brackets with the [result] "=r" (res) format.

You may use the inline assembly to define symbols like __asm (".global __use_no_semihosting\n\t"); or to define labels using the : sign after the label name like __asm ("my_label:\n\t");.

Furthermore, it enables you to write multiple instructions within the same _asm statement and also enables you to write embedded assembly using the __attribute__((naked)) keyword.

The Microsoft C++ compiler (MSVC) can provide different results on the ARM architecture than on x86 or x64 machines or architectures for the same C++ source code, and you may encounter many migration or conversion issues.

The issues can invoke undefined, implementation-defined, or unspecified behavior and other migration issues attributed to hardware differences between ARM and x86 or x64 architectures that interact with the C++ standard differently.

Syed Hassan Sabeeh Kazmi avatar Syed Hassan Sabeeh Kazmi avatar

Hassan is a Software Engineer with a well-developed set of programming skills. He uses his knowledge and writing capabilities to produce interesting-to-read technical articles.

GitHub