Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ int main() {
$ rm -f (path filter *.o); gcc -O0 -c main.c; llvm-objdump -d --x86-asm-syntax=att main.o
```

```bash
```x86asm
main.o: file format elf64-x86-64

Disassembly of section .text:
Expand All @@ -90,11 +90,11 @@ text data bss dec hex filename
As `-O0`, the compiler generates a stack frame, leading to unnecessary instruction overhead.

###### Use `-O1` as optimization level
```
```bash
$ rm -f (path filter *.o); gcc -O1 -c main.c; llvm-objdump -d --x86-asm-syntax=att main.o
```

```bash
```x86asm
main.o: file format elf64-x86-64

Disassembly of section .text:
Expand All @@ -121,7 +121,7 @@ It reduces the output from six instructions to three by removing the stack frame
$ rm -f (path filter *.o); gcc -O2 -c main.c; llvm-objdump -d --x86-asm-syntax=att main.o
```

```bash
```x86asm
main.o: file format elf64-x86-64

Disassembly of section .text:
Expand Down Expand Up @@ -154,7 +154,9 @@ $ rm -f (path filter *.o); clang -O1 -c main.c; llvm-objdump -d --x86-asm-syntax

```bash
rm -f (path filter *.o); clang -O1 -c main.c; llvm-objdump -d --x86-asm-syntax=att main.o
```

```x86asm
main.o: file format elf64-x86-64

Disassembly of section .text:
Expand Down Expand Up @@ -183,7 +185,7 @@ long get_val() {
$ rm -f (path filter *.o); clang -O2 -c get_val.c; llvm-objdump -d --x86-asm-syntax=att get_val.o
```

```bash
```x86asm
get_val.o: file format elf64-x86-64

Disassembly of section .text:
Expand Down Expand Up @@ -278,7 +280,7 @@ $ clang -O2 -c main.c
$ llvm-objdump -d --disassemble-symbols=f --x86-asm-syntax=att main.o
```

```bash
```x86asm
main.o: file format elf64-x86-64

Disassembly of section .text:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ int add(int x, int y) {
$ rm -f (path filter *.o); clang -O0 -c add.c; llvm-objdump -d --x86-asm-syntax=att add.o
```

```bash
```x86asm
add.o: file format elf64-x86-64

Disassembly of section .text:
Expand All @@ -78,7 +78,7 @@ the compiler cannot translate `a = b + c` directly to a single `add` instruction
value of `b` or `c` before the operation is executed, the compiler needs to use `mov` instruction to
initialize the destination with one of the operands first:

```bash
```x86asm
movl -0x4(%rbp), %eax
addl -0x8(%rbp), %eax
```
Expand All @@ -90,7 +90,7 @@ Hence, the compiler needs to use two instructions to execute the addition at the
rm -f (path filter *.o); clang -O2 -c add.c; llvm-objdump -d --x86-asm-syntax=att add.o
```

```bash
```x86asm
add.o: file format elf64-x86-64

Disassembly of section .text:
Expand All @@ -115,7 +115,7 @@ and the other employing a single `lea` instruction.
$ nvim add.s
```

```text
```x86asm
.section .note.GNU-stack, "", @progbits

.section .rodata
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ $ qemu-aarch64 ./app.out
$ llvm-objdump -d --disassemble-symbols=add app.out
```

```text
```armasm
app.out: file format elf64-littleaarch64

Disassembly of section .text:
Expand Down Expand Up @@ -108,7 +108,7 @@ Disassembly of section .text:
```

#### Part 01: Function Prologue
```text
```armasm
4007f0: d10083ff sub sp, sp, #0x20 // Allocate 32 bytes on stack
4007f4: a9017bfd stp x29, x30, [sp, #0x10] // Save Frame Pointer (x29) and Link Register (x30)
4007f8: 910043fd add x29, sp, #0x10 // Set up new Frame Pointer
Expand All @@ -134,7 +134,7 @@ facilitate the Stack Unwinding process during the Function Epilogue.
```

#### Part 02: Parameter Storage
```text
```armasm
4007fc: b81fc3a0 stur w0, [x29, #-0x4] // Store 'x' (w0) into stack
400800: b9000be1 str w1, [sp, #0x8] // Store 'y' (w1) into stack
400804: b9400be8 ldr w8, [sp, #0x8] // Load 'y' from stack into w8
Expand All @@ -145,7 +145,7 @@ Since it is at the `-O0` optimization level,
an additional instruction is used to load `y` from stack memory back into a register (`w8`) for subsequent conditional evaluation.

#### Part 03: Branching
```text
```armasm
400808: 71000108 subs w8, w8, #0x0 // Compare w8 (y) with 0
40080c: 540000a8 b.hi 0x400820 <add+0x30> // If y > 0, jump to recursive case (400820)
400810: 14000001 b 0x400814 <add+0x24> // Else, branch to base case logic
Expand All @@ -158,7 +158,7 @@ if `y > 0`, the Program Counter (`PC`) jumps to the Recursive Case;
otherwise, it jumps to the Base Case.

#### Part 04: The Base Case: `y == 0`
```text
```armasm
400814: b85fc3a0 ldur w0, [x29, #-0x4] // [Base Case] Load 'x' into w0
400818: b90007e0 str w0, [sp, #0x4] // Store 'x' as the potential return value
40081c: 14000008 b 0x40083c <add+0x4c> // Jump to epilogue (return) (Part 06)
Expand All @@ -169,7 +169,7 @@ The compiler then executes a `store` operation from register `W0` to stack memor
return value.

#### Part 05: The Recursive Step: `add(x + 1, y - 1)`
```text
```armasm
400820: b85fc3a8 ldur w8, [x29, #-0x4] // [Recursive Case] Load 'x' into w8
400824: 11000500 add w0, w8, #0x1 // w0 = x + 1 (Preparing 1st argument)
400828: b9400be8 ldr w8, [sp, #0x8] // Load 'y' into w8
Expand All @@ -184,7 +184,7 @@ The `bl` (Branch with Link) instruction then executes the recursive call, redire
Once the recursive call returns, the resulting value in `w0` is stored into stack memory before jumping to the epilogue.

#### Part 06: Function Epilogue
```text
```armasm
40083c: b94007e0 ldr w0, [sp, #0x4] // Load the result from stack into w0
400840: a9417bfd ldp x29, x30, [sp, #0x10] // Restore Frame Pointer and Link Register
400844: 910083ff add sp, sp, #0x20 // Deallocate stack space
Expand Down Expand Up @@ -212,7 +212,7 @@ $ qemu-aarch64 ./app.out
$ llvm-objdump -d --disassemble-symbols=add app.out
```

```text
```armasm
app.out: file format elf64-littleaarch64

Disassembly of section .text:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ int mul(int x) {
$ rm -f (path filter *.o); clang -O2 -c mul.c; llvm-objdump -d --x86-asm-syntax=att mul.o
```

```text
```x86asm
mul.o: file format elf64-x86-64

Disassembly of section .text:
Expand Down Expand Up @@ -88,7 +88,7 @@ int mul(int x) {
$ rm -f (path filter *.o); clang -O2 -c mul.c; llvm-objdump -d --x86-asm-syntax=att mul.o
```

```text
```x86asm
mul.o: file format elf64-x86-64

Disassembly of section .text:
Expand Down Expand Up @@ -116,7 +116,7 @@ int mul(int x) {
$ rm -f (path filter *.o); clang -O2 -c mul.c; llvm-objdump -d --x86-asm-syntax=att mul.o
```

```text
```x86asm
mul.o: file format elf64-x86-64

Disassembly of section .text:
Expand Down Expand Up @@ -145,7 +145,7 @@ int mul(int x) {
$ rm -f (path filter *.o); clang -O2 -c mul.c; llvm-objdump -d --x86-asm-syntax=att mul.o
```

```text
```x86asm
mul.o: file format elf64-x86-64

Disassembly of section .text:
Expand Down Expand Up @@ -186,7 +186,7 @@ int mul(int x) {
$ rm -f (path filter *.o); clang -O2 -c mul.c; llvm-objdump -d --x86-asm-syntax=att mul.o
```

```text
```x86asm
mul.o: file format elf64-x86-64

Disassembly of section .text:
Expand Down Expand Up @@ -215,7 +215,7 @@ int mul(int x) {
$ rm -f (path filter *.o); clang -O2 -c mul.c; llvm-objdump -d --x86-asm-syntax=att mul.o
```

```text
```x86asm
mul.o: file format elf64-x86-64

Disassembly of section .text:
Expand Down Expand Up @@ -245,7 +245,7 @@ int mul(int x) {
$ rm -f (path filter *.o); clang -O2 -c mul.c; llvm-objdump -d --x86-asm-syntax=att mul.o
```

```text
```x86asm
mul.o: file format elf64-x86-64

Disassembly of section .text:
Expand Down Expand Up @@ -275,7 +275,7 @@ int mul(int x) {
$ rm -f (path filter *.o); clang -O2 -c mul.c; llvm-objdump -d --x86-asm-syntax=att mul.o
```

```text
```x86asm
mul.o: file format elf64-x86-64

Disassembly of section .text:
Expand Down Expand Up @@ -309,7 +309,7 @@ int mul(int x) {
$ rm -f (path filter *.o); clang -O2 -c mul.c; llvm-objdump -d --x86-asm-syntax=att mul.o
```

```text
```x86asm
mul.o: file format elf64-x86-64

Disassembly of section .text:
Expand Down Expand Up @@ -340,7 +340,7 @@ int mul(int x) {
$ rm -f (path filter *.o); clang -O2 -c mul.c; llvm-objdump -d --x86-asm-syntax=att mul.o
```

```text
```x86asm
mul.o: file format elf64-x86-64

Disassembly of section .text:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ int mul(int x) {
$ rm -f (path filter *.o); clang -O2 -target aarch64-linux-gnu -c mul.c; llvm-objdump -d mul.o
```

```text
```armasm
mul.o: file format elf64-littleaarch64

Disassembly of section .text:
Expand Down Expand Up @@ -88,7 +88,7 @@ int mul(int x) {
$ rm -f (path filter *.o); clang -O2 -target aarch64-linux-gnu -c mul.c; llvm-objdump -d mul.o
```

```text
```armasm
mul.o: file format elf64-littleaarch64

Disassembly of section .text:
Expand Down Expand Up @@ -120,7 +120,7 @@ int mul(int x) {
$ rm -f (path filter *.o); clang -O2 -target aarch64-linux-gnu -c mul.c; llvm-objdump -d mul.o
```

```text
```armasm
mul.o: file format elf64-littleaarch64

Disassembly of section .text:
Expand All @@ -132,8 +132,10 @@ Disassembly of section .text:
```

ARM Instructions:
- `add <Rd>, <Rn>, <Rm>, lsl #<shift>`
- `lsl <Rd>, <Rn>, #<shift>`
```armasm
add <Rd>, <Rn>, <Rm>, lsl #<shift>
lsl <Rd>, <Rn>, #<shift>
```

The multiplication of 6x is decomposed into two discrete stages.
First, the compiler calculates `w8 = w0 + (w0 << 1) = w0 + 2 * w0 = 3 * w0`.
Expand All @@ -155,7 +157,7 @@ int mul(int x) {
$ rm -f (path filter *.o); clang -O2 -target aarch64-linux-gnu -c mul.c; llvm-objdump -d mul.o
```

```text
```armasm
mul.o: file format elf64-littleaarch64

Disassembly of section .text:
Expand All @@ -167,8 +169,10 @@ Disassembly of section .text:
```

ARM Instructions:
- `lsl <Rd>, <Rn>, #<shift>`
- `sub <Rd>, <Rn>, <Rm>`
```armasm
lsl <Rd>, <Rn>, #<shift>
sub <Rd>, <Rn>, <Rm>
```

The compiler implements a shift-and-subtract strategy for constants near powers of two.
To compute `7x`, it first executes `w8 = w0 << 3 = 8 * w0`
Expand All @@ -190,7 +194,7 @@ int mul(int x) {
$ rm -f (path filter *.o); clang -O2 -target aarch64-linux-gnu -c mul.c; llvm-objdump -d mul.o
```

```text
```armasm
mul.o: file format elf64-littleaarch64

Disassembly of section .text:
Expand All @@ -202,14 +206,16 @@ Disassembly of section .text:
```

ARM Instructions:
- `mov <Rd>, <Imm>`
- `mul <Rd>, <Rn>, <Rm>`
```armasm
mov <Rd>, <Imm>
mul <Rd>, <Rn>, <Rm>
```

The compiler defaults to the `mul` instruction because decomposing the constant `11` cannot be achieved in only two instructions.

If the compiler were to adopt a manual shift-and-subtract strategy,
the code generator would need to output three instructions:
```text
```armasm
add w8, w0, w0, lsl 1 // w8 = x + 2x = 3x
lsl w8, w8, #2 // w8 = w8 << 2 = 3x << 2 = 3x * 4 = 12x
sub w0, w8, w0 // w0 = 12x - x = 11x
Expand All @@ -233,7 +239,7 @@ int mul(int x) {
$ rm -f (path filter *.o); clang -O2 -target aarch64-linux-gnu -c mul.c; llvm-objdump -d mul.o
```

```text
```armasm
mul.o: file format elf64-littleaarch64

Disassembly of section .text:
Expand All @@ -245,8 +251,10 @@ Disassembly of section .text:
```

ARM Instructions:
- `lsl <Rd>, <Rn>, #<shift>`
- `sub <Rd>, <Rn>, <Rm>, lsl #<shift>`
```armasm
lsl <Rd>, <Rn>, #<shift>
sub <Rd>, <Rn>, <Rm>, lsl #<shift>
```

The computation of 14x demonstrates the flexibility of the sub instruction with shifted operands.
The compiler first calculates `w8 = w0 << 4 = 16 * w0`.
Expand Down
Loading
Loading