Post

Go calling conventions, CGO callback internals | Go —> C

Go calling conventions, CGO callback internals | Go —> C

Few notes on Go calling conventions, setting the custom calling convention in IDA Pro __usercall and Go internals. In the malware trainings related to analysis of Go internals, Go dlls are not always covered. Malware incorporating reflective loaders in Go is a new trend, thats why i decided to write this quick guide for myself to refer to while reversing Go binaries.

Go calling conventions

The official Go documentation says the following on the go calling convention. Example:

1
2
func f(a1 uint8, a2 [2]uintptr, a3 uint8) (r1 struct { x uintptr; y [2]uintptr }, r2 string) 
// on a 64-bit architecture with hypothetical integer registers R0–R9.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
On entry:
  a1 goes to R0
  a3 goes to R1 
Stack frame is laid out in the following sequence:
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —  lower address
  a2      [2]uintptr    <— stack-assigned arguments
  r1.x    uintptr       <— stack-assigned results 
  r1.y    [2]uintptr    <
  a1Spill uint8         <— arg spill area
  a3Spill uint8         <— arg spill area
  _       [6]uint8      <— alignment padding
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —  higher address
[!] In the stack frame, only the a2 field is initialized on entry, the rest of the frame is left uninitialized.
[!] Only arguments, not results, are assigned a spill area on the stack.
On exit:
	r2.base goes to R0  	<-- Result r2 is decomposed into its components, which are individually register-assigned. 
	r2.len goes to R1
	r1.x and r1.y are initialized in the stack frame.
[!] a2 and r1 are stack-assigned because they contain arrays.

Go ABI

There are numerous calling conventions, that really make debugging tiresome, especially in Go where many of them are fused in a single binary. Some of them include:

gcc ABI: Used by gcc for C/C++ code. x64_86 (64-bit)

  • Linux:
    • Arguments: First : rdi, rsi, rdx, rcx, r8, r9.
    • Additional arguments: stack.
    • Return value: rax.
  • Windows:
    • Arguments: rcx, rdx, r8, r9.
    • Additional arguments: stack
    • Return value: rax.
      x86 (32-bit)
  • Linux and Windows:
    • Arguments: stack.
    • Return value: eax.

Go ABI: A general term for Go’s calling conventions (includes ABI0 and ABIInternal).

  • Platform-specific:
    • When Go interacts with external code (e.g., C or assembly), it uses the platform’s native ABI (e.g., gcc ABI on Linux).
    • For example:
      • On Linux x86-64, Go uses rdi, rsi, rdx, rcx, r8, r9 for arguments (matching the gcc ABI).
      • On Windows x86-64, Go uses rcx, rdx, r8, r9 for arguments (matching the Windows ABI).

ABI0: The old, stack-based Go ABI (deprecated).

ABIInternal: The current, register-based Go ABI.

  • Registers used for arguments:
    • Go’s ABI uses a custom set of registers to pass arguments, which may include rax, rbx, rcx, rdx, rsi, rdi, r8, r9, r10, r11, etc.
    • The exact registers used depend on the function signature and the Go compiler’s optimization decisions.

ABI Platform: The platform-specific ABI used for interfacing with external code (also used by Go when calling C code )

Go entities

To aid the analysis, knowledge of Go structures is required:

  • g - Goroutine, lightweight thread in Go, has a user stack associated with it
  • p - Processor, a logical entity that manages scheduling of goroutines onto threads.
  • m - Machine, OS-level thread, has a system stack associated with it, also known as g0 (on Unix platforms, a signal stack).
  • p.GoF - Go Function associated with a processor.
  • m.g0 - system goroutine associated with an OS thread.

All g, m, and p objects are heap allocated, but are never freed, so their memory remains type stable. As a result, the runtime can avoid write barriers in the depths of the scheduler.

  • getg().m.curg - to get the current user g
  • getg() - returns the current g, but when executing on the system or signal stacks, this will return the current M’s “g0” or “gsignal”, respectively
  • getg() == getg().m.curg - to determine if you’re running on the user stack or the system stack.

figure

CGO callback / Go —> C

If gcc compiled function f callling back to Go this is what happens next. To make it possible for gcc-compiled C code to call a Go function p.GoF, cgo writes a gcc-compiled function named GoF (not p.GoF, since gcc doesn’t know about packages) which acts like a bridge. This GoF function is written in C and is an intermediary between the C code and the Go runtime. The gcc-compiled C function f calls GoF. GoF initializes “frame”, a structure containing all of its arguments and slots for p.GoF’s results. It calls crosscall2 (_cgoexp_GoF, frame, framesize, ctxt) using the gcc ABI.

crosscall2

crosscall2 is a four-argument adapter from the gcc function call ABI to the gc function call ABI. Code of this function is running in the Go runtime, but it still executing on m.g0’s stack and outside the $GOMAXPROCS limit. crosscall2 saves C callee-saved registers and calls cgocallback (_cgoexp_GoF, frame, ctxt) using the gc ABI.

1
void crosscall2 (void (*_cgoexp_GoF)(void *), void * frame, __int32 framesize , __int64 ctxt);
1
2
3
func crosscall2 (_cgoexp_GoF, frame unsafe.Pointer, framesize int32, ctxt uintptr)
// gcc ABI.
// _cgoexp_GoF is the PC of frame func(frame unsafe.Pointer) function.

crosscall2 is a low-level function varying from arch to arch, for the sake of brevity only amd64 architecture is examined here.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
crosscall2:
    push rdi ; Save registers (PUSH_REGS_HOST_TO_ABI0)
    push rsi
    push rbx
    push rbp
    push r12
    push r13
    push r14
    push r15
    sub rsp, 0x18 
#ifndef GOOS_windows
    ; Store arguments on the stack (non-Windows)
    mov [rsp], rdi        ; fn
    mov [rsp + 0x8], rsi  ; arg
    ; Skip n in DX
    mov [rsp + 0x10], rcx ; ctxt
#else
    ; Store arguments on the stack (Windows)
    mov [rsp], rcx        ; fn
    mov [rsp + 0x8], rdx  ; arg
						  ; Skip n in R8
    mov [rsp + 0x10], r9  ; ctxt
#endif
    call runtime_cgocallback ; Call runtime·cgocallback    
    add rsp, 0x18     ; Restore stack
    pop r15     ; Restore registers (POP_REGS_HOST_TO_ABI0)
    pop r14
    pop r13
    pop r12
    pop rbp
    pop rbx
    pop rsi
    pop rdi
    ret 

Parameters for crosscall2 function call passed via rcx, rdx, r8d, r9. This matches platform ABI - Windows.

figure

Moving futher inside the crosscall2, we see call to cgocallback which uses the same arguments we passed before into crosscall2 but this time __cdecl is used. Look how caller cleans up the stack.

figure

Here we can explicitly make function see arguments passed via stack, by setting our __usercall in IDA.

1
2
3
4
5
void __usercall runtime_cgocallback (
  __int64 *cgoexp_GoF@<0:^0.8>, 
  void *frame@<0:^8.8>,
  __int64 *ctxt@<0:^16.8>
 );

We are all set!

figure

cgocallback

Switches from m.g0’s stack to the original g (m.curg)’s stack, on which it calls cgocallbackg (_cgoexp_GoF, frame, ctxt). cgocallback saves the current stack pointer SP as m.g0.sched.sp, so that any use of m.g0’s stack during the execution of the callback will be done below the existing stack frames. Before overwriting m.g0.sched.sp, it pushes the old value on the m.g0 stack, so that it can be restored later.

1
2
3
4
5
void __usercall runtime_cgocallbackg_0 (
  __int64 *cgoexp_GoF@<0:^0.8>, 
  void *frame@<0:^8.8>, 
  __int64 *ctxt@<0:^16.8>
  );

[!] Quick note: Parsing golang metadata couldn’t help a bit as most of the metadata was stripped from the binary. Thats why golang plugin failed. But we can try it anyway ( Edit > Other > golang:detect_and_parse)

figure

cgocallbackg

Now code of this function is executed on a real goroutine stack (not m.g0). This function mostly responsible for ensuring stack unwindling ( m.g0.sched.sp) if panic occurs. Also it calls runtime.entersyscall function that ensures that $GOMAXPROCS is not exceeded by blocking execution. In addition, this function usses gc ABI meaning that __golang is used. Then it calls _cgoexp_GoF(frame).

1
2
3
4
5
void __usercall runtime_cgocallbackg(
  void (*fn)(void *)@<rax>, 
  void *frame@<rbx>, 
  unsigned __int64 ctxt@<rcx>
  );

For some reason IDA refuses to recognize __golang convention. Setting it does not automatically comment argumetns passed into function. Thats why again we set custom defined __usercall

figure

[!] Quick note: the ctxt variable is of type uintptr that is “an integer type that is large enough to contain the bit pattern of any pointer” in Go . Despite its confusing name uintptr is not a pointer but an integer, thats why ctxt vairable is of type unsigned __int64 and not unsigned __int64 *.

figure

cgocallbackg1

The last frontier of our journey is cgocallbackg1

figure

Now here is another trick, we see a register based call, its not the same as a call to constant function address. Sometimes IDA can be a little buggy while setting call and value types. Here we examine 2 ways, in which proper function argument recognition may be achieved, they are somewhat different. Initially we a presented with this state, cgocallbackg1 eventually calls function fn with the void* frame variable, but it is nowhere to be seen inside the fn brackets. Clearly it was passed via rax. figure One might say we can just set __golang, but it just wont work, simply because in ABIInternal is not standartized way of passing arguments to function. We might stumble upon function where arguments passed via rsi, rdi and setting __golang will not help in any way to make IDA recognize arguments. By pressing Y we set value type to the following signature.

1
void (__usercall *) (void *@<rax>); // value type

Then select force call type

figure

Now function arguments recognized correctly.

figure

Ot the other way by setting call type:

1
void __usercall fn(void *frame@<rax>); // call type

Now IDA properly recognized function call and arguments:

figure

This matches the official documentation of the Go :

1
2
3
4
5
6
7
8
9
10
func cgocallbackg1(fn, frame unsafe.Pointer, ctxt uintptr) {
	// code ...
	// Invoke callback. This function is generated by cmd/cgo and
	// will unpack the argument frame and call the Go function.
	var cb func(frame unsafe.Pointer)
	cbFV := funcval{uintptr(fn)}
	*(*unsafe.Pointer)(unsafe.Pointer(&cb)) = noescape(unsafe.Pointer(&cbFV))
	cb(frame)    // fn is the same as cb, cb stand for callback
	// code ...
}

Upon examination of the function to be invoked, we see that it has no return value, so we set the return type to void. This is our long-awaited callback function—exactly the function that was meant to be called from the very beginning. Here, the callback is invoked via rsi, with rax being passed as an argument. rax takes its value from rbx, which in turn takes its value from the stack at the 0x78 offset.

figure

_cgoexp_GoF(frame) aka cb(frame)

Unpacks the arguments from frame, calls p.GoF, writes the results back to frame, and returns. Now we start unwinding this whole process. Indeed this function that accepted parameter via rax uses it to perform unpacking into registers.

figure

figure

Unwindling.

Now the execution is returned back to cgocallbackg. cgocallbackg calls entersyscall and returns cgocallback. cgocallback switches stack back to m.g0’s stack. Restores the old stack pointer m.g0.sched.sp value from the stack and returns to crosscall2 .crosscall2 restores the callee-save registers for gcc and returns to GoF, which unpacks any result values and returns to f. So the end chain of called functions looks like this:

runtime.crosscall2()

runtime.cgocallback()

runtime.cgocallbackg()

_cgoexp_GoF()

CGO / C —> Go

Before you read this section, check out this cool article related to CGO internals, that covers topic on WinAPI functions’ address resolution. The author also explains cgocalls and asmcgocall function internal structure. I’ve started reversing Go internal without knowing beforehand about this article, so some things might overlap. Whenever GO calls C functions it invokes stdcall function from the runtime package. stdcall function itself has internal mechanisms. Internal it includes the following chain of functions that is called in consecutive order.

[package] runtime

stdcall

asmcgocall

asmstdcall

stdcall function has 9 implementations (stdcall0 … stdcall8), depending on the amount of arguments. The picture below misses the runtime_stdcall8 name . Inside the binary it is common to the these names among functions’ list:

figure

Case study - initHighResTimer

We will look inside Go function from the runtime package - runtime.initHighResTimer.

figure

figure

Right from the start we can see that IDA incorrectly determined calling convention __fastcall along with a number of arguments. stdcall4 has only 5 parameters. The Go stdcall4 prototype defined as follows:

1
2
3
4
5
6
func stdcall4(fn stdFunction, a0, a1, a2, a3 uintptr) uintptr {
	mp := getg().m
	mp.libcall.n = 4
	mp.libcall.args = uintptr(noescape(unsafe.Pointer(&a0)))
	return stdcall(fn)
}

So we change that a little to make conform to the gcc ABI by setting __stdcall and removing unnecessary trailing parameter.

figure

figure

Here we see how, in rax, an offset is passed to another offset that points to the WinAPI function CreateWaitableTimerExW. However, because the second argument is of type float, it is passed via the xmm15 register, which takes 16 bytes in size. This means the variable occupies the space of two arguments on the stack. That is why the total count of arguments passed is four, not five. The function also uses the platform ABI, as it is about to call a Windows-specific function written in C (stdcall).

figure The right calling convention is as follows:

1
__int64 __stdcall runtime_stdcall4(__int64 fn, __m128  a0, __int64 a1, __int64 a2)

This is equivalent to customly set __usercall.

1
2
3
4
5
6
void __usercall runtime_stdcall4(
	__int64 fn@<^0.8>,  //  __int64, offset 0x0, size 8 bytes 
	__m128 a0@<^8.16>,  //  __m128, offset 0x8, size 16 bytes
	__int64 a1@<^18.8>, //  __int64, offset 0x18, size 8 bytes 
	__int64 a2@<^20.8>  //  __int64, offset 0x20, size 8 bytes
	);

Now stdcall function call takes place, stdcall structure looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
func stdcall(fn stdFunction) uintptr {
	gp := getg()
	mp := gp.m
	mp.libcall.fn = uintptr(unsafe.Pointer(fn))
	resetLibcall := false
	if mp.profilehz != 0 && mp.libcallsp == 0 {
		mp.libcallg.set(gp) // leave pc/sp for cpu profiler
		mp.libcallpc = getcallerpc()
		// sp must be the last, because once async cpu profiler finds
		// all three values to be non-zero, it will use them
		mp.libcallsp = getcallersp()
		resetLibcall = true // See comment in sys_darwin.go:libcCall
	}
	asmcgocall(asmstdcallAddr, unsafe.Pointer(&mp.libcall))
	if resetLibcall {
		mp.libcallsp = 0
	}
	return mp.libcall.r1
}

asmcgocall function signature:

1
func asmcgocall(fn, arg unsafe.Pointer) int32

asmcgocall internal structure:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
asmcgocall:
    ; Load arguments from stack (Go ABI)
    mov rax, [rsp + 8]        ; fn 
    mov rbx, [rsp + 16]       ; arg 
    mov rdx, rsp              ; Save SP

    ; Check if we need to switch to m->g0 stack
    mov rcx, gs:[0]           ; get_tls(CX)
    mov rdi, [rcx]            ; g = tls->g
    test rdi, rdi
    jz nosave                 ; Skip if g == nil

    ; Check if already on gsignal or g0 stack
    mov r8, [rdi + 8]         ; g.m (offset of m in g struct)
    mov rsi, [r8 + 0x10]      ; m->gsignal
    cmp rdi, rsi
    je nosave                 ; Already on gsignal stack
    mov rsi, [r8 + 0x18]      ; m->g0
    cmp rdi, rsi
    je nosave                 ; Already on g0 stack
	
    call gosave_systemstack_switch	; Switch to system stack (m->g0)
    mov [rcx], rsi            ; tls->g = m->g0
    mov rsp, [rsi + 0x28]     ; Restore SP from g0.sched.sp

    ; Prepare stack for C ABI (16-byte aligned)
    sub rsp, 16
    and rsp, -16              ; Align to 16 bytes
    mov [rsp + 8], rdi        ; Save original g
    mov rdi, [rdi + 0x30]     ; g->stack.hi
    sub rdi, rdx              ; Calculate stack depth
    mov [rsp], rdi            
    call runtime.asmcgocall_landingpad     
    ; Restore stack and registers
    mov rcx, gs:[0]           ; get_tls(CX)
    mov rdi, [rsp + 8]        ; Restore original g
    mov rsi, [rdi + 0x30]     ; g->stack.hi
    sub rsi, [rsp]            ; Calculate new SP
    mov [rcx], rdi            ; tls->g = original g
    mov rsp, rsi              ; Restore SP
    mov [rsp + 24], eax       ; Store return value
    ret

nosave:
    ; Already on a system stack (m->g0 or m->gsignal). No g to save.
    sub rsp, 16                ; Allocate 16 bytes for alignment
    and rsp, -15               ; Align stack to 16 bytes (~15 = 0xFFFFFFFFFFFFFFF0)
    
	; Prepare stack for debugging (even though no g is saved)
    mov qword [rsp + 8], 0     ; Store 0 where g would normally be saved (for debuggers)
    mov [rsp], rdx             ; Save original SP (from DX) at [rsp + 0]
    call runtime.asmcgocall_landingpad 
    mov rsi, [rsp]             ; Restore original stack pointer
    mov rsp, rsi               
    mov dword [rsp + 24], eax  ; Store 32-bit return value
    ret

asmcgocall IDA view:

figure

More examples

Example 1

By default IDA can’t recognize Go function arguments so it just skips them. For example, decompiled Go function:

1
__int64 __fastcall fn_1() // no parameters

But if we look at the dissassembly, we there are actually a few arguments passed to function. Diasassembly:

1
2
3
4
5
mov     rax, [rsp+88h]
mov     rbx, [rsp+40h]
call    fn_1
mov     [rsp+0B8h], rax
mov     [rsp+70h], rbx

Correct function prototypes:

1
2
3
4
5
struct go_str __usercall fn_1@<rax,rbx>(struct go_str s@<rax,rbx>);

void __usercall fn_2( __int64 a1@<rdi>, __int64 a2@<rsi>);

void __usercall fn_3(__int64 r1@<rax>,__int64 r2@<rbx>, __int64 r3@<rcx>, __int64 r4@<rdi>);

Example 2

How it looks in IDA:

figure

Function signature:

1
void __usercall sub_65F5C5BA(__int64 dst@<rdi>, __int64 src@<rsi>);

Function actually just copies 80 bytes ( 5 x 16 )from rsi to rdi pointers:

figure

Resources

  • https://go.dev/src/runtime/HACKING
  • https://go.dev/src/runtime/os_windows.go
  • https://go.dev/src/runtime/cgo/asm_386.s
  • https://go.dev/src/runtime/cgo/asm_amd64.s
  • https://medium.com/@aditimishra_541/what-are-go-processors-bfc13b38095e
  • https://leandrofroes.github.io/posts/An-in-depth-look-at-Golang-Windows-calls/
  • https://hex-rays.com/blog/igors-tip-of-the-week-51-custom-calling-conventions
This post is licensed under CC BY 4.0 by the author.