Go calling conventions, CGO callback internals | Go —> C
Few notes on Go calling conventions, setting the custom calling convention in IDA Pro __usercall and Go internals. In the malware trainings related to analysis of Go internals, Go dlls are not always covered. Malware incorporating reflective loaders in Go is a new trend, thats why i decided to write this quick guide for myself to refer to while reversing Go binaries.
Go calling conventions
The official Go documentation says the following on the go calling convention. Example:
1
2
func f(a1 uint8, a2 [2]uintptr, a3 uint8) (r1 struct { x uintptr; y [2]uintptr }, r2 string)
// on a 64-bit architecture with hypothetical integer registers R0–R9.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
On entry:
a1 goes to R0
a3 goes to R1
Stack frame is laid out in the following sequence:
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — lower address
a2 [2]uintptr <— stack-assigned arguments
r1.x uintptr <— stack-assigned results
r1.y [2]uintptr <
a1Spill uint8 <— arg spill area
a3Spill uint8 <— arg spill area
_ [6]uint8 <— alignment padding
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — higher address
[!] In the stack frame, only the a2 field is initialized on entry, the rest of the frame is left uninitialized.
[!] Only arguments, not results, are assigned a spill area on the stack.
On exit:
r2.base goes to R0 <-- Result r2 is decomposed into its components, which are individually register-assigned.
r2.len goes to R1
r1.x and r1.y are initialized in the stack frame.
[!] a2 and r1 are stack-assigned because they contain arrays.
Go ABI
There are numerous calling conventions, that really make debugging tiresome, especially in Go where many of them are fused in a single binary. Some of them include:
gcc ABI: Used by gcc for C/C++ code. x64_86 (64-bit)
- Linux:
- Arguments: First :
rdi,rsi,rdx,rcx,r8,r9. - Additional arguments:
stack. - Return value:
rax.
- Arguments: First :
- Windows:
- Arguments:
rcx,rdx,r8,r9. - Additional arguments:
stack - Return value:
rax.
x86 (32-bit)
- Arguments:
- Linux and Windows:
- Arguments:
stack. - Return value:
eax.
- Arguments:
Go ABI: A general term for Go’s calling conventions (includes ABI0 and ABIInternal).
- Platform-specific:
- When Go interacts with external code (e.g., C or assembly), it uses the platform’s native ABI (e.g., gcc ABI on Linux).
- For example:
- On Linux x86-64, Go uses
rdi,rsi,rdx,rcx,r8,r9for arguments (matching the gcc ABI). - On Windows x86-64, Go uses
rcx,rdx,r8,r9for arguments (matching the Windows ABI).
- On Linux x86-64, Go uses
ABI0: The old, stack-based Go ABI (deprecated).
ABIInternal: The current, register-based Go ABI.
- Registers used for arguments:
- Go’s ABI uses a custom set of registers to pass arguments, which may include
rax,rbx,rcx,rdx,rsi,rdi,r8,r9,r10,r11, etc. - The exact registers used depend on the function signature and the Go compiler’s optimization decisions.
- Go’s ABI uses a custom set of registers to pass arguments, which may include
ABI Platform: The platform-specific ABI used for interfacing with external code (also used by Go when calling C code )
Go entities
To aid the analysis, knowledge of Go structures is required:
g- Goroutine, lightweight thread in Go, has auser stackassociated with itp- Processor, a logical entity that manages scheduling of goroutines onto threads.m- Machine, OS-level thread, has asystem stackassociated with it, also known asg0(on Unix platforms, asignal stack).p.GoF- Go Function associated with a processor.m.g0- system goroutine associated with an OS thread.
All g, m, and p objects are heap allocated, but are never freed, so their memory remains type stable. As a result, the runtime can avoid write barriers in the depths of the scheduler.
getg().m.curg- to get the current userggetg()- returns the currentg, but when executing on the system or signal stacks, this will return the current M’s “g0” or “gsignal”, respectivelygetg() == getg().m.curg- to determine if you’re running on the user stack or the system stack.
CGO callback / Go —> C
If gcc compiled function f callling back to Go this is what happens next. To make it possible for gcc-compiled C code to call a Go function p.GoF, cgo writes a gcc-compiled function named GoF (not p.GoF, since gcc doesn’t know about packages) which acts like a bridge. This GoF function is written in C and is an intermediary between the C code and the Go runtime. The gcc-compiled C function f calls GoF. GoF initializes “frame”, a structure containing all of its arguments and slots for p.GoF’s results. It calls crosscall2 (_cgoexp_GoF, frame, framesize, ctxt) using the gcc ABI.
crosscall2
crosscall2 is a four-argument adapter from the gcc function call ABI to the gc function call ABI. Code of this function is running in the Go runtime, but it still executing on m.g0’s stack and outside the $GOMAXPROCS limit. crosscall2 saves C callee-saved registers and calls cgocallback (_cgoexp_GoF, frame, ctxt) using the gc ABI.
1
void crosscall2 (void (*_cgoexp_GoF)(void *), void * frame, __int32 framesize , __int64 ctxt);
1
2
3
func crosscall2 (_cgoexp_GoF, frame unsafe.Pointer, framesize int32, ctxt uintptr)
// gcc ABI.
// _cgoexp_GoF is the PC of frame func(frame unsafe.Pointer) function.
crosscall2 is a low-level function varying from arch to arch, for the sake of brevity only amd64 architecture is examined here.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
crosscall2:
push rdi ; Save registers (PUSH_REGS_HOST_TO_ABI0)
push rsi
push rbx
push rbp
push r12
push r13
push r14
push r15
sub rsp, 0x18
#ifndef GOOS_windows
; Store arguments on the stack (non-Windows)
mov [rsp], rdi ; fn
mov [rsp + 0x8], rsi ; arg
; Skip n in DX
mov [rsp + 0x10], rcx ; ctxt
#else
; Store arguments on the stack (Windows)
mov [rsp], rcx ; fn
mov [rsp + 0x8], rdx ; arg
; Skip n in R8
mov [rsp + 0x10], r9 ; ctxt
#endif
call runtime_cgocallback ; Call runtime·cgocallback
add rsp, 0x18 ; Restore stack
pop r15 ; Restore registers (POP_REGS_HOST_TO_ABI0)
pop r14
pop r13
pop r12
pop rbp
pop rbx
pop rsi
pop rdi
ret
Parameters for crosscall2 function call passed via rcx, rdx, r8d, r9. This matches platform ABI - Windows.
Moving futher inside the crosscall2, we see call to cgocallback which uses the same arguments we passed before into crosscall2 but this time __cdecl is used. Look how caller cleans up the stack.
Here we can explicitly make function see arguments passed via stack, by setting our __usercall in IDA.
1
2
3
4
5
void __usercall runtime_cgocallback (
__int64 *cgoexp_GoF@<0:^0.8>,
void *frame@<0:^8.8>,
__int64 *ctxt@<0:^16.8>
);
We are all set!
cgocallback
Switches from m.g0’s stack to the original g (m.curg)’s stack, on which it calls cgocallbackg (_cgoexp_GoF, frame, ctxt). cgocallback saves the current stack pointer SP as m.g0.sched.sp, so that any use of m.g0’s stack during the execution of the callback will be done below the existing stack frames. Before overwriting m.g0.sched.sp, it pushes the old value on the m.g0 stack, so that it can be restored later.
1
2
3
4
5
void __usercall runtime_cgocallbackg_0 (
__int64 *cgoexp_GoF@<0:^0.8>,
void *frame@<0:^8.8>,
__int64 *ctxt@<0:^16.8>
);
[!] Quick note: Parsing golang metadata couldn’t help a bit as most of the metadata was stripped from the binary. Thats why golang plugin failed. But we can try it anyway ( Edit > Other > golang:detect_and_parse)
cgocallbackg
Now code of this function is executed on a real goroutine stack (not m.g0). This function mostly responsible for ensuring stack unwindling ( m.g0.sched.sp) if panic occurs. Also it calls runtime.entersyscall function that ensures that $GOMAXPROCS is not exceeded by blocking execution. In addition, this function usses gc ABI meaning that __golang is used. Then it calls _cgoexp_GoF(frame).
1
2
3
4
5
void __usercall runtime_cgocallbackg(
void (*fn)(void *)@<rax>,
void *frame@<rbx>,
unsigned __int64 ctxt@<rcx>
);
For some reason IDA refuses to recognize __golang convention. Setting it does not automatically comment argumetns passed into function. Thats why again we set custom defined __usercall
[!] Quick note: the ctxt variable is of type uintptr that is “an integer type that is large enough to contain the bit pattern of any pointer” in Go . Despite its confusing name uintptr is not a pointer but an integer, thats why ctxt vairable is of type unsigned __int64 and not unsigned __int64 *.
cgocallbackg1
The last frontier of our journey is cgocallbackg1
Now here is another trick, we see a register based call, its not the same as a call to constant function address. Sometimes IDA can be a little buggy while setting call and value types. Here we examine 2 ways, in which proper function argument recognition may be achieved, they are somewhat different. Initially we a presented with this state, cgocallbackg1 eventually calls function fn with the void* frame variable, but it is nowhere to be seen inside the fn brackets. Clearly it was passed via rax.
One might say we can just set __golang, but it just wont work, simply because in ABIInternal is not standartized way of passing arguments to function. We might stumble upon function where arguments passed via rsi, rdi and setting __golang will not help in any way to make IDA recognize arguments. By pressing Y we set value type to the following signature.
1
void (__usercall *) (void *@<rax>); // value type
Then select force call type
Now function arguments recognized correctly.
Ot the other way by setting call type:
1
void __usercall fn(void *frame@<rax>); // call type
Now IDA properly recognized function call and arguments:
This matches the official documentation of the Go :
1
2
3
4
5
6
7
8
9
10
func cgocallbackg1(fn, frame unsafe.Pointer, ctxt uintptr) {
// code ...
// Invoke callback. This function is generated by cmd/cgo and
// will unpack the argument frame and call the Go function.
var cb func(frame unsafe.Pointer)
cbFV := funcval{uintptr(fn)}
*(*unsafe.Pointer)(unsafe.Pointer(&cb)) = noescape(unsafe.Pointer(&cbFV))
cb(frame) // fn is the same as cb, cb stand for callback
// code ...
}
Upon examination of the function to be invoked, we see that it has no return value, so we set the return type to void. This is our long-awaited callback function—exactly the function that was meant to be called from the very beginning. Here, the callback is invoked via rsi, with rax being passed as an argument. rax takes its value from rbx, which in turn takes its value from the stack at the 0x78 offset.
_cgoexp_GoF(frame) aka cb(frame)
Unpacks the arguments from frame, calls p.GoF, writes the results back to frame, and returns. Now we start unwinding this whole process. Indeed this function that accepted parameter via rax uses it to perform unpacking into registers.
Unwindling.
Now the execution is returned back to cgocallbackg. cgocallbackg calls entersyscall and returns cgocallback. cgocallback switches stack back to m.g0’s stack. Restores the old stack pointer m.g0.sched.sp value from the stack and returns to crosscall2 .crosscall2 restores the callee-save registers for gcc and returns to GoF, which unpacks any result values and returns to f. So the end chain of called functions looks like this:
runtime.crosscall2()
runtime.cgocallback()
runtime.cgocallbackg()
_cgoexp_GoF()
CGO / C —> Go
Before you read this section, check out this cool article related to CGO internals, that covers topic on WinAPI functions’ address resolution. The author also explains cgocalls and asmcgocall function internal structure. I’ve started reversing Go internal without knowing beforehand about this article, so some things might overlap. Whenever GO calls C functions it invokes stdcall function from the runtime package. stdcall function itself has internal mechanisms. Internal it includes the following chain of functions that is called in consecutive order.
[package] runtime
stdcall
asmcgocall
asmstdcall
stdcall function has 9 implementations (stdcall0 … stdcall8), depending on the amount of arguments. The picture below misses the runtime_stdcall8 name . Inside the binary it is common to the these names among functions’ list:
Case study - initHighResTimer
We will look inside Go function from the runtime package - runtime.initHighResTimer.
Right from the start we can see that IDA incorrectly determined calling convention __fastcall along with a number of arguments. stdcall4 has only 5 parameters. The Go stdcall4 prototype defined as follows:
1
2
3
4
5
6
func stdcall4(fn stdFunction, a0, a1, a2, a3 uintptr) uintptr {
mp := getg().m
mp.libcall.n = 4
mp.libcall.args = uintptr(noescape(unsafe.Pointer(&a0)))
return stdcall(fn)
}
So we change that a little to make conform to the gcc ABI by setting __stdcall and removing unnecessary trailing parameter.
Here we see how, in rax, an offset is passed to another offset that points to the WinAPI function CreateWaitableTimerExW. However, because the second argument is of type float, it is passed via the xmm15 register, which takes 16 bytes in size. This means the variable occupies the space of two arguments on the stack. That is why the total count of arguments passed is four, not five. The function also uses the platform ABI, as it is about to call a Windows-specific function written in C (stdcall).
The right calling convention is as follows:
1
__int64 __stdcall runtime_stdcall4(__int64 fn, __m128 a0, __int64 a1, __int64 a2)
This is equivalent to customly set __usercall.
1
2
3
4
5
6
void __usercall runtime_stdcall4(
__int64 fn@<^0.8>, // __int64, offset 0x0, size 8 bytes
__m128 a0@<^8.16>, // __m128, offset 0x8, size 16 bytes
__int64 a1@<^18.8>, // __int64, offset 0x18, size 8 bytes
__int64 a2@<^20.8> // __int64, offset 0x20, size 8 bytes
);
Now stdcall function call takes place, stdcall structure looks like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
func stdcall(fn stdFunction) uintptr {
gp := getg()
mp := gp.m
mp.libcall.fn = uintptr(unsafe.Pointer(fn))
resetLibcall := false
if mp.profilehz != 0 && mp.libcallsp == 0 {
mp.libcallg.set(gp) // leave pc/sp for cpu profiler
mp.libcallpc = getcallerpc()
// sp must be the last, because once async cpu profiler finds
// all three values to be non-zero, it will use them
mp.libcallsp = getcallersp()
resetLibcall = true // See comment in sys_darwin.go:libcCall
}
asmcgocall(asmstdcallAddr, unsafe.Pointer(&mp.libcall))
if resetLibcall {
mp.libcallsp = 0
}
return mp.libcall.r1
}
asmcgocall function signature:
1
func asmcgocall(fn, arg unsafe.Pointer) int32
asmcgocall internal structure:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
asmcgocall:
; Load arguments from stack (Go ABI)
mov rax, [rsp + 8] ; fn
mov rbx, [rsp + 16] ; arg
mov rdx, rsp ; Save SP
; Check if we need to switch to m->g0 stack
mov rcx, gs:[0] ; get_tls(CX)
mov rdi, [rcx] ; g = tls->g
test rdi, rdi
jz nosave ; Skip if g == nil
; Check if already on gsignal or g0 stack
mov r8, [rdi + 8] ; g.m (offset of m in g struct)
mov rsi, [r8 + 0x10] ; m->gsignal
cmp rdi, rsi
je nosave ; Already on gsignal stack
mov rsi, [r8 + 0x18] ; m->g0
cmp rdi, rsi
je nosave ; Already on g0 stack
call gosave_systemstack_switch ; Switch to system stack (m->g0)
mov [rcx], rsi ; tls->g = m->g0
mov rsp, [rsi + 0x28] ; Restore SP from g0.sched.sp
; Prepare stack for C ABI (16-byte aligned)
sub rsp, 16
and rsp, -16 ; Align to 16 bytes
mov [rsp + 8], rdi ; Save original g
mov rdi, [rdi + 0x30] ; g->stack.hi
sub rdi, rdx ; Calculate stack depth
mov [rsp], rdi
call runtime.asmcgocall_landingpad
; Restore stack and registers
mov rcx, gs:[0] ; get_tls(CX)
mov rdi, [rsp + 8] ; Restore original g
mov rsi, [rdi + 0x30] ; g->stack.hi
sub rsi, [rsp] ; Calculate new SP
mov [rcx], rdi ; tls->g = original g
mov rsp, rsi ; Restore SP
mov [rsp + 24], eax ; Store return value
ret
nosave:
; Already on a system stack (m->g0 or m->gsignal). No g to save.
sub rsp, 16 ; Allocate 16 bytes for alignment
and rsp, -15 ; Align stack to 16 bytes (~15 = 0xFFFFFFFFFFFFFFF0)
; Prepare stack for debugging (even though no g is saved)
mov qword [rsp + 8], 0 ; Store 0 where g would normally be saved (for debuggers)
mov [rsp], rdx ; Save original SP (from DX) at [rsp + 0]
call runtime.asmcgocall_landingpad
mov rsi, [rsp] ; Restore original stack pointer
mov rsp, rsi
mov dword [rsp + 24], eax ; Store 32-bit return value
ret
asmcgocall IDA view:
More examples
Example 1
By default IDA can’t recognize Go function arguments so it just skips them. For example, decompiled Go function:
1
__int64 __fastcall fn_1() // no parameters
But if we look at the dissassembly, we there are actually a few arguments passed to function. Diasassembly:
1
2
3
4
5
mov rax, [rsp+88h]
mov rbx, [rsp+40h]
call fn_1
mov [rsp+0B8h], rax
mov [rsp+70h], rbx
Correct function prototypes:
1
2
3
4
5
struct go_str __usercall fn_1@<rax,rbx>(struct go_str s@<rax,rbx>);
void __usercall fn_2( __int64 a1@<rdi>, __int64 a2@<rsi>);
void __usercall fn_3(__int64 r1@<rax>,__int64 r2@<rbx>, __int64 r3@<rcx>, __int64 r4@<rdi>);
Example 2
How it looks in IDA:
Function signature:
1
void __usercall sub_65F5C5BA(__int64 dst@<rdi>, __int64 src@<rsi>);
Function actually just copies 80 bytes ( 5 x 16 )from rsi to rdi pointers:
Resources
- https://go.dev/src/runtime/HACKING
- https://go.dev/src/runtime/os_windows.go
- https://go.dev/src/runtime/cgo/asm_386.s
- https://go.dev/src/runtime/cgo/asm_amd64.s
- https://medium.com/@aditimishra_541/what-are-go-processors-bfc13b38095e
- https://leandrofroes.github.io/posts/An-in-depth-look-at-Golang-Windows-calls/
- https://hex-rays.com/blog/igors-tip-of-the-week-51-custom-calling-conventions





















