November 27, 2020

TerabitWeb Blog

Fascinating Technology and Security Information

Malware Development Pt. 1: Dynamic Module Loading in Go

14 min read

Original Post from SpecterOps
Author: Dwight Hohnstein


As a blend between offensive security engineer and developer, I find myself frustrated in attempting to adhere to the software development lifecycle (SDLC). The modern day security consultant requires so many disparate tools across a variety of maintainers to be successful in operations, and integrating them into a workflow is awkward at best. Worse, the methods in which these tools are deployed into an environment are often immutable to the user, leaving them with little to no alternatives. Many agents are compiled with a set of static commands, with usually some set of functionality that allows the end user to extend it in a limited capacity. Besides these commands being immutable once compiled and loaded, if an agent were to be compromised by a clever analyst, they’d be able to create detections around it and its entire feature set.

Instead, what if we could load agent functionality as it was needed? A minimal agent core is delivered to the target whose primary function is to load commands from the control server. Once loaded, the commands could then be executed by the agent, dispatching requisite data to the modules and communicating the results. This is exciting because it solves many of the problems outlined above:

  1. If any agent in a mesh were compromised, the capabilities exposed would only be limited to that which was loaded in memory at that time. If the agent isn’t carved from memory, a defender only sees the bare-bones loading functionality. If one were to clean up their memory after executing a module, the module functionality also remains safe unless a defender dumps the memory of the machine while that module is executing.
  2. Modules can live as their own separate code repositories, allowing for easier maintainability and QA testing.
  3. Modules can be written in any language so long as they compile to shared libraries.
  4. Modules can have versioning associated with them, which is a model Cody Thomas’s ( Mythic C2 ( framework supports.
  5. In shops where a dedicated development team does not exist, and new innovations are driven by individuals, it becomes much easier to integrate new functionality any one person develops either on assessment or otherwise.

In this article, I’ll outline a proof-of-concept (POC) written in Go that is capable of loading shared libraries during its run time, and demonstrate how to accomplish two-way communication between an arbitrary shared library and the application core. This POC is written for Linux for the sake of simplicity; however, the same concept can be applied to Windows using Stephen Fewer’s excellent Reflective DLL Injection project (

Why Go?

Go was appealing as an application core for three reasons. First, Go is capable of cross-platform compilation. It’d provide a stable code warehouse for every operating system (OS), and building is (for the most part) straight forward for whatever OS you’d want to deploy on. Second, the ability for Go to interact with C code allows a developer to more easily manage C code without having to code in pure C. Moreover, C gives us direct access to native APIs so that we don’t have to perform reflection to access them. Lastly, I wanted to learn a new language and enjoy Go quite a bit. It’s a minor thing, but hey, it’s important to enjoy what you do.

Application Design Considerations

Before we write any code, we should define what functionality it is we’re trying to build. Our aim is to create a shared library loader that can load libraries in memory, invoke function exports from said library, and return the results of that function. The results will be wildly different from function to function, so results should be stuffed into a datagram structure that both the application and library agree upon. These datagrams should be flexible enough such that when received, the application or library can perform more complex logic with the data within. A module loaded this way may be long running and need to stream output back to the application core. As such, the loading application needs to expose callback functionality a library can call. Conversely, a loaded library may require more data from the application core (such as in the case of chunked file downloads), and thus the library must also have a callback function the application core can feed data into. Both the application and library callback functions must exist for two way communication to occur and is critical for more complex functionality.

These requirements are defined more succinctly as follows:

  1. The application must be capable of loading libraries in memory* (Linux is a special snowflake).
  2. The application must be capable of parsing function exports from these in-memory libraries.
  3. The application must expose an interface for the library so that the newly loaded library can stream data to the application core.
  4. The loaded library must expose an interface for the application core to invoke so that it may receive more data (if required).
  5. Communications must adhere to a datagram specification such that both the application and library can parse received data correctly.

A Brief Foreword on CGo

Before proceeding into the implementation, I wanted to say a few words on how CGo (e.g., how Go can call C and vice-versa) works. This is by no means a complete primer or replacement for the stellar package documentation (found here: Instead, I intend to give a high level overview of the critical concepts required to implement this in Go.

First, C can call Go functions so long as those functions are exported in your package. This is done by the //export flag preceding your Go function definition. You then declare in your C code that there exists some function declared outside the scope of your C file that you can invoke using the extern flag. The documentation is more succinct (see:, but can be summarized by this excerpt here:

Documentation excerpt showing how C can call Go functions.

Conversely, Go can invoke C code directly by using the “C” package so long as the C function is defined in the header file and included in the calling Go file. Again, the documentation can demonstrate this clearly here:

Documentation excerpt calling C code from Go.

The last key concept critical to this POC is understanding how pointers in applications work. A pointer is an address in memory that points to something, be it an object like a datagram or the address of a function. In this POC we’ll be passing pointers of both datagrams and functions betwixt the libraries and application; however, we cannot pass Go function pointers directly. Moreover, the documentation states that “Calling C function pointers is currently not supported, however you can declare Go variables which hold C function pointers and pass them back and forth between Go and C. C code may call function pointers received from Go.”

This gives us everything we need for two way communications. Our Go code can expose a function that C can obtain the address to. C can define a function to invoke an arbitrary function pointer, and Go can invoke this newly defined C function. The address of this C function can be passed between the application core and a newly loaded library, which would complete our requirement of two-way communication.

Datagram Specification

For the purpose of this article, I’ll define a datagram as a special message type that holds data to be passed between the application core and a library. These datagrams are structs with a predictable memory layout so that (in theory) it’s possible to pass pointers to these structs amongst a variety of languages and receive the datagram properly. In this first iteration, a datagram holds the following data:

  1. The data that is being sent between the application and library.
  2. The length of the data being sent.
  3. The name of the module that is sending or receiving the data. This is important so that the application core can route data from the C2, like a file chunk, to the module requiring the data.
  4. The type of message being sent. This allows for more complex application logic for whatever function is receiving the datagram.

This is simply the first iteration of the datagram and I’m sure there’s oversights, but for the sake of this proof of concept, it’ll suffice. In the POC code, it’s defined as the following:

Go Module Specification

We’re going to start with the module specification as I believe it to be easier to understand than the application core. As discussed earlier, each module needs to adhere to the datagram specification and be able to invoke an application callback function to communicate data to the loader. In our discussion of CGo and function pointers we discussed how Go code cannot invoke C function pointers directly; however, they can define a bridge function in C to invoke these C function pointers. If we can pass the application callback function pointer to the invoked module, that module will be able to invoke that function pointer through the C bridge function.

A minimal project folder is going to hold three files:

  1. A file, bridge.h, which declares what a callback function is and the bridge function to invoke that callback function pointer.
  2. A file, bridge.c, that defines the bridge function and invokes the function pointer passed to it with some data.
  3. A file, main.go, that exports two functions — a main and callback function for the application core to invoke. The main executing function of this module will take a pointer to a datagram, a pointer to the application callback function, and a pointer to a datagram struct to be populated by the module.

Finally, let’s define our Go shared library. For this example, we’re going to export a function named helloworld that’ll be invoked by the loading core. In main.go of our library, we’ll define the function as follows:

As we can see, the function takes three arguments. The first argument is a pointer to a datagram that’s forwarded from the application core. The second argument is a C function pointer to the C bridge function in the application core, such that the invoked module can call the application callback function. Lastly, the third argument is a pointer to the resultant datagram to be populated by the module. In our example, we see on line 6 we type cast the pointer to a datagram structure pointer and print out the data that was passed to the function. Further down, at line 16, we pass data from our module back to the loading application by invoking the bridge function defined in bridge.h. Then, before finishing the routine, we populate the resultant datagram pointer with some data to be parsed by the application core.

Our module definition is still incomplete. According to our requirements above, the module may need to receive data from the loading agent, and as such should expose a callback function. We defined our module callback function in our example as follows:

This function simply takes a pointer to a datagram, then should perform additional logic based on the message type or type of data it’s expecting to receive from the loading application. You can imagine a global shared data structure in this module that the main helloworld function blocks on until helloworldCallback populates it, then proceeds with whatever application logic it wants to do next.

Once this is built into a shared library, we should see two functions exported (helloworld and helloworldCallback) for our loading application to get handles to.

Application Core POC

In-Memory Module Loading

On Linux, this has been a solved problem for quite some time, and this POC doesn’t implement a novel loading solution. It allocates a file in RAM using the memfd suite of functions and bootstraps the library and function calls using dlopen and dlsym respectively. I extended the code from @TheXC3LL’s blog post on the subject ( to suit the project requirements by:

  1. Returning exported function pointers using the dlsym function call.
  2. Creating data structures to hold requisite function pointers for the main module logic and module callback functions.
  3. Creating Go package wrappers to manage RAM files and in-memory libraries loaded this way (memfile and memlibrary packages respectively).
Graphical control flow of how in memory library loading works.

The memlibrary Package

The memfile package is relatively straight forward, short, and nothing too complex. Rather, I’ll focus on the implementation of the library loading functionality due to the complexity.

Let’s first cover what aspects of an in-memory library we’d care about as it pertains to our project goals. First, we want the short name of the library we’re invoking so that we can route data to that library at a later time if it was required. Second, we’d want to know where in memory this library lives, and in our case this will be a pointer to the RAM file created using memfd_create. Lastly, we want to hold function pointers to the exported functions of that library that we care about. Because we defined the specification for a new module above, we only care about two functions: the function that performs the main module logic, and the callback function of that library if it requires more data. Given this, I define an in memory module as follows:

Let’s for a moment take for granted that these function pointers are populated successfully by whatever library loading logic is implemented. We’d need two primary wrappers around each module — one to invoke the main function of the module, and another to send data to that module. In Go, we define this to be as follows:

We’ll cover how results are passed between the library and the main application on line 11 later on, but for a moment let’s discuss line 5. As shown previously, C code can invoke function pointers to it while CGo cannot. This function, call_module_callback, is a bridge function that allows us to call the module’s exported callback function from Go. The definition for this function is no different than the callback defined in the module’s bridge.c file, so we won’t cover it further here.

The call_module_function is not dissimilar to the call_module_callback function. The only difference between them is the definition of the function pointers they’re invoking — namely, callbacks only take one argument, while a module function requires three. To get the resultant datagram from the module, we define call_module_function as follows:

(Note: datagram is the struct defined previously, except redefined in the C header file)

Finally, let’s stitch all the pieces together by creating a new InMemoryModule. Here’s the Go code for declaring a new module:

Let’s first start by inspecting the parameters. We need to know where in memory this library is located, hence the requirement for the InMemoryFile. Next, if we want to route data from the application to the newly loaded module, we need some sort of identifier for it which we define as moduleName above. In the actual “Module Specification” section, we have this defined as a constant MODULE_NAME. Finally, we need to tell the application what functions it’s looking for from the library’s export table to acquire a function pointers to, which is denoted by functionName and callbackName for the main module function and callback function respectively. The result from C.load_module will be a pointer to a ModuleFunctions structure that has the desired function pointers should they exist, and if not, we return an error to the callee. Lastly, we add this new InMemoryModule to a module manager within the package so that we can call the module from anywhere in our source code before returning a pointer to the new InMemoryModule.

The load_module function defined in memlibrary.h is defined as follows:

The function takes three arguments: the path to the library to be loaded (libraryPath), the main module function name (functionName), and the module callback should it require more data from the application core (callbackName). On line 7 we load and acquire a handle to the library using the dlopen function (wrapped by load_library here), and allocate a new structure to hold our function pointers from within that library. We then retrieve our function pointers using wrappers to the dlsym command (get_main_export and get_main_callback_export) and populate the structure before returning the resultant ModuleFunctions structure.

The Proof of Concept

Piecing all of the above together, our main application logic ends up looking like the following:

The logic here is relatively straight forward. We first allocate a new file in RAM (line 3) using the memfile package and write our library to that file. In this example, the library is the example “helloworld” library defined in the “Module Specification” section (Note: While we’re grabbing this file from disk, you could deliver this byte array through any means you’d like). This library has two exports, helloworld and helloworldCallback, which we tell the memlibrary package to go fetch. The library is returned as the lib variable (line 13), and we can directly invoke it by passing a byte array of data to it (line 19). We can even send data to the library as shown on line 21. Finally, because the package implements a module manager internally, we can invoke the module by its short name anywhere in the main application so long as we know the short name associated with the module as shown on line 23. Compiling and running the application returns the following output:

Result of running the POC above.
Graphical control flow of the proof of concept.


Software development in the offensive security space is rapid, volatile, and spread out amongst individuals each developing their own applications. By moving towards a modular agent design, we can solve many of the problems of integration we face with most open source agents. New modules can be designed on the fly and integrated without having to redeploy or recompile a new callback. These modules could (in theory) be written in any language a developer chooses so long as they adhere to the specification above, facilitating contributions from disparate individuals with varying degrees of experience. Moving towards a modular specification allows an operator to easily tweak, on the fly, any agent command they so choose. Finally, because the agent core is so lightweight, it becomes extremely difficult to signature or determine if the binary is genuinely malicious or not. While this proof of concept is limited in scope, I believe it to be extendable to a variety of operating systems, and hope that it has at least stimulated the creativity of other developers in our field. If you’d like the full source code for this proof of concept, it can be found at the links below.


Application Core POC:

Shared Library POC:

Mythic C2:

CGo Documentation:

Library Loading on Linux in C:

Reflective DLL Injection:

Malware Development Pt. 1: Dynamic Module Loading in Go was originally published in Posts By SpecterOps Team Members on Medium, where people are continuing the conversation by highlighting and responding to this story.

Go to Source
Author: Dwight Hohnstein

Copyright © All rights reserved. | Newsphere by AF themes.