Skip to main content

The magic of eBPF III: Development playground

Introduction

 At some point, we had to dive into developing programs in eBPF, and that time has finally come. In this post, we'll explore several different approaches to writing eBPF programs, including powerful tools like Cilium and BCC. I'll highlight the methods that I find most efficient and convenient, because as developers, our goal is to write code quickly and effectively, without unnecessary complications. So let's get straight to the point and see how we can streamline our eBPF development workflow.

 I think I should clarify, my go-to method of coding eBPF programs is with Cilium and their bpf2go library. A spectacular and simple way of coding programs in kernelspace, with C like syntax, and a very comfortable way of adapting the userspace with Golang. It turns out that all you need to do that is the big brain of the people in Cilium. I won't spoil anything just yet, but keep in mind that all my tinkering with eBPF has been done with bpf2go.

I strongly advice you to start from my previous posts if you have no idea of what eBPF is. Here I intend to get our feet wet for the first time with some simple code, but simple and eBPF don't go hand on hand, so to have a bit more context on this topic, please go and read first The magic of eBPF I: What is this? and The magic of eBPF II: The not-so-good side. If you already know what this is and want to know how to start writing code, of course feel free to keep reading.

The grandiose Toolchain

 If you’re new to eBPF, you might find it a bit surprising how much excitement and enthusiasm it sparks within the tech community. On the surface, it might seem like a niche technology—one that operates quietly in the background, away from the limelight. Yet, eBPF has earned a loyal following for a good reason. It's a game-changer, unlocking powerful capabilities within the Linux kernel, and the community can’t seem to get enough of it.

 The sheer number of tools built both with and for eBPF is mind-blowing. From performance monitoring and security to networking and beyond, eBPF is proving to be incredibly versatile. While it’s impossible to cover every tool out there (seriously, we’d be here all day), I can point you toward some of the best options to start your own eBPF adventure.

 When it comes to coding and developing with eBPF, you’ve got four main paths to choose from: Cilium, BCC, BPFTrace, and libbpf. Each one offers its own unique approach and strengths, so you can find the right fit for your project or interests. Let’s take a closer look at what each of these brings to the table, so you can make an informed decision and dive into the world of eBPF with confidence.

 Below there is a table where I compared the most relevant aspects in my opinion. In general, each one of them will force you to choose a programming language. Cilium runs with Golang, BCC with Python, BPFTrace is a language in and of itself and libbpf runs with C. More often than not, the eBPF tutorials you can find on the internet use BCC for its ease of use. Python is a really easy language to learn, and as such, is the choice of many to start coding.

Feature Cilium BCC BPFTrace libbpf
Simplicity High-level and relatively easy to use with Kubernetes integration Moderate, requires familiarity with Python and C High, script-based with simpler syntax Low, direct interaction with BPF with more complexity
Performance Optimized for production with minimal overhead Good, but some overhead due to higher-level abstractions Good for tracing, but can be less performant in demanding scenarios The best, provides low-level access for maximum performance
Ease of Coding Medium, needs you to be fluent in Go Easy, requires coding in both Python and C Very easy, scripting-focused with minimal code required Challenging, requires deep knowledge of C and BPF internals
Compatibility Strong Kubernetes support Wide support across Linux distributions Widely compatible, works with most modern Linux systems Broad compatibility, but more manual setup required

 First, let's clear this up. Technically, the eBPF Go library and Cilium aren’t exactly the same thing. Cilium is like the Swiss Army knife of cloud computing and distributed systems, with way more tricks up its sleeve than just eBPF. On the other hand, the eBPF Go library is your go-to toolkit for writing eBPF applications in Go.

 But here’s the fun part: Cilium uses the eBPF Go library under the hood and makes it super easy to plug it into your Go code. So, for the sake of simplicity (and our sanity), let’s treat them as one and the same from here on out. Cool? Cool.

Development flow

 Now, I know I’ve been dropping hints like breadcrumbs, but let’s make it official: we’re using Cilium (aka the eBPF Go library) for this project. Here’s the game plan:

  1. Write Your eBPF Program: Start by coding your eBPF program in C-like syntax. Save it with a .bpf.c extension.
  2. Craft Your Userspace Code in Go: Next up, switch gears to Go and write the userspace code. This is the part that interacts with the kernel and retrieves data from the eBPF maps. Think of it as the bridge between your eBPF program and the outside world.
  3. Add the Go Generate Directive: Add the //go:generate directive to your Go file. This directive tells Go to automatically generate code that maps your Go structs to the kernel maps.
  4. Run the Show.
  5. Debug: What, you thought it was going to work first try? Is this your first time?
  6. Loop: Repeat steps 4 and 5 until it works.

Development Environment

  Now we are more or less ready for setting up our lovely development environment and get our hands dirty with this. About time, right? First, we need the Linux's Kernel utilities. These will allow you to compile and run eBPF code in the Kernel Space.

sudo pacman -S linux-tools linux-headers

  This packages (in Arch) will prompt you to choose particularly what do you want to install, if you choose nothing, the packet manager will install everything so be careful. It should look something like this:

  See how I chose the option number 2 in the linux-tools selector? Those are the bpf tools the linux kernel needs to operate.In contrast, the linux-headers selector is left blank, because it will depend on the kernel version you have installed in your system, so you need to find that one out. Just running uname -r in the terminal should be enough.

 Then we need to install the compilers needed, both clangd and LLVM. Just run the following.

sudo pacman -S clang llvm

 Finally, we need the libraries for the language, and go itself. For Golang you can follow these official guide to install it or you can install it with pacman just as any other package.

sudo pacman -S libbpf go

  Finally you might not have git installed on your system. We don't necessarily need it, but it's always nice to have. Also, I don't know to what extent the go get directives need git to retrieve packages from third-party repositories, regardless, I'd follow this step to even if you don't think you need it.

sudo pacman -S git

  I would say this is more than enough to start tinkering with eBPF with go. There's a chance you might be missing some other libraries I already have installed in my OS by default, but it shouldn't happen. If it does you can always investigate a little further to see what you might be missing. Also, an important note. Keep in mind that this packages are for Arch-based distros. This basically means that they might vary from distro to distro, and you may need to look for the particular name yours wants.

  For your convenience, I'm leaving you a single command that installs everything listed above.

sudo pacman -S git go libbpf llvm clang linux-headers linux-tools

First test

  I mentioned earlier that you might need to install more libraries than I do. Let's put that to the test! We'll start by writing our first eBPF program in Golang to ensure everything is set up correctly.

  When developing an eBPF program, I usually begin by scripting the BPF code in the Kernel Space. Why? It helps me to first get a clear picture of what I need before moving on to the User Space code that processes it. So, without further ado, let's do further.

The BPF code

 The BPF script is composed usually by three different pieces. A map, a struct and a function. The map stores the information in order for it to be sent to the User Space, the struct defines the information that is going to be sent and the function is where you code the logic of the program. Remember, keep it simple. It's supposed to have little overhead, and it's Kernel Programming after all, so don't get too excited.

  Let's start creating a code.bpf.c file and then specifying the map. There are plenty of them, each one serves a different and particular purpose. Some of them are more general-purposed than others, but that's too advanced for this post. For now, and up until you learn eBPF in depth, most of the time you will use just one kind of map, which is the one shown below.

// Define a perf event map to send events to user space
struct {
    __uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
    __uint(max_entries, 1024);
} execve_events SEC(".maps");

  There's a few things to break down here. First, let's talk about the declaration. You might find it a bit unusual to see a C struct that looks like a HashMap. That's because it is, in fact, a HashMap, kind of. You define the type of map you'll be using and specify the maximum number of entries. Keep in mind that different types of maps may behave differently, and the fields required for proper declaration can vary. This is where the documentation becomes incredibly useful.

  Secondly, after the closing curly bracket, you give the map a name as if it were a variable, and then use the SEC() macro to indicate the section of the kernel it should be attached to. That's it! The map is now declared and ready for use.

  Now, let’s move on to declaring the struct that we’ll send to the map. The idea here is to store important values that our program needs to process in User Space. It’s crucial to focus on values that actually matter for what we’re doing. Just throwing in random data just in case won’t help your performance at all. This struct is just a standard C struct, so nothing new or tricky here.

// Structure to hold execve event data
struct execve_event {
    uint32_t pid;              // Process ID
    char comm[TASK_COMM_LEN];  // Command name (process name)
};

  For now, this struct is just a placeholder. We don’t really need this info to print "Hello World!" in User Space, but it’s useful to keep around so I can explain eBPF a bit more. Basically, it contains the PID (Process ID) and the COMM (the name of the executable without the path). This all boils down to saying that this struct holds details about the process running the execv call.

  Next up, we need to implement the function that actually grabs the values we want and sends them off to User Space.

// Tracepoint for syscalls:sys_enter_execve
SEC("tracepoint/syscalls/sys_enter_execve")
int trace_execve(struct trace_event_raw_sys_enter *ctx) {
    struct execve_event event = {};

    // Get current task (process)
    struct task_struct *task = (struct task_struct *)bpf_get_current_task();

    // Populate event data with PID and process name
    event.pid = bpf_get_current_pid_tgid() >> 32;
    bpf_get_current_comm(&event.comm, sizeof(event.comm));

    // Send the event to user space through the perf event map
    bpf_perf_event_output(ctx, &execve_events, 
    	BPF_F_CURRENT_CPU, &event, sizeof(event));

    return 0;
}

  Here we meet the SEC() macro again. As before, it tells us where to hook the program in the kernel. In this case, the code will run every time the kernel hits the execv function call. After that, it’s just the usual stuff: the return type, function name, and a parameter which is the context of the program. This context struct is sometimes necessary to grab certain data from the kernel syscall. Right now, it’s just passed as a parameter to the bpf_perf_event_output helper function, but it can be useful later on.

  I should assume the rest of the code and BPF helpers within the function are self explanatory. We get the PID, get the COMM, put them in the map and send the content's of the map to the User Space. I just mentioned that the context is not used for anything rather than being a parameter for the bpf_perf_event_output helper. Do not remove it, it's a non-optional parameter, so it stays there.

  Now the three main parts of the BPF code are completed. The only thing left to be done is to add some sprinkles so the compiler doesn't cry about us not having the necessary includes, a valid License for the helpers and whatnot. Right below is the complete code of the code.bpf.c file.

code.bpf.c
#include <stdbool.h>
#include <stdint.h>
#include <stdlib.h>

#include <linux/bpf.h>
#include <linux/bpf_common.h>

#include <bpf/bpf_endian.h>
#include <bpf/bpf_helpers.h>

#define TASK_COMM_LEN 16

// Structure to hold execve event data
struct execve_event {
    uint32_t pid;              // Process ID
    char comm[TASK_COMM_LEN];  // Command name (process name)
};

// Define a perf event map to send events to user space
struct {
    __uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
    __uint(max_entries, 1024);
} execve_events SEC(".maps");

// Tracepoint for syscalls:sys_enter_execve
SEC("tracepoint/syscalls/sys_enter_execve")
int trace_execve(struct trace_event_raw_sys_enter *ctx) {
    struct execve_event event = {};

    // Get current task (process)
    struct task_struct *task = (struct task_struct *)bpf_get_current_task();

    // Populate event data with PID and process name
    event.pid = bpf_get_current_pid_tgid() >> 32;
    bpf_get_current_comm(&event.comm, sizeof(event.comm));

    // Send the event to user space through the perf event map
    bpf_perf_event_output(ctx, &execve_events, 
    	BPF_F_CURRENT_CPU, &event, sizeof(event));

    return 0;
}

// The license is important, don't forget to put it
char _license[] SEC("license") = "GPL";

  It has happened to me a couple times where I forgot to put that final line with the license, or the license I put wasn't the one the BPF code was under. It throws a weird error and it might take you longer than it should to figure it out. So check first go and check out if the license is correct, or is there at all.

  In future posts we will also cover a generated header file called "vmlinux.h" that contains mostly everything you might need to code without having to include many more files.

The Golang Code

  So far so good. We have now to code the User Space, that is, the part of the program that processes all that raw information coming from the BPF maps. As I said before, we are using bpf2go, a Cilium library that allows to compile BPF code and run it on Go with relative ease. In all honesty it does more than this, but for now, this explanation is enough.

  Again, the User Space consists of three main parts, just like the BPF code. A struct with the information, a probe to attach the BPF code to an event and a reader to extract data from the map. There's a few more things to it and, obviously, the more complex the program is the more there is to the User Space, but as a general rule, it always stems from the same starting point. It may help you to know that the complexity escalates way more abruptly in the User Space than it does in the Kernel Space. These BPF scripts stay roughly the same, with little increase in difficulty.

 First, we need to retrieve a few libraries from GitHub for our code to run. Mainly two: cilium/ebpf/perf, that allows to read information from a BPF_MAP_TYPE_PERF_EVENT_ARRAY and cilium/ebpf/link, which is the easiest way to link a BPF program to a probe and run it. The code looks like so:

package main

import (
    "bytes"
    "encoding/binary"
    "fmt"
    "log"
    "os"
    "os/signal"
    "syscall"
    
    "github.com/cilium/ebpf/perf"
    "github.com/cilium/ebpf/link"
)

// execveEvent must match the structure from code.bpf.c
type execveEvent struct {
    Pid  uint32
    Comm [16]byte
}

  As you can see, the struct is exactly the same as the one we wrote in the code.bpf.c file. It has to be identical unless you want to manually parse the bytes (a process known as unmarshaling bytes) and carefully place each value with the correct offset in the struct fields. We’ll dive into that in another post. For now, let’s let the Go library handle that for us.

  Now, we need to create the BPF objects, attach the program to a specific probe, and read from it. But here's the catch—we don’t actually have those objects yet. So where are they? Well, nowhere. We’ll get to that soon. For now, let’s pretend we do have them and take a look at the next steps:

func main() {
    // Load the eBPF objects (programs and maps) generated by ebpf2go
    objs := codeObjects{}
    if err := loadCodeObjects(&objs, nil); err != nil {
        log.Fatalf("loading objects: %v", err)
    }
    defer objs.Close()

    // Attach the program to the execve syscall tracepoint
    link, err := link.Tracepoint("syscalls", "sys_enter_execve",
    	objs.codePrograms.TraceExecve, nil)
    if err != nil {
        log.Fatalf("Failed to attach tracepoint: %v", err)
    }
    defer link.Close()
	...
}

 If you remember, at some point I talked about a directive that generated Go code from the eBPF file. That directive is the last thing I'm going to explain, but just know that the things we're missing right now, such as the codeObjects{} struct, or the loadCodeObjects() function come from that autogenerated file. The names will change depending on what you specify as the output name in the directive, so if you set it to "MyBPFCode" for instance, the name of each one will be myBPFCodeObjects{} and loadMyBPFCodeObjects() respectively.

  One more thing we should gander at is the linker. The most interesting things about it are the second and third parameters. The first one is the name of the probe it will attach itself to whereas the latter is the name of the function we coded in the BPF file. You can verify that by yourself.

  Alright, let's move on to the next part, the perf reader. That is what we need to extract and parse data from the BPF map, and this is the exact reason the structs have to match between BPF and the Golang code. At some point you'll code with structs that are relatively complex and this automatic solution will no longer work, but for now let's use it.

  To code a perf reader you just need to declare it as follows:

...
	// Create a perf event reader to read events from the execve_events map
    reader, err := perf.NewReader(objs.ExecveEvents, 4096)
    if err != nil {
        log.Fatalf("Failed to create perf event reader: %v", err)
    }
    defer reader.Close()
...

  With that done, we also need to code a way for the program to gracefully handle a Keyboard Interruption (^C). In go this is easy enough thanks to the power of channels. We can just start a new go-routine and constantly scan for a keyboard interruption. The moment we intercept one, we send it to the main thread of execution and interrupt it. Just like so:

...
	// Set up signal handling to clean up on exit
    sigs := make(chan os.Signal, 1)
    signal.Notify(sigs, syscall.SIGINT, syscall.SIGTERM)
    go func() {
        <-sigs
        fmt.Println("Exiting...")
        os.Exit(0)
    }()
...

  The last thing we need to do is to infinitely loop over the reader so every event that arrives is correctly processed. In our case this is very simple since the only thing we do is to print "Hello World!" every time some event is pushed to the PerfMap. Seems simple enough right? Let's go:

...
	// Read events from the perf event reader
    for {
        record, err := reader.Read()
        if err != nil {
            log.Fatalf("Failed to read from perf event reader: %v", err)
        }

        // Decode the event data
        var event execveEvent
        err = binary.Read(bytes.NewReader(record.RawSample), binary.LittleEndian, &event)
        if err != nil {
            log.Fatalf("Failed to decode event: %v", err)
        }

        // Show proof the program is working correctly
        fmt.Printf("Hello, World! Execv call executed!\n")
    }
}

  Cool, all this should be enough. I'm going to leave all this code down below complete so you can copy and paste it more comfortably. I won't explain what this section does because it's pretty much obvious. Since there's nothing here that is advanced programming or new, I assume you can read the code and know what it does. If that's not the case, consider starting by something simpler than eBPF. Anyway, here you go!

main.go
//go:generate go run github.com/cilium/ebpf/cmd/bpf2go -target bpfel -cc clang code code.bpf.c  -- -I/usr/include/linux/bpf.h

package main

import (
    "bytes"
    "encoding/binary"
    "fmt"
    "log"
    "os"
    "os/signal"
    "syscall"
    
    "github.com/cilium/ebpf/perf"
    "github.com/cilium/ebpf/link"
)

// execveEvent must match the structure from code.bpf.c
type execveEvent struct {
    Pid  uint32
    Comm [16]byte
}

func main() {
    // Load the eBPF objects (programs and maps) generated by ebpf2go
    objs := codeObjects{}
    if err := loadCodeObjects(&objs, nil); err != nil {
        log.Fatalf("loading objects: %v", err)
    }
    defer objs.Close()

    // Attach the program to the execve syscall tracepoint
    link, err := link.Tracepoint("syscalls", "sys_enter_execve", objs.codePrograms.TraceExecve, nil)
    if err != nil {
        log.Fatalf("Failed to attach tracepoint: %v", err)
    }
    defer link.Close()

    // Create a perf event reader to read events from the execve_events map
    reader, err := perf.NewReader(objs.ExecveEvents, 4096)
    if err != nil {
        log.Fatalf("Failed to create perf event reader: %v", err)
    }
    defer reader.Close()

    // Set up signal handling to clean up on exit
    sigs := make(chan os.Signal, 1)
    signal.Notify(sigs, syscall.SIGINT, syscall.SIGTERM)
    go func() {
        <-sigs
        fmt.Println("Exiting...")
        os.Exit(0)
    }()

    // Read events from the perf event reader
    for {
        record, err := reader.Read()
        if err != nil {
            log.Fatalf("Failed to read from perf event reader: %v", err)
        }

        // Decode the event data
        var event execveEvent
        err = binary.Read(bytes.NewReader(record.RawSample), binary.LittleEndian, &event)
        if err != nil {
            log.Fatalf("Failed to decode event: %v", err)
        }

        // Show proof the program is working correctly
        fmt.Printf("Hello, World! Execv call executed!\n")
    }
}

  The last thing I have to tell you before concluding this already long enough post is this line:
//go:generate go run github.com/cilium/ebpf/cmd/bpf2go -target bpfel -cc clang code code.bpf.c -- -I/usr/include/linux/bpf.h. This is what generates the Golang Code from the eBPF program so both Kernel and User spaces can communicate. Its syntax is pretty simple, but can be tricky to understand so we're going to break it down.

  • go:generate: This is the directive, so when you execute go generate, it looks for these and executes whatever iss next to them
  • go run <executable-name>: Tells the directive what to do, in this case, run a Golang executable located in a GitHub repo.
  • -target bfpel: Name of the target architecture. There's plenty of them and can vary from Kernel to Kernel and between machines. In this case it's bpfel. Different architectures allow different code and structs so if you change the target, you may need to change the code for it to work properly.
  • -cc clang <executable-name> <bpf-file-name>: Specifies the compiler to be used, the name of the executable (this gives the name for all those Go structs and functions we used) and what is the file to compile. You cannot compile more than one file per directive.
  • -- -I<path-to-library>: Tells the directive what other libraries you want to link to the program.

Executing the program

  Now all we have left to do is compile and execute the program. To do that you need to follow three steps:

  1. Generate go code from eBPF wiht go generate ./...
  2. Build the executable binaries with go build
  3. Run the program with root privileges

  To generate the go code, you need to go to the root directory of the program (where the file go.mod is located) and execute the following command:

go generate ./...

  You should see a few files appear, both being a mesh of the executable name specified in the directive and the target architecture with extensions .o and .go. In this case the files should look like so: code_bpfel.*.

  Then we need to build the executable by running the following command:

go build main.go code_bpfel.go

  Remember we need to include every go file we coded with, specially the autogenerated ones, so the compiler knows where the objects referenced in the main file are located.

  Finally, we execute the recently compiled binary. We need to do so with sudo because being at kernel level requires root privileges. If you don't do so, the program will fail to execute. You can choose which command to execute, no need to do both.

sudo go run main
sudo ./main

  You can execute go run without building the binaries first, because go is so smart it will look for compiled binaries, and if doesn't find any, builds the binaries automagically. If you want to do so, execute the command below:

sudo go run main.go code_bpfel.go

  If you've done everything correctly, the output should be looking something like this:

  Had anything gone wrong, check the code and re-read the article. It should work. Also, there's a chance you may need to tinker with it and find what works (do a little of detective work) since this is Kernel-dependent and your version might differ from mine. Anyway, I hope you had a ton of fun writing your first toy program in eBPF.

What's next?

  Well, we're done now. It's been a long post, hasn't it? There's a lot to explain about this wonderful topic. Nevertheless I tried my best to keep it concise and not go around the bush too much. I know I left a few things out, but be patient. This will go on for a few posts, I hope.

  Regardless, now you should have a properly configured development playground where you can start coding more and more complex eBPF tools. Who knows, maybe you end up contributing to Cilium. Be that as it may, I can't certainly tell you what's coming up next. Perhaps another post like this one, perhaps something with NASM, or maybe a vim/nvim quickstart to get things ready and working for you. Who knows. My advice is to stay tuned and see what's next.

  Hope to see you soon, keep coding!

Comments

Popular Posts

A Gentle Introduction to NASM: Why and How to Get Started

Introduction     As the title suggests, in this post we'll be discussing many things about Assembly language. This is a series I have been wanting to make for a while now. My experience with Assembly Languages started way back in 2018 when I was in College. There we had a particular subject called Computer and Network Fundamentals where we had to tinker with MIPS Assembly (talk about fundamentals). It was love at first sight.     Now it's been a few years since then, so my skills at programming with assembly language got a bit better (not by much though). In case there's anyone else in this world that would like to start writing a few programs with this wonderful technology, this post and most likely the few following are gonna be of great help, I hope. What exactly is Assembly Language?      All this talk about Assembly is great, but what exactly is it? I’m guessing most of you already have a basic understanding of what Assembly language is and how it funct

The magic of eBPF I: What is this?

 Introduction     This post has been mainly inspired because I've been tinkering with eBPF for the past few months, getting to know it works. Now that I have what could be consider "Solid" knowledge on the matter, I thought to share and tell my experience and the possibilities I foresee with this technology.     If I say that the technology world, especially IT, is in constant evolution and change I surprise no one however, it's been a long time since we get something so potentially game-changing as eBPF is. Originally, it was designed to filter network packets, but now has grown into an incredibly versatile and powerful tool that enables developers and security engineers to run sandboxed programs in the Kernel space.     I'm fully aware that the moment you read Kernel a shiver was sent down your spine, and if it's not the case, I'm glad to meet another fellow low-level enthusiast. Regardless, you shouldn't be scared of any of this. To you as a pro