name: welcome class: title <p>Linux <i><b>e</b>xtended <b>B</b>erkeley <b>P</b>acket <b>F</b>ilters</i></p> <p>.footnote[ <strong>Be kind to the WiFi!</strong><br/> <em>Be kind with others</em><br/> <em>Thank you!</em></p> <p><strong>Slides: <a href="https://workshop.bpf.sh/">https://workshop.bpf.sh/</a></strong> ]</p> --- name: presenters class: extra-details <ul> <li><p>Hello! We are:</p> <ul> <li><p>.emoji[🐕] David Calavera (<a href="https://twitter.com/calavera">@calavera</a>, Netlify)</p></li> <li><p>.emoji[🐕] Lorenzo Fontana (<a href="https://twitter.com/fntlnz">@fntlnz</a>, Sysdig)</p></li> </ul></li> <li><p>The workshop will run from 9:00am to 12:30pm</p></li> <li><p>Feel free to interrupt for questions at any time</p></li> </ul> --- name: book class: <p>.pic[ <img src="/img/book_cover.jpg" alt="BPF book cover" /> ]</p> --- name: requirements class: extra-details <h1 id="pre-requirements">Pre-requirements</h1> <ul> <li><p>A machine with Linux Kernel 4.18+ (or the provided Vagrant machine)</p></li> <li><p>Be comfortable with the UNIX command line</p> <ul> <li><p>navigating directories</p></li> <li><p>editing files</p></li> <li><p>a little bit of bash-fu (environment variables, loops)</p></li> </ul></li> </ul> --- name: inspire class: title <p><em>Tell me and I forget.</em> <br/> <em>Teach me and I remember.</em> <br/> <em>Involve me and I learn.</em></p> <p>Misattributed to Benjamin Franklin</p> <p><a href="https://www.barrypopik.com/index.php/new_york_city/entry/tell_me_and_i_forget_teach_me_and_i_may_remember_involve_me_and_i_will_lear/">(Probably inspired by Chinese Confucian philosopher Xunzi)</a></p> --- name: environment-setup-tmux class: <h1 id="terminals">Terminals</h1> <p>Once in a while, the instructions will say: <br/>“Open a new terminal.”</p> <p>There are multiple ways to do this:</p> <ul> <li><p>create a new window or tab on your machine, and SSH into the VM;</p></li> <li><p>use screen or tmux on the VM and open a new window from there;</p></li> <li><p>Or if you are executing in a local Linux machine just open a new terminal in there;</p></li> </ul> <p>You are welcome to use the method that you feel the most comfortable with.</p> --- name: environment-tmux-cheat class: <h1 id="tmux-cheatsheet">Tmux cheatsheet</h1> <p><a href="https://en.wikipedia.org/wiki/Tmux">Tmux</a> is a terminal multiplexer like <code>screen</code>.</p> <p><em>You don’t have to use it or even know about it to follow along. <br/> But some of us like to use it to switch between terminals. <br/> It comes preinstalled in the Vagrant machine we provided</em></p> <ul> <li>Ctrl-b c → creates a new window</li> <li>Ctrl-b n → go to next window</li> <li>Ctrl-b p → go to previous window</li> <li>Ctrl-b “ → split window top/bottom</li> <li>Ctrl-b % → split window left/right</li> <li>Ctrl-b Alt-1 → rearrange windows in columns</li> <li>Ctrl-b Alt-2 → rearrange windows in rows</li> <li>Ctrl-b arrows → navigate to other windows</li> <li>Ctrl-b d → detach session</li> <li>tmux attach → reattach to session</li> </ul> --- name: environment-vagrant class: <h1 id="vagrant">Vagrant</h1> <ul> <li>If you don’t know what Vagrant is, <strong>don’t worry</strong>.</li> <li>It’s just a tool to create Virtual machines that we use to create a common VM with all the eBPF tools for everyone!</li> <li>This workshop comes with a reference environment expressed in a <code>Vagrantfile</code>.</li> <li>You don’t have to use this one, but be prepared to install stuff! OS X</li> </ul> <pre><code>brew cask install virtualbox brew cask install vagrant </code></pre> <p>Windows</p> <p>Download <a href="https://www.vagrantup.com/downloads.html">Vagrant</a> and <a href="https://www.virtualbox.org/">Virtualbox</a></p> <p>Ubuntu</p> <pre><code>apt install vagrant virtualbox </code></pre> --- name: environment-others class: <h1 id="non-vagrant-aka-all-the-other-environments">Non-Vagrant (aka. All the other environments)</h1> <p>Make sure to have:</p> <ul> <li>git</li> <li>an editor of your choice</li> <li>gcc</li> <li>clang</li> <li>go</li> </ul> <p>The other tools we need, <code>bpftrace</code> and <code>bcc</code> will have their own setup instructions in the respective chapters.</p> --- name: environment-vagrant-cheatsheet class: <h1 id="vagrant">Vagrant</h1> <p>After cloning the workshop repository, enter the environment folder:</p> <pre><code>git clone https://github.com/bpftools/bpf-workshop.git cd bpf-workshop/environment </code></pre> <p>Then there are three major things you can do:</p> <pre><code class="language-bash"># Start the environment vagrant up # Stop the environment vagrant halt # Destroy the environment vagrant destroy # Obtain a shell vagrant ssh </code></pre> --- name: intro-exercises class: <h2 id="hands-on-sections">Hands-on sections</h2> <ul> <li><p>The whole workshop is hands-on</p></li> <li><p>We are going to write some eBPF programs</p></li> <li><p>All hands-on sections are clearly identified, like the gray rectangle below</p></li> </ul> <p>.exercise[ - This is the stuff you’re supposed to do! ]</p> --- name: toc class: <h1 id="table-of-content">Table of content</h1> - [Introduction](#introduction) - [The BPF in Kernel Virtual Machine](#bpf-vm) - [BCC](#bcc) - [bpftrace](#bpftrace) - [eBPF and Kubernetes](#kubernetes) - [eBPF and Linux Networking](#networking) - [Linux Kernel security and eBPF](#security) --- name: introduction class: title Introduction .nav[ [Previous section](#) | [Back to table of contents](#toc) | [Next section](#bpf-vm) ] --- name: introduction-content class: <h1 id="introduction">Introduction</h1> <ul> <li>The BSD Packet Filter: A New Architecture for User-level Packet Capture</li> <li>Virtual Machine to work efficiently with register based CPUs</li> <li>Packet filtering without copying data</li> </ul> --- name: extended-bpf-implementation class: <h1 id="the-extended-bpf-implementation-ebpf">The extended BPF implementation (eBPF)</h1> <ul> <li>Introduced in 2014 by Alexei Starovoitov</li> <li>Increased register size from 2 32-bit registers to 10 64-bit registers</li> <li>Initially designed to optimize network filters</li> </ul> --- name: bpf-vm class: title The BPF in Kernel Virtual Machine .nav[ [Previous section](#introduction) | [Back to table of contents](#toc) | [Next section](#bcc) ] --- name: bpfvm-explaination class: <h1 id="the-bpf-in-kernel-virtual-machine">The BPF in-kernel Virtual Machine</h1> <ul> <li>Implements a general purpose low level RISC instructions</li> <li>Runs the instructions in response to events triggered by the kernel</li> <li>Implements a verifier, so that your programs can’t break the kernel</li> <li>Has different interfaces for different types of programs</li> <li>Widely supported in the kernel</li> <li>Has an upstream LLVM backend, you can compile eBPF code with clang</li> </ul> --- name: bpfvm-diagram class: <h1 id="the-bpf-in-kernel-virtual-machine">The BPF in-kernel Virtual Machine</h1> <p>.pic[ <img src="/img/bpf-vm-diagram.svg" alt="eBPF Virtual Machine Diagram" /> ]</p> --- name: bpfvm-bpf-ebpf class: <h1 id="bpf-ebpf-emoji">BPF … eBPF … .emoji[🤔]</h1> <ul> <li>BPF is the classic implementation, suitable only for basic filtering, BPF is also referred as cBPF;</li> <li>The eBPF instruction set is wider than the BPF instruction set;</li> <li>BPF does not support maps, eBPF does;</li> <li>eBPF has general purpose registers and a stack, BPF only an accumulator and a scratch memory store;</li> </ul> --- name: bpfvm-bpf-maps class: <h1 id="maps">Maps</h1> <ul> <li>BPF Maps data stores that live in the kernel;</li> <li>Can be accessed by any BPF program that knows about them;</li> <li>Programs that run in user-space can also access these maps by using file descriptors;</li> <li>You can store any kind of data in a map, as long as you specify the data size correctly before hand;</li> <li>The kernel treats keys and values as binary blobs and it doesn’t care about what you keep in a map;</li> <li>This is what we use to let userspace programs to extract or feed information into BPF programs running in the kernel!</li> </ul> --- name: bpfvm-bpf-maps-types class: <p><strong>Many different types of maps</strong></p> <ul> <li>Hash table: BPF_MAP_TYPE_HASH</li> <li>Array: BPF_MAP_TYPE_ARRAY</li> <li>Program array maps: BPF_MAP_TYPE_PROG_ARRAY, this one is magic, allows you to store references to bpf programs so that you can do jumps between bpf programs;</li> <li>Perf events array maps: BPF_MAP_TYPE_PERF_EVENT_ARRAY</li> <li>Per-CPU hash maps: BPF_MAP_TYPE_PERCPU_HASH</li> <li>Per-CPU array maps: BPF_MAP_TYPE_PERCPU_ARRAY</li> <li>Stack trace maps: BPF_MAP_TYPE_STACK_TRACE</li> <li>Cgroup array maps: BPF_MAP_TYPE_CGROUP_ARRAY</li> <li>Hash and per cpu has with LRU cache: BPF_MAP_TYPE_LRU_PERCPU_HASH, BPF_MAP_TYPE_LRU_HASH</li> <li>Longest Prefix Match(LPM) Trie: BPF_MAP_TYPE_LPM_TRIE</li> <li>Array of maps, and hash of maps, maps: <code>BPF_MAP_TYPE_ARRAY_OF_MAPS</code> and <code>BPF_MAP_TYPE_HASH_OF_MAPS</code></li> <li>And many more! Find all of them <code>man 2 bpf</code></li> </ul> --- name: bpfvm-bpf-maps-operations class: <p><strong>Maps operations</strong></p> <ul> <li>Lookup a single element value, <code>bpf_map_lookup_elem</code></li> <li>Remove an element, <code>bpf_map_delete_element</code></li> <li>Iterating over elements</li> <li>Updating an element, <code>bpf_map_update_elem</code></li> <li>Get the next key in the map, <code>bpf_map_get_next_key</code></li> <li>Search, get the value and delete in a single atomic operation, <code>bpf_map_lookup_and_delete_element</code></li> <li>Concurrent access is regulated using a mechanism called <code>bpf_spin_lock</code> that is essentially a semaphore;</li> </ul> --- name: bpfvm-bpf-programs class: <h1 id="bpf-programs">BPF programs</h1> <ul> <li>Code that’s triggered based on events in the kernel</li> <li>Context arguments that depend on the event triggered</li> <li>Must always terminate</li> <li>Cannot include outbounded control loops</li> <li>Limited in the number of instructions to execute (changing soon)</li> <li>Can trigger other BPF programs</li> </ul> --- name: bpfvm-bpf-program-helpers class: <h1 id="bpf-program-helpers">BPF program helpers</h1> <ul> <li>General helpers available to any program, like <code>bpf_trace_printk</code> and <code>bpf_get_current_pid_tgid</code></li> <li>Specialized helper available only to specific types of programs, <code>bpf_perf_event_output</code></li> <li><a href="https://github.com/iovisor/bpf-docs/blob/master/bpf_helpers.rst">https://github.com/iovisor/bpf-docs/blob/master/bpf_helpers.rst</a></li> </ul> --- name: bpfvm-bpf-program-types class: <h1 id="bpf-program-types">BPF program types</h1> <ul> <li>Socket filtering: BPF_PROG_TYPE_SOCKET_FILTER, BPF_PROG_TYPE_SK_SKB, BPF_PROG_TYPE_SK_MSG, BPF_PROG_TYPE_SK_REUSEPORT</li> <li>Tracing: BPF_PROG_TYPE_KPROBE, BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_RAW_TRACEPOINT</li> <li>XDP: BPF_PROG_TYPE_XDP</li> <li>Perf events: BPF_PROG_TYPE_PERF_EVENT</li> <li>Cgroups: BPF_PROG_TYPE_CGROUP_SKB, BPF_PROG_TYPE_CGROUP_SOCK, BPF_PROG_TYPE_CGROUP_DEVICE, BPF_PROG_TYPE_CGROUP_SOCK_ADDR</li> <li>Infrared devices: BPF_PROG_TYPE_LIRC_MODE2</li> </ul> --- name: bpfvm-bpf-program-exercise class: <h1 id="bpf-program-example">BPF program example</h1> <p>.exercise[</p> <pre><code>#include <uapi/linux/bpf.h> #define SEC(NAME) __attribute__((section(NAME), used)) SEC("tracepoint/syscalls/sys_enter_execve") int bpf_prog(void *ctx) { char msg[] = "Hello, BPF World!"; bpf_trace_printk(msg, sizeof(msg)); return 0; } char _license[] SEC("license") = "GPL"; </code></pre> <p>]</p> --- name: bpfvm-bpf-program-exercise class: <h1 id="bpf-program-example-part-2">BPF program example (part 2)</h1> <p>.exercise[</p> <pre><code>clang -O2 -target bpf -c hello_world_kern.c -o hello_world_kern.o </code></pre> <p>]</p> --- name: bpfvm-bpf-program-exercise class: <h1 id="bpf-program-example-part-3">BPF program example (part 3)</h1> <p>.exercise[</p> <pre><code>#include <stdio.h> #include "bpf_load.h" int main(int argc, char **argv) { if (load_bpf_file("hello_world_kern.o") != 0) { printf("The kernel didn't load the BPF program\\n"); return -1; } read_trace_pipe(); return 0; } </code></pre> <p>]</p> --- name: bpfvm-bpf-program-resources class: <h1 id="other-resources">Other resources</h1> <ul> <li><a href="https://bpf.sh/usdt-report-doc">https://bpf.sh/usdt-report-doc</a></li> <li><a href="https://fntlnz.wtf/post/xdp-ip-iproute/">https://fntlnz.wtf/post/xdp-ip-iproute/</a></li> </ul> --- name: bcc class: title BCC .nav[ [Previous section](#bpf-vm) | [Back to table of contents](#toc) | [Next section](#bpftrace) ] --- name: bcc-explaination class: <h1 id="the-bpf-compiler-collection">The BPF Compiler Collection</h1> <ul> <li>Toolkit to create and manipulate BPF programs</li> <li>Connects BPF programs with high level programming languages</li> <li>C++, Python, Lua, and Go frontends</li> <li>Dynamic load and unload of BPF programs</li> </ul> --- name: bcc-tools class: <h1 id="bcc-included-tools">BCC included tools</h1> <ul> <li>Tracing and monitoring</li> <li>Networking</li> <li>Introspection</li> </ul> --- name: bcc-hello-world class: <h1 id="bcc-hello-world">BCC hello world</h1> <p>.exercise[ - In the <code>bcc/examples</code> folder; - With root permissions; - Execute the <code>hello_world.py</code> tool; ]</p> --- name: bcc-hello-world-destilled class: <h1 id="bcc-hello-world-destilled">BCC hello world destilled</h1> <pre><code>source = """ int kprobe__sys_clone(void *ctx) { bpf_trace_printk("Hello, World!\n"); return 0; } """ BPF(text = source).trace_print() </code></pre> --- name: bcc-perf-events class: <h1 id="bcc-perf-events">BCC perf events</h1> <ul> <li>Real time event service between BPF and frontend.</li> <li>Active buffer polling</li> </ul> --- name: bcc-perf-events-exercise-1 class: <h1 id="bcc-perf-events-exercise-part-1">BCC perf events exercise (part 1)</h1> <p>.exercise[</p> <pre><code>bpf_source = """ #include <uapi/linux/ptrace.h> BPF_PERF_OUTPUT(events); struct data_t { char comm[16]; }; """ </code></pre> <p>]</p> --- name: bcc-perf-events-exercise-1 class: <h1 id="bcc-perf-events-exercise-part-2">BCC perf events exercise (part 2)</h1> <p>.exercise[</p> <pre><code>bpf_source += """ int on_execve(struct pt_regs *ctx, const char __user *filename, const char __user *const __user *__argv, const char __user *const __user *__envp) { struct data_t data = {}; bpf_get_current_comm(&data.comm, sizeof(data.comm)); events.perf_submit(ctx, &data, sizeof(data)); return 0; } """ </code></pre> <p>]</p> --- name: bcc-perf-events-exercise-3 class: <h1 id="bcc-perf-events-exercise-part-3">BCC perf events exercise (part 3)</h1> <p>.exercise[</p> <pre><code>from bcc import BPF from bcc.utils import printb def dump_data(cpu, data, size): event = bpf["events"].event(data) printb(b"%-16s" % event.comm) bpf = BPF(text = bpf_source) execve_function = bpf.get_syscall_fnname("execve") bpf.attach_kprobe(event = execve_function, fn_name = "on_execve") bpf["events"].open_perf_buffer(dump_data) while 1: bpf.perf_buffer_poll() </code></pre> <p>]</p> --- name: bcc-perf-events-source class: <h1 id="bcc-perf-events-source">BCC perf events source</h1> <p><a href="https://workshop.bpf.sh/exercises/mini_exec_snoop.py">https://workshop.bpf.sh/exercises/mini_exec_snoop.py</a></p> --- name: bcc-profile class: <h1 id="bcc-profile">BCC Profile</h1> <ul> <li>Sample stack traces to profile CPU data</li> <li>Observe where a running application is spending CPU time</li> </ul> --- name: bcc-profile-exercise class: <h1 id="bcc-profile-exercise">BCC Profile exercise</h1> <p>.exercise[</p> <pre><code>sudo tools/profile -p PID </code></pre> <p>]</p> --- name: bcc-profile-exercise class: <h1 id="bcc-profile-exercise-part-2">BCC Profile exercise (Part 2)</h1> <p>.exercise[ - Download the Flamegrapsh scripts:</p> <pre><code>git clone https://github.com/brendangregg/FlameGraph </code></pre> <ul> <li>Generate a flamegraph for your profiled data:</li> </ul> <pre><code>sudo tools/profile -p PID -f > /tmp/profile.out flamegraph.pl /tmp/profile.out > /tmp/profile-graph.svg \ && firefox /tmp/profile-graph.svg </code></pre> <p>]</p> --- name: bcc-takeaways class: <h1 id="takeaways">Takeaways</h1> <ul> <li>Convenient interop with other languages</li> <li>Write one time only tools, and background processes</li> </ul> --- name: bpftrace class: title bpftrace .nav[ [Previous section](#bcc) | [Back to table of contents](#toc) | [Next section](#kubernetes) ] --- name: bpftrace-intro class: <h1 id="bpftrace-bpf-observability-front-end">bpftrace: BPF observability front-end</h1> <p>On GitHub <a href="https://github.com/iovisor/bpftrace">https://github.com/iovisor/bpftrace</a></p> <p><em>What it is</em>:</p> <ul> <li>Higher level language to write eBPF programs;</li> <li>Built from the ground-up for BPF and Linux;</li> <li>Used in production at Netflix, Facebook, etc;</li> <li>Custom one-liners;</li> <li>Comes with tools;</li> <li>It is just for tracing;</li> </ul> <p><em>What it is NOT</em>:</p> <ul> <li>A framework to build your loaders;</li> <li>You can’t do classic bpf with it (like seccomp programs or socket probe types);</li> <li>It does not support traffic control and XDP;</li> </ul> --- name: bpftrace-install class: <h1 id="bpftrace-installation">bpftrace: Installation</h1> <p>We will need to do some exercises with bpftrace. If you are not using the Vagrant environment, you might want to install it now!</p> <p>Ubuntu snap package</p> <pre><code>sudo snap install --devmode bpftrace sudo snap connect bpftrace:system-trace </code></pre> <p>Fedora (28 or later)</p> <pre><code>sudo dnf install bpftrace </code></pre> <p>You can find further instructions <a href="https://github.com/iovisor/bpftrace/blob/master/INSTALL.md">here</a></p> --- name: bpftrace-syntax class: <h1 id="bpftrace-syntax">bpftrace: Syntax</h1> <p>.pic[ <img src="/img/bpftrace-syntax.png" alt="bpftrace-syntax" /> ]</p> --- name: bpftrace-probes class: <h1 id="bpftrace-probes">bpftrace: Probes</h1> <p>.pic[ <img src="/img/probe.png" alt="supported bpf probe types" /> ]</p> --- name: probe type shortcuts class: <h1 id="bpftrace-probe-type-shortcuts">bpftrace: Probe type shortcuts</h1> <table> <thead> <tr> <th>full</th> <th>shortcut</th> <th>Description</th> </tr> </thead> <tbody> <tr> <td>tracepoint</td> <td>t</td> <td>Kernel static tracepoints</td> </tr> <tr> <td>usdt</td> <td>U</td> <td>User-level statically defined tracing</td> </tr> <tr> <td>kprobe</td> <td>k</td> <td>Kernel function tracing</td> </tr> <tr> <td>kretprobe</td> <td>kr</td> <td>Kernel function returns</td> </tr> <tr> <td>uprobe</td> <td>u</td> <td>User-level function tracing</td> </tr> <tr> <td>uretprobe</td> <td>ur</td> <td>User-level function returns</td> </tr> <tr> <td>profile</td> <td>p</td> <td>Timed sampling across all CPUs</td> </tr> <tr> <td>interval</td> <td>i</td> <td>Interval output</td> </tr> <tr> <td>software</td> <td>s</td> <td>Kernel software events</td> </tr> <tr> <td>hardware</td> <td>h</td> <td>Processor hardware events</td> </tr> </tbody> </table> --- name: bpftrace-filters class: <h1 id="bpftrace-filters">bpftrace: Filters</h1> <ul> <li>/pid == 181/</li> <li>/comm != “sshd”/</li> <li>/@ts[tid]/</li> </ul> --- name: bpftrace-actions class: <h1 id="bpftrace-actions">bpftrace: Actions</h1> <p><strong>Per-event output</strong></p> <ul> <li>printf()</li> <li>system()</li> <li>join()</li> <li>time()</li> </ul> <p><strong>Map Summaries</strong></p> <ul> <li>@ = count() or @++</li> <li>@ = hist()</li> </ul> --- name: bpftrace-functions class: <h1 id="bpftrace-functions">bpftrace: Functions</h1> <table> <thead> <tr> <th>function</th> <th>description</th> </tr> </thead> <tbody> <tr> <td>hist(int n)</td> <td>Produce a log2 histogram of values of n</td> </tr> <tr> <td>lhist(int n# int min# int max# int step)</td> <td>Produce a linear histogram of values of n</td> </tr> <tr> <td>count()</td> <td>Count the number of times this function is called</td> </tr> <tr> <td>sum(int n)</td> <td>Sum this value</td> </tr> <tr> <td>min(int n)</td> <td>Record the minimum value seen</td> </tr> <tr> <td>max(int n)</td> <td>Record the maximum value seen</td> </tr> <tr> <td>avg(int n)</td> <td>Average this value</td> </tr> <tr> <td>stats(int n)</td> <td>Return the count# average# and total for this value</td> </tr> <tr> <td>delete(@x)</td> <td>Delete the map element passed in as an argument</td> </tr> <tr> <td>str(char *s [# int length])</td> <td>Returns the string pointed to by s</td> </tr> <tr> <td>printf(char *fmt# …)</td> <td>Print formatted to stdout</td> </tr> </tbody> </table> --- name: bpftrace-functions-contd class: <h1 id="bpftrace-functions-cont-d">bpftrace: Functions (cont’d)</h1> <table> <thead> <tr> <th>function</th> <th>description</th> </tr> </thead> <tbody> <tr> <td>print(@x[# int top [# int div]])</td> <td>Print a map# with optional top entry count and divisor</td> </tr> <tr> <td>clear(@x)</td> <td>Delete all key/values from a map</td> </tr> <tr> <td>sym(void *p)</td> <td>Resolve kernel address</td> </tr> <tr> <td>usym(void *p)</td> <td>Resolve user space address</td> </tr> <tr> <td>ntop([int af# ]int</td> <td>char[4</td> </tr> <tr> <td>kaddr(char *name)</td> <td>Resolve kernel symbol name</td> </tr> <tr> <td>uaddr(char *name)</td> <td>Resolve user space symbol name</td> </tr> <tr> <td>reg(char *name)</td> <td>Returns the value stored in the named register</td> </tr> <tr> <td>join(char *arr[] [# char *delim])</td> <td>Prints the string array</td> </tr> <tr> <td>time(char *fmt)</td> <td>Print the current time</td> </tr> <tr> <td>cat(char *filename)</td> <td>Print file content</td> </tr> <tr> <td>system(char *fmt)</td> <td>Execute shell command</td> </tr> <tr> <td>exit()</td> <td>Quit bpftrace</td> </tr> </tbody> </table> --- name: bpftrace-variable-types class: <h1 id="bpftrace-variable-types">bpftrace: Variable types</h1> <p><strong>Basic Variables</strong></p> <ul> <li>@global</li> <li>@thread_local[tid]</li> <li>$scratch</li> </ul> <p><strong>Associative Arrays</strong></p> <ul> <li>@array[key] = value</li> </ul> <p><strong>Buitins</strong></p> <ul> <li>pid</li> <li>…</li> </ul> --- name: bpftrace-builtin-variables class: <h1 id="bpftrace-builtin-variables">bpftrace: Builtin Variables</h1> <table> <thead> <tr> <th>variable</th> <th>description</th> </tr> </thead> <tbody> <tr> <td>tid</td> <td>Thread ID (kernel pid)</td> </tr> <tr> <td>cgroup</td> <td>Cgroup ID of the current process</td> </tr> <tr> <td>uid</td> <td>User ID</td> </tr> <tr> <td>gid</td> <td>Group ID</td> </tr> <tr> <td>nsecs</td> <td>Nanosecond timestamp</td> </tr> <tr> <td>elapsed</td> <td>Nanosecond timestamp since bpftrace initialization</td> </tr> <tr> <td>cpu</td> <td>Processor ID</td> </tr> <tr> <td>comm</td> <td>Process name</td> </tr> </tbody> </table> --- name: bpftrace-builtin-variables-contd class: <h1 id="bpftrace-builtin-variables-cont-d">bpftrace: Builtin Variables (cont’d)</h1> <table> <thead> <tr> <th>variable</th> <th>description</th> </tr> </thead> <tbody> <tr> <td>pid</td> <td>Process ID (kernel tgid)</td> </tr> <tr> <td>stack</td> <td>Kernel stack trace</td> </tr> <tr> <td>ustack</td> <td>User stack trace</td> </tr> <tr> <td>arg0, arg1, … etc.</td> <td>Arguments to the function being traced</td> </tr> <tr> <td>retval</td> <td>Return value from function being traced</td> </tr> <tr> <td>func</td> <td>Name of the function currently being traced</td> </tr> <tr> <td>probe</td> <td>Full name of the probe</td> </tr> <tr> <td>curtask</td> <td>Current task_struct as a u64</td> </tr> <tr> <td>rand</td> <td>Random number of type u32</td> </tr> <tr> <td>$1, $2, … etc.</td> <td>Positional parameters to the bpftrace program</td> </tr> </tbody> </table> --- name: bpftrace-exercise-tools class: extra-details <h2 id="bpftrace-hands-on-tools">bpftrace hands on: Tools!</h2> <ol> <li>We will clone the bpftrace repository in our Linux machine;</li> <li>We are not cloning it to install bpftrace itself, but to get all the tools under the <code>tools</code> folder</li> </ol> <p>.exercise[ - Clone the bpftrace repo</p> <pre><code class="language-bash"> git clone https://github.com/iovisor/bpftrace.git cd bpftrace/tools </code></pre> <p>]</p> --- name: bpftrace-exercise-tools-tcpretrans class: extra-details <h2 id="bpftrace-hands-on-trace-or-count-tcp-retransmits">bpftrace hands on: Trace or count TCP retransmits</h2> <ul> <li>In the bpftrace tools folder, there’s a tool called <code>tcpretrans.bt</code>;</li> <li>TCP wants to make sure that your packet is received with the <em>guarantee</em> that all the received bytes will be identical and in the same order as those sent, this technique is called <strong>positive acknowledgement with re-transmission</strong>;</li> <li>What happens when there are many retransmits is that your system can have a significant overhead, then you want to know when a retransmit occurs, <code>tcpretrans.bt</code> does just that</li> <li>Retransmits are usually a sign of poor network health, and this tool is useful for their investigation. Unlike using tcpdump, this tool has very low overhead, as it only traces the retransmit function. It also prints additional kernel details: the state of the TCP session at the time of the retransmit.</li> </ul> --- name: bpftrace-exercise-tools-tcpretrans-contd class: extra-details <h2 id="bpftrace-hands-on-trace-or-count-tcp-retransmits-cont-d">bpftrace hands on: Trace or count TCP retransmits (cont’d)</h2> <p>.exercise[ - In the <code>bpftrace/tools</code> folder; - With root permissions; - Execute the <code>tcpretrans.bt</code> tool;</p> <pre><code class="language-bash"> bpftrace tcpretrans.bt </code></pre> <ul> <li>Once it’s started, the best way to trigger some retransmits is to try to connect to a closed port;</li> <li>Try it on a new terminal while leaving <code>tcpretrans.bt</code> active!</li> </ul> <pre><code class="language-bash"> telnet bpf.sh 9090 </code></pre> <p>]</p> --- name: bpftrace-oneliners-read-kernel-tracing-vfs class: extra-details <h1 id="bpftrace-hands-on-tracing-read-bytes-using-a-kretprobe">bpftrace hands on: tracing read bytes using a kretprobe</h1> <ul> <li>We will use the capability of bpftrace to instrument the <code>vfs_read</code> function in the kernel using a <code>kretprobe</code>;</li> <li>We will create an array called <code>bytes</code> that will dump a linear histogram where the arguments are: value, min, max, step. The first argument (retval) of vfs_read() is the return value: the number of bytes read;</li> </ul> <p>.exercise[ - Execute this one liner using bpftrace, then let it run for a while then use <code>Ctrl-C</code> to dump the results</p> <pre><code class="language-bash">bpftrace -e 'kretprobe:vfs_read { @bytes = lhist(retval, 0, 2000, 200); }' </code></pre> <p>]</p> <p>.footnote[.smaller[ In Linux, all files are accessed through the Virtual Filesystem Switch, or VFS, a layer of code which implements generic filesystem actions and vectors requests to the correct specific code to handle the request. ]]</p> --- name: bpftrace-oneliners-read-syscall class: extra-details <h1 id="bpftrace-hands-on-tracing-read-bytes-using-a-tracepoint">bpftrace hands on: tracing read bytes using a tracepoint</h1> <ul> <li>We want to do the same thing we did with the <code>kretprobe</code> in the previous exercise</li> </ul> <p>.exercise[ - Execute this one liner using bpftrace</p> <pre><code class="language-bash">bpftrace -e 'tracepoint:syscalls:sys_exit_read { @bytes = lhist(args->ret, 0, 2000, 200); }' </code></pre> <ul> <li>Let it run for a while then use <code>Ctrl-C</code> to dump the results ]</li> </ul> <p><strong>What’s the difference?</strong> While being very powerful (it can trace any kernel function), <code>kretprobe</code> approach can’t be considered “stable”, because internal kernel functions can change between kernels. On the other hand using a tracepoint is a much more stable approach because tracepoints are considered as a user facing feature and not an internal one by kernel developers. Whenever possible use tracepoints instead of kprobe/kretprobe.</p> --- name: bpftrace-oneliners-uretprobe class: extra-details <h1 id="bpftrace-hands-on-reading-userspace-returns">bpftrace hands on: reading userspace returns</h1> <p>We have a Go program that prints a random number every second.</p> <pre><code class="language-go">package main import( "time" "fmt" "math/rand" ) func main() { for { time.Sleep(time.Second * 1) fmt.Printf("%d\n", giveMeNumber()) } } func giveMeNumber() int { return rand.Intn(100) + rand.Intn(900) } </code></pre> <p>We want to get the random number out of it using a bpftrace program.</p> --- name: bpftrace-oneliners-uretprobe-contd class: extra-details <h1 id="bpftrace-hands-on-reading-userspace-returns">bpftrace hands on: reading userspace returns</h1> <p>.exercise[ - Create a file named <code>main.go</code> with the code from <a href="#bpftrace-oneliners-uretprobe">previous slide</a>; - Then, compile it with:</p> <pre><code class="language-bash"> go build -o randomnumbers main.go </code></pre> <ul> <li>This will create a binary named <code>randomnumbers</code> in the current folder;</li> <li>Once that is done, we just start the program <code>./randomnumbers</code>;</li> <li>Now, in a new terminal, we instrument the program using bpftrace and a <code>uretprobe</code>:</li> </ul> <pre><code class="language-bash">bpftrace -e \ 'uretprobe:./randomnumbers:"main.giveMeNumber" { printf("%d\n", retval) }' </code></pre> <p>]</p> <p>.footnote[.smaller[Bonus point! Try to do an <code>objdump -t randomnumbers | grep -i giveMe</code>, what do you notice?]]</p> --- name: bpftrace-internals class: <h1 id="bpftrace-internals">bpftrace: Internals</h1> <p>.pic[ <img src="/img/bpftrace-internals.png" alt="supported bpf probe types" /> ]</p> --- name: bpftrace-takeaways class: <h1 id="takeaways">Takeaways</h1> <ul> <li>There’s an higher level language to use eBPF, called <code>bpftrace</code>;</li> <li><code>bpftrace</code> can be used only for eBPF based tracing;</li> <li>It’s pretty magic;</li> <li>There are a <strong>LOT</strong> of premade tools you can use in the <code>bpftrace/tools</code> folder, saves a lot of time;</li> </ul> --- name: bpftrace-credits class: <h1 id="credits-and-references">Credits and References</h1> <ul> <li><a href="https://github.com/iovisor/bpf-docs/blob/master/bpftrace_public_template_jun2019.odp">Brendan Gregg’s presentation on bpftrace</a></li> <li><a href="https://github.com/iovisor/bpftrace/blob/master/docs/reference_guide.md">IOVisor’s bpftrace reference guide</a></li> <li><a href="https://github.com/iovisor/bpftrace">IOVisor’s bpftrace repository</a></li> <li><a href="https://en.wikipedia.org/wiki/Transmission_Control_Protocol">Wikipedia article on TCP</a></li> <li><a href="https://github.com/iovisor/bpftrace/blob/b6c4136fabf2527fc736bc08ee875625156b5431/docs/tutorial_one_liners.md">IOVisor’s bpftrace one liner’s tutorial</a></li> </ul> --- name: kubernetes class: title eBPF and Kubernetes .nav[ [Previous section](#bpftrace) | [Back to table of contents](#toc) | [Next section](#networking) ] --- name: kubernetes-just-a-container class: <h1 id="approach-1-just-use-a-container">Approach #1: Just use a container</h1> <ul> <li>A sidecar container sharing the process namespace;</li> <li>You just provide an image with eBPF loader as entrypoint;</li> <li>The loader will just load the program and execute it;</li> <li>Not extremely generic but does the job!</li> <li>A very flexible approach!</li> </ul> <p>.small[.small[</p> <pre><code class="language-yaml">apiVersion: v1 kind: Pod metadata: name: happy-borg spec: shareProcessNamespace: true containers: - name: execsnoop image: calavera/execsnoop securityContext: - privileged: true volumeMounts: - name: sys # mount the debug filesystem mountPath: /sys readOnly: true - name: headers # mount the kernel headers required by bcc mountPath: /usr/src readOnly: true - name: modules # mount the kernel modules required by bcc mountPath: /lib/modules readOnly: true - name: container doing random work ... </code></pre> <p>]]</p> --- name: kubernetes-kubectl-trace class: <h1 id="approach-2-kubectl-trace">Approach #2: kubectl-trace</h1> <ul> <li>It’s basically <a href="#bpftrace"><em>bpftrace</em></a>, but for the kubectl!</li> <li>It’s on GitHub <a href="https://github.com/iovisor/kubectl-trace">iovisor/kubectl-trace</a></li> </ul> <p>.pic[ <img src="img/kubernetes-kubectl-trace.png" alt="img/kubernetes-kubectl-trace.png" /> ]</p> --- name: networking class: title eBPF and Linux Networking .nav[ [Previous section](#kubernetes) | [Back to table of contents](#toc) | [Next section](#security) ] --- name: networking-intro class: <h1 id="ebpf-and-linux-networking">eBPF and Linux Networking</h1> <p><strong>Main use cases</strong></p> <ul> <li>Retrospective analysis of network traffic captured on a live system, using the pcap format for example;</li> <li>Live packet filtering, e.g: allow only UDP traffic and discard anything else;</li> <li>Live observation of a filtered set of packets flowing into a live system;</li> </ul> <p><strong>At different levels</strong></p> <ul> <li>cBPF packet filtering</li> <li>Raw packet filtering (<code>BPF_PROG_TYPE_SOCKET_FILTER</code>)</li> <li>Traffic control</li> <li>XDP</li> </ul> --- name: networking-tcpdump class: <h1 id="cbpf-and-packet-filtering">cBPF and packet filtering</h1> <ul> <li>Packet filtering is done using an accumulator on which filters are applied, the classic BPF way;</li> <li>One of the most popular use cases for it is <code>tcpdump</code>;</li> <li>It doesn’t support the use of maps;</li> </ul> <p><strong>Tcpdump</strong></p> <ul> <li>Probably the most known use cases for live packets observation;</li> <li>It is implemented as a frontend for <code>libpcap</code>;</li> <li>Allows to define high level filtering expression that are then converted to a BPF filtering expression;</li> <li>Tcpdump can dump the used BPF for user inspection;</li> <li>Can read from an existing pcap file and filter on it</li> </ul> --- name: networking-tcpdump-exercise class: <h1 id="hands-on-tcpdump-packet-filtering">hands on: Tcpdump packet filtering</h1> <p>.exercise[ In a new terminal, execute <code>tcpdump</code> with a filter and use the <code>-d</code> option to dump the generated BPF assembly.</p> <pre><code class="language-bash">tcpdump -d 'ip and tcp port 8080' </code></pre> <p>What do you see? Anything noteworthy? ]</p> --- name: networking-tcpdump-exercise-result class: <h1 id="tcpdump-hands-on-what-is-that-stuff">tcpdump hands on: What is that stuff?</h1> <p>.pic[ <img src="/img/tcpdump.png" alt="tcpdump explanation" /> ]</p> --- name: networking-raw-packets class: <h1 id="raw-packets-filtering">Raw packets filtering</h1> <ul> <li>Attach a BPF program to a socket</li> <li>All the packets received by it will be to the program as an <code>sk_buff</code> struct</li> <li>The program can make the decision on whether to discard or allow them based on its logic</li> </ul> <p>Here’s an example, the program type is given by the <code>SEC("socket")</code> definition that gets translated to <code>BPF_PROG_TYPE_SOCKET_FILTER</code>.</p> <p>.small[</p> <pre><code class="language-c">SEC("socket") int socket_prog(struct __sk_buff *skb) { int proto = load_byte(skb, ETH_HLEN + offsetof(struct iphdr, protocol)); int one = 1; int *el = bpf_map_lookup_elem(&countmap, &proto); if (el) { (*el)++; } else { el = &one; } bpf_map_update_elem(&countmap, &proto, el, BPF_ANY); return 0; } </code></pre> <p>]</p> --- name: networking-traffic-control class: <h1 id="traffic-control-tc-and-ebpf">Traffic Control (tc) and eBPF</h1> <ul> <li>tc is the kernel packet scheduling subsystem;</li> <li>It’s made of mechanisms and queuing systems that decide how packet flows and are accepted into the system;</li> <li>It has a classifier that can use a bpf program to make the decisions, called <code>cls_bpf</code>;</li> </ul> <p>Among tc use cases there are:</p> <ul> <li>Prioritize certain kinds of packets</li> <li>Drop specific kind of packet</li> <li>Bandwidth distribution</li> </ul> --- name: networking-traffic-control class: <h1 id="traffic-control-cls-bpf-hook-points">Traffic Control cls_bpf hook points</h1> <ul> <li>cls_bpf can hook in ingress and egress</li> <li>that means that you can manipulate both packets your machine receives and packets it sends!</li> <li>programs receive an <code>sk_buff</code></li> </ul> <p>Here’s a diagram showing the interactions:</p> <p>.pic[<img src="/img/tc-flow-bpf-cls.png" alt="cls_bpf interactions" />]</p> --- name: networking-tc-example class: <h1 id="example-tc-program-to-drop-all-tcp-packets">Example: TC program to drop all TCP packets</h1> <p>.small[.small[</p> <pre><code class="language-c">SEC("classifier") static inline int classification(struct __sk_buff *skb) { void *data_end = (void *)(long)skb->data_end; void *data = (void *)(long)skb->data; struct ethhdr *eth = data; __u16 h_proto; __u64 nh_off = 0; nh_off = sizeof(*eth); if (data + nh_off > data_end) { return TC_ACT_OK; } h_proto = eth->h_proto; if (h_proto != bpf_htons(ETH_P_IP)) { return TC_ACT_OK; } struct iphdr *iph = data + nh_off; if (iph + 1 > data_end) { return TC_ACT_OK; } if (iph->protocol -= IPPROTO_TCP) { return TC_ACT_SHOT } return TC_ACT_OK; } </code></pre> <p>]]</p> <p>.small[ The classifier program is added to the qdisc using <code>tc</code>:</p> <pre><code>tc filter add dev eth0 ingress bpf obj classifier.o flowid 0: </code></pre> <p>]</p> --- name: networking-xdp class: <h1 id="xpress-data-path">Xpress Data Path</h1> <ul> <li>Programs are of type <code>BPF_PROG_TYPE_XDP</code></li> <li>There are three operation modes: <ul> <li><em>Native:</em> the network card driver supports XDP, code runs on the driver receive path;</li> <li><em>Offloaded:</em> the network card hardware supports XDP, the nic CPU will execute the logic;</li> <li><em>Generic:</em> It’s provided as a test mode for developers it’s for testing xdp programs without having the proper hardware;</li> </ul></li> <li>Once packets are processed, XDP will return one of its possible codes: <ul> <li><em>XDP_DROP</em>: drop the packet;</li> <li><em>XDP_TX</em>: forward the packet;</li> <li><em>XDP_REDIRECT</em>: similar to TX but forward to another nic or map of type CPU map;</li> <li><em>XDP_PASS</em>: allow the packet</li> </ul></li> </ul> --- name: networking-xdp-tc-differences class: <h1 id="differences-between-tc-and-xdp">Differences between TC and XDP</h1> <ul> <li>XDP programs are executed earlier in the ingress data path, before entering in the main kernel network stack;</li> <li>Program does not have access to a Socket buffer struct sk_buff like with tc;</li> <li>XDP programs instead take a different structure called xdp_buff that is an eager representation of the packet without metadata;</li> </ul> <p><strong>All this comes with advantages and disadvantages</strong>:</p> <p>Being executed even before the kernel code, XDP programs can drop packets in a very efficient way. Compared to tc programs, XDP programs can only be attached to traffic in ingress to the system.</p> --- name: networking-xdp-interactions class: <h1 id="xdp-packets-processor">XDP packets processor</h1> <p>.footnote[.smaller[ - It executes BPF programs for XDP packets - Coordinates the interaction between them and the network stack - It ensures that packets are read and writeable and allows to attach post processing verdicts in the form of packet processor actions - The illustrated return codes before, are its return actions!</p> <p>]] .pic[ <img src="/img/xdp-interaction-diagram.png" alt="xdp packets processor" /> ]</p> --- name: networking-xdp-example class: <h1 id="example-xdp-program-to-drop-all-tcp-packets">Example: XDP program to drop all TCP packets</h1> <p>.small[</p> <pre><code class="language-c">SEC("mysection") int myprogram(struct xdp_md *ctx) { int ipsize = 0; void *data = (void *)(long)ctx->data; void *data_end = (void *)(long)ctx->data_end; struct ethhdr *eth = data; struct iphdr *ip; ipsize = sizeof(*eth); ip = data + ipsize; ipsize += sizeof(struct iphdr); if (data + ipsize > data_end) { return XDP_DROP; } if (ip->protocol == IPPROTO_TCP) { return XDP_DROP; } return XDP_PASS; } </code></pre> <p>It can be loaded on any interface using:</p> <p><code> ip link set dev enp0s8 xdp obj udp.o sec mysection </code>]</p> --- name: security class: title Linux Kernel security and eBPF .nav[ [Previous section](#networking) | [Back to table of contents](#toc) | [Next section](#) ] --- name: security-seccomp class: extra-details <h1 id="seccomp">Seccomp</h1> <ul> <li>Stands for Secure Computing;</li> <li>Implements a filtering backend based on cBPF</li> <li>You can write a BPF program hat filters the execution of any syscall by allowing/disallowing the ones you want based on your logic;</li> </ul> <p>Here’s the seccomp data structure for filters as from <code>linux/seccomp.h</code></p> <pre><code class="language-c">struct seccomp_data { int nr; __u32 arch; __u64 instruction_pointer; __u64 args[6]; }; </code></pre> <p>Allows to filter based on: the system call, its arguments or a combination of them.</p> --- name: security-lsm class: extra-details <h1 id="lsm-hooks">LSM Hooks</h1> <ul> <li>The Linux security modules (LSM) framework, has a set of hooks to control the execution of (e)BPF programs,</li> <li>Allows to create a fine-grained set of privileges around them when using a module that implements BPF hooks support</li> <li>Actually implemented by Landlock and SELinux</li> <li>The only in kernel tree implementation is SELinux</li> <li>It’s based on the concept of hook calls instead of syscalls</li> </ul> --- name: security-seccomp-exercise class: extra-details <h2 id="hands-on-seccomp-filters-using-bpf-programs">hands on: Seccomp filters using bpf programs</h2> <p>.exercise[ - Clone the exercise repository and cd into it</p> <pre><code class="language-bash"> git clone https://gist.github.com/fntlnz/08ae20befb91befd9a53cd91cdc6d507 seccomp-exercise cd seccomp-exercise </code></pre> <ul> <li>After following the instructions in <code>README.md</code>, what do you notice? ]</li> </ul> --- name: credits class: extra-details <h2 id="credits">Credits</h2> <ul> <li>Thanks to all the eBPF authors and tools makers for their awesome work on it;</li> <li>Many thanks to <a href="https://github.com/jpetazzo">Jérôme Petazzoni</a>, we adapted Jérôme’s template from <a href="https://container.training">container.training</a> to Hugo, we also used the terminals setup instructions and the tmux cheatsheet from that deck!</li> <li>Thanks to the <a href="https://github.com/gnab/remark">remark</a> authors for their work on it, it’s the tool we use to generate the slides;</li> <li>Thanks to the <a href="https://gohugo.io/">hugo</a> authors, for the awesome static site generator;</li> </ul> ---