AFL Internals - Instrumentation

Introduction

American Fuzzy Lop is a fuzzer developed by lcamtuf and many others joined him to make it better. For example, a forkserver is designed by J. Horn for faster target executions. Also, the pre-assembly pass has been replaced by a proper LLVM pass written by L. Szekeres. In this article, I’m going to give a shot to explain the internals since it is almost always pleasant to follow brilliant people’s work.

The Macro Perspective

There are severals components that make up AFL. Three most important programs are:

afl-gcc - The GCC front-end that is basically a replacement for gcc as in CC=afl-gcc (though it calls & needs gcc),
afl-as - The assembler that modifies the assembly code before piping it to the GNU assembler, adds instrumentation,
afl-fuzz - The executable that generates the inputs, executes the binary, tracks the progress and finally maintains & classifies the outputs.

Compilation of AFL is pretty straightforward. The Makefile does say it all:

afl-gcc: afl-gcc.c $(COMM_HDR) | test_x86
    $(CC) $(CFLAGS) $@.c -o $@ $(LDFLAGS)
    set -e;for i in afl-g++ afl-clang afl-clang++; do ln -sf afl-gcc $$i; done
afl-as: afl-as.c afl-as.h $(COMM_HDR) | test_x86
    $(CC) $(CFLAGS) $@.c -o $@ $(LDFLAGS)
    ln -sf afl-as as
afl-fuzz: afl-fuzz.c $(COMM_HDR) | test_x86
    $(CC) $(CFLAGS) $@.c -o $@ $(LDFLAGS)

The same binary is used for all other front-ends: afl-g++, afl-clang, afl-clang++ and decision is based on argv[0]. For the curious, in a Makefile, the following variables are defined in a target context:

$@: the target, left-hand side of the semicolon, usually the binary,
$<: first prerequisite, first token just after the semicolon, usually the source file,
$^: all prerequisites, all after the semicolon.

afl-gcc

The entrypoint is pretty basic:

/* Main entry point */
int main(int argc, char** argv) {
  if (isatty(2) && !getenv("AFL_QUIET")) {
    SAYF(cCYA "afl-cc " cBRI VERSION cRST " by <lcamtuf@google.com>\n");
  } else be_quiet = 1;
                                                                                           
  if (argc < 2) {
    SAYF("\n"
        ... removed for the sake brevity ...
        );
    exit(1);
  }
                                                                                           
  find_as(argv[0]);
  edit_params(argc, argv);
  execvp(cc_params[0], (char**)cc_params);
  FATAL("Oops, failed to execute '%s' - check your PATH", cc_params[0]);
  return 0;
}

When argc < 2, it shows a help message that says just replace your usual CC=gcc (CC=clang) with CC=afl-gcc (CC=afl-clang) depending on you toolchain. The result is saved in the global static u8* as_path;.

The program starts with locating the afl-as either in the same directory as afl-gcc or in the one defined by getenv("AFL_PATH"). Then, edit_params() is called that prepares the parameters for the execution of the real C compiler on the next line.

Recall that execvp() replaces the current process (see execvp(3)) so FATAL() will never be reached unless it fails to execute the compiler.

    cc_params[cc_par_cnt++] = "-B";
    cc_params[cc_par_cnt++] = as_path;

In edit_params(), the assembler path is added to parameters with a -B switch. In this directory, gcc will look for the binary called as and since we know from the Makefile that afl-as is symbolically linked to as, it will use afl-as as an assembler.

afl-as

The core of the compiler front-end is afl-as.c. The entrypoint is as follows:

/* Main entry point */
int main(int argc, char** argv) {

  s32 pid;
  u32 rand_seed;
  int status;
  u8* inst_ratio_str = getenv("AFL_INST_RATIO");

  struct timeval tv;
  struct timezone tz;

  clang_mode = !!getenv(CLANG_ENV_VAR);

  if (isatty(2) && !getenv("AFL_QUIET")) {
    SAYF(cCYA "afl-as " cBRI VERSION cRST " by <lcamtuf@google.com>\n");
  } else be_quiet = 1;

  if (argc < 2) {
    SAYF("\n"
    ... removed for the sake of brevity ...
    exit(1);
  }

  gettimeofday(&tv, &tz);
  rand_seed = tv.tv_sec ^ tv.tv_usec ^ getpid();
  srandom(rand_seed);
  edit_params(argc, argv);
  if (inst_ratio_str) {
    if (sscanf(inst_ratio_str, "%u", &inst_ratio) != 1 || inst_ratio > 100)
      FATAL("Bad value of AFL_INST_RATIO (must be between 0 and 100)");
  }

  if (getenv(AS_LOOP_ENV_VAR))
    FATAL("Endless loop when calling 'as' (remove '.' from your PATH)");

  setenv(AS_LOOP_ENV_VAR, "1", 1);

/* When compiling with ASAN, we don't have a particularly elegant way to skip
   ASAN-specific branches. But we can probabilistically compensate for
   that... */

  if (getenv("AFL_USE_ASAN") || getenv("AFL_USE_MSAN")) {
    sanitizer = 1;
    inst_ratio /= 3;
  }

  if (!just_version) add_instrumentation();

  if (!(pid = fork())) {
    execvp(as_params[0], (char**)as_params);
    FATAL("Oops, failed to execute '%s' - check your PATH", as_params[0]);
  }

  if (pid < 0) PFATAL("fork() failed");
  if (waitpid(pid, &status, 0) <= 0) PFATAL("waitpid() failed");
  if (!getenv("AFL_KEEP_ASSEMBLY")) unlink(modified_file);
  exit(WEXITSTATUS(status));
}

Similary, there are two interesting functions called by main(). The first one is edit_params(). Basically, it copies most of the arguments to a new array for GNU as invocation. The last argument is the input file and this argument is saved in global as in static u8* input_file = argv[argc-1];. Instead of the input file, as the last argument, an output file, allocated in a temporary directory, is passed.

  modified_file = alloc_printf("%s/.afl-%u-%u.s", tmp_dir, getpid(),
                               (u32)time(NULL));

wrap_things_up:

  as_params[as_par_cnt++] = modified_file;
  as_params[as_par_cnt]   = NULL;

Next call is add_instrumentation() that does the heavy lifting. After the input file is modified by this function, fork() is called. In the child (0 == fork()), the assembler is invoked with the new parameters while the parent (0 != child_pid = fork()) waits for the child to conclude.

$ ls -alt /tmp/.afl*
total 64
drwxr-xr-x 13 user  users  4096 May 14 19:37 ..
drwxr-xr-x  2 user  users  4096 May 14 18:36 .
-rw-------  1 user  users 27195 May 14 18:36 .afl-269117-1652542607.s
-rw-------  1 user  users 27195 May 14 18:35 .afl-269063-1652542539.s

Setting the environment variable AFL_KEEP_ASSEMBLY to a nonzero value is useful to keep the instrumented assembly files. It is possible to observe these files under the temporary directory after compilation.

Enrichment

So what is the idea here? Or better, what does afl-as is trying to achieve? Basically, AFL aims to find particular inputs that reach every edge that resulted in branching in the target code. However, unless an emulator is used, there is no way to signal the top-level runner in which edge the code is currently on. This problem is solved by instrumentation.

Let’s see how void add_instrumentation() works.

/* Process input file, generate modified_file. Insert instrumentation in all
   the appropriate places. */
static void add_instrumentation(void) {
  static u8 line[MAX_LINE];

  FILE* inf;
  FILE* outf;
  s32 outfd;
  u32 ins_lines = 0;

  u8  instr_ok = 0, skip_csect = 0, skip_next_label = 0,
      skip_intel = 0, skip_app = 0, instrument_next = 0;

  if (input_file) {
    inf = fopen(input_file, "r");
    if (!inf) PFATAL("Unable to read '%s'", input_file);
  } else inf = stdin;

  outfd = open(modified_file, O_WRONLY | O_EXCL | O_CREAT, 0600);
  if (outfd < 0) PFATAL("Unable to write to '%s'", modified_file);
  outf = fdopen(outfd, "w");
  if (!outf) PFATAL("fdopen() failed");

First of all, input assembly file is opened as FILE *inf; and temporary output file s32 outfd;. The function also defines several flags:

instr_ok: Nonzero when we are in a text section,
skip_csect: Skip 32bit code when compiling 64bit code,
skip_next_label: Skip labels likes .LBB0,
skip_intel: Skip Intel style assembly, matches *.intel_syntax",
skip_app: Skip code between lines starting with #APP and #NO_APP,
instrument_next: Add instrumentation code before the next instruction.

while (fgets(line, MAX_LINE, inf)) {
/* In some cases, we want to defer writing the instrumentation trampoline
   until after all the labels, macros, comments, etc. If we're in this
   mode, and if the line starts with a tab followed by a character, dump
   the trampoline now. */
  if (!pass_thru && !skip_intel && !skip_app && !skip_csect && instr_ok &&
      instrument_next && line[0] == '\t' && isalpha(line[1])) {
    fprintf(outf, use_64bit ? trampoline_fmt_64 : trampoline_fmt_32,
            R(MAP_SIZE));

    instrument_next = 0;
    ins_lines++;
  }
  /* Output the actual line, call it a day in pass-thru mode. */
  fputs(line, outf);
  if (pass_thru) continue;

The loop starts by reading a line from the input. If certain flags are matched, the instrumentation block located at trampoline_fmt_{32,64} is added to the output. Next, the line is copied to the output. Flags like skip_app, skip_csect, skip_intel help decide whether or not the next block is instrumented.

if (line[0] == '\t' && line[1] == '.') {
/* OpenBSD puts jump tables directly inline with the code, which is
   a bit annoying. They use a specific format of p2align directives
   around them, so we use that as a signal. */

  if (!clang_mode && instr_ok && !strncmp(line + 2, "p2align ", 8) &&
      isdigit(line[10]) && line[11] == '\n') skip_next_label = 1;

  if (!strncmp(line + 2, "text\n", 5) ||
      !strncmp(line + 2, "section\t.text", 13) ||
      !strncmp(line + 2, "section\t__TEXT,__text", 21) ||
      !strncmp(line + 2, "section __TEXT,__text", 21)) {
    instr_ok = 1;
    continue;
  }

  if (!strncmp(line + 2, "section\t", 8) ||
      !strncmp(line + 2, "section ", 8) ||
      !strncmp(line + 2, "bss\n", 4) ||
      !strncmp(line + 2, "data\n", 5)) {
    instr_ok = 0;
    continue;
  }
}

The first line tries to match an assembler directive. Next one tries not to interfere with the alignment when we are in a text section (nonzero instr_ok). Irrelevant sections are skipped by setting instr_ok to zero.

/* Detect off-flavor assembly (rare, happens in gdb). When this is
   encountered, we set skip_csect until the opposite directive is
   seen, and we do not instrument. */
if (strstr(line, ".code")) {
  if (strstr(line, ".code32")) skip_csect = use_64bit;
  if (strstr(line, ".code64")) skip_csect = !use_64bit;
}

skip_csect flag tries to skip 32-bit code when we are compiling a 64-bit binary.

/* Detect syntax changes, as could happen with hand-written assembly.
   Skip Intel blocks, resume instrumentation when back to AT&T. */
 if (strstr(line, ".intel_syntax")) skip_intel = 1;
 if (strstr(line, ".att_syntax")) skip_intel = 0;

Skips Intel syntax assembly.

/* Detect and skip ad-hoc __asm__ blocks, likewise skipping them. */
 if (line[0] == '#' || line[1] == '#') {
   if (strstr(line, "#APP")) skip_app = 1;
   if (strstr(line, "#NO_APP")) skip_app = 0;
 }

Skips ad-hoc blocks between #APP and #NO_APP.

/* If we're in the right mood for instrumenting, check for function
   names or conditional labels. This is a bit messy, but in essence,
   we want to catch:

     ^main:      - function entry point (always instrumented)
     ^.L0:       - GCC branch label
     ^.LBB0_0:   - clang branch label (but only in clang mode)
     ^\tjnz foo  - conditional branches

     ...but not:

     ^# BB#0:    - clang comments
     ^ # BB#0:   - ditto
     ^.Ltmp0:    - clang non-branch labels
     ^.LC0       - GCC non-branch labels
     ^.LBB0_0:   - ditto (when in GCC mode)
     ^\tjmp foo  - non-conditional jumps

   Additionally, clang and GCC on MacOS X follow a different convention
   with no leading dots on labels, hence the weird maze of #ifdefs
   later on. */
if (skip_intel || skip_app || skip_csect || !instr_ok ||
    line[0] == '#' || line[0] == ' ') continue;

First check skips comments like #.. and #...,

/* Conditional branch instruction (jnz, etc). We append the instrumentation
   right after the branch (to instrument the not-taken path) and at the
   branch destination label (handled later on). */
if (line[0] == '\t') {
  if (line[1] == 'j' && line[2] != 'm' && R(100) < inst_ratio) {
    fprintf(outf, use_64bit ? trampoline_fmt_64 : trampoline_fmt_32,
            R(MAP_SIZE));
    ins_lines++;
  }
  continue;
}

Next, the instrumentation code trampoline_fmt_{32,64} is added depending on the arch of the binary. The conditions are the instruction should be non-conditional branching and instruction ratio must satisfy R(100) < instr_ratio.

#ifdef __APPLE__
    /* Apple: L<whatever><digit>: */
    if ((colon_pos = strstr(line, ":"))) {
      if (line[0] == 'L' && isdigit(*(colon_pos - 1))) {
#else
    /* Everybody else: .L<whatever>: */
    if (strstr(line, ":")) {
      if (line[0] == '.') {
#endif /* __APPLE__ */
        /* .L0: or LBB0_0: style jump destination */
#ifdef __APPLE__
        /* Apple: L<num> / LBB<num> */
        if ((isdigit(line[1]) || (clang_mode && !strncmp(line, "LBB", 3)))
            && R(100) < inst_ratio) {
#else
        /* Apple: .L<num> / .LBB<num> */
        if ((isdigit(line[2]) || (clang_mode && !strncmp(line + 1, "LBB", 3)))
            && R(100) < inst_ratio) {
#endif /* __APPLE__ */
          /* An optimization is possible here by adding the code only if the   
             label is mentioned in the code in contexts other than call / jmp. 
             That said, this complicates the code by requiring two-pass        
             processing (messy with stdin), and results in a speed gain        
             typically under 10%, because compilers are generally pretty good  
             about not generating spurious intra-function jumps.               
                                                                               
             We use deferred output chiefly to avoid disrupting                
             .Lfunc_begin0-style exception handling calculations (a problem on 
             MacOS X). */

          if (!skip_next_label) instrument_next = 1; else skip_next_label = 0;
        }
      } else {
        /* Function label (always instrumented, deferred mode). */
        instrument_next = 1;
      }
    }
  }

This one is the maze that has been mentioned in the comments section before. So we have already instrumented when the condition is false, this one instruments when the condition is true. If the parser matches any branch labels such as ^.LXX or ^.LBBX_X, it enables the flag instrument_next so that the loop prologue would do the necessary adjustments.

if (ins_lines)
  fputs(use_64bit ? main_payload_64 : main_payload_32, outf);

if (input_file) fclose(inf);
fclose(outf);

Finally, main_payload_{32,64} is written to the output file before closing it and sending it to the GNU as for the assembly.

Instrumentation

In this section, I’ll try to explain how adding trampoline_fmt_{32,64} and main_payload_{32,64} allows it to signal the fuzzer. They are both defined in afl-as.h. For the sake of simplicity, I’ll follow the 32bit version. Let’s start with trampoline_fmt_32.

static const u8* trampoline_fmt_32 =
  "\n" 
  "/* --- AFL TRAMPOLINE (32-BIT) --- */\n"
  "\n"                                     
  ".align 4\n"                             
  "\n"                                     
  "leal -16(%%esp), %%esp\n"               
  "movl %%edi,  0(%%esp)\n"                
  "movl %%edx,  4(%%esp)\n"                
  "movl %%ecx,  8(%%esp)\n"                
  "movl %%eax, 12(%%esp)\n"                
  "movl $0x%08x, %%ecx\n"                  
  "call __afl_maybe_log\n"                 
  "movl 12(%%esp), %%eax\n"                
  "movl  8(%%esp), %%ecx\n"                
  "movl  4(%%esp), %%edx\n"                
  "movl  0(%%esp), %%edi\n"                
  "leal 16(%%esp), %%esp\n"                
  "\n"                                     
  "/* --- END --- */\n"                    
  "\n";

Recall this code block is added to every edge of the program. Basically, it loads the address %esp-16 (space for 4 32bit registers) to %esp and saves registers %edi, %edx, %ecx, %eax in that location and calls __afl_maybe_log with the parameter in %ecx. The value of this parameter is R(MAP_SIZE). MAP_SIZE is defined in config.h as (1 << MAP_SIZE_POW2) where MAP_SIZE_POW2 is 16. R() picks a random number and it is defined in types.h as follows:

#define R(x) (random() % (x))

Therefore instrumentation code tags every edge with a random number between 0 and 64k. If your code has more edges than 64k, try increasing this value. After the call, registers are restored as well as the stack pointer %esp.

static const u8* main_payload_32 = 
  "\n" "/* --- AFL MAIN PAYLOAD (32-BIT) --- */\n" "\n"
  ".text\n"
  ".att_syntax\n"
  ".code32\n"
  ".align 8\n" "\n"

  "__afl_maybe_log:\n" "\n"
  "  lahf\n"
  "  seto %al\n" "\n"
  "  /* Check if SHM region is already mapped. */\n" "\n"
  "  movl  __afl_area_ptr, %edx\n"
  "  testl %edx, %edx\n"
  "  je    __afl_setup\n" "\n"
  "__afl_store:\n"

Prelude of the main_payload_32 denotes we are injecting code in ATT syntax and want it to be aligned at 256 bytes boundary. __afl_maybe_log starts with saving status flags into the AH register $ (AH \leftarrow EFLAGS(SF:ZF:0:AF:0:PF:1:CF)) $. seto sets AL if OF is true. Next, the code checks whether the __afl_area_ptr is initialized and calls __afl_setup if not.

  ".align 8\n" "\n"
  "__afl_setup:\n" "\n"
  "  /* Do not retry setup if we had previous failures. */\n" "\n"
  "  cmpb $0, __afl_setup_failure\n"
  "  jne  __afl_return\n" "\n"
  "/* Map SHM, jumping to __afl_setup_abort if something goes wrong.\n"
  "   We do not save FPU/MMX/SSE registers here, but hopefully, nobody\n"
  "   will notice this early in the game. */\n" "\n"
  "  pushl %eax\n"
  "  pushl %ecx\n" "\n"
  "  pushl $.AFL_SHM_ENV\n"
  "  call  getenv\n"
  "  addl  $4, %esp\n" "\n"
  "  testl %eax, %eax\n"
  "  je    __afl_setup_abort\n" "\n"
  "  pushl %eax\n"
  "  call  atoi\n"
  "  addl  $4, %esp\n" "\n"
  "  pushl $0          /* shmat flags    */\n"
  "  pushl $0          /* requested addr */\n"
  "  pushl %eax        /* SHM ID         */\n"
  "  call  shmat\n"

This code block call getenv to get the value of an environment variable called __AFL_SHM_ID defined in config.h as follows:

#define SHM_ENV_VAR         "__AFL_SHM_ID"

The value of this variable is exactly the one returned by shmget() in afl_fuzz.c. Basically, the fuzzer maps a shared memory region passed the id of this region to the instrumented binary via this environment. The memory region is then mapped in the binary via a call to shmat. The return value in %eax is the mapped address and saved in __afl_area_ptr.

"  pushl $0          /* shmat flags    */\n"
"  pushl $0          /* requested addr */\n"
"  pushl %eax        /* SHM ID         */\n"
"  call  shmat\n"
"  addl  $12, %esp\n" "\n"
"  cmpl $-1, %eax\n"
"  je   __afl_setup_abort\n" "\n"
"  /* Store the address of the SHM region. */\n" "\n"
"  movl %eax, __afl_area_ptr\n"
"  movl %eax, %edx\n"

Since now we know what __afl_area_ptr points to, let’s go back to __afl_maybe_log and see how it stores the values:

  "__afl_store:\n" "\n"
  "/* Calculate and store hit for the code location specified in ecx. There\n"
  "   is a double-XOR way of doing this without tainting another register,\n"
  "   and we use it on 64-bit systems; but it's slower for 32-bit ones. */\n"
#ifndef COVERAGE_ONLY
  "  movl __afl_prev_loc, %edi\n"
  "  xorl %ecx, %edi\n"
  "  shrl $1, %ecx\n"
  "  movl %ecx, __afl_prev_loc\n"
#else
  "  movl %ecx, %edi\n"
#endif /* ^!COVERAGE_ONLY */
  "\n"
#ifdef SKIP_COUNTS
  "  orb  $1, (%edx, %edi, 1)\n"
#else
  "  incb (%edx, %edi, 1)\n"
#endif /* ^SKIP_COUNTS */
  "\n"
  "__afl_return:\n"
  "\n"
  "  addb $127, %al\n"
  "  sahf\n"
  "  ret\n"

Recall that %ecx stores the random number that denotes the edge. First, previous location seen by the execution is saved in __afl_prev_loc. It is zero when the program starts. Also, %edx points to __afl_area_ptr so the location (%edx, %edi, 1) points to the offset for the current edge in shared memory region. This value is a byte and incremented each time we pass through this branch. Finally, in __afl_return, flags are restored for execution to continue.

The Forkserver

"__afl_forkserver:\n" "\n"
"  /* Enter the fork server mode to avoid the overhead of execve() calls. */"
"  pushl %eax\n"
"  pushl %ecx\n"
"  pushl %edx\n" "\n"
"\n"

The forkserver basically tries to eliminate the overhead of restart the process again and again. The prelude start by saving some registers.

"/* Phone home and tell the parent that we're OK. (Note that signals with\n"
"  no SA_RESTART will mess it up). If this fails, assume that the fd is\n"
"  closed because we were execve()d from an instrumented binary, or because\n"
"  the parent doesn't want to use the fork server. */\n"
"\n"
"  pushl $4          /* length    */\n"
"  pushl $__afl_temp /* data      */\n"
"  pushl $" STRINGIFY((FORKSRV_FD + 1)) "  /* file desc */\n"
"  call  write\n"
"  addl  $12, %esp\n"

The above code writes 4 bytes pointed by __afl_temp to one of the forkservers file descriptor. The variable is defined at the end as follows:

"  .comm   __afl_temp, 4, 32\n"

Similarly, FORKSERVER_FD is defined in config.h as follows:

#define FORKSRV_FD          198

"\n"
"  cmpl  $4, %eax\n"
"  jne   __afl_fork_resume\n"
"\n"

The result is compared with 4 to see whether the write is successful or not.

"__afl_fork_wait_loop:\n" "\n"
"  /* Wait for parent by reading from the pipe. Abort if read fails. */\n"
"  pushl $4          /* length    */\n"
"  pushl $__afl_temp /* data      */\n"
"  pushl $" STRINGIFY(FORKSRV_FD) "        /* file desc */\n"
"  call  read\n"
"  addl  $12, %esp\n"
"\n"
"  cmpl  $4, %eax\n"
"  jne   __afl_die\n"
"\n"

This basically waits for server to respond before continuing any execution and calls fork() afterwards. Recall, we write to FORKSERVER_FD+1 and read from FORKSERVER.

"/* Once woken up, create a clone of our process. This is an excellent use\n"
"   case for syscall(__NR_clone, 0, CLONE_PARENT), but glibc boneheadedly\n"
"   caches getpid() results and offers no way to update the value, breaking\n"
"   abort(), raise(), and a bunch of other things :-( */\n"
"\n"
"  call fork\n"
"\n"
"  cmpl $0, %eax\n"
"  jl   __afl_die\n"
"  je   __afl_fork_resume\n"

If the return value is zero, that means we are in the child so the follow-up is __afl_fork_resume.

"__afl_fork_resume:\n" "\n"
"  /* In child process: close fds, resume execution. */\n" "\n"
"  pushl $" STRINGIFY(FORKSRV_FD) "\n"
"  call  close\n"
"\n"
"  pushl $" STRINGIFY((FORKSRV_FD + 1)) "\n"
"  call  close\n"
"\n"
"  addl  $8, %esp\n"
"\n"
"  popl %edx\n"
"  popl %ecx\n"
"  popl %eax\n"
"  jmp  __afl_store\n"

In __afl_fork_resume, two forkserver file descriptors are closed and code jumps to __afl_store to save edge id in %ecx as usual. Finally, let’s get back to the parent process.

"  /* In parent process: write PID to pipe, then wait for child. */\n"
"  movl  %eax, __afl_fork_pid\n"
"\n"
"  pushl $4              /* length    */\n"
"  pushl $__afl_fork_pid /* data      */\n"
"  pushl $" STRINGIFY((FORKSRV_FD + 1)) "      /* file desc */\n"
"  call  write\n"
"  addl  $12, %esp\n"

Saves the PID in __afl_fork_pid and tells the forkserver about it.

"  pushl $0             /* no flags  */\n"
"  pushl $__afl_temp    /* status    */\n"
"  pushl __afl_fork_pid /* PID       */\n"
"  call  waitpid\n"
"  addl  $12, %esp\n"
"\n"
"  cmpl  $0, %eax\n"
"  jle   __afl_die\n"

Waits for child to conclude and saves the status in __afl_temp.

"  /* Relay wait status to pipe, then loop back. */\n" "\n"
"  pushl $4          /* length    */\n"
"  pushl $__afl_temp /* data      */\n"
"  pushl $" STRINGIFY((FORKSRV_FD + 1)) "  /* file desc */\n"
"  call  write\n"
"  addl  $12, %esp\n"
"\n"
"  jmp __afl_fork_wait_loop\n"
"\n"

Tells the child exit status to the forkserver and jumps to __afl_fork_wait_loop. Let’s see the loop body.

"__afl_fork_wait_loop:\n" "\n"
"  /* Wait for parent by reading from the pipe. Abort if read fails. */\n"
"  pushl $4          /* length    */\n"
"  pushl $__afl_temp /* data      */\n"
"  pushl $" STRINGIFY(FORKSRV_FD) "        /* file desc */\n"
"  call  read\n"
"  addl  $12, %esp\n"
"\n"
"  cmpl  $4, %eax\n"
"  jne   __afl_die\n"
"\n"

Reads 4 bytes from the forkserver. Probably, this read call returns when the current child concludes and we will start a new child soon.

"/* Once woken up, create a clone of our process. This is an excellent use\n"
"   case for syscall(__NR_clone, 0, CLONE_PARENT), but glibc boneheadedly\n"
"   caches getpid() results and offers no way to update the value, breaking\n"
"   abort(), raise(), and a bunch of other things :-( */\n"
"\n"
"  call fork\n"
"\n"
"  cmpl $0, %eax\n"
"  jl   __afl_die\n"
"  je   __afl_fork_resume\n"

Yup, just fork and have a new child. Child does the right thing and jumps to __afl_fork_resume.

"  /* In parent process: write PID to pipe, then wait for child. */\n"
"  movl  %eax, __afl_fork_pid\n"
"\n"
"  pushl $4              /* length    */\n"
"  pushl $__afl_fork_pid /* data      */\n"
"  pushl $" STRINGIFY((FORKSRV_FD + 1)) "      /* file desc */\n"
"  call  write\n"
"  addl  $12, %esp\n"
"\n"
"  pushl $0             /* no flags  */\n"
"  pushl $__afl_temp    /* status    */\n"
"  pushl __afl_fork_pid /* PID       */\n"
"  call  waitpid\n"
"  addl  $12, %esp\n"
"\n"
"  cmpl  $0, %eax\n"
"  jle   __afl_die\n"
"\n"
"  /* Relay wait status to pipe, then loop back. */\n"
"\n"
"  pushl $4          /* length    */\n"
"  pushl $__afl_temp /* data      */\n"
"  pushl $" STRINGIFY((FORKSRV_FD + 1)) "  /* file desc */\n"
"  call  write\n"
"  addl  $12, %esp\n"
"\n"
"  jmp __afl_fork_wait_loop\n"

This is exactly the same as before. Waits for the child to conclude, save the exit status and relay this information back to the server so that the server would know how the input did.

This concludes the instrumentation code overview.

afl-fuzz

Let’s see how the runner manages the target, gathers information and achieves coverage.

Let’s cover some relevant calls. After parsing arguments, signal handlers are installed. For instance the following handles timeouts while running the target.

static void handle_timeout(int sig) {
  if (child_pid > 0) {
    child_timed_out = 1;
    kill(child_pid, SIGKILL);
  } else if (child_pid == -1 && forksrv_pid > 0) {
    child_timed_out = 1;
    kill(forksrv_pid, SIGKILL);
  }
}

Next, shared memory region is initialized and an environment variable is set so that the instrumentation can locate the memory region for registering edges.

/* Configure shared memory and virgin_bits. This is called at startup. */
EXP_ST void setup_shm(void) {
  u8* shm_str;

  if (!in_bitmap) memset(virgin_bits, 255, MAP_SIZE);

  memset(virgin_tmout, 255, MAP_SIZE);
  memset(virgin_crash, 255, MAP_SIZE);

  shm_id = shmget(IPC_PRIVATE, MAP_SIZE, IPC_CREAT | IPC_EXCL | 0600);

  if (shm_id < 0) PFATAL("shmget() failed");
  atexit(remove_shm);
  shm_str = alloc_printf("%d", shm_id);

  /* If somebody is asking us to fuzz instrumented binaries in dumb mode,
     we don't want them to detect instrumentation, since we won't be sending
     fork server commands. This should be replaced with better auto-detection
     later on, perhaps? */
  if (!dumb_mode) setenv(SHM_ENV_VAR, shm_str, 1);
  ck_free(shm_str);
  trace_bits = shmat(shm_id, NULL, 0);
  if (trace_bits == (void *)-1) PFATAL("shmat() failed");
}

The target memory regions are defined at the beginning of afl-fuzz.c as follows:

EXP_ST u8* trace_bits;                /* SHM with instrumentation bitmap  */
EXP_ST u8  virgin_bits[MAP_SIZE],     /* Regions yet untouched by fuzzing */
           virgin_tmout[MAP_SIZE],    /* Bits we haven't seen in tmouts   */
           virgin_crash[MAP_SIZE];    /* Bits we haven't seen in crashes  */
SHM_ENV_VAR is defined in config.h:

#define SHM_ENV_VAR         "__AFL_SHM_ID"

After setting up shared memory, initial testcases are loaded into a queue in static void read_testcases(void).

for (i = 0; i < nl_cnt; i++) {
  struct stat st;
                                                                                      
  u8* fn = alloc_printf("%s/%s", in_dir, nl[i]->d_name);
  u8* dfn = alloc_printf("%s/.state/deterministic_done/%s",
                            in_dir, nl[i]->d_name);
  u8  passed_det = 0;
                                                                                      
  free(nl[i]); /* not tracked */
                                                                                      
  if (lstat(fn, &st) || access(fn, R_OK))
    PFATAL("Unable to access '%s'", fn);
                                                                                      
  /* This also takes care of . and .. */
  if (!S_ISREG(st.st_mode) || !st.st_size
     || strstr(fn, "/README.testcases")) {
    ck_free(fn);
    ck_free(dfn);
    continue;
  }
                                                                                      
  if (st.st_size > MAX_FILE)
    FATAL("Test case '%s' is too big (%s, limit is %s)", fn,
          DMS(st.st_size), DMS(MAX_FILE));
                                                                                      
  /* Check for metadata that indicates that deterministic fuzzing
     is complete for this entry. We don't want to repeat deterministic
     fuzzing when resuming aborted scans, because it would be pointless
     and probably very time-consuming. */
  if (!access(dfn, F_OK)) passed_det = 1;
  ck_free(dfn);
  add_to_queue(fn, st.st_size, passed_det);
}

Basically, here fn denotes the testcase filename and it is passed to add_to_queue(). Then, perform_dry_run() is called that calls calibrate_case() in which the forkserver is fired up.

void init_forkserver(char **argv)

So the idea is explained here. I’ll try to go over the function once more here. Basically, it create two pipes one for state and the other for control and forks.

EXP_ST void init_forkserver(char** argv) {
  static struct itimerval it;
  int st_pipe[2], ctl_pipe[2];
  int status;
  s32 rlen;
                                                                               
  ACTF("Spinning up the fork server...");
  if (pipe(st_pipe) || pipe(ctl_pipe)) PFATAL("pipe() failed");
  forksrv_pid = fork();
  if (forksrv_pid < 0) PFATAL("fork() failed");

In the child, it basically tries to get/set sane limits.

  if (!forksrv_pid) {
    struct rlimit r;
    /* Umpf. On OpenBSD, the default fd limit for root users is set to
       soft 128. Let's try to fix that... */
    if (!getrlimit(RLIMIT_NOFILE, &r) && r.rlim_cur < FORKSRV_FD + 2) {
      r.rlim_cur = FORKSRV_FD + 2;
      setrlimit(RLIMIT_NOFILE, &r); /* Ignore errors */
    }
                                                                               
    if (mem_limit) {
      r.rlim_max = r.rlim_cur = ((rlim_t)mem_limit) << 20;
#ifdef RLIMIT_AS
      setrlimit(RLIMIT_AS, &r); /* Ignore errors */
#else
      /* This takes care of OpenBSD, which doesn't have RLIMIT_AS, but
         according to reliable sources, RLIMIT_DATA covers anonymous
         maps - so we should be getting good protection against OOM bugs. */
      setrlimit(RLIMIT_DATA, &r); /* Ignore errors */
#endif /* ^RLIMIT_AS */
    }
    /* Dumping cores is slow and can lead to anomalies if SIGKILL is delivered 
       before the dump is complete. */
    r.rlim_max = r.rlim_cur = 0;
    setrlimit(RLIMIT_CORE, &r); /* Ignore errors */

Then, it ties stdout and stderr to /dev/null and also stdin if out_file is empty.

    /* Isolate the process and configure standard descriptors. If out_file is
       specified, stdin is /dev/null; otherwise, out_fd is cloned instead. */
    setsid();
    dup2(dev_null_fd, 1);
    dup2(dev_null_fd, 2);
    if (out_file) {
      dup2(dev_null_fd, 0);
    } else {
      dup2(out_fd, 0);
      close(out_fd);
    }

The following block is interesting as it ties child side of the control pipe to FORKSRV_FD and state side of the forkserver to FORKSRV_FD+1. After all goes OK, unncessary descriptors are closed.

    /* Set up control and status pipes, close the unneeded original fds. */
    if (dup2(ctl_pipe[0], FORKSRV_FD) < 0) PFATAL("dup2() failed");
    if (dup2(st_pipe[1], FORKSRV_FD + 1) < 0) PFATAL("dup2() failed");
                                                                               
    close(ctl_pipe[0]);
    close(ctl_pipe[1]);
    close(st_pipe[0]);
    close(st_pipe[1]);
                                                                               
    close(out_dir_fd);
    close(dev_null_fd);
    close(dev_urandom_fd);
    close(fileno(plot_file));

Finally, options regarding to ASAN and MSAN are set in the environment and the target binary is executed via an execv() call. Recall that the target binary will stop at a read() call at the first instrumentation edge and wait for forkserver.

    /* This should improve performance a bit, since it stops the linker from
       doing extra work post-fork(). */
                                                                                         
    if (!getenv("LD_BIND_LAZY")) setenv("LD_BIND_NOW", "1", 0);
    /* Set sane defaults for ASAN if nothing else specified. */
    setenv("ASAN_OPTIONS", "abort_on_error=1:"
                           "detect_leaks=0:"
                           "symbolize=0:"
                           "allocator_may_return_null=1", 0);
                                                                                         
    /* MSAN is tricky, because it doesn't support abort_on_error=1 at this
       point. So, we do this in a very hacky way. */
    setenv("MSAN_OPTIONS", "exit_code=" STRINGIFY(MSAN_ERROR) ":"
                           "symbolize=0:"
                           "abort_on_error=1:"
                           "allocator_may_return_null=1:"
                           "msan_track_origins=0", 0);
                                                                                         
    execv(target_path, argv);
    /* Use a distinctive bitmap signature to tell the parent about execv()
       falling through. */
    *(u32*)trace_bits = EXEC_FAIL_SIG;
    exit(0);
  }

Meanwhile the forkserver parent proceeds as follows;

  /* Close the unneeded endpoints. */
  close(ctl_pipe[0]);
  close(st_pipe[1]);

  fsrv_ctl_fd = ctl_pipe[1];
  fsrv_st_fd  = st_pipe[0];

Closes the child side of the pipes and now fsrv_ctl_fd will be written and fsrv_st_fd will be read by the parent.

  /* Wait for the fork server to come up, but don't wait too long. */
  it.it_value.tv_sec = ((exec_tmout * FORK_WAIT_MULT) / 1000);
  it.it_value.tv_usec = ((exec_tmout * FORK_WAIT_MULT) % 1000) * 1000;
                                                                            
  setitimer(ITIMER_REAL, &it, NULL);

Set the timer if anything goes wrong during fork().

  rlen = read(fsrv_st_fd, &status, 4);
                                                                            
  it.it_value.tv_sec = 0;
  it.it_value.tv_usec = 0;
                                                                            
  setitimer(ITIMER_REAL, &it, NULL);
  /* If we have a four-byte "hello" message from the server, we're all set. 
     Otherwise, try to figure out what went wrong. */
  if (rlen == 4) {
    OKF("All right - fork server is up.");
    return;
  }
  if (child_timed_out)
    FATAL("Timeout while initializing fork server (adjusting -t may help)");
                                                                            
  if (waitpid(forksrv_pid, &status, 0) <= 0)
    PFATAL("waitpid() failed");

Above code reads the hello message of 4 bytes and determines if the fork() call is successful. The rest of the function has more checks and omitted for brevity. All we have now is a channel that which we will communicate with the target: fsrv_ctl_fd for writes and fsrv_st_fd for reads.

The Fuzzing Loop

The main fuzzing loop appears in afl-fuzz.c as follows:

while (1) {
  u8 skipped_fuzz;
  cull_queue();
                                                                   
  if (!queue_cur) {
    queue_cycle++;
    current_entry     = 0;
    cur_skipped_paths = 0;
    queue_cur         = queue;
                                                                   
    while (seek_to) {
      current_entry++;
      seek_to--;
      queue_cur = queue_cur->next;
    }
                                                                   
    show_stats();
                                                                   
    if (not_on_tty) {
      ACTF("Entering queue cycle %llu.", queue_cycle);
      fflush(stdout);
    }
                                                                   
    /* If we had a full queue cycle with no new finds, try
       recombination strategies next. */
    if (queued_paths == prev_queued) {
      if (use_splicing) cycles_wo_finds++; else use_splicing = 1;
    } else cycles_wo_finds = 0;
                                                                   
    prev_queued = queued_paths;
    if (sync_id && queue_cycle == 1 && getenv("AFL_IMPORT_FIRST")) 
      sync_fuzzers(use_argv);
  }
  skipped_fuzz = fuzz_one(use_argv);
  if (!stop_soon && sync_id && !skipped_fuzz) {
    if (!(sync_interval_cnt++ % SYNC_INTERVAL))
      sync_fuzzers(use_argv);
  }
  if (!stop_soon && exit_1) stop_soon = 2;
  if (stop_soon) break;
                                                                   
  queue_cur = queue_cur->next;
  current_entry++;
}

Loop body starts with cull_queue() that basically selects the testcase to run. sync_fuzzers() synchronizes this instance among other instances if there are any (ie. multi-core fuzzing). The main fuzzing function is naturally fuzz_one(). Exit from this infinite loop is controlled by the variable stop_soon.

static u8 fuzz_one(char **argv);

fuzz_one() starts with variable definitions. Then, it looks for one in the queue and maps the test data into memory pointed by in_buf. This buffer is then copied into out_buf which will be the input for the target.

static u8 fuzz_one(char** argv) {
  s32 len, fd, temp_len, i, j;
  u8  *in_buf, *out_buf, *orig_in, *ex_tmp, *eff_map = 0;
  u64 havoc_queued,  orig_hit_cnt, new_hit_cnt;
  u32 splice_cycle = 0, perf_score = 100, orig_perf, prev_cksum, eff_cnt = 1;
  u8  ret_val = 1, doing_det = 0;
  u8  a_collect[MAX_AUTO_EXTRA];
  u32 a_len = 0;

  .. skipped for brevity ..

  /* Map the test case into memory. */
  fd = open(queue_cur->fname, O_RDONLY);
  if (fd < 0) PFATAL("Unable to open '%s'", queue_cur->fname);
  len = queue_cur->len;
  orig_in = in_buf = mmap(0, len, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
  if (orig_in == MAP_FAILED) PFATAL("Unable to mmap '%s'", queue_cur->fname);
  close(fd);
 /* We could mmap() out_buf as MAP_PRIVATE, but we end up clobbering every
    single byte anyway, so it wouldn't give us any performance or memory usage
    benefits. */
  out_buf = ck_alloc_nozero(len);
  subseq_tmouts = 0;
  cur_depth = queue_cur->depth;
  /*******************************************
   * CALIBRATION (only if failed earlier on) *

  .. skipped for brevity ..
  memcpy(out_buf, in_buf, len);
  .. skipped for brevity ..

After this point, all fuzzing stages are applied on out_buf and target is run to alter trace_bits. We are ready to see a fuzzing stage. Let’s cover a stage here and see how the results are evaluated.

Here is the code block for the bitflip 2/1.

 /* Two walking bits. */
 stage_name  = "bitflip 2/1";
 stage_short = "flip2";
 stage_max   = (len << 3) - 1;
                                                                    
 orig_hit_cnt = new_hit_cnt;
 for (stage_cur = 0; stage_cur < stage_max; stage_cur++) {
   stage_cur_byte = stage_cur >> 3;
   FLIP_BIT(out_buf, stage_cur);
   FLIP_BIT(out_buf, stage_cur + 1);
   if (common_fuzz_stuff(argv, out_buf, len)) goto abandon_entry;
   FLIP_BIT(out_buf, stage_cur);
   FLIP_BIT(out_buf, stage_cur + 1);
 }
 new_hit_cnt = queued_paths + unique_crashes;
 stage_finds[STAGE_FLIP2]  += new_hit_cnt - orig_hit_cnt;
 stage_cycles[STAGE_FLIP2] += stage_max;

len is multiplied by 8 in stage_max since we are walking over every bit. This stage basically flips two consecutive bits and runs the target. FLIP_BIT macro is defined as follows:

#define FLIP_BIT(_ar, _b) do { \
    u8* _arf = (u8*)(_ar); \
    u32 _bf = (_b); \
    _arf[(_bf) >> 3] ^= (128 >> ((_bf) & 7)); \ 
  } while (0)

After the stage concludes, the results are saved in stage_finds and stage_cycles. The only call here is to common_fuzz_stuff(). Arguments are the command line arguments, the test data along with its integral length.

EXP_ST u8 common_fuzz_stuff(char** argv, u8* out_buf, u32 len) {
  u8 fault;
                                                                      
  if (post_handler) {
    out_buf = post_handler(out_buf, &len);
    if (!out_buf || !len) return 0;
  }
  write_to_testcase(out_buf, len);
  fault = run_target(argv, exec_tmout);
  if (stop_soon) return 1;
  if (fault == FAULT_TMOUT) {
    if (subseq_tmouts++ > TMOUT_LIMIT) {
      cur_skipped_paths++;
      return 1;
    }
  } else subseq_tmouts = 0;
  /* Users can hit us with SIGUSR1 to request the current input
     to be abandoned. */
  if (skip_requested) {
     skip_requested = 0;
     cur_skipped_paths++;
     return 1;
  }
  /* This handles FAULT_ERROR for us: */
  queued_discovered += save_if_interesting(argv, out_buf, len, fault);
  if (!(stage_cur % stats_update_freq) || stage_cur + 1 == stage_max)
    show_stats();
  return 0;
}

The function writes the testcase, runs the target and looks for a timeout. Finally, calls save_if_interesting() to see if there are any interesting results.

static void write_to_testcase(void* mem, u32 len) {
  s32 fd = out_fd;
  if (out_file) {
    unlink(out_file); /* Ignore errors. */
    fd = open(out_file, O_WRONLY | O_CREAT | O_EXCL, 0600);
    if (fd < 0) PFATAL("Unable to create '%s'", out_file);
  } else lseek(fd, 0, SEEK_SET);
                                                             
  ck_write(fd, mem, len, out_file);
  if (!out_file) {
    if (ftruncate(fd, len)) PFATAL("ftruncate() failed");
    lseek(fd, 0, SEEK_SET);
  } else close(fd);
}

write_to_testcase() is pretty straightforward. Let’s see the run_target().

/* Execute target application, monitoring for timeouts. Return status
   information. The called program will update trace_bits[]. */
static u8 run_target(char** argv, u32 timeout) {
  static struct itimerval it;
  static u32 prev_timed_out = 0;
  static u64 exec_ms = 0;
                                                                                    
  int status = 0;
  u32 tb4;
  child_timed_out = 0;
  /* After this memset, trace_bits[] are effectively volatile, so we
     must prevent any earlier operations from venturing into that
     territory. */
  memset(trace_bits, 0, MAP_SIZE);
  MEM_BARRIER();

This starts with clearing out the trace_bits and calls a MEM_BARRIER() so that it waits for memory write to finish.

  /* If we're running in "dumb" mode, we can't rely on the fork server
     logic compiled into the target program, so we will just keep calling
     execve(). There is a bit of code duplication between here and
     init_forkserver(), but c'est la vie. */
  if (dumb_mode == 1 || no_forkserver) {

  .. skipped_for_brevity ..

  } else {
    s32 res;
    /* In non-dumb mode, we have the fork server up and running, so simply
       tell it to have at it, and then read back PID. */
    if ((res = write(fsrv_ctl_fd, &prev_timed_out, 4)) != 4) {
      if (stop_soon) return 0;
      RPFATAL(res, "Unable to request new process from fork server (OOM?)");
    }

Let’s call the child to say that we are ready. prev_timed_out will be zero in the first run.

    if ((res = read(fsrv_st_fd, &child_pid, 4)) != 4) {
      if (stop_soon) return 0;
      RPFATAL(res, "Unable to request new process from fork server (OOM?)");
    }
    if (child_pid <= 0) FATAL("Fork server is misbehaving (OOM?)");
  }

Child returns the PID of itself.

  /* Configure timeout, as requested by user, then wait for child
     to terminate. */  
  it.it_value.tv_sec = (timeout / 1000);                                            
  it.it_value.tv_usec = (timeout % 1000) * 1000;                                    
                                                                                    
  setitimer(ITIMER_REAL, &it, NULL);                                                
                                                                                    
  /* The SIGALRM handler simply kills the child_pid and sets
     child_timed_out. */    
  if (dumb_mode == 1 || no_forkserver) {                                            
    if (waitpid(child_pid, &status, 0) <= 0) PFATAL("waitpid() failed");            
  } else {                                                                          
    s32 res;                                                                        
    if ((res = read(fsrv_st_fd, &status, 4)) != 4) {                                
      if (stop_soon) return 0;                                                      
      RPFATAL(res, "Unable to communicate with fork server (OOM?)");                
    }                                                                               
  }                                                                                 
  if (!WIFSTOPPED(status)) child_pid = 0;

Read the status of the child and wait until it concludes or crashes.

  getitimer(ITIMER_REAL, &it);                                                      
  exec_ms = (u64) timeout - (it.it_value.tv_sec * 1000 +                            
                             it.it_value.tv_usec / 1000);                           
                                                                                    
  it.it_value.tv_sec = 0;                                                           
  it.it_value.tv_usec = 0;                                                          
                                                                                    
  setitimer(ITIMER_REAL, &it, NULL);                                                
  total_execs++;                                                                    
  /* Any subsequent operations on trace_bits must not be moved by the               
     compiler below this point. Past this location, trace_bits[] behave             
     very normally and do not have to be treated as volatile. */                    
  MEM_BARRIER();                                                                    
  tb4 = *(u32*)trace_bits;

Execute another MEM_BARRIER() to sync shared memory and now we have the trace generated by the instrumentation that belongs to the current test case. Next, there is a call to classify_counts() to classify our findings. The rest is error handling for various causes.

#ifdef WORD_SIZE_64                                                                 
  classify_counts((u64*)trace_bits);                                                
#else                                                                               
  classify_counts((u32*)trace_bits);                                                
#endif /* ^WORD_SIZE_64 */                                                          
                                                                                    
  prev_timed_out = child_timed_out;                                                 
  /* Report outcome to caller. */                                                   
  if (WIFSIGNALED(status) && !stop_soon) {                                          
    kill_signal = WTERMSIG(status);                                                 
    if (child_timed_out && kill_signal == SIGKILL) return FAULT_TMOUT;              
    return FAULT_CRASH;                                                             
  }                                                                                 
                                                                                    
  /* A somewhat nasty hack for MSAN, which doesn't support abort_on_error and       
     must use a special exit code. */                                               
  if (uses_asan && WEXITSTATUS(status) == MSAN_ERROR) {                             
    kill_signal = 0;                                                                
    return FAULT_CRASH;                                                             
  }                                                                                 

 /* A somewhat nasty hack for MSAN, which doesn't support abort_on_error and        
     must use a special exit code. */                                               
  if (uses_asan && WEXITSTATUS(status) == MSAN_ERROR) {                             
    kill_signal = 0;                                                                
    return FAULT_CRASH;                                                             
  }                                                                                 
                                                                                    
  if ((dumb_mode == 1 || no_forkserver) && tb4 == EXEC_FAIL_SIG)                    
    return FAULT_ERROR;                                                             
                                                                                    
  /* It makes sense to account for the slowest units only if the testcase
     was run under the user defined timeout. */                                                
  if (!(timeout > exec_tmout) && (slowest_exec_ms < exec_ms)) {                     
    slowest_exec_ms = exec_ms;                                                      
  }                                                                                 
  return FAULT_NONE;                                                                
}

classify_counts() is nothing more than a hand-crafted logarithmic classifier depending on the hitcount. If the hitcount is 6, it’ll be 8, if the hitcount is 55 then it’ll be 64, etc.

static const u8 count_class_lookup8[256] = {                                   
  [0]           = 0,                                                           
  [1]           = 1,                                                           
  [2]           = 2,                                                           
  [3]           = 4,                                                           
  [4 ... 7]     = 8,                                                           
  [8 ... 15]    = 16,                                                          
  [16 ... 31]   = 32,                                                          
  [32 ... 127]  = 64,                                                          
  [128 ... 255] = 128                                                          
};

static u16 count_class_lookup16[65536];
EXP_ST void init_count_class16(void) {                                         
  u32 b1, b2;                                                                  
  for (b1 = 0; b1 < 256; b1++)                                                 
    for (b2 = 0; b2 < 256; b2++)                                               
      count_class_lookup16[(b1 << 8) + b2] =                                   
        (count_class_lookup8[b1] << 8) | count_class_lookup8[b2];              
}

#ifdef WORD_SIZE_64
  .. skipped for brevity ..
#else
static inline void classify_counts(u32* mem) {                                 
  u32 i = MAP_SIZE >> 2;                                                       
  while (i--) {                                                                
    /* Optimize for sparse bitmaps. */                                         
    if (unlikely(*mem)) {                                                      
      u16* mem16 = (u16*)mem;                                                  
                                                                               
      mem16[0] = count_class_lookup16[mem16[0]];                               
      mem16[1] = count_class_lookup16[mem16[1]];                               
    }                                                                          
    mem++;                                                                     
  }                                                                            
}
#endif /* ^WORD_SIZE_64 */

Saving Interesting Cases

This is the last piece in our puzzle. Once we figure out whether the current execution leads to anything interesting we’ll save and probably favor it in our queue.

/* Check if the result of an execve() during routine fuzzing is interesting,       
   save or queue the input test case for further analysis if so. Returns 1 if      
   entry is saved, 0 otherwise. */
static u8 save_if_interesting(char** argv, void* mem, u32 len, u8 fault) {         
  u8  *fn = "";                                                                    
  u8  hnb;                                                                         
  s32 fd;                                                                          
  u8  keeping = 0, res;                                                            
                                                                                   
  if (fault == crash_mode) {                                                       
     .. skipped for brevity ..
  }
  switch (fault) {                                                                 
    case FAULT_TMOUT:                                                              
      /* Timeouts are not very interesting, but we're still obliged to keep        
         a handful of samples. We use the presence of new bits in the              
         hang-specific bitmap as a signal of uniqueness. In "dumb" mode, we        
         just keep everything. */                                                  
                                                                                   
      total_tmouts++;                                                              
      if (unique_hangs >= KEEP_UNIQUE_HANG) return keeping;                        
                                                                                   
      if (!dumb_mode) {                                                            
                                                                                   
#ifdef WORD_SIZE_64                                                                
        simplify_trace((u64*)trace_bits);                                          
#else                                                                              
        simplify_trace((u32*)trace_bits);                                          
#endif /* ^WORD_SIZE_64 */                                                         
                                                                                   
        if (!has_new_bits(virgin_tmout)) return keeping;                           
      }                                                                            
                                                                                   
      unique_tmouts++;                                                             
      /* Before saving, we make sure that it's a genuine hang by re-running        
         the target with a more generous timeout (unless the default timeout       
         is already generous). */                                                  
      if (exec_tmout < hang_tmout) {                                               
        u8 new_fault;                                                              
        write_to_testcase(mem, len);
        new_fault = run_target(argv, hang_tmout);                                  
                                                                                   
        /* A corner case that one user reported bumping into: increasing the       
           timeout actually uncovers a crash. Make sure we don't discard it if     
           so. */                                                                  
        if (!stop_soon && new_fault == FAULT_CRASH) goto keep_as_crash;            
                                                                                   
        if (stop_soon || new_fault != FAULT_TMOUT) return keeping;                 
      }                                                                            
#ifndef SIMPLE_FILES                                                               
      fn = alloc_printf("%s/hangs/id:%06llu,%s", out_dir,                          
                        unique_hangs, describe_op(0));                             
#else                                                                              
      fn = alloc_printf("%s/hangs/id_%06llu", out_dir,                             
                        unique_hangs);                                             
#endif /* ^!SIMPLE_FILES */                                                        
      unique_hangs++;                                                              
      last_hang_time = get_cur_time();                                             
      break;

This basically saves timeouts unless unique_hangs >= KEEP_UNIQUE_HANGS. Upper bounds are defined in config.h as

#define KEEP_UNIQUE_HANG    500
#define KEEP_UNIQUE_CRASH   5000

It also handles the case where it is really no a hang but a crash but timeout value was too low to see that. Finally it allocated a filename to write the testcase for later use in fn. Let’s see what simplify_trace() does.

/* Destructively simplify trace by eliminating hit count information
   and replacing it with 0x80 or 0x01 depending on whether the tuple
   is hit or not. Called on every new crash or timeout, should be
   reasonably fast. */
static const u8 simplify_lookup[256] = {
  [0]         = 1,
  [1 ... 255] = 128
};

#ifdef WORD_SIZE_64
 .. skipped for brevity ..
#else
static void simplify_trace(u32* mem) {
  u32 i = MAP_SIZE >> 2;
  while (i--) {
    /* Optimize for sparse bitmaps. */
    if (unlikely(*mem)) {
      u8* mem8 = (u8*)mem;
      mem8[0] = simplify_lookup[mem8[0]];
      mem8[1] = simplify_lookup[mem8[1]];
      mem8[2] = simplify_lookup[mem8[2]];
      mem8[3] = simplify_lookup[mem8[3]];
    } else *mem = 0x01010101;
    mem++;
  }
}
#endif /* ^WORD_SIZE_64 */

This basically simplifies the trace map once more and replaces the values with 1 or 128 depending on whether or not there is a hit. So far so good, let’s see how crashes are handled.

    case FAULT_CRASH:
keep_as_crash:
      /* This is handled in a manner roughly similar to timeouts,
         except for slightly different limits and no need to re-run test
         cases. */
      total_crashes++;
      if (unique_crashes >= KEEP_UNIQUE_CRASH) return keeping;
      if (!dumb_mode) {
#ifdef WORD_SIZE_64
        simplify_trace((u64*)trace_bits);
#else
        simplify_trace((u32*)trace_bits);
#endif /* ^WORD_SIZE_64 */
        if (!has_new_bits(virgin_crash)) return keeping;
      }
      if (!unique_crashes) write_crash_readme();
#ifndef SIMPLE_FILES
      fn = alloc_printf("%s/crashes/id:%06llu,sig:%02u,%s", out_dir,
                        unique_crashes, kill_signal, describe_op(0));
#else
      fn = alloc_printf("%s/crashes/id_%06llu_%02u", out_dir, unique_crashes,
                        kill_signal);
#endif /* ^!SIMPLE_FILES */
      unique_crashes++;
                                                                                   
      last_crash_time = get_cur_time();
      last_crash_execs = total_execs;
      break;
    case FAULT_ERROR: FATAL("Unable to execute target application");
    default: return keeping;
  }

Similarly, unless unique_crashes >= KEEP_UNIQUE_CRASH, we keep this testcase. If this is first crash, it writes a readme to the standard output.

 /* If we're here, we apparently want to save the crash or hang
     test case, too. */
  fd = open(fn, O_WRONLY | O_CREAT | O_EXCL, 0600);
  if (fd < 0) PFATAL("Unable to create '%s'", fn);
  ck_write(fd, mem, len, fn);
  close(fd);
   
  ck_free(fn);
  return keeping;
}

Finally, in the epilogue, we save the testcase into a file named by fn and write the test data. describe_op() generates a substring for the filename that relates the testcase.

Conclusion

In this article, I’ve tried to go over the details of the instrumentation of a target by AFL Fuzzer. The instrumentation code is injected in the pre-assembly phase by an assembly text parser. Edges are instrumented at runtime by injecting a call to the a function that flips the bits in a memory region shared with the runner thereby increasing the hitcount. This allows fuzzer to achieve code coverage and guides it to the depths of the branches.

To make the process execution fast, a forkserver is designed that basically stops at the entrypoint and clones the image when a signal is sent by the server. High execution speeds are achieved through this mechanism.

Fuzzer stages are located in fuzz_one() and this is a huge function with all fuzzer stages are executed sequentially. However, I was only able to show a sample bitflip stage in this article.

Until next time!

AFL Internals