AFL Internals - libtokencap

Introduction

In a previous article, I’ve covered American Fuzzy Lop Internals. This time, I am going to look into a library called libtokencap included in AFL. The name suggests that it is related to capturing tokens and it is true. It allows capturing constants at runtime. So why do we need such a shared library? Read on.

libtokencap

The library is a single file located under libtokencap/ directory. The README describes it as follows:

This Linux-only companion library allows you to instrument strcmp(),
memcmp(), and related functions to automatically extract syntax tokens
passed to any of these libcalls. The resulting list of tokens may be
then given as a starting dictionary to afl-fuzz (the -x option) to
improve coverage on subsequent fuzzing runs.

Basically, we try to find tokens for our dictionary to provide afl with parameter -x. For generating such tokens, libtokencap comes in handy. One interesting note is, libtokencap can be used with other fuzzers too and not strictly tied to AFL. One drawback, it is Linux-only.

The library is a single file so let’s start with that file libtokencap/libtokencap.so.c. There is no entrypoint in a shared library though there is function that’ll be called when it is loaded in memory (ie. dlopen()).

/* Init code to open the output file (or default to stderr). */
__attribute__((constructor)) void __tokencap_init(void) {
  u8* fn = getenv("AFL_TOKEN_FILE");
  if (fn) __tokencap_out_file = fopen(fn, "a");
  if (!__tokencap_out_file) __tokencap_out_file = stderr;
}

Basically, it starts by opening a file named in the environment AFL_TOKEN_FILE. __tokencap_out_file is defined globally as:

static FILE* __tokencap_out_file;

Several functions are defined in this library. Basically they are reimplementations of common libc functions. These are as follows:

strcmp
strncmp
strcasecmp
strncasecmp
memcmp
strstr
strcasestr

Redefining common libc functions

Let’s start with one of the redefinitions. I pick you, strcmp!

#undef strcmp
int strcmp(const char* str1, const char* str2) {
  if (__tokencap_is_ro(str1)) __tokencap_dump(str1, strlen(str1), 1);
  if (__tokencap_is_ro(str2)) __tokencap_dump(str2, strlen(str2), 1);

  while (1) {
    unsigned char c1 = *str1, c2 = *str2;

    if (c1 != c2) return (c1 > c2) ? 1 : -1;
    if (!c1) return 0;
    str1++; str2++;
  }
}

It starts with undefining macros named strcmp. The function’s type signature is exactly the same as strcmp(). There are only two lines added at the beginning. They are calls to __tokencap_is_ro() and __tokencap_dump(). The rest is a similar strcmp() implementation that returns 0, 1 or -1 depending on the comparison.

The dump function basically dumps the string into the file pointed by __tokencap_out_file. The other one is a predicate to decide whether the argument is read-only (ro) or not. So how does one decide if a char * is read-only?

/proc/self/maps

Let’s see a sample maps file. I’ve removed nix hashes brevity.

00400000-00407000 r--p 00000000 fe:04 20073134 /nix/store/.../bin/coreutils
00407000-00536000 r-xp 00007000 fe:04 20073134 /nix/store/.../bin/coreutils
00536000-00590000 r--p 00136000 fe:04 20073134 /nix/store/.../bin/coreutils
00591000-0059c000 r--p 00190000 fe:04 20073134 /nix/store/.../bin/coreutils
0059c000-0059d000 rw-p 0019b000 fe:04 20073134 /nix/store/.../bin/coreutils

This file is provided by Linux kernel and includes memory mappings of a process. First two columns are the start and the end addresses of the mapping. Third one tells about permissions. Fourth one is the offset in the file. Next one is the major:minor of the device that provides the file. The next column inode is followed by the filename. Pretty common Linux stuff.

Loading mappings and __tokencap_is_ro()

So we need to load the mappings into our memory so that we can decide based on this data.

/* Mapping data and such */
#define MAX_MAPPINGS 1024

static struct mapping {
  void *st, *en;
} __tokencap_ro[MAX_MAPPINGS];

The above defines the mapping struct that basically holds start and end addresses. These are the only values we are interested in storing.

static u32   __tokencap_ro_cnt;
static u8    __tokencap_ro_loaded;

These are for holding the mapping count and a predicate that we can verify that the mappings are loaded properly. Let’s look into the function that actually loads the mappings.

/* Identify read-only regions in memory. Only parameters that fall into these
   ranges are worth dumping when passed to strcmp() and so on. Read-write
   regions are far more likely to contain user input instead. */
static void __tokencap_load_mappings(void) {
  u8 buf[MAX_LINE];
  FILE* f = fopen("/proc/self/maps", "r");

  __tokencap_ro_loaded = 1;

  if (!f) return;

Try opening the file, fail otherwise.

  while (fgets(buf, MAX_LINE, f)) {
    u8 rf, wf;
    void* st, *en;

Parse each line and check if it is readable and not writable. If so, register it in our mappings.

    if (sscanf(buf, "%p-%p %c%c", &st, &en, &rf, &wf) != 4) continue;
    if (wf == 'w' || rf != 'r') continue;

    __tokencap_ro[__tokencap_ro_cnt].st = (void*)st;
    __tokencap_ro[__tokencap_ro_cnt].en = (void*)en;

Bail out if we have too many mappings.

    if (++__tokencap_ro_cnt == MAX_MAPPINGS) break;
  }
  fclose(f);
}

Now we have every read-only mapping in struct mapping __tokencap_ro[] global. Let’s see how our predicate employs this mapping to decide whether a string is ro or not.

/* Check an address against the list of read-only mappings. */
static u8 __tokencap_is_ro(const void* ptr) {
  u32 i;

If mappings are not loaded, then load’em up!

  if (!__tokencap_ro_loaded) __tokencap_load_mappings();

Check if the address of the pointer (the argument) is between the start and end values of a mapping. If so, then we know it is ro.

  for (i = 0; i < __tokencap_ro_cnt; i++)
    if (ptr >= __tokencap_ro[i].st && ptr <= __tokencap_ro[i].en) return 1;

  return 0;
}

In the library, the number of max mappings is defined to be 1024. I’ve checked with my own system and max number of mappings is around 9000 (ie. a well-known browser) and only 300 of them are read-only. Therefore, I assume this value is pretty safe.

Dumping tokens into AFL_TOKEN_FILE

All we are left is, dumping the tokens to our dump file, then we can use them with afl -x dictionary parameter. Below is a straighforward implementation that you might enjoy.

/* Dump an interesting token to output file, quoting and escaping it
   properly. */
static void __tokencap_dump(const u8* ptr, size_t len, u8 is_text) {
  u8 buf[MAX_AUTO_EXTRA * 4 + 1];
  u32 i;
  u32 pos = 0;

  if (len < MIN_AUTO_EXTRA || len > MAX_AUTO_EXTRA || !__tokencap_out_file)
    return;

  for (i = 0; i < len; i++) {
    if (is_text && !ptr[i]) break;

    switch (ptr[i]) {
      case 0 ... 31:
      case 127 ... 255:
      case '\"':
      case '\\':

        sprintf(buf + pos, "\\x%02x", ptr[i]);
        pos += 4;
        break;
      default:
        buf[pos++] = ptr[i];
    }
  }
  buf[pos] = 0;
  fprintf(__tokencap_out_file, "\"%s\"\n", buf);
}

Gimme! Gimme! Gimme tokens tonite!

The Makefile tells us how to build the shared library.

CFLAGS      ?= -O3 -funroll-loops
CFLAGS      += -Wall -D_FORTIFY_SOURCE=2 -g -Wno-pointer-sign

all: libtokencap.so

libtokencap.so: libtokencap.so.c ../config.h
	$(CC) $(CFLAGS) -shared -fPIC $< -o $@ $(LDFLAGS)

So we can load the library with LD_PRELOAD and we can specify the output by AFL_TOKEN_FILE. However, we need to recompile the target binary with -fno-builtin-* so that it calls our reimplementations. However, there is an easy way. Just set AFL_NO_BUILTIN=1 before compiling with afl-gcc and it will set the necessary parameters for you. Finally, we get the dictionary as follows:

export AFL_TOKEN_FILE=$PWD/temp_output.txt

for i in <out_dir>/queue/id*; do
  LD_PRELOAD=/path/to/libtokencap.so \
    /path/to/target/program [...params, including $i...]
done

sort -u temp_output.txt >afl_dictionary.txt

Conclusion

In this short article, I’ve given the details of libtokencap, a library that reimplements some common libc functions in order to capture tokens that might be useful for fuzzing with AFL or any other fuzzer.

Although the idea seems fine, I’ve run libtokencap against openssh client and was able to capture only 11 (eleven) tokens of various lengths. This doesn’t help much I guess. We’ll come back to the subject of capturing tokens later on.

Hope you enjoyed it!

AFL Internals