AFL Internals
Stages
AFL Internals - Stages
Introduction
In this article, I’ll go over the various fuzzing stages included in AFL. These are bitflip, arithmetic, interesting, extras, havoc and splice. Do you think all these stages are deterministic? Or else, do you want to know the probabilistic ones? Read on.
static u8 fuzz_one(char **argv)
All we know up to this point is that there is a huge function called
fuzz_one()
that does the fuzzing. It starts by picking an input from
the queue, memory map it to orig_file = in_file
and then it is
copied to out_buf
, a buffer that every stage mutates. One
interesting function also we have studied is
common_fuzz_stuff()
. Basically, it takes the arguments and an output
buffer along with its length, executes the target binary and tells us
about its exit condition. These are fundamental to all stages that I’m
going to describe.
Let’s have an overview and see what each stages focuses on.
- bitflip
Flips consecutive bits by xor’ing them with all ones. This stage has different variations like 1-,2-,4-,8-,16-,32- consequent bits,
- arithmetic
Employs algebraic manipulations such as addition and subtraction to find new edges,
- interesting
Works by modifying the data using a small set of interesting values provided in the codebase,
- extras
These are the extra stages that has been collected via various methods,
- havoc
Probablistic stage that applies rounds (or sub-stages) in a probabilistic manner,
- splice
This is also a probabilistic stage that does splicing at a random point in the input with another item in the queue, trying to discover new edges,
Stage: bitflip
Let’s look into the first fuzzing stage. It flips a single bit and is
called bitflip 1/1
.
/* Single walking bit. */
stage_short = "flip1";
stage_max = len << 3;
stage_name = "bitflip 1/1";
stage_val_type = STAGE_VAL_NONE;
orig_hit_cnt = queued_paths + unique_crashes;
prev_cksum = queue_cur->exec_cksum;
for (stage_cur = 0; stage_cur < stage_max; stage_cur++) {
stage_cur_byte = stage_cur >> 3;
FLIP_BIT(out_buf, stage_cur);
if (common_fuzz_stuff(argv, out_buf, len)) goto abandon_entry;
FLIP_BIT(out_buf, stage_cur);
}
new_hit_cnt = queued_paths + unique_crashes;
stage_finds[STAGE_FLIP1] += new_hit_cnt - orig_hit_cnt;
stage_cycles[STAGE_FLIP1] += stage_max;
It starts by setting the stage name and shortname. stage_max
is set
to the number of bits since we are going to flip every bit of the
input and run the target. The original hit count is saved in
orig_hit_cnt
. After the stage is executed, new_hit_cnt
is
calculated. Also, stats are saved in stage_finds
and stage_cycles
.
FLIP_BIT is a macro defined as follows:
#define FLIP_BIT(_ar, _b) do { \
u8* _arf = (u8*)(_ar); \
u32 _bf = (_b); \
_arf[(_bf) >> 3] ^= (128 >> ((_bf) & 7)); \
} while (0)
Bitflip stage has variations where the number of consecutive bits increase in powers of two. For brevity, I’m only going to provide 4 bits variation here. It is almost the same apart from the number of bits flipped.
/* Four walking bits. */
stage_name = "bitflip 4/1";
stage_short = "flip4";
stage_max = (len << 3) - 3;
orig_hit_cnt = new_hit_cnt;
for (stage_cur = 0; stage_cur < stage_max; stage_cur++) {
stage_cur_byte = stage_cur >> 3;
FLIP_BIT(out_buf, stage_cur);
FLIP_BIT(out_buf, stage_cur + 1);
FLIP_BIT(out_buf, stage_cur + 2);
FLIP_BIT(out_buf, stage_cur + 3);
if (common_fuzz_stuff(argv, out_buf, len)) goto abandon_entry;
FLIP_BIT(out_buf, stage_cur);
FLIP_BIT(out_buf, stage_cur + 1);
FLIP_BIT(out_buf, stage_cur + 2);
FLIP_BIT(out_buf, stage_cur + 3);
}
new_hit_cnt = queued_paths + unique_crashes;
stage_finds[STAGE_FLIP4] += new_hit_cnt - orig_hit_cnt;
stage_cycles[STAGE_FLIP4] += stage_max;
When we arrive at byte flip (8bits), the stage is overloaded with something called Effector. Effector basically tries to identify which bytes have any effect on the trace_bits by comparing checksums.
/* Effector map setup. These macros calculate:
EFF_APOS - position of a particular file offset in the map.
EFF_ALEN - length of a map with a particular number of bytes.
EFF_SPAN_ALEN - map span for a sequence of bytes.
*/
#define EFF_APOS(_p) ((_p) >> EFF_MAP_SCALE2)
#define EFF_REM(_x) ((_x) & ((1 << EFF_MAP_SCALE2) - 1))
#define EFF_ALEN(_l) (EFF_APOS(_l) + !!EFF_REM(_l))
#define EFF_SPAN_ALEN(_p, _l) (EFF_APOS((_p) + (_l) - 1) - EFF_APOS(_p) + 1)
Constants are defined config.h as follows:
/* Scaling factor for the effector map used to skip some of the more
expensive deterministic steps. The actual divisor is set to
2^EFF_MAP_SCALE2 bytes: */
#define EFF_MAP_SCALE2 3
/* Minimum input file length at which the effector logic kicks in: */
#define EFF_MIN_LEN 128
/* Maximum effector density past which everything is just fuzzed
unconditionally (%): */
#define EFF_MAX_PERC 90
So, EFF_MAP_SCALE
is basically for going from bits to
bytes. EFF_MIN_LEN
is the minimum number of bytes that Effector is
interested in. EFF_MAX_PERC
is the threshold for full-fuzzing. We
will see how this is used in a second.
/* Initialize effector map for the next step (see comments below). Always
flag first and last byte as doing something. */
eff_map = ck_alloc(EFF_ALEN(len));
eff_map[0] = 1;
if (EFF_APOS(len - 1) != 0) {
eff_map[EFF_APOS(len - 1)] = 1;
eff_cnt++;
}
Nothing interesting so far, allocate eff_map
, mark first and last
bytes.
/* Walking byte. */
stage_name = "bitflip 8/8";
stage_short = "flip8";
stage_max = len;
orig_hit_cnt = new_hit_cnt;
for (stage_cur = 0; stage_cur < stage_max; stage_cur++) {
stage_cur_byte = stage_cur;
out_buf[stage_cur] ^= 0xFF;
if (common_fuzz_stuff(argv, out_buf, len)) goto abandon_entry;
/* We also use this stage to pull off a simple trick: we identify
bytes that seem to have no effect on the current execution path
even when fully flipped - and we skip them during more expensive
deterministic stages, such as arithmetics or known ints. */
if (!eff_map[EFF_APOS(stage_cur)]) {
Let’s see how it handles unmarked bytes.
u32 cksum;
/* If in dumb mode or if the file is very short, just flag everything
without wasting time on checksums. */
if (!dumb_mode && len >= EFF_MIN_LEN)
cksum = hash32(trace_bits, MAP_SIZE, HASH_CONST);
else
cksum = ~queue_cur->exec_cksum;
Compare the checksum to see if there is any changes in
trace_bits
. Then, mark those bytes.
if (cksum != queue_cur->exec_cksum) {
eff_map[EFF_APOS(stage_cur)] = 1;
eff_cnt++;
}
}
out_buf[stage_cur] ^= 0xFF;
}
/* If the effector map is more than EFF_MAX_PERC dense, just flag the
whole thing as worth fuzzing, since we wouldn't be saving much time
anyway. */
if (eff_cnt != EFF_ALEN(len) && eff_cnt * 100 / EFF_ALEN(len) > EFF_MAX_PERC) {
memset(eff_map, 1, EFF_ALEN(len));
blocks_eff_select += EFF_ALEN(len);
} else {
blocks_eff_select += eff_cnt;
}
Save total count in blocks_eff_select
. If the percentage is higher
than 90 percent, mark all the region
as important.
blocks_eff_total += EFF_ALEN(len);
new_hit_cnt = queued_paths + unique_crashes;
stage_finds[STAGE_FLIP8] += new_hit_cnt - orig_hit_cnt;
stage_cycles[STAGE_FLIP8] += stage_max;
Stage ends with storing the total in blocks_eff_total
.
Stage: arithmetic
Arithmetic stage basically employs addition and subtraction of values
in [0..ARITH_MAX]
. The value ARITH_MAX
is defined to be 35 in
config.h. Let’s see the first variation.
/* 8-bit arithmetics. */
stage_name = "arith 8/8";
stage_short = "arith8";
stage_cur = 0;
stage_max = 2 * len * ARITH_MAX; // = 35
stage_val_type = STAGE_VAL_LE;
orig_hit_cnt = new_hit_cnt;
for (i = 0; i < len; i++) {
u8 orig = out_buf[i];
/* Let's consult the effector map... */
if (!eff_map[EFF_APOS(i)]) {
stage_max -= 2 * ARITH_MAX;
continue;
}
Here we see the Effector map in play. We skip when the byte doesn’t
yield anything interesting in trace_bits
.
stage_cur_byte = i;
for (j = 1; j <= ARITH_MAX; j++) {
u8 r = orig ^ (orig + j);
Here, the operation is defined as:
\[ R = ORIG \oplus (ORIG+j) \]Note that, addition and xor does not commute in this equation so the result is not j. If the result does not seem to be a bitflip, let’s run the target and see if it yields anything interesting.
/* Do arithmetic operations only if the result couldn't be a product
of a bitflip. */
if (!could_be_bitflip(r)) {
stage_cur_val = j;
out_buf[i] = orig + j;
if (common_fuzz_stuff(argv, out_buf, len)) goto abandon_entry;
stage_cur++;
} else stage_max--;
This time let’s try the following:
\[ R = ORIG \oplus ( ORIG − j ) \] r = orig ^ (orig - j);
if (!could_be_bitflip(r)) {
stage_cur_val = -j;
out_buf[i] = orig - j;
if (common_fuzz_stuff(argv, out_buf, len)) goto abandon_entry;
stage_cur++;
} else stage_max--;
out_buf[i] = orig;
}
}
new_hit_cnt = queued_paths + unique_crashes;
stage_finds[STAGE_ARITH8] += new_hit_cnt - orig_hit_cnt;
stage_cycles[STAGE_ARITH8] += stage_max;
Epilogue as usual, saves total counts for stats. Let’s see the 16-bit variation.
stage_name = "arith 16/8";
stage_short = "arith16";
stage_cur = 0;
stage_max = 4 * (len - 1) * ARITH_MAX;
orig_hit_cnt = new_hit_cnt;
for (i = 0; i < len - 1; i++) {
u16 orig = *(u16*)(out_buf + i);
/* Let's consult the effector map... */
if (!eff_map[EFF_APOS(i)] && !eff_map[EFF_APOS(i + 1)]) {
stage_max -= 4 * ARITH_MAX;
continue;
}
We skip if both bytes are uninteresting in the effector map.
stage_cur_byte = i;
for (j = 1; j <= ARITH_MAX; j++) {
u16 r1 = orig ^ (orig + j),
r2 = orig ^ (orig - j),
r3 = orig ^ SWAP16(SWAP16(orig) + j),
r4 = orig ^ SWAP16(SWAP16(orig) - j);
The first two r1
,r2
are for little-endian arch, the latters are
for big-endian.
/* Try little endian addition and subtraction first. Do it only
if the operation would affect more than one byte (hence the
& 0xff overflow checks) and if it couldn't be a product of
a bitflip. */
stage_val_type = STAGE_VAL_LE;
if ((orig & 0xff) + j > 0xff && !could_be_bitflip(r1)) {
stage_cur_val = j;
*(u16*)(out_buf + i) = orig + j;
if (common_fuzz_stuff(argv, out_buf, len)) goto abandon_entry;
stage_cur++;
} else stage_max--;
So, we evaluate if the least significant byte overflows and r1
is not
a bitflip (ie. already fuzzed in bitflip).
if ((orig & 0xff) < j && !could_be_bitflip(r2)) {
stage_cur_val = -j;
*(u16*)(out_buf + i) = orig - j;
if (common_fuzz_stuff(argv, out_buf, len)) goto abandon_entry;
stage_cur++;
} else stage_max--;
Similarly, we evaluate if the least significant byte underflows and r2
is not a bitflip.
/* Big endian comes next. Same deal. */
stage_val_type = STAGE_VAL_BE;
if ((orig >> 8) + j > 0xff && !could_be_bitflip(r3)) {
stage_cur_val = j;
*(u16*)(out_buf + i) = SWAP16(SWAP16(orig) + j);
if (common_fuzz_stuff(argv, out_buf, len)) goto abandon_entry;
stage_cur++;
} else stage_max--;
if ((orig >> 8) < j && !could_be_bitflip(r4)) {
stage_cur_val = -j;
*(u16*)(out_buf + i) = SWAP16(SWAP16(orig) - j);
if (common_fuzz_stuff(argv, out_buf, len)) goto abandon_entry;
stage_cur++;
} else stage_max--;
*(u16*)(out_buf + i) = orig;
}
}
new_hit_cnt = queued_paths + unique_crashes;
stage_finds[STAGE_ARITH16] += new_hit_cnt - orig_hit_cnt;
stage_cycles[STAGE_ARITH16] += stage_max;
Big-endian evaluations and epilogue are almost the same. There is one interesting point though. Here, the stage mutates by the following equation:
\[ R=ORIG+j \]Note that this is different than the first arithmetic variation. The 32-bit arithmetic stage is similar to the 16-bit version and skipped for brevity.
Before we proceed into the next stage, what about that
could_be_bitflip()
? Recall, the function call is similar to
could_be_bitflip(orig ^ (orig + j))
.
static u8 could_be_bitflip(u32 xor_val) {
u32 sh = 0;
if (!xor_val) return 1;
Return if:
\[ 0=orig⊕(orig+j) \] \[ orig=orig+j \] \[ 0=j \] /* Shift left until first bit set. */
while (!(xor_val & 1)) { sh++; xor_val >>= 1; }
Shift until we get something non-zero. We try to skip zeros on the right of j.
/* 1-, 2-, and 4-bit patterns are OK anywhere. */
if (xor_val == 1 || xor_val == 3 || xor_val == 15) return 1;
So, let’s try to understand the above statement. Here are a few
calculations that lead to result=3
:
The last one is 3 after a shift right. So first and the second one can be achieved by two bitflips. Last one is still a double bitflip, one on the left. On the other hand, the following is not covered and it is three bitflips.
\[ 0b11 \oplus (0b11+0b01) = 0b11 \oplus 0b100=0b111 \]This makes sense if you recall bitflip variations are powers of 2. Still not convinced, let’s try another approach.
\[ orig \oplus (orig+j)=1 \] \[ orig \oplus 1 = orig+j \]Now, the problem becomes, how do we cancel orig?
\[ 00 \oplus 1=01=00+j⟹j=1 \] \[ 01 \oplus 1=00=01+j⟹j=−1 \] \[ 10 \oplus 1=11=10+j⟹j=1 \] \[ 11 \oplus 1=10=11+j⟹j=−1 \]So, j alters sign depending on the least significant bit of orig. Let’s try when the result is 2.
\[ 00 \oplus 10=10=00+j⟹j=2 \] \[ 01 \oplus 10=11=01+j⟹j=2 \] \[ 10 \oplus 10=00=10+j⟹j=−2 \] \[ 11 \oplus 10=01=11+j⟹j=−2 \]Don’t get confused, it’s the same thing, just the ordering is
different. If the orig >> 2 & 1
is true, we get 2
, otherwise
-2
. So let’s try 3
.
It is apparent that the final j
is the sum of correspoding j
’s in
the previous two equations. Aha! Now I see the logic behind. If we are
comparing the result
with all ones of any number digits, then j
can be any of the followings:
where \( j \ge 0 \), \( a_0 \in \{ 1,-1 \} \). Let’s move on.
/* 8-, 16-, and 32-bit patterns are OK only if shift factor is
divisible by 8, since that's the stepover for these ops. */
if (sh & 7) return 0;
This one check if the least significant byte is all-zero therefore, catches the cases where lsb is overflowed exactly by its complement.
if (xor_val == 0xff || xor_val == 0xffff || xor_val == 0xffffffff)
return 1;
return 0;
}
This is similar to the above situation. This concludes arithmetic stages.
Stage: interesting
These stages do employ constant values that seem to be interesting when fuzzing a target. Let’s look into the first variation.
stage_name = "interest 8/8";
stage_short = "int8";
stage_cur = 0;
stage_max = len * sizeof(interesting_8);
stage_val_type = STAGE_VAL_LE;
orig_hit_cnt = new_hit_cnt;
/* Setting 8-bit integers. */
for (i = 0; i < len; i++) {
u8 orig = out_buf[i];
/* Let's consult the effector map... */
if (!eff_map[EFF_APOS(i)]) {
stage_max -= sizeof(interesting_8);
continue;
}
We skip if the Effector map tells us this byte has no effect.
stage_cur_byte = i;
for (j = 0; j < sizeof(interesting_8); j++) {
/* Skip if the value could be a product of bitflips or arithmetics. */
if (could_be_bitflip(orig ^ (u8)interesting_8[j]) ||
could_be_arith(orig, (u8)interesting_8[j], 1)) {
stage_max--;
continue;
}
We check if this could be a bitflip or arithmetic. Otherwise, proceed into fuzzing.
stage_cur_val = interesting_8[j];
out_buf[i] = interesting_8[j];
if (common_fuzz_stuff(argv, out_buf, len)) goto abandon_entry;
out_buf[i] = orig;
stage_cur++;
}
}
Set the value in the array, fuzz and restore.
new_hit_cnt = queued_paths + unique_crashes;
stage_finds[STAGE_INTEREST8] += new_hit_cnt - orig_hit_cnt;
stage_cycles[STAGE_INTEREST8] += stage_max;
Finally, store stats about the stage. So, what are those constants? afl-fuzz.c has the following definitions.
static s8 interesting_8[] = { INTERESTING_8 };
static s16 interesting_16[] = { INTERESTING_8, INTERESTING_16 };
static s32 interesting_32[] = { INTERESTING_8, INTERESTING_16, INTERESTING_32 };
The rest is defined in config.h:
/* List of interesting values to use in fuzzing. */
#define INTERESTING_8 \
-128, /* Overflow signed 8-bit when decremented */ \
-1, /* */ \
0, /* */ \
1, /* */ \
16, /* One-off with common buffer size */ \
32, /* One-off with common buffer size */ \
64, /* One-off with common buffer size */ \
100, /* One-off with common buffer size */ \
127 /* Overflow signed 8-bit when incremented */
Some of the values are for overflow and underflows. The others are weight zero and one. Finally, there is 100 :) Let’s see the 16-bit variation.
stage_name = "interest 16/8";
stage_short = "int16";
stage_cur = 0;
stage_max = 2 * (len - 1) * (sizeof(interesting_16) >> 1);
orig_hit_cnt = new_hit_cnt;
for (i = 0; i < len - 1; i++) {
u16 orig = *(u16*)(out_buf + i);
/* Let's consult the effector map... */
if (!eff_map[EFF_APOS(i)] && !eff_map[EFF_APOS(i + 1)]) {
stage_max -= sizeof(interesting_16);
continue;
}
Again, skip if the Effector tells us that these two bytes are uninteresting.
stage_cur_byte = i;
for (j = 0; j < sizeof(interesting_16) / 2; j++) {
stage_cur_val = interesting_16[j];
/* Skip if this could be a product of a bitflip, arithmetics,
or single-byte interesting value insertion. */
if (!could_be_bitflip(orig ^ (u16)interesting_16[j]) &&
!could_be_arith(orig, (u16)interesting_16[j], 2) &&
!could_be_interest(orig, (u16)interesting_16[j], 2, 0)) {
stage_val_type = STAGE_VAL_LE;
*(u16*)(out_buf + i) = interesting_16[j];
if (common_fuzz_stuff(argv, out_buf, len)) goto abandon_entry;
stage_cur++;
} else stage_max--;
Little-endian fuzzing if it is not bitflip, arithmetic or single byte interesting.
if ((u16)interesting_16[j] != SWAP16(interesting_16[j]) &&
!could_be_bitflip(orig ^ SWAP16(interesting_16[j])) &&
!could_be_arith(orig, SWAP16(interesting_16[j]), 2) &&
!could_be_interest(orig, SWAP16(interesting_16[j]), 2, 1)) {
stage_val_type = STAGE_VAL_BE;
*(u16*)(out_buf + i) = SWAP16(interesting_16[j]);
if (common_fuzz_stuff(argv, out_buf, len)) goto abandon_entry;
stage_cur++;
} else stage_max--;
}
*(u16*)(out_buf + i) = orig;
}
Big-endian fuzzing block.
new_hit_cnt = queued_paths + unique_crashes;
stage_finds[STAGE_INTEREST16] += new_hit_cnt - orig_hit_cnt;
stage_cycles[STAGE_INTEREST16] += stage_max;
Store stats as usual. The 4-byte variation is similar to the 2-byte version so I skip it for brevity. Let’s see the interesting 2-byte and 4-byte values.
#define INTERESTING_16 \
-32768, /* Overflow signed 16-bit when decremented */ \
-129, /* Overflow signed 8-bit */ \
128, /* Overflow signed 8-bit */ \
255, /* Overflow unsig 8-bit when incremented */ \
256, /* Overflow unsig 8-bit */ \
512, /* One-off with common buffer size */ \
1000, /* One-off with common buffer size */ \
1024, /* One-off with common buffer size */ \
4096, /* One-off with common buffer size */ \
32767 /* Overflow signed 16-bit when incremented */
#define INTERESTING_32 \
-2147483648LL, /* Overflow signed 32-bit when decremented */ \
-100663046, /* Large negative number (endian-agnostic) */ \
-32769, /* Overflow signed 16-bit */ \
32768, /* Overflow signed 16-bit */ \
65535, /* Overflow unsig 16-bit when incremented */ \
65536, /* Overflow unsig 16 bit */ \
100663045, /* Large positive number (endian-agnostic) */ \
2147483647 /* Overflow signed 32-bit when incremented */
Before proceeding, let’s see could_be_arith()
. I promise this is
easier than could_be_bitflip()
. The call is similar to
could_be_arith(orig, (u8)interesting_8[j], 1))
.
static u8 could_be_arith(u32 old_val, u32 new_val, u8 blen) {
u32 i, ov = 0, nv = 0, diffs = 0;
if (old_val == new_val) return 1;
/* See if one-byte adjustments to any byte could produce this result. */
for (i = 0; i < blen; i++) {
u8 a = old_val >> (8 * i),
b = new_val >> (8 * i);
if (a != b) { diffs++; ov = a; nv = b; }
}
Skip bytes that are exactly the same. Save new values in ov
and nv
.
/* If only one byte differs and the values are within range, return 1. */
if (diffs == 1) {
if ((u8)(ov - nv) <= ARITH_MAX ||
(u8)(nv - ov) <= ARITH_MAX) return 1;
}
Compare the difference with ARITH_MAX
and return 1 if that’s the case
(ie. already covered by arithmetic stage). Recall, ARITH_MAX
is
defined in config.h to be 35
.
if (blen == 1) return 0;
/* See if two-byte adjustments to any byte would produce this result. */
diffs = 0;
for (i = 0; i < blen / 2; i++) {
u16 a = old_val >> (16 * i),
b = new_val >> (16 * i);
if (a != b) { diffs++; ov = a; nv = b; }
}
/* If only one word differs and the values are within range, return 1. */
if (diffs == 1) {
if ((u16)(ov - nv) <= ARITH_MAX ||
(u16)(nv - ov) <= ARITH_MAX) return 1;
LSB’s are the same but the next two bytes are different. Compare the
difference with ARITH_MAX
and decide.
ov = SWAP16(ov); nv = SWAP16(nv);
if ((u16)(ov - nv) <= ARITH_MAX ||
(u16)(nv - ov) <= ARITH_MAX) return 1;
}
This is basically the big-endian variation of the previous check.
/* Finally, let's do the same thing for dwords. */
if (blen == 4) {
if ((u32)(old_val - new_val) <= ARITH_MAX ||
(u32)(new_val - old_val) <= ARITH_MAX) return 1;
So, all four bytes are different nevertheless check if they are close enough.
new_val = SWAP32(new_val);
old_val = SWAP32(old_val);
if ((u32)(old_val - new_val) <= ARITH_MAX ||
(u32)(new_val - old_val) <= ARITH_MAX) return 1;
}
return 0;
}
Finally, big-endian check is applied. The only function left is
could_be_interest()
. It is given below and looks
complicated. However, if you look close enough, you’ll see that parts
of the input are replaced by constants defined in the
interest_{8,16,32}
array and a check is applied whether the new and
old values match. Depending on the size of the interesting, it masks
and or’s the values.
static u8 could_be_interest(u32 old_val, u32 new_val, u8 blen, u8 check_le) {
u32 i, j;
if (old_val == new_val) return 1;
/* See if one-byte insertions from interesting_8 over old_val could
produce new_val. */
for (i = 0; i < blen; i++) {
for (j = 0; j < sizeof(interesting_8); j++) {
u32 tval = (old_val & ~(0xff << (i * 8))) |
(((u8)interesting_8[j]) << (i * 8));
if (new_val == tval) return 1;
}
}
/* Bail out unless we're also asked to examine two-byte LE insertions
as a preparation for BE attempts. */
if (blen == 2 && !check_le) return 0;
/* See if two-byte insertions over old_val could give us new_val. */
for (i = 0; i < blen - 1; i++) {
for (j = 0; j < sizeof(interesting_16) / 2; j++) {
u32 tval = (old_val & ~(0xffff << (i * 8))) |
(((u16)interesting_16[j]) << (i * 8));
if (new_val == tval) return 1;
/* Continue here only if blen > 2. */
if (blen > 2) {
tval = (old_val & ~(0xffff << (i * 8))) |
(SWAP16(interesting_16[j]) << (i * 8));
if (new_val == tval) return 1;
}
}
}
if (blen == 4 && check_le) {
/* See if four-byte insertions could produce the same result
(LE only). */
for (j = 0; j < sizeof(interesting_32) / 4; j++)
if (new_val == (u32)interesting_32[j]) return 1;
}
return 0;
}
This concludes the interesting stage.
Stage: extras
These extra fuzzing stages allow us to replace a part of the input or insert special bytes into the input. The user_extras (over) variant just overwrites the input with the provided extra.
stage_name = "user extras (over)";
stage_short = "ext_UO";
stage_cur = 0;
stage_max = extras_cnt * len;
stage_val_type = STAGE_VAL_NONE;
orig_hit_cnt = new_hit_cnt;
for (i = 0; i < len; i++) {
u32 last_len = 0;
stage_cur_byte = i;
/* Extras are sorted by size, from smallest to largest. This means
that we don't have to worry about restoring the buffer in
between writes at a particular offset determined by the outer
loop. */
for (j = 0; j < extras_cnt; j++) {
/* Skip extras probabilistically if extras_cnt > MAX_DET_EXTRAS. Also
skip them if there's no room to insert the payload, if the token
is redundant, or if its entire span has no bytes set in the effector
map. */
if ((extras_cnt > MAX_DET_EXTRAS && UR(extras_cnt) >= MAX_DET_EXTRAS) ||
extras[j].len > len - i ||
!memcmp(extras[j].data, out_buf + i, extras[j].len) ||
!memchr(eff_map + EFF_APOS(i), 1, EFF_SPAN_ALEN(i, extras[j].len))) {
stage_max--;
continue;
}
Check if the data in extras is the same. Check if the Effector tells
us there is nothing interesting in the following bytes. One might
argue that this stage is probabilistic as you might noticed the
condition UR(extras_cnt) >= MAX_DET_EXTRAS
. That’s actually
correct. The data is deterministic but the decision to apply that
mutation is probabilistic. MAX_DET_EXTRAS
is defined in config.h
as 200. So, say we have 400 extras, then that particular extra will be
included with probability of \( \frac{1}{2} \) . Interesting enough,
under 200 extras, all will be deterministic. Therefore, it determines
an upper bound on the determinism and switches to probabalistic mode
when we have more extras than 200.
last_len = extras[j].len;
memcpy(out_buf + i, extras[j].data, last_len);
if (common_fuzz_stuff(argv, out_buf, len)) goto abandon_entry;
stage_cur++;
}
The mutation is applied to the data at offset i and the target is executed.
/* Restore all the clobbered memory. */
memcpy(out_buf + i, in_buf + i, last_len);
}
new_hit_cnt = queued_paths + unique_crashes;
stage_finds[STAGE_EXTRAS_UO] += new_hit_cnt - orig_hit_cnt;
stage_cycles[STAGE_EXTRAS_UO] += stage_max;
user_extras (insert)
stage is similar to the previous overwrite
stage however it inserts the extras instead of overwriting.
stage_name = "user extras (insert)";
stage_short = "ext_UI";
stage_cur = 0;
stage_max = extras_cnt * (len + 1);
orig_hit_cnt = new_hit_cnt;
ex_tmp = ck_alloc(len + MAX_DICT_FILE);
for (i = 0; i <= len; i++) {
stage_cur_byte = i;
for (j = 0; j < extras_cnt; j++) {
if (len + extras[j].len > MAX_FILE) {
stage_max--;
continue;
}
Skip if the length of the extra plus the current buffer is larger than
MAX_FILE
. The upper bound is defined in config.h as 1MiB (1 * 1024 * 1024
).
/* Insert token */
memcpy(ex_tmp + i, extras[j].data, extras[j].len);
Insert extra at offset i
.
/* Copy tail */
memcpy(ex_tmp + i + extras[j].len, out_buf + i, len - i);
Copy the rest of the data to the temporary buffer ex_tmp
.
if (common_fuzz_stuff(argv, ex_tmp, len + extras[j].len)) {
ck_free(ex_tmp);
goto abandon_entry;
}
stage_cur++;
}
/* Copy head */
ex_tmp[i] = out_buf[i];
}
So, this is a little bit strange, don’t you think? What about bytes
before i
? The trick is extras are sorted by ascending length and the
buffer is allocated only once. Your guess is correct as mine. It is
that last line that restores the temporary buffer to the original ;)
ck_free(ex_tmp);
new_hit_cnt = queued_paths + unique_crashes;
stage_finds[STAGE_EXTRAS_UI] += new_hit_cnt - orig_hit_cnt;
stage_cycles[STAGE_EXTRAS_UI] += stage_max;
Epilogue of the stage is similar to the others. The only question we
are left with is, where do these extras[]
come from? Let’s start with
the global definitions.
struct extra_data {
u8* data; /* Dictionary token data */
u32 len; /* Dictionary token length */
u32 hit_cnt; /* Use count in the corpus */
};
static struct extra_data* extras; /* Extra tokens to fuzz with */
static u32 extras_cnt; /* Total number of tokens read */
static struct extra_data* a_extras; /* Automatically selected extras */
static u32 a_extras_cnt; /* Total number of tokens available */
Next, let’s see where these extras provided by the user.
case 'x': /* dictionary */
if (extras_dir) FATAL("Multiple -x options not supported");
extras_dir = optarg;
break;
}
.. skippd for brevity ..
if (extras_dir) load_extras(extras_dir);
Aha! So, -x
switch on the command line is related to
extras[]
. Besides, it is load_extras()
’s duty to load the extras into
the global array. I’m not going to cover the whole function but a
small part at the end.
check_and_sort:
if (!extras_cnt) FATAL("No usable files in '%s'", dir);
qsort(extras, extras_cnt, sizeof(struct extra_data), compare_extras_len);
OKF("Loaded %u extra tokens, size range %s to %s.", extras_cnt,
DMS(min_len), DMS(max_len));
if (max_len > 32)
WARNF("Some tokens are relatively large (%s) - consider trimming.",
DMS(max_len));
if (extras_cnt > MAX_DET_EXTRAS)
WARNF("More than %u tokens - will use them probabilistically.",
MAX_DET_EXTRAS);
}
The epilogue of the function tells us several things. First, extras[]
are sorted according to their lengths. Next, it is better to keep the
length below 32. Similarly, if the number of dictionary items is
larger than MAX_DET_EXTRAS
, then it will switch to probabilistic
mode. Recall, that the value is 400.
The next variation is called auto_extras (over)
is another
overwriting variation.
stage_name = "auto extras (over)";
stage_short = "ext_AO";
stage_cur = 0;
stage_max = MIN(a_extras_cnt, USE_AUTO_EXTRAS) * len;
stage_val_type = STAGE_VAL_NONE;
orig_hit_cnt = new_hit_cnt;
for (i = 0; i < len; i++) {
u32 last_len = 0;
stage_cur_byte = i;
for (j = 0; j < MIN(a_extras_cnt, USE_AUTO_EXTRAS); j++) {
/* See the comment in the earlier code; extras are sorted by size. */
if (a_extras[j].len > len - i ||
!memcmp(a_extras[j].data, out_buf + i, a_extras[j].len) ||
!memchr(eff_map + EFF_APOS(i), 1, EFF_SPAN_ALEN(i, a_extras[j].len))) {
stage_max--;
continue;
}
last_len = a_extras[j].len;
memcpy(out_buf + i, a_extras[j].data, last_len);
if (common_fuzz_stuff(argv, out_buf, len)) goto abandon_entry;
stage_cur++;
}
/* Restore all the clobbered memory. */
memcpy(out_buf + i, in_buf + i, last_len);
}
new_hit_cnt = queued_paths + unique_crashes;
stage_finds[STAGE_EXTRAS_AO] += new_hit_cnt - orig_hit_cnt;
stage_cycles[STAGE_EXTRAS_AO] += stage_max;
Very similar to user_extras (over)
stage with one question? Where does
a_extras[]
come from?
auto collect
I must admit I’ve cheated and removed a code block in the first bitflip stage. Let’s go ahead and revisit that.
/* Single walking bit. */
stage_short = "flip1";
stage_max = len << 3;
stage_name = "bitflip 1/1";
stage_val_type = STAGE_VAL_NONE;
orig_hit_cnt = queued_paths + unique_crashes;
prev_cksum = queue_cur->exec_cksum;
for (stage_cur = 0; stage_cur < stage_max; stage_cur++) {
stage_cur_byte = stage_cur >> 3;
FLIP_BIT(out_buf, stage_cur);
if (common_fuzz_stuff(argv, out_buf, len)) goto abandon_entry;
FLIP_BIT(out_buf, stage_cur);
Until this point, everything seems in order.
/* While flipping the least significant bit in every byte, pull of an extra
trick to detect possible syntax tokens. In essence, the idea is that if
you have a binary blob like this:
xxxxxxxxIHDRxxxxxxxx
...and changing the leading and trailing bytes causes variable or no
changes in program flow, but touching any character in the "IHDR" string
always produces the same, distinctive path, it's highly likely that
"IHDR" is an atomically-checked magic value of special significance to
the fuzzed format.
We do this here, rather than as a separate stage, because it's a nice
way to keep the operation approximately "free" (i.e., no extra execs).
Empirically, performing the check when flipping the least significant bit
is advantageous, compared to doing it at the time of more disruptive
changes, where the program flow may be affected in more violent ways.
The caveat is that we won't generate dictionaries in the -d mode or -S
mode - but that's probably a fair trade-off.
This won't work particularly well with paths that exhibit variable
behavior, but fails gracefully, so we'll carry out the checks anyway.
*/
if (!dumb_mode && (stage_cur & 7) == 7) {
Branch if we have just evaluated the last bit of a byte.
u32 cksum = hash32(trace_bits, MAP_SIZE, HASH_CONST);
Calculate the checksum of the trace_bits
.
if (stage_cur == stage_max - 1 && cksum == prev_cksum) {
/* If at end of file and we are still collecting a string, grab the
final character and force output. */
if (a_len < MAX_AUTO_EXTRA) a_collect[a_len] = out_buf[stage_cur >> 3];
a_len++;
if (a_len >= MIN_AUTO_EXTRA && a_len <= MAX_AUTO_EXTRA)
maybe_add_auto(a_collect, a_len);
Now, collect the last byte into a_collect
array. If we had enough,
decide in maybe_add_auto()
.
} else if (cksum != prev_cksum) {
/* Otherwise, if the checksum has changed, see if we have something
worthwhile queued up, and collect that if the answer is yes. */
if (a_len >= MIN_AUTO_EXTRA && a_len <= MAX_AUTO_EXTRA)
maybe_add_auto(a_collect, a_len);
a_len = 0;
prev_cksum = cksum;
}
If checksum is different (ie something interesting), decide in
maybe_add_auto()
. Notice that the counter is reset in this branch and
prev_cksum
is set.
/* Continue collecting string, but only if the bit flip actually made
any difference - we don't want no-op tokens. */
if (cksum != queue_cur->exec_cksum) {
if (a_len < MAX_AUTO_EXTRA) a_collect[a_len] = out_buf[stage_cur >> 3];
a_len++;
}
}
}
So the above still collects bytes into a_collect
buffer
deterministicly. The rest similar to any bitflip stage.
new_hit_cnt = queued_paths + unique_crashes;
stage_finds[STAGE_FLIP1] += new_hit_cnt - orig_hit_cnt;
stage_cycles[STAGE_FLIP1] += stage_max;
Constants are defined in config.h as follows.
#define MIN_AUTO_EXTRA 3
#define MAX_AUTO_EXTRA 32
Last piece of the puzzle is maybe_add_auto()
.
/* Maybe add automatic extra. */
static void maybe_add_auto(u8* mem, u32 len) {
u32 i;
/* Allow users to specify that they don't want auto dictionaries. */
if (!MAX_AUTO_EXTRAS || !USE_AUTO_EXTRAS) return;
Check if MAX_AUTO_EXTRAS
or USE_AUTO_EXTRAS
is zero. They are
defined in config.h as follows:
#define USE_AUTO_EXTRAS 50
#define MAX_AUTO_EXTRAS (USE_AUTO_EXTRAS * 10)
Next, skip the constant header.
/* Skip runs of identical bytes. */
for (i = 1; i < len; i++)
if (mem[0] ^ mem[i]) break;
if (i == len) return;
Check to see if the value is already in interesting.
/* Reject builtin interesting values. */
if (len == 2) {
i = sizeof(interesting_16) >> 1;
while (i--)
if (*((u16*)mem) == interesting_16[i] ||
*((u16*)mem) == SWAP16(interesting_16[i])) return;
}
if (len == 4) {
i = sizeof(interesting_32) >> 2;
while (i--)
if (*((u32*)mem) == interesting_32[i] ||
*((u32*)mem) == SWAP32(interesting_32[i])) return;
}
Check to see if it is already in the dictionary.
/* Reject anything that matches existing extras. Do a case-insensitive
match. We optimize by exploiting the fact that extras[] are sorted
by size. */
for (i = 0; i < extras_cnt; i++)
if (extras[i].len >= len) break;
for (; i < extras_cnt && extras[i].len == len; i++)
if (!memcmp_nocase(extras[i].data, mem, len)) return;
/* Last but not least, check a_extras[] for matches. There are no
guarantees of a particular sort order. */
auto_changed = 1;
Check to see if it is previously auto collected.
for (i = 0; i < a_extras_cnt; i++) {
if (a_extras[i].len == len && !memcmp_nocase(a_extras[i].data, mem, len)) {
a_extras[i].hit_cnt++;
goto sort_a_extras;
}
}
Finally, if everything seems OK, save it in a_extras[]
.
/* At this point, looks like we're dealing with a new entry. So, let's
append it if we have room. Otherwise, let's randomly evict some other
entry from the bottom half of the list. */
if (a_extras_cnt < MAX_AUTO_EXTRAS) {
a_extras = ck_realloc_block(a_extras, (a_extras_cnt + 1) *
sizeof(struct extra_data));
a_extras[a_extras_cnt].data = ck_memdup(mem, len);
a_extras[a_extras_cnt].len = len;
a_extras_cnt++;
} else {
i = MAX_AUTO_EXTRAS / 2 +
UR((MAX_AUTO_EXTRAS + 1) / 2);
ck_free(a_extras[i].data);
a_extras[i].data = ck_memdup(mem, len);
a_extras[i].len = len;
a_extras[i].hit_cnt = 0;
}
sort_a_extras:
/* First, sort all auto extras by use count, descending order. */
qsort(a_extras, a_extras_cnt, sizeof(struct extra_data),
compare_extras_use_d);
/* Then, sort the top USE_AUTO_EXTRAS entries by size. */
qsort(a_extras, MIN(USE_AUTO_EXTRAS, a_extras_cnt),
sizeof(struct extra_data), compare_extras_len);
}
This concludes all deterministic stages. The next stage is called havoc and it is pretty exciting since it is probabilistic. Curious?
Stage: havoc
This stage is a probabilistic stage where the mutators are applied in
a probabilistic manner. Also, the havoc stage is intertwined with
the splice’ng stage. When we are in havoc, splice_cycle is
zero. doing_det
is set to 1 in the beginning of fuzz_one()
and
stays the same.
havoc_stage:
stage_cur_byte = -1;
/* The havoc stage mutation code is also invoked when splicing files; if the
splice_cycle variable is set, generate different descriptions and such. */
if (!splice_cycle) {
stage_name = "havoc";
stage_short = "havoc";
stage_max = (doing_det ?
HAVOC_CYCLES_INIT /* 1024 */:
HAVOC_CYCLES /* 256 */) * perf_score / havoc_div / 100;
stage_max
is set to HAVOC_CYCLES_INIT
at the beginning and the value
is 1024.
} else {
static u8 tmp[32];
perf_score = orig_perf;
sprintf(tmp, "splice %u", splice_cycle);
stage_name = tmp;
stage_short = "splice";
stage_max = SPLICE_HAVOC * perf_score / havoc_div / 100;
}
The above is splice stage initial configuration. I will revisit it when we are dealing with that particular stage.
if (stage_max < HAVOC_MIN) stage_max = HAVOC_MIN;
HAVOC_MIN
is set to 16 in config.h, so stage_max
stays the same.
temp_len = len;
orig_hit_cnt = queued_paths + unique_crashes;
havoc_queued = queued_paths;
/* We essentially just do several thousand runs (depending on perf_score)
where we take the input file and make random stacked tweaks. */
for (stage_cur = 0; stage_cur < stage_max; stage_cur++) {
u32 use_stacking = 1 << (1 + UR(HAVOC_STACK_POW2));
stage_cur_val = use_stacking;
So the stacking number is set here. HAVOC_STACK_POW2
is set 7 in
config.h, therefore the number of stackings is in \( \{
1,2,4,8,16,32,128 \}\). The idea here is to select a mutator randomly
and stack them together to fuzz the the target. Next, we’ll see a
large switch statement with cases for each mutator. These mutators are
similar to the ones we have already seen. One important note is,
mutation is applied to random bytes as in out_buf[UR(temp_len)] = ...
.
for (i = 0; i < use_stacking; i++) {
switch (UR(15 + ((extras_cnt + a_extras_cnt) ? 2 : 0))) {
case 0:
/* Flip a single bit somewhere. Spooky! */
FLIP_BIT(out_buf, UR(temp_len << 3));
break;
One bitflip mutator.
case 1:
/* Set byte to interesting value. */
out_buf[UR(temp_len)] = interesting_8[UR(sizeof(interesting_8))];
break;
case 2:
/* Set word to interesting value, randomly choosing endian. */
if (temp_len < 2) break;
if (UR(2)) {
*(u16*)(out_buf + UR(temp_len - 1)) =
interesting_16[UR(sizeof(interesting_16) >> 1)];
} else {
*(u16*)(out_buf + UR(temp_len - 1)) = SWAP16(
interesting_16[UR(sizeof(interesting_16) >> 1)]);
}
break;
case 3:
/* Set dword to interesting value, randomly choosing endian. */
if (temp_len < 4) break;
if (UR(2)) {
*(u32*)(out_buf + UR(temp_len - 3)) =
interesting_32[UR(sizeof(interesting_32) >> 2)];
} else {
*(u32*)(out_buf + UR(temp_len - 3)) = SWAP32(
interesting_32[UR(sizeof(interesting_32) >> 2)]);
}
break;
Interesting mutators for different sizes (8, 16, 32). It selects
random values from interesting_{8,16,32}
arrays via UR(2)
that returns
a random values in \( \{ 0 , 1 \} \).
case 4:
/* Randomly subtract from byte. */
out_buf[UR(temp_len)] -= 1 + UR(ARITH_MAX);
break;
case 5:
/* Randomly add to byte. */
out_buf[UR(temp_len)] += 1 + UR(ARITH_MAX);
break;
case 6:
/* Randomly subtract from word, random endian. */
if (temp_len < 2) break;
if (UR(2)) {
u32 pos = UR(temp_len - 1);
*(u16*)(out_buf + pos) -= 1 + UR(ARITH_MAX);
} else {
u32 pos = UR(temp_len - 1);
u16 num = 1 + UR(ARITH_MAX);
*(u16*)(out_buf + pos) =
SWAP16(SWAP16(*(u16*)(out_buf + pos)) - num);
}
break;
case 7:
/* Randomly add to word, random endian. */
if (temp_len < 2) break;
if (UR(2)) {
u32 pos = UR(temp_len - 1);
*(u16*)(out_buf + pos) += 1 + UR(ARITH_MAX);
} else {
u32 pos = UR(temp_len - 1);
u16 num = 1 + UR(ARITH_MAX);
*(u16*)(out_buf + pos) =
SWAP16(SWAP16(*(u16*)(out_buf + pos)) + num);
}
break;
case 8:
/* Randomly subtract from dword, random endian. */
if (temp_len < 4) break;
if (UR(2)) {
u32 pos = UR(temp_len - 3);
*(u32*)(out_buf + pos) -= 1 + UR(ARITH_MAX);
} else {
u32 pos = UR(temp_len - 3);
u32 num = 1 + UR(ARITH_MAX);
*(u32*)(out_buf + pos) =
SWAP32(SWAP32(*(u32*)(out_buf + pos)) - num);
}
break;
case 9:
/* Randomly add to dword, random endian. */
if (temp_len < 4) break;
if (UR(2)) {
u32 pos = UR(temp_len - 3);
*(u32*)(out_buf + pos) += 1 + UR(ARITH_MAX);
} else {
u32 pos = UR(temp_len - 3);
u32 num = 1 + UR(ARITH_MAX);
*(u32*)(out_buf + pos) =
SWAP32(SWAP32(*(u32*)(out_buf + pos)) + num);
}
break;
These mutators randomly apply algebraic operators with randomly selected operands.
case 10:
/* Just set a random byte to a random value. Because,
why not. We use XOR with 1-255 to eliminate the
possibility of a no-op. */
out_buf[UR(temp_len)] ^= 1 + UR(255);
break;
XOR’s a randomly generated byte with a byte in randomly selected position in the input.
case 11 ... 12: {
/* Delete bytes. We're making this a bit more likely
than insertion (the next option) in hopes of keeping
files reasonably small. */
u32 del_from, del_len;
if (temp_len < 2) break;
/* Don't delete too much. */
del_len = choose_block_len(temp_len - 1);
del_from = UR(temp_len - del_len + 1);
memmove(out_buf + del_from, out_buf + del_from + del_len,
temp_len - del_from - del_len);
temp_len -= del_len;
break;
}
This one deletes a block. The length of the block and the position is
selected randomly. Let’s see how the block length is choosen in
choose_block_len()
.
static u32 choose_block_len(u32 limit) {
u32 min_value, max_value;
u32 rlim = MIN(queue_cycle, 3);
if (!run_over10m) rlim = 1;
switch (UR(rlim)) {
case 0: min_value = 1;
max_value = HAVOC_BLK_SMALL;
break;
case 1: min_value = HAVOC_BLK_SMALL;
max_value = HAVOC_BLK_MEDIUM;
break;
default:
if (UR(10)) {
min_value = HAVOC_BLK_MEDIUM;
max_value = HAVOC_BLK_LARGE;
} else {
min_value = HAVOC_BLK_LARGE;
max_value = HAVOC_BLK_XL;
}
}
if (min_value >= limit) min_value = 1;
return min_value + UR(MIN(max_value, limit) - min_value + 1);
}
run_over10m
is a flag that tells if the fuzzer is running more than 10
minutes. So when the fuzzer first starts, it only selects small
blocks. queue_cycle
is the number cycles that this fuzzer instance has
spent fuzzing the target. A cycle is defined to be a complete set of
fuzzing actions listed in fuzz_one()
on a single target binary. So
after the third cycle, rlim
is set to be 3. Finally, the havoc block
sizes are defined in config.h as follows:
#define HAVOC_BLK_SMALL 32
#define HAVOC_BLK_MEDIUM 128
#define HAVOC_BLK_LARGE 1500
#define HAVOC_BLK_XL 32768
The next mutator is somewhat unlucky ;) It has two modes. First mode
clones bytes, the other one inserts a constant block. Mode is selected
randomly according to the flag u8 actually_clone = UR(4)
. In the
first mode, when the flag is zero, clone_len
is set to the maximum
havoc blocksize, clone_from
is zero. clone_to
is a random length
less than the length of the input. First, new_buf
is allocated,
first clone_to
bytes of input is copied into it. From clone_to
to
clone_len
bytes, the buffer is overwritten by either random bytes or
random bytes from the input. Finally, the rest of the input is
copied. This is the insert mode. Next, when actually_clone
flag is
non-zero, clone_len
is a random havoc blocksize and clone_from
is
random in [0…temp_len-clone_len+1]
. The first clone_to
bytes are
from the input as in the previous mode. Then, clone_len
bytes from
the input are cloned. Finally, the buffer is filled with the rest.
Insert mode is executed with \( \frac{1}{4} \) probability and clone mode is executed with \( \frac{3}{4} \)
case 13:
if (temp_len + HAVOC_BLK_XL < MAX_FILE) {
/* Clone bytes (75%) or insert a block of constant bytes (25%). */
u8 actually_clone = UR(4);
u32 clone_from, clone_to, clone_len;
u8* new_buf;
if (actually_clone) {
clone_len = choose_block_len(temp_len);
clone_from = UR(temp_len - clone_len + 1);
} else {
clone_len = choose_block_len(HAVOC_BLK_XL);
clone_from = 0;
}
clone_to = UR(temp_len);
new_buf = ck_alloc_nozero(temp_len + clone_len);
/* Head */
memcpy(new_buf, out_buf, clone_to);
/* Inserted part */
if (actually_clone)
memcpy(new_buf + clone_to, out_buf + clone_from, clone_len);
else
memset(new_buf + clone_to,
UR(2) ? UR(256) : out_buf[UR(temp_len)], clone_len);
/* Tail */
memcpy(new_buf + clone_to + clone_len, out_buf + clone_to,
temp_len - clone_to);
ck_free(out_buf);
out_buf = new_buf;
temp_len += clone_len;
}
break;
Next mutator is similar to the previous one. There are two different
modes. copy_len
is first set to a random havoc blocksize. With \(
\frac{1}{4} \) probability, copy_len
bytes are copied from offset
copy_from
to offset copy_to
. With probability \( \frac{3}{4} \),
instead of the input buffer, the source is either a randomly
generated byte stream or a byte at a random position in the input
buffer with equal probabilities.
case 14: {
/* Overwrite bytes with a randomly selected chunk (75%) or fixed
bytes (25%). */
u32 copy_from, copy_to, copy_len;
if (temp_len < 2) break;
copy_len = choose_block_len(temp_len - 1);
copy_from = UR(temp_len - copy_len + 1);
copy_to = UR(temp_len - copy_len + 1);
if (UR(4)) {
if (copy_from != copy_to)
memmove(out_buf + copy_to, out_buf + copy_from, copy_len);
} else memset(out_buf + copy_to,
UR(2) ? UR(256) : out_buf[UR(temp_len)], copy_len);
break;
}
Next two mutators are enabled only when there are either dictionary items provided by the user or auto collect sub-stage collected several items during the bitflip stage. The first one overwrites an extra at a random position while the second one inserts.
/* Values 15 and 16 can be selected only if there are any extras
present in the dictionaries. */
case 15: {
/* Overwrite bytes with an extra. */
if (!extras_cnt || (a_extras_cnt && UR(2))) {
/* No user-specified extras or odds in our favor. Let's use an
auto-detected one. */
u32 use_extra = UR(a_extras_cnt);
u32 extra_len = a_extras[use_extra].len;
u32 insert_at;
if (extra_len > temp_len) break;
insert_at = UR(temp_len - extra_len + 1);
memcpy(out_buf + insert_at, a_extras[use_extra].data, extra_len);
} else {
/* No auto extras or odds in our favor. Use the dictionary. */
u32 use_extra = UR(extras_cnt);
u32 extra_len = extras[use_extra].len;
u32 insert_at;
if (extra_len > temp_len) break;
insert_at = UR(temp_len - extra_len + 1);
memcpy(out_buf + insert_at, extras[use_extra].data, extra_len);
}
break;
}
case 16: {
u32 use_extra, extra_len, insert_at = UR(temp_len + 1);
u8* new_buf;
/* Insert an extra. Do the same dice-rolling stuff as for the
previous case. */
if (!extras_cnt || (a_extras_cnt && UR(2))) {
use_extra = UR(a_extras_cnt);
extra_len = a_extras[use_extra].len;
if (temp_len + extra_len >= MAX_FILE) break;
new_buf = ck_alloc_nozero(temp_len + extra_len);
/* Head */
memcpy(new_buf, out_buf, insert_at);
/* Inserted part */
memcpy(new_buf + insert_at, a_extras[use_extra].data, extra_len);
} else {
use_extra = UR(extras_cnt);
extra_len = extras[use_extra].len;
if (temp_len + extra_len >= MAX_FILE) break;
new_buf = ck_alloc_nozero(temp_len + extra_len);
/* Head */
memcpy(new_buf, out_buf, insert_at);
/* Inserted part */
memcpy(new_buf + insert_at, extras[use_extra].data, extra_len);
}
/* Tail */
memcpy(new_buf + insert_at + extra_len, out_buf + insert_at,
temp_len - insert_at);
ck_free(out_buf);
out_buf = new_buf;
temp_len += extra_len;
break;
}
}
}
After use_stacking
many mutations are applied, the target binary is
run.
if (common_fuzz_stuff(argv, out_buf, temp_len)) goto abandon_entry;
/* out_buf might have been mangled a bit, so let's restore it to its
original size and shape. */
if (temp_len < len) out_buf = ck_realloc(out_buf, len);
temp_len = len;
memcpy(out_buf, in_buf, len);
Restore the buffer to a pristine state.
/* If we're finding new stuff, let's run for a bit longer, limits
permitting. */
if (queued_paths != havoc_queued) {
if (perf_score <= HAVOC_MAX_MULT * 100) {
stage_max *= 2;
perf_score *= 2;
}
havoc_queued = queued_paths;
}
}
These are for splice’ng stage. We’ll see how they configure the stage in the next section.
new_hit_cnt = queued_paths + unique_crashes;
if (!splice_cycle) {
stage_finds[STAGE_HAVOC] += new_hit_cnt - orig_hit_cnt;
stage_cycles[STAGE_HAVOC] += stage_max;
} else {
stage_finds[STAGE_SPLICE] += new_hit_cnt - orig_hit_cnt;
stage_cycles[STAGE_SPLICE] += stage_max;
}
As usual, stats are saved for later use.
Stage: splice
By default, this stage is not enabled. It is enabled when we have a
full cycle without any new findings. Let’s see relevant lines in
main()
:
/* If we had a full queue cycle with no new finds, try
recombination strategies next. */
if (queued_paths == prev_queued) {
if (use_splicing) cycles_wo_finds++; else use_splicing = 1;
} else cycles_wo_finds = 0;
Also, -d
command-line switch enables splicing which basically makes
the fuzzer skip deterministic stages.
#ifndef IGNORE_FINDS
/************
* SPLICING *
************/
/* This is a last-resort strategy triggered by a full round with no findings.
It takes the current input file, randomly selects another input, and
splices them together at some offset, then relies on the havoc
code to mutate that blob. */
retry_splicing:
if (use_splicing && splice_cycle++ < SPLICE_CYCLES &&
queued_paths > 1 && queue_cur->len > 1) {
struct queue_entry* target;
u32 tid, split_at;
u8* new_buf;
s32 f_diff, l_diff;
/* First of all, if we've modified in_buf for havoc, let's clean that up. */
if (in_buf != orig_in) {
ck_free(in_buf);
in_buf = orig_in;
len = queue_cur->len;
}
First restore in_buf
to a pristine state.
/* Pick a random queue entry and seek to it. Don't splice with yourself. */
do { tid = UR(queued_paths); } while (tid == current_entry);
splicing_with = tid;
target = queue;
while (tid >= 100) { target = target->next_100; tid -= 100; }
while (tid--) target = target->next;
/* Make sure that the target has a reasonable length. */
while (target && (target->len < 2 || target == queue_cur)) {
target = target->next;
splicing_with++;
}
if (!target) goto retry_splicing;
The above code basically tries to find a random item in the queue to splice with. If it fails, retries until it gets one.
/* Read the testcase into a new buffer. */
fd = open(target->fname, O_RDONLY);
if (fd < 0) PFATAL("Unable to open '%s'", target->fname);
new_buf = ck_alloc_nozero(target->len);
ck_read(fd, new_buf, target->len, target->fname);
close(fd);
Let’s load the target into the new_buf
buffer.
/* Find a suitable splicing location, somewhere between the first and
the last differing byte. Bail out if the difference is just a single
byte or so. */
locate_diffs(in_buf, new_buf, MIN(len, target->len), &f_diff, &l_diff);
This one compares two buffers in_buf
and new_buf
. Basically, we are
looking for splice locations in both buffers. This is a point where
these two buffers differ. f_diff
is the position where these two first
differs. l_diff
is the last position before len
where they
differ. They are set to -1
if the search fails.
if (f_diff < 0 || l_diff < 2 || f_diff == l_diff) {
ck_free(new_buf);
goto retry_splicing;
}
Retry if two buffers match everywhere or the difference is less than two bytes or differ only in the last byte.
/* Split somewhere between the first and last differing byte. */
split_at = f_diff + UR(l_diff - f_diff);
Select a random location after f_diff
for splicing.
/* Do the thing. */
len = target->len;
memcpy(new_buf, in_buf, split_at);
in_buf = new_buf;
Do the splicing.
ck_free(out_buf);
out_buf = ck_alloc_nozero(len);
memcpy(out_buf, in_buf, len);
goto havoc_stage;
}
#endif /* !IGNORE_FINDS */
Copy the new buffer to out_buf
and run havoc stage and we are done.
Is that all? Nope! Let’s revisit splice’ng related configurations in the above code.
} else {
static u8 tmp[32];
perf_score = orig_perf;
sprintf(tmp, "splice %u", splice_cycle);
stage_name = tmp;
stage_short = "splice";
stage_max = SPLICE_HAVOC * perf_score / havoc_div / 100;
}
The constant SPLICE_HAVOC
is defined to be 32 in
config.h. havoc_div
is always 1 not sure why !? Therefore, for
each perf_score
we get 32 rounds of mutations. What’s with this
perf_score
? There is an assignment at the beginning of fuzz_one()
:
orig_perf = perf_score = calculate_score(queue_cur);
Let’s visit the definition.
/* Calculate case desirability score to adjust the length of havoc fuzzing.
A helper function for fuzz_one(). Maybe some of these constants should
go into config.h. */
static u32 calculate_score(struct queue_entry* q) {
u32 avg_exec_us = total_cal_us / total_cal_cycles;
Average execution time provided by the calibration stage.
u32 avg_bitmap_size = total_bitmap_size / total_bitmap_entries;
Average bitmap size.
u32 perf_score = 100;
We start with an initial perf_score
of 100.
/* Adjust score based on execution speed of this path, compared to the
global average. Multiplier ranges from 0.1x to 3x. Fast inputs are
less expensive to fuzz, so we're giving them more air time. */
if (q->exec_us * 0.1 > avg_exec_us) perf_score = 10;
else if (q->exec_us * 0.25 > avg_exec_us) perf_score = 25;
else if (q->exec_us * 0.5 > avg_exec_us) perf_score = 50;
else if (q->exec_us * 0.75 > avg_exec_us) perf_score = 75;
else if (q->exec_us * 4 < avg_exec_us) perf_score = 300;
else if (q->exec_us * 3 < avg_exec_us) perf_score = 200;
else if (q->exec_us * 2 < avg_exec_us) perf_score = 150;
Adjust according to the execution time. For example, if it takes 10x
the average, set perf_score
to 10. Note that these values are
absolute.
/* Adjust score based on bitmap size. The working theory is that better
coverage translates to better targets. Multiplier from 0.25x to 3x. */
if (q->bitmap_size * 0.3 > avg_bitmap_size) perf_score *= 3;
else if (q->bitmap_size * 0.5 > avg_bitmap_size) perf_score *= 2;
else if (q->bitmap_size * 0.75 > avg_bitmap_size) perf_score *= 1.5;
else if (q->bitmap_size * 3 < avg_bitmap_size) perf_score *= 0.25;
else if (q->bitmap_size * 2 < avg_bitmap_size) perf_score *= 0.5;
else if (q->bitmap_size * 1.5 < avg_bitmap_size) perf_score *= 0.75;
Relative adjustment according to the bitmap_size
. They are not exact
but sane enough. For instance, 10/3 = 3.33 > 3
.
/* Adjust score based on handicap. Handicap is proportional to how late
in the game we learned about this path. Latecomers are allowed to run
for a bit longer until they catch up with the rest. */
if (q->handicap >= 4) {
perf_score *= 4;
q->handicap -= 4;
} else if (q->handicap) {
perf_score *= 2;
q->handicap--;
}
These are to fix the latecomers as stated in the comments.
/* Final adjustment based on input depth, under the assumption that fuzzing
deeper test cases is more likely to reveal stuff that can't be
discovered with traditional fuzzers. */
switch (q->depth) {
case 0 ... 3: break;
case 4 ... 7: perf_score *= 2; break;
case 8 ... 13: perf_score *= 3; break;
case 14 ... 25: perf_score *= 4; break;
default: perf_score *= 5;
}
Relative adjustments according to the queue depth.
/* Make sure that we don't go over limit. */
if (perf_score > HAVOC_MAX_MULT * 100) perf_score = HAVOC_MAX_MULT * 100;
return perf_score;
}
HAVOC_MAX_MULT
is defined in config.h to be 16
. The last
predicate checks if perf_score
is larger than 1600. If that’s the
case, the upper bound is assigned. This concludes the scoring
algorithm.
After all stages are complete, fuzz_one()
ends with cleanup.
ret_val = 0;
abandon_entry:
splicing_with = -1;
/* Update pending_not_fuzzed count if we made it through the calibration
cycle and have not seen this entry before. */
if (!stop_soon && !queue_cur->cal_failed && !queue_cur->was_fuzzed) {
queue_cur->was_fuzzed = 1;
pending_not_fuzzed--;
if (queue_cur->favored) pending_favored--;
}
munmap(orig_in, queue_cur->len);
if (in_buf != orig_in) ck_free(in_buf);
ck_free(out_buf);
ck_free(eff_map);
return ret_val;
#undef FLIP_BIT
}
Conclusion
Wow! Never thought we’ll come this far. What we have learned in this
chapter? Let’s have a recap. Bitflip stage flips consequtive bits
with variants 1,2,4,8,16,32 bits. Arithmetic stage adds or subtracts
values in [0..35]
and it has 8,16,32 bits variants. Interesting
stage tries achieve some kind of wierd behaviour by replacing values
with constants. It has 8, 16 and 32-bit variants. Stage extras
acquires data from a dictionary provided by the user and it has two
variants, one overwrites, the other inserts. Similarly, auto extras
stage follows a similar path however the dictionary is a dynamic one
auto-collected during the first bitflip stage. These are all
deterministic stages other than one of the extras stages. That
particular stage runs probabilisticly depending on the number of items
in the dictionary. Havoc and Splice stages run totally
probabilisticly by applying randomly selected mutations. The number of
stages in Splice stage depends on perf_score
and it is calculated
for each query item independently.
Hope you enjoyed it.