2019년 1월 27일 일요일

[How Syzkaller Works] 02 - Terminology & Algorithm




:) Goal


- Understanding terminology used in syzkaller
- Understanding main fuzzing loop algorithm of syzkaller


:) Terminology


- Terminology that we'll consider is what is used in syzkaller.  But It's not limited to syzkaller, It's general terminology in fuzzing.

1. Program


    - The sequence of syscalls in userspace.
    - For example, Suppose that we want to fuzz ioctl().
      Since ioctl() will be failed if there is no prior call to open(), We must call open() first.
      It means that we should use the sequence of syscalls even if what we want to fuzz is just one syscall.
      Then, the sequence would be as follows. and this sequence is called Program.

fd = open("/dev/dev");
ioctl(fd, ..., ...);
              < Program for fuzzing ioctl() >

2. Generation


    - "Generation" is generating a random, new Program.
    - How this Generation works?  For understanding it, we should look at additional terms used in Generation.
       - Allowed Calls :  Program can only be consist of Allowed Calls.
                                  In other words, It's the set of syscalls that Program allows to run.
       - The number of Calls :   The number of syscalls in the Program.

    - Syzkaller does Generation with following algorithm.
      1) Generating Allowed Calls from the current state of Program.
      2) Put Allowed Calls into Program.
      3) If Program's number of call is smaller than The number of Calls,  Repeat.

    - The code snippet for that is as below. (code location :  prog/generation.go)
// Generate generates a random program of length ~ncalls.
// calls is a set of allowed syscalls, if nil all syscalls are used.
func (target *Target) Generate(rs rand.Source, ncalls int, ct *ChoiceTable) *Prog {
    p := &Prog{
        Target: target,
    } 
    r := newRand(target, rs)
    s := newState(target, ct)
    for len(p.Calls) < ncalls {
        calls := r.generateCall(s, p)
        for _, c := range calls {
            s.analyze(c)
            p.Calls = append(p.Calls, c)
        }
    }
    p.debugValidate()
    return p
}


3. Mutation


- "Mutation" mutates it from the generated "Program".
  Mutation may insert/remove a syscall, change the call args, or splice from the other corpus.

- Suppose that the Program is as below.
----------------------------
fd = open("/dev/test");
ioctl(fd, CMD0, arg); 
----------------------------
next, Do fuzzing with this Program, next mutate from it.

----------------------------
fd = open("/dev/test");
ioctl(fd, CMD1, arg); 
----------------------------
In this case, the call arg has modified by the mutation.
next, continues to mutate it.

----------------------------
fd = open("/dev/test");
ioctl(fd, CMD1, arg); 
ioctl(fd, CMD2, arg);
----------------------------
In this case, A new syscall has inserted by the mutation.

- See the code of syzkaller to understand how Mutation works. (code location :  prog/mutation.go)
  This code determine what it does as a Mutation, based on probability.
func (p *Prog) Mutate(rs rand.Source, ncalls int, ct *ChoiceTable, corpus []*Prog) {
...
for stop, ok := false, false; !stop; stop = ok && r.oneOf(3) {
switch {
    case r.oneOf(5):
        ok = ctx.squashAny()
    case r.nOutOf(1, 100):
        ok = ctx.splice()   // splice a corpus to Program (p)
    case r.nOutOf(20, 31):
        ok = ctx.insertCall()   // insert a syscall to Program (p)
    case r.nOutOf(10, 11):
        ok = ctx.mutateArg()  // change the arg of syscall
    default:
        ok = ctx.removeCall()  // remove a syscall from Program (p)
 }
}


4. Corpus


- Corpus is a set of Program. (An 1-dimensional array that has Program as an element)
- Recall what we looked at in previous posting related to coverage-guided fuzzing.
   If the coverage increased when we executed a Program, the Program is going to add to Corpus.
   In other words, Corpus could be extended according to changes of coverage, as below.

   < When the Fuzzer first executes Program >  - Corpus {}
   < Coverage increased > - Corpus { p }
   < Coverage increase > - Corpus {p, p2}  (p2 is the version that combining p1 with an interesting input that changes coverage)

- See the function "triageInput" in syz-fuzzer/proc.go.
  
func (proc *Proc) triageInput(item *WorkTriage) {
...  // Investigation -  Has this input changed coverage?
...
    item.p, item.call = prog.Minimize(item.p, item.call, false,  // Minimize the Program.
...
    proc.fuzzer.addInputToCorpus(item.p, inputSignal, sig)   // If it does, addInputToCorpus!!
....


:) Fuzzing Loop


- We can find the main fuzzing loop of syzkaller in syz-fuzzer/proc.go - loop()
  Look at more detail of loop() in the view of source code.

func (proc *Proc) loop() {
...
    for i := 0; ; i++ {   // infinite loop
    item := proc.fuzzer.workQueue.dequeue()
    if item != nil {
    switch item := item.(type) {
    case *WorkTriage:         // WorkTriage means that we should investigate the coverage here, and add the input to Corpus.
       proc.triageInput(item)
   case *WorkCandidate:   //  WorkCandidate are Program for hub.  It's handled as locally generated/mutated program.
       proc.execute(proc.execOpts, item.p, item.flags, StatCandidate)
    case *WorkSmash:       // WorkSmash are Program just added to Corpus.
        proc.smashInput(item)
    default:
        log.Fatalf("unknown work type: %#v", item)
    }
    continue

    ct := proc.fuzzer.choiceTable
    corpus := proc.fuzzer.corpusSnapshot()
    if len(corpus) == 0 || i%generatePeriod == 0 {
        // Generate a new prog.
        p := proc.fuzzer.target.Generate(proc.rnd, programLength, ct)   // Generate a new Program
        log.Logf(1, "#%v: generated", proc.pid)
        proc.execute(proc.execOpts, p, ProgNormal, StatGenerate)  // Execute Program
    } else {
        // Mutate an existing prog.
        p := corpus[proc.rnd.Intn(len(corpus))].Clone()
        p.Mutate(proc.rnd, programLength, ct, corpus)   // Mutate an existing Program with current Corpus.
        log.Logf(1, "#%v: mutated", proc.pid)
        proc.execute(proc.execOpts, p, ProgNormal, StatFuzz)   // Execute Program
    }
}

- and here is the code of execute().
func (proc *Proc) execute(execOpts *ipc.ExecOpts, p *prog.Prog, flags ProgTypes, stat Stat) *ipc.ProgInfo {
   info := proc.executeRaw(execOpts, p, stat)   // Execute!!
   ....
   proc.fuzzer.workQueue.enqueue(&WorkTriage{   // Add WorkTriage getting from the executeRaw() into the workQueue.
   p: p.Clone(),
   call: callIndex,
   info: info,
   flags: flags,
   })
   return info
}
  

- If we just start to fuzz, the code to generate a new program will be executed.
  and then, mutate from the first program, and execute the mutated program.
  and the WorkTriage is added into workQueue, It contains some information about coverage.
  next, tirageInput() will investigate the WorkTriage, and It determines whether or not add the input to the Corpus. 


:) References



2019년 1월 25일 금요일

[How Syzkaller Works] 01 - Coverage-guided fuzzing & KCOV



:) Goal


- Take a look at what coverage-guided fuzzing is.
- How syzkaller utilizes the concept of coverage-guided fuzzing.
- Look at the detail of KCOV that supports syzkaller to achieve coverage-guided fuzzing.


:) Coverage-guided fuzzing


- What is the coverage-guided fuzzing? Before to answer that, recall why the fuzzer is for.
  That is, finding a crash in the program or kernel.
  We don't know where a bug in, so that we should attempt to execute almost all pieces of code in the program to find a bug.

- Take a look at below example, figure 1.

static int func_c(int a, int b, int c)  // internal function
{
    return c / b + a;   // vulnerable to divided by zero.
}

static int func_b(int a, int b, int c) // internal function
{
    if (b == 20)
        return func_(a, b, c);
    return 0;
}

 // exported function. fuzzer is only able to call this function.
void func_a(int a, int b, int c) 
{
    int t;
    if (a == 10) {
         t = func_b(a, b, c);
         printf("%d\n", t);
    }
    return;
}
            < figure 1 >

- The fuzzer is only able to call func_a() because other functions are internal function.
and the bug is in func_c(),  It could be vulnerable to divided by zero, Because an attacker can control both c and b, therefore c / b could turn out to be divided by zero by an attacker.

- Suppose that the code of figure1 is sort of black-box. That is, The fuzzer can't see the code, can see the function type that takes int a, int b, int c as arguments. In this situation, How does it make the fuzzer reach to the bug in func_c()?

- Let's check it out first the simplest algorithm.

  1) Random fuzzing

      1-1) Select a, b, c in a random manner.
      1-2) Call func_a(random-a, random-b, random-c)
      1-3) Has the bug found? If not, go to 1-1), and repeat again.
      - What do you think of it?  Search space for this is 2^32 * 2^32 * 2^32 = 2^96...  Not reasonable space..
      - With this sort of random fuzzing, We may not be able to find a bug.
      - Then, What is the better one?

  2) Coverage-guided fuzzing

      2-1) Select a, b, c in a random manner.
      2-2) Call func_a(random-a, random-b, random-c)  ==> exactly same to previous one so far.
      2-3) Has the coverage increased?
              Suppose that we called func_a(10, 90, 100);
              Then, "if (a == 10) {" ==> Since We are able to in this branch,  It makes the coverage increase.
      2-4) If the coverage is increased, store the context of call we did,  and test it with the context.
             (It is usually called Regression Testing)
             For example, The next step of selection of a could not be done, because a could be fixed to 10.
             --> func_a(10, random-b, random-c)
      - If we keep going to fuzz with this way,  the search space would be 2^32 + 2^32 + 2^32 = 2^35!!!
      - As we've seen, The Coverage-guided fuzzing makes the fuzzer find a bug significantly faster than Random fuzzing.

  3) Coverage-guided fuzzing + API template

      - Is there any better way than Coverage-guided fuzzing?  How about consider to use API template?
      - Suppose that the func_a() has a template. 
         func_a(int a [10, 20, 30],  int b [10, 20, 30],  int c [all integer])
         - The above template means that a must be one of 10, 20, 30, and b must be one of 10, 20, 30.
           and c has no limitation that which value can be in.
         - and combine Coverage-guided fuzzing with this template,
           then the search space would be 3 + 3 + 2^32 ~= 2^32!!
      - Syzkaller is taking advantage of both coverage-guided fuzzing and system call template for effective fuzzing.


:) KCOV


- We've seen why coverage-guided fuzzing is needed. then the next question is,
  - How to record the coverage of kernel?
  - and,  How to make the fuzzer know whether or not the coverage is increased?

- To respond above two questions,  KCOV has come to Linux kernel. In other words, KCOV does
  - Instrumentation for recording the coverage.
  - Exporting the record of coverage via debugfs.

1) Instrumentation for recording the coverage


- For this to work, two components are involved.
   1-1) First, is SANCOV_PLUGIN as implemented as GCC plugin.
         - Kernel Code :  ./scripts/gcc-plugins/sancov_plugin.c
         - This plugin does simple task, inserts __sanitizer_cov_trace_pc() call at the start of all basic blocks in Linux kernel.
            In other words, What the coverage has increased means that the more number of basic blocks has executed.
   1-2) It's turn to see inside of __sanitizer_cov_trace_pc().
         - Kernel Code :  ./kernel/kcov.c
         - That does simple task too.  This function stores the code address to coverage buffers shared with user space.
           In the view of this function itself, the code address would be return address. (_RET_IP_)

- Are they instrumenting all?
  - No, KCOV has come to Linux kernel for supporting Syzkaller.  and Syzkaller aims to test input of syscall.
    Thus It does not collect coverage in hard/soft interrupt, and inherently non-deterministic or non-interesting parts in kernel (e.g. locking) is disabled.

2) Exporting the record of coverage via debugfs


- The fuzzer or user can send command to KCOV through debugfs-exported virtual file.
  - The file for that communication is "/sys/kernel/debug/kcov"
- Check it out [2] to know how to send command, and how to retrieve the result of coverage collection.


:) Example code to understand how to use KCOV


- See [3] to get details.


:) References