2019년 1월 25일 금요일

[How Syzkaller Works] 01 - Coverage-guided fuzzing & KCOV



:) Goal


- Take a look at what coverage-guided fuzzing is.
- How syzkaller utilizes the concept of coverage-guided fuzzing.
- Look at the detail of KCOV that supports syzkaller to achieve coverage-guided fuzzing.


:) Coverage-guided fuzzing


- What is the coverage-guided fuzzing? Before to answer that, recall why the fuzzer is for.
  That is, finding a crash in the program or kernel.
  We don't know where a bug in, so that we should attempt to execute almost all pieces of code in the program to find a bug.

- Take a look at below example, figure 1.

static int func_c(int a, int b, int c)  // internal function
{
    return c / b + a;   // vulnerable to divided by zero.
}

static int func_b(int a, int b, int c) // internal function
{
    if (b == 20)
        return func_(a, b, c);
    return 0;
}

 // exported function. fuzzer is only able to call this function.
void func_a(int a, int b, int c) 
{
    int t;
    if (a == 10) {
         t = func_b(a, b, c);
         printf("%d\n", t);
    }
    return;
}
            < figure 1 >

- The fuzzer is only able to call func_a() because other functions are internal function.
and the bug is in func_c(),  It could be vulnerable to divided by zero, Because an attacker can control both c and b, therefore c / b could turn out to be divided by zero by an attacker.

- Suppose that the code of figure1 is sort of black-box. That is, The fuzzer can't see the code, can see the function type that takes int a, int b, int c as arguments. In this situation, How does it make the fuzzer reach to the bug in func_c()?

- Let's check it out first the simplest algorithm.

  1) Random fuzzing

      1-1) Select a, b, c in a random manner.
      1-2) Call func_a(random-a, random-b, random-c)
      1-3) Has the bug found? If not, go to 1-1), and repeat again.
      - What do you think of it?  Search space for this is 2^32 * 2^32 * 2^32 = 2^96...  Not reasonable space..
      - With this sort of random fuzzing, We may not be able to find a bug.
      - Then, What is the better one?

  2) Coverage-guided fuzzing

      2-1) Select a, b, c in a random manner.
      2-2) Call func_a(random-a, random-b, random-c)  ==> exactly same to previous one so far.
      2-3) Has the coverage increased?
              Suppose that we called func_a(10, 90, 100);
              Then, "if (a == 10) {" ==> Since We are able to in this branch,  It makes the coverage increase.
      2-4) If the coverage is increased, store the context of call we did,  and test it with the context.
             (It is usually called Regression Testing)
             For example, The next step of selection of a could not be done, because a could be fixed to 10.
             --> func_a(10, random-b, random-c)
      - If we keep going to fuzz with this way,  the search space would be 2^32 + 2^32 + 2^32 = 2^35!!!
      - As we've seen, The Coverage-guided fuzzing makes the fuzzer find a bug significantly faster than Random fuzzing.

  3) Coverage-guided fuzzing + API template

      - Is there any better way than Coverage-guided fuzzing?  How about consider to use API template?
      - Suppose that the func_a() has a template. 
         func_a(int a [10, 20, 30],  int b [10, 20, 30],  int c [all integer])
         - The above template means that a must be one of 10, 20, 30, and b must be one of 10, 20, 30.
           and c has no limitation that which value can be in.
         - and combine Coverage-guided fuzzing with this template,
           then the search space would be 3 + 3 + 2^32 ~= 2^32!!
      - Syzkaller is taking advantage of both coverage-guided fuzzing and system call template for effective fuzzing.


:) KCOV


- We've seen why coverage-guided fuzzing is needed. then the next question is,
  - How to record the coverage of kernel?
  - and,  How to make the fuzzer know whether or not the coverage is increased?

- To respond above two questions,  KCOV has come to Linux kernel. In other words, KCOV does
  - Instrumentation for recording the coverage.
  - Exporting the record of coverage via debugfs.

1) Instrumentation for recording the coverage


- For this to work, two components are involved.
   1-1) First, is SANCOV_PLUGIN as implemented as GCC plugin.
         - Kernel Code :  ./scripts/gcc-plugins/sancov_plugin.c
         - This plugin does simple task, inserts __sanitizer_cov_trace_pc() call at the start of all basic blocks in Linux kernel.
            In other words, What the coverage has increased means that the more number of basic blocks has executed.
   1-2) It's turn to see inside of __sanitizer_cov_trace_pc().
         - Kernel Code :  ./kernel/kcov.c
         - That does simple task too.  This function stores the code address to coverage buffers shared with user space.
           In the view of this function itself, the code address would be return address. (_RET_IP_)

- Are they instrumenting all?
  - No, KCOV has come to Linux kernel for supporting Syzkaller.  and Syzkaller aims to test input of syscall.
    Thus It does not collect coverage in hard/soft interrupt, and inherently non-deterministic or non-interesting parts in kernel (e.g. locking) is disabled.

2) Exporting the record of coverage via debugfs


- The fuzzer or user can send command to KCOV through debugfs-exported virtual file.
  - The file for that communication is "/sys/kernel/debug/kcov"
- Check it out [2] to know how to send command, and how to retrieve the result of coverage collection.


:) Example code to understand how to use KCOV


- See [3] to get details.


:) References




댓글 없음:

댓글 쓰기