public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Alexander Graf <graf@amazon.com>
To: <samcacc@amazon.com>, Sam Caccavale <samcacc@amazon.de>
Cc: <samcaccavale@gmail.com>, <nmanthey@amazon.de>,
	<wipawel@amazon.de>, <dwmw@amazon.co.uk>, <mpohlack@amazon.de>,
	<graf@amazon.de>, <karahmed@amazon.de>,
	<andrew.cooper3@citrix.com>, <JBeulich@suse.com>,
	<pbonzini@redhat.com>, <rkrcmar@redhat.com>, <tglx@linutronix.de>,
	<mingo@redhat.com>, <bp@alien8.de>, <hpa@zytor.com>,
	<paullangton4@gmail.com>, <anirudhkaushik@google.com>,
	<x86@kernel.org>, <kvm@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 2/3] Emulate simple x86 instructions in userspace
Date: Fri, 21 Jun 2019 15:28:34 +0200	[thread overview]
Message-ID: <c5fbed80-5933-eca3-001e-0e2aaccfcd1d@amazon.com> (raw)
In-Reply-To: <7e0188fa-351f-157b-2815-ab19222f44b4@amazon.com>


On 12.06.19 17:19, samcacc@amazon.com wrote:
> On 5/31/19 10:38 AM, Alexander Graf wrote:
>> On 21.05.19 17:39, Sam Caccavale wrote:
>>
>>> +static void dump_state_after(const char *desc, struct state *state)
>>> +{
>>> +    debug(" -- State after %s --\n", desc);
>>> +    debug("mode: %s\n", x86emul_mode_string[state->ctxt.mode]);
>>> +    debug(" cr0: %lx\n", state->vcpu.cr[0]);
>>> +    debug(" cr3: %lx\n", state->vcpu.cr[3]);
>>> +    debug(" cr4: %lx\n", state->vcpu.cr[4]);
>>> +
>>> +    debug("Decode _eip: %lu\n", state->ctxt._eip);
>>> +    debug("Emulate eip: %lu\n", state->ctxt.eip);
>>> +
>>> +    debug("\n");
>>>    }
>>>      int step_emulator(struct state *state)
>>>    {
>>> -    return 0;
>>> +    int rc, prev_eip = state->ctxt.eip;
>>> +    int decode_size = state->data_available - decode_offset;
>>> +
>>> +    if (decode_size < 15) {
>>> +        rc = x86_decode_insn(&state->ctxt, &state->data[decode_offset],
>>> +                     decode_size);
>>> +    } else {
>>> +        rc = x86_decode_insn(&state->ctxt, NULL, 0);
>>
>> Isn't this going to fetch instructions from data as well? Why do we need
>> the < 15 special case at all?
>>
> I've changed the method of acquiring data in v2, but the 15 limit is
> still relevant.  If x86_decode_insn is called with a NULL pointer and
> instruction size 0, the bytes are fetched via the emulator_ops.fetch
> function.  This would be nice, but there is no way of limiting how many
> bytes it will try and fetch-- and it usually grabs 15 since that is the
> longest x86 instruction (as of yet?).  When there are less than 15 bytes
> left, limiting the fetch size to the remaining bytes is important.


You want to at least add a comment here, detailing the fact that where 
the magic 15 comes from and that you want to exercise the normal 
prefetch path while still allowing the buffer to shrink < 15 bytes :). 
Maybe move MAX_INST_SIZE from svm.c into a .h file and reuse that while 
at it.


[...]


>>> diff --git a/tools/fuzz/x86_instruction_emulation/scripts/bin_fuzz
>>> b/tools/fuzz/x86_instruction_emulation/scripts/bin_fuzz
>>> new file mode 100755
>>> index 000000000000..e570b17f9404
>>> --- /dev/null
>>> +++ b/tools/fuzz/x86_instruction_emulation/scripts/bin_fuzz
>>> @@ -0,0 +1,23 @@
>>> +#!/bin/bash
>>> +# SPDX-License-Identifier: GPL-2.0+
>>> +# This runs the afl-harness at $1, $2 times (or 100)
>>> +# It runs uniq and sorts the output to give an idea of what is
>>> causing the
>>> +# most crashes.  Useful for deciding what to implement next.
>>> +
>>> +if [ "$#" -lt 1 ]; then
>>> +  echo "Usage: './bin_fuzz path_to_afl-harness [number of times to run]"
>>> +  exit
>>> +fi
>>> +
>>> +mkdir -p fuzz
>>> +rm -f fuzz/*.in fuzz/*.out
>>> +
>>> +for i in $(seq 1 1 ${2:-100})
>>> +do
>>> +  {
>>> +  head -c 500 /dev/urandom | tee fuzz/$i.in | ./$1
>>> +  } > fuzz/$i.out 2>&1
>>> +
>>> +done
>>> +
>>> +find ./fuzz -name '*.out' -exec tail -1 {} \; | sed 's/.*
>>> Segmen/Segman/' | sed -r 's/^(\s[0-9a-f]{2})+$/misc instruction
>>> output/' | sort | uniq -c | sort -rn
>>
>> What is that Segman thing about?
>>
> This was for binning crashes-- check `tools/fuzz/x86ie/scripts/bin.sh`
> in v2 for the updated version.  Basically, it checks whether a
> segmentation fault has happened, and if so, launches a gdb session to
> see whether it was caused by an unimplemented x86_emulator_op.  This is
> useful in development for prioritizing the unimplemented features which
> are causing the most fake crashes.


I can see why you want to combine them, but I don't understand where 
"Segman" comes from. Where is there a man here?



Alex


  reply	other threads:[~2019-06-21 13:28 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-21 15:39 x86 instruction emulator fuzzing Sam Caccavale
2019-05-21 15:39 ` [PATCH 1/3] Build target for emulate.o as a userspace binary Sam Caccavale
2019-05-31  8:02   ` Alexander Graf
2019-06-12 15:19     ` samcacc
2019-05-21 15:39 ` [PATCH 2/3] Emulate simple x86 instructions in userspace Sam Caccavale
2019-05-31  8:38   ` Alexander Graf
2019-06-12 15:19     ` samcacc
2019-06-21 13:28       ` Alexander Graf [this message]
2019-05-21 15:39 ` [PATCH 3/3] Demonstrating unit testing via simple-harness Sam Caccavale
2019-05-31  8:39 ` x86 instruction emulator fuzzing Alexander Graf
2019-06-12 15:19   ` samcacc

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c5fbed80-5933-eca3-001e-0e2aaccfcd1d@amazon.com \
    --to=graf@amazon.com \
    --cc=JBeulich@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=anirudhkaushik@google.com \
    --cc=bp@alien8.de \
    --cc=dwmw@amazon.co.uk \
    --cc=graf@amazon.de \
    --cc=hpa@zytor.com \
    --cc=karahmed@amazon.de \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=mpohlack@amazon.de \
    --cc=nmanthey@amazon.de \
    --cc=paullangton4@gmail.com \
    --cc=pbonzini@redhat.com \
    --cc=rkrcmar@redhat.com \
    --cc=samcacc@amazon.com \
    --cc=samcacc@amazon.de \
    --cc=samcaccavale@gmail.com \
    --cc=tglx@linutronix.de \
    --cc=wipawel@amazon.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox