linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Borislav Petkov <bp@alien8.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>, X86 ML <x86@kernel.org>,
	Andy Lutomirski <luto@amacapital.net>,
	Peter Zijlstra <peterz@infradead.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 5/9] x86/dumpstack: Improve opcodes dumping in the Code: section
Date: Thu, 8 Mar 2018 15:20:58 -0800	[thread overview]
Message-ID: <CA+55aFwpXcoXJEPGShDSjovA7X20qc_iMioD129BSzGmNiQNaw@mail.gmail.com> (raw)
In-Reply-To: <20180308223632.GA15694@pd.tnic>

On Thu, Mar 8, 2018 at 2:36 PM, Borislav Petkov <bp@alien8.de> wrote:
>
> Btw, do we have any explanation for the two-thirds prologue? I dug it
> out to the patch below but it doesn't say why the prologue being bigger
> is more important than the epilogue.

Yes, so the reason the prologue is more important is that there's
really two cases for the "Code:" line:

 (a) it doesn't matter at all, because you have the kernel image
anyway, and can just get the code from there based on the faulting
instruction pointer address.

 (b) it ends up helping you match things up roughly, because maybe you
have a different compiler version and different code generation
(and/or maybe just different config options)

In the (a) case, the code doesn't matter at all, and you might as well
just look at the whole function by just doing "gdb vmlinux" and then
disassemble it. You have everything just from %rip.

In the (b) case, there are two reasons to give code "around" the
faulting instruction pointer:

 - to help match up when you compare your not-identical vmlinux sequence

 - to actually figure out what the register allocation in the faulting
kernel was, so that you can make sense of the register dump, because
your local kernel might have completely different register allocaiton.

And for that *second* case, the instructions that _precede_ the
faulting instructions are generally much more important than the
instructions that follow the faulting instruction.

The instructions that *follow* the faulting instructions generally
have little or no relevance for the register dump (obviously loops etc
_can_ make them  matter, but you see the rough point).

So the instructions that precede the faulting instruction are in many
ways much more relevant than the instructions that follow it. So you
want more of those.

Also, the instructions that precede the faulting instruction have
another issue - with variable-length instructions, it's hard to sync
up things. Even if we were to use a disassembler (which we almost
certainly don't at fault time), walking backwards can easily be
ambiguous.

So you want extra "slop" in the bytes that get dumped before the
instruction, because you don't know quite where the instruction
boundaries may be.  In contrast, the faulting instruction itself - and
the instrucvtions that follow it - you can tell where the instruction
boundaries are (unless you had an exception due to a corrupted
indirect branch, which does happen but is quite rare).

So there's two reasons for why the bytes _before_ the instruction are
more important than the bytes _after_ the instruction: the register
state is more relevant for them, and the slop for decoding.

The "two thirds" then just comes from that. It's just a random
estimate of "instruction bytes that precede the faulting instructions
are more important than later ones".

In fact, to some degree the "one third" can be seen as "the faulting
instruction itself needs some space".

An x86 instruction can be at most 15 bytes, and you do likely want at
least that. Any bytes after the first instruction aren't _useless_
(because they definitely can help line things up when you have
slightly different code generation due to compiler versions etc), so
"at least 15 bytes, and maybe a bit more" again makes that 2/3rds
thing work out well when you print out 64 bytes.

                      Linus

PS. Historical side note:

We haven't always printed out that much. This is Linux-0.01:

        for(i=0;i<10;i++)
                printk("%02x ",0xff & get_seg_byte(esp[1],(i+(char *)esp[0])));

so it just printed out 10 bytes, basically just the faulting
instruction (and not even all of it if it has a lot of immediates).

We eventually extended the 10 bytes into 20 bytes, again starting at
the existing instruction, because if you don't print out a lot, you
need to start at the faulting instruction itself because of slop.

The expansion from 20 to 64 bytes happened in 2004 in v2.6.9. And
that's also when we started dumping preceding instructions, because
once you print out a fair chunk of bytes, that's when "preceding
instructions are more important" comes in. Because before that, you
want to guarantee that you at least print out the faulting
instruction.

  reply	other threads:[~2018-03-08 23:21 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-06  9:49 [PATCH 0/9] x86/dumpstack: Cleanups and user opcode bytes Code: section, v1 Borislav Petkov
2018-03-06  9:49 ` [PATCH 1/9] panic: Add closing panic marker parenthesis Borislav Petkov
2018-03-08 11:03   ` [tip:core/core] " tip-bot for Borislav Petkov
2018-03-06  9:49 ` [PATCH 2/9] x86/fault: Do not print IP in show_fault_oops() Borislav Petkov
2018-03-08 11:09   ` [tip:x86/cleanups] " tip-bot for Borislav Petkov
2018-03-06  9:49 ` [PATCH 3/9] x86/dumpstack: Unify show_regs() Borislav Petkov
2018-03-08 11:10   ` [tip:x86/cleanups] " tip-bot for Borislav Petkov
2018-03-06  9:49 ` [PATCH 4/9] x86/dumpstack: Carve out Code: dumping into a function Borislav Petkov
2018-03-06  9:49 ` [PATCH 5/9] x86/dumpstack: Improve opcodes dumping in the Code: section Borislav Petkov
2018-03-06 18:47   ` Linus Torvalds
2018-03-07 10:13     ` Borislav Petkov
2018-03-07 13:25       ` Josh Poimboeuf
2018-03-07 14:16         ` Borislav Petkov
2018-03-07 21:08         ` Linus Torvalds
2018-03-08 10:16           ` Borislav Petkov
2018-03-08 18:00             ` Linus Torvalds
2018-03-08 22:36               ` Borislav Petkov
2018-03-08 23:20                 ` Linus Torvalds [this message]
2018-03-09 10:15                   ` Borislav Petkov
2018-03-06  9:49 ` [PATCH 6/9] x86/dumpstack: Add loglevel argument to show_opcodes() Borislav Petkov
2018-03-06  9:49 ` [PATCH 7/9] x86/fault: Dump user opcode bytes on fatal faults Borislav Petkov
2018-03-06  9:49 ` [PATCH 8/9] x86/dumpstack: Add a show_ip() function Borislav Petkov
2018-03-06  9:49 ` [PATCH 9/9] x86/dumpstack: Save first regs set for the executive summary Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CA+55aFwpXcoXJEPGShDSjovA7X20qc_iMioD129BSzGmNiQNaw@mail.gmail.com \
    --to=torvalds@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=jpoimboe@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=peterz@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).