All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Borkmann <daniel@iogearbox.net>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	Jakub Kicinski <kubakici@wp.pl>
Cc: netdev@vger.kernel.org, Brenden Blanco <bblanco@plumgrid.com>,
	Thomas Graf <tgraf@suug.ch>, Wangnan <wangnan0@huawei.com>,
	He Kuang <hekuang@huawei.com>,
	kernel-team@fb.com
Subject: Re: bpf debug info
Date: Tue, 29 Nov 2016 21:23:10 +0100	[thread overview]
Message-ID: <583DE3AE.8030103@iogearbox.net> (raw)
In-Reply-To: <20161129185116.GA22581@ast-mbp.thefacebook.com>

On 11/29/2016 07:51 PM, Alexei Starovoitov wrote:
> On Tue, Nov 29, 2016 at 03:38:18PM +0000, Jakub Kicinski wrote:
[...]
>>>> So next step is to improve verifier messages to be more human friendly.
>>>> The step after is to introduce BPF_COMMENT pseudo instruction
>>>> that will be ignored by the interpreter yet it will contain the text
>>>> of original source code. Then llvm-objdump step won't be necessary.
>>>> The bpf loader will load both instructions and pieces of C sources.
>>>> Then verifier errors should be even easier to read and humans
>>>> can easily understand the purpose of the program.
>>>
>>> So the BPF_COMMENT pseudo insn will get stripped away from the insn array
>>> after verification step, so we don't need to hold/account for this mem? I
>>> assume in it's ->imm member it will just hold offset into text blob?
>>
>> Associating any form of opaque data with programs always makes me
>> worried about opening a side channel of communication with a specialized
>> user space implementations/compilers.  But I guess if the BPF_COMMENTs
>> are stripped in the verifier as Daniel assumes drivers and JITs will
>> never see it.
>
> yes. the idea that it's a comment. It can contain any text,
> not only C code, but any other language.
> It's definitely going to be stripped before JITs and kernel will
> not make any safety or translation decisions based on such comment.
>
>> Just to clarify, however - is there any reason why pushing the source
>> code into the kernel is necessary?  Or is it just for convenience?
>> Provided the user space loader has access to the debug info it should
>> have no problems matching the verifier output to code lines?
>
> correct. just for convenience. The user space has to keep .o around,
> since it can crash, would have to reload and so on.
> Only for some script that ssh-es into servers and wants to see
> what is being loaded, it might help to dump full asm and these comments
> along with prog_digest that Daniel is working on in parallel.

Which would mean we'd need to keep it around somewhere (prog aux data?)
in post-verification time (so potentially drivers/JITs could see it, too,
just not inside insn stream). Some API glue code could probably blind
this information for the JITing time to stop incentive of playing side
channel games (e.g. core code could encrypt the pointer value and only
core kernel knows how to access that data, no modules, no out-of-tree
code). The other thing I'm wondering is, when we strip this info anyway
from the insn stream to keep it in aux data (so it can later be reconstructed
on a dump), then perhaps that is best done before prog loading time? It
would then allow to keep complexity with stripping that insns out of the
verifier. If semantics are that these comments are acting as a hole/gap
(in a similar sense of what we have with cBPF today), then it can never
become a jmp target and loaders could strip it out already (instead of
teaching DFS, etc about it), and prepare a meta data structure in bpf_attr
for bpf(2), and verifier works based on that one. What makes this problematic
however is when you have rewrites in the kernel (ctx access, constant
blinding, etc), but perhaps they could just adjust the offsets from that
meta data thing as well?

> Alternatively instead of doing BPF_COMMENT we can load the whole .o
> as-is into bpffs as a blob. Later (based on digest) the kernel can
> dump such .o back for user space to run objdump on. It all can be
> done without kernel involvement. Like tc command can copy .o and so on.
> But not everything is using tc.

That means kernel must ensure/verify that loaded insns also come from
that claimed object file; not sure if easily possible w/o parsing elf.
It could work if the kernel loads everything based on the content of
the object file itself, but that is not possible anymore since we have
bpf(2) already for doing that (but also would add a lot of complexity).

> Another alternative is to do a decompiler from bpf binary
> into meaningful C code. It's not trivial and names will be lost.
> bpf_comment approach is pretty cheap from kernel point of view
> and greatly helps visibility when users don't cheat with debug info.

Agree, it's non-trivial, would be really nice to have, though, so also
non-cooperative -g users could be better examined.

Thanks,
Daniel

  reply	other threads:[~2016-11-29 20:23 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-29  6:42 bpf debug info Alexei Starovoitov
2016-11-29 15:11 ` Daniel Borkmann
2016-11-29 15:38   ` Jakub Kicinski
2016-11-29 18:51     ` Alexei Starovoitov
2016-11-29 20:23       ` Daniel Borkmann [this message]
2016-11-30  0:28         ` Alexei Starovoitov
2016-11-29 17:01   ` Alexei Starovoitov
2016-12-06 15:12     ` Arnaldo Carvalho de Melo
  -- strict thread matches above, loose matches on Subject: below --
2016-12-13 19:38 Alexei Starovoitov
2016-12-13 19:55 ` Daniel Borkmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=583DE3AE.8030103@iogearbox.net \
    --to=daniel@iogearbox.net \
    --cc=alexei.starovoitov@gmail.com \
    --cc=bblanco@plumgrid.com \
    --cc=hekuang@huawei.com \
    --cc=kernel-team@fb.com \
    --cc=kubakici@wp.pl \
    --cc=netdev@vger.kernel.org \
    --cc=tgraf@suug.ch \
    --cc=wangnan0@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.