Re: [PATCH v2] panic: Taint kernel if fault injection has been used

BPF List
 help / color / mirror / Atom feed

From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
To: KP Singh <kpsingh@kernel.org>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	LKML <linux-kernel@vger.kernel.org>, bpf <bpf@vger.kernel.org>,
	Borislav Petkov <bp@alien8.de>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Kees Cook <keescook@chromium.org>,
	Josh Poimboeuf <jpoimboe@redhat.com>,
	Mark Rutland <mark.rutland@arm.com>,
	Florent Revest <revest@chromium.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Christoph Hellwig <hch@infradead.org>, Chris Mason <clm@meta.com>
Subject: Re: [PATCH v2] panic: Taint kernel if fault injection has been used
Date: Mon, 12 Dec 2022 00:14:31 +0900	[thread overview]
Message-ID: <20221212001431.ac84a6320cff8d5fa8aa943e@kernel.org> (raw)
In-Reply-To: <CACYkzJ72-hJweZoFN_YN8u3NOmp5x82M2xA-ZKBi5ubt6yrzZA@mail.gmail.com>

On Sun, 11 Dec 2022 08:49:01 +0100
KP Singh <kpsingh@kernel.org> wrote:

> On Sun, Dec 11, 2022 at 3:52 AM Masami Hiramatsu <mhiramat@kernel.org> wrote:
> >
> > Hi Alexei,
> >
> > On Wed, 7 Dec 2022 20:36:28 -0800
> > Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> >
> > > Yet for 2 days this 'taint' arguing is preventing people from looking at the bug.
> > > And that happens all the time on lkml. Somebody reports a bug and kernel devs
> > > jump on the poor person:
> > > "Can you repro without taint?",
> > > "Can you repro with upstream kernel?"
> > > This is discouraging.
> > > The 'taint' concept makes it easier for kernel devs to ignore bug reports
> > > and push back on the reporter.
> > > Do it few times and people stop reporting bugs.
> >
> > That seems off topic for me. You seems complained against the taint flag
> > itself.
> 
> The series is about adding a taint for, so discussing the user
> experience, when someone reports a "tainted crash" seems reasonable to
> me and not off topic.
> 
> >
> > > Say, this particular bug in rethook was found by one of our BPF CI developers.
> > > They're not very familiar with the kernel, but they can see plenty of 'rethook'
> > > references in the stack trace, lookup MAINTAINER file and ping Massami,
> > > but to the question "can you repro without taint?" they can only say NO,
> > > because this is how our CI works. So they will keep silence and the bug will be lost.
> >
> > BTW, this sounds like the BPF CI system design issue. If user is NOT easily
> > identifying what test caused the issue (e.g. what tests ran on the system
> > until the bug was found), the CI system is totally useless, because after
> > finding a problem, it must be investigated to solve the problem.
> >
> > Without investigation, how would you usually fix the bug??
> 
> Masami, this seems accusational and counter productive, it was never
> said that issues can be solved without investigation.

Let me apologies about my misunderstanding.  

> 
> The BPF CI does find issues, the BPF reviewers and maintainers
> regularly fix bugs using it. Alexei's point here is that a taint does
> not help in solving the problem, rather deter some people from even
> looking at it. (not BPF people, but other maintainers [distro, kernel]
> who would ask for a reproduction without a taint).

Hmm, that is a problem. Some taint flag should be useful hints
for finding the error patterns.

> Let's take a step back and focus on solving debuggability and
> introspection as we clearly have some perception issues about taints
> in the community. (distro maintainers, users) before we go and add
> more taints.

Agreed.

> > > That's not the only reason why I'm against generalizing 'taint'.
> > > Tainting because HW is misbehaving makes sense, but tainting because
> > > of OoO module or because of live-patching does not.
> > > It becomes an excuse that people abuse.
> >
> > yeah, it is possible to be abused. but that is the problem who
> > abuse it.
> 
> I am sorry, but it's our responsibility as developers to design
> features so that users don't face arduous pushbacks.

Sorry if I confuse you. I meant that taint flag abusing. :(

> 
> > > Right now syzbot is finding all sorts of bugs. Most of the time syzbot
> > > turns error injection on to find those allocation issues.
> > > If syzbot reports will start coming as tainted there will be even less
> > > attention to them. That will not be good.
> >
> > Hmm, what kind of error injection does syzbot do? I would like to know
> > how it is used. For example, does that use only a specify set of
> > injection points, or use all existing points?
> >
> > If the latter, I feel safer because syzbot ensures the current all
> > ALLOW_ERROR_INJECTION() functions will work with error injection. If not,
> > we need to consider removing the ALLOW_ERROR_INJECTION() from the
> > function which is not tested well (or add this taint flag.)
> >
> > Documentation/fault-injection/fault-injection.rst has no explanation
> > about ALLOW_ERROR_INJECTION(), but obviously the ALLOW_ERROR_INJECTION()
> > marked functions and its caller MUST be designed safely against the
> > error injection. e.g.
> >
> > - It must return an error code. (so EI_ETYPE_NONE must be removed)
> 
> This is already the case with BPF, the modify return trampolines
> further limits the error injection to functions that return errors.

OK, so I also should remove it from FEI.

> 
> > - Caller must check the return value always.
> >   (but I thought this was the reason why we need this test framework...)
> > - It should not run any 'effective' code before checking an error.
> >   For example, increment counter, call other functions etc.
> >   (this means it can return without any side-effect)
> 
> This is the case with modify_return trampolines in BPF which avoid
> side effects and limit the attachment surface further and avoiding
> side effects is a design goal. If we missed anything, let's fix that.
> 
> https://lwn.net/Articles/813724/

Yeah, if BPF tests already tested all ALLOW_ERROR_INJECTION() functions,
it may be checked already. I think we just need adding the above
explanation on the document, so that the people who will add additional
ALLOW_ERROR_INJECTION() on a function, can understand the limitation.

> 
> >
> > Anything else?
> >
> > [...]
> > > All these years we've been working on improving bpf introspection and
> > > debuggability. Today crash dumps look like this:
> > >   bpf_trace_printk+0xd3/0x170 kernel/trace/bpf_trace.c:377
> > >   bpf_prog_cf2ac6d483d8499b_trace_bpf_trace_printk+0x2b/0x37
> > >   bpf_dispatcher_nop_func include/linux/bpf.h:1082 [inline]
> > >   __bpf_prog_run include/linux/filter.h:600 [inline]
> > >   bpf_prog_run include/linux/filter.h:607 [inline]
> > >
> > > The 2nd from the top is a bpf prog. The rest are kernel functions.
> > > bpf_prog_cf2ac6d483d8499b_trace_bpf_trace_printk
> > >          ^^ is a prog tag   ^^ name of bpf prog
> > >
> > > If you do 'bpftool prog show' you can see both tag and name.
> > > 'bpftool prog dump jited'
> > > dumps x86 code mixed with source line text.
> > > Often enough +0x2b offset will have some C code right next to it.
> >
> > This is good, but this only works when the vmcore is dumped and
> > on the stack. My concern about the function error injection is
> > that makes some side effects, which can cause a problem afterwards
> > (this means after unloading the bpf prog)
> 
> I think careful choices need to be made on when error injection is
> allowed so that these situations don't occur. (as you mentioned in
> your comment). [1]. If a BPF program is unloaded, there is no error
> injection any more, let's ensure that we design the error injection
> allow list and the BPF logic to ensure this cannot happen.

OK. Actually, I trust the BPF logic itself will be handle this
correctly. I just concerned that some people who don't know much
(because it is not carefully documented) might add ALLOW_ERROR_INJECTION()
to a function which is not injectable by design. Thus I thought the
taint flag can help.
But if those are always Cc'd to bpf@vger and it will be tested by BPF
CI, I'm OK for that.

> > > One can monitor all prog load/unload via perf or via audit.
> 
> I would like us to focus on debuggability as it helps both the
> maintainers and the user. And I see a few things that need to be done:
> 
> 1. Revisit what is allowed for error injection in the kernel and if
> they can cause any subtle issues. My initial take is that functions
> that are directly called from syscall path should generally be okay.
> But let's check them for the patterns you mentioned.

Yeah, I agree that syscall entries should be safe.

> 2. If it helps, add the list of BPF modify return programs to stack
> traces. Although this is really needed if we don't do [1] properly.

Would you mean a list of enabled BPF programs in Oops code? If so,
I also want to add enabled FEI list on it.

> 3. Check if anything needs to be improved in the verification logic
> for modify return trampolines.

I think BPF logic itself is safe. But the targeted function itself
or the caller may not be designed for such error injection.
I think this is my fault that I have not documented about
ALLOW_ERROR_INJECTION() well. Sorry about that.

Thank you,


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

next prev parent reply	other threads:[~2022-12-11 15:14 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-04 22:22 [PATCH v2] panic: Taint kernel if fault injection has been used Masami Hiramatsu (Google)
2022-12-04 22:30 ` Alexei Starovoitov
2022-12-04 22:59   ` Masami Hiramatsu
2022-12-06  2:17     ` Alexei Starovoitov
2022-12-06  7:20       ` Masami Hiramatsu
2022-12-07  4:01         ` Alexei Starovoitov
2022-12-07  4:39           ` Steven Rostedt
2022-12-07  4:41             ` Steven Rostedt
2022-12-07  4:45             ` Alexei Starovoitov
2022-12-07  5:18               ` Steven Rostedt
2022-12-07 12:48               ` Steven Rostedt
2022-12-08  4:36                 ` Alexei Starovoitov
2022-12-08 14:59                   ` Steven Rostedt
2022-12-11  2:52                   ` Masami Hiramatsu
2022-12-11  7:49                     ` KP Singh
2022-12-11 15:14                       ` Masami Hiramatsu [this message]
2022-12-12 21:39                         ` KP Singh
2022-12-11 17:02                       ` Steven Rostedt
2022-12-12 21:43                         ` KP Singh
2022-12-12  0:53                       ` Masami Hiramatsu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221212001431.ac84a6320cff8d5fa8aa943e@kernel.org \
    --to=mhiramat@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=alexei.starovoitov@gmail.com \
    --cc=bp@alien8.de \
    --cc=bpf@vger.kernel.org \
    --cc=clm@meta.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hch@infradead.org \
    --cc=jpoimboe@redhat.com \
    --cc=keescook@chromium.org \
    --cc=kpsingh@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=peterz@infradead.org \
    --cc=revest@chromium.org \
    --cc=rostedt@goodmis.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox