From: Clark Williams <williams@redhat.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: David Miller <davem@davemloft.net>,
Sebastian Sewior <bigeasy@linutronix.de>,
daniel@iogearbox.net, bpf@vger.kernel.org, ast@kernel.org,
kafai@fb.com, songliubraving@fb.com, yhs@fb.com,
Peter Zijlstra <peterz@infradead.org>,
Arnaldo Carvalho de Melo <acme@redhat.com>
Subject: Re: [PATCH] BPF: Disable on PREEMPT_RT
Date: Thu, 17 Oct 2019 21:49:17 -0500 [thread overview]
Message-ID: <20191017214917.18911f58@tagon> (raw)
In-Reply-To: <alpine.DEB.2.21.1910172342090.1869@nanos.tec.linutronix.de>
+acme
On Thu, 17 Oct 2019 23:54:07 +0200 (CEST)
Thomas Gleixner <tglx@linutronix.de> wrote:
> On Thu, 17 Oct 2019, David Miller wrote:
>
> > From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> > Date: Thu, 17 Oct 2019 17:40:21 +0200
> >
> > > On 2019-10-17 16:53:58 [+0200], Daniel Borkmann wrote:
> > >> On Thu, Oct 17, 2019 at 11:05:01AM +0200, Sebastian Andrzej Siewior wrote:
> > >> > Disable BPF on PREEMPT_RT because
> > >> > - it allocates and frees memory in atomic context
> > >> > - it uses up_read_non_owner()
> > >> > - BPF_PROG_RUN() expects to be invoked in non-preemptible context
> > >>
> > >> For the latter you'd also need to disable seccomp-BPF and everything
> > >> cBPF related as they are /all/ invoked via BPF_PROG_RUN() ...
> > >
> > > I looked at tracing and it depended on BPF_SYSCALL so I assumed they all
> > > do… Now looking for BPF_PROG_RUN() there is PPP_FILTER,
> > > NET_TEAM_MODE_LOADBALANCE and probably more. I didn't find a symbol for
> > > seccomp-BPF.
> > > Would it make sense to override BPF_PROG_RUN() and make each caller fail
> > > instead? Other recommendations?
> >
> > I hope you understand that basically you are disabling any packet sniffing
> > on the system with this patch you are proposing.
> >
> > This means no tcpdump, not wireshark, etc. They will all become
> > non-functional.
> >
> > Turning off BPF just because PREEMPT_RT is enabled is a non-starter it is
> > absolutely essential functionality for a Linux system at this point.
>
> I'm all ears for an alternative solution. Here are the pain points:
>
> #1) BPF disables preemption unconditionally with no way to do a proper RT
> substitution like most other infrastructure in the kernel provides
> via spinlocks or other locking primitives.
As I understand it, BPF programs cannot loop and are limited to 4096 instructions.
Has anyone done any timing to see just how much having preemption off while a
BPF program executes is going to affect us? Are we talking 1us or 50us? or longer?
I wonder if there's some instrumentation we could use to determine the maximum time
spent running a BPF program. Maybe some perf mojo...
>
> #2) BPF does allocations in atomic contexts, which is a dubious decision
> even for non RT. That's related to #1
I guess my question here is, are the allocations done on behalf of an about-to-run
BPF program, or as a result of executing BPF code? Is it something we might be able
to satisfy from a pre-allocated pool rather than kmalloc()? Ok, I need to go dive
into BPF a bit deeper.
>
> #3) BPF uses the up_read_non_owner() hackery which was only invented to
> deal with already existing horrors and not meant to be proliferated.
>
> Yes, I know it's a existing facility ....
I'm sure I'll regret asking this, but why is up_read_non_owner() a horror? I mean,
I get the fundamental wrongness of having someone that's not the owner of a semaphore
performing an 'up' on it, but is there an RT-specific reason that it's bad? Is it
totally a blocker for using BPF with RT or is it something we should fix over time?
>
> TBH, I have no idea how to deal with those things. So the only way forward
> for RT right now is to disable the whole thing.
>
> Clark might have some insight from the product side for you how much that
> impacts usability.
>
> Thanks,
>
> tglx
Clark is only just starting his journey with BPF, so not an expert.
I do think that we (RT) are going to have to co-exist with BPF, if only due to the
increased use of XDP. I also think that other sub-systems will start to
employ BPF for production purposes (as opposed to debug/analysis which is
how we generally look at tracing, packet sniffing, etc.). I think we *have* to
figure out how to co-exist.
Guess my "hey, that look interesting, think I'll leisurely read up on it" just got
a little less leisurely. I'm out most of the day tomorrow but I'll catch up on email
over the weekend.
Clark
--
The United States Coast Guard
Ruining Natural Selection since 1790
next prev parent reply other threads:[~2019-10-18 5:14 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-10-17 9:05 [PATCH] BPF: Disable on PREEMPT_RT Sebastian Andrzej Siewior
2019-10-17 14:53 ` Daniel Borkmann
2019-10-17 15:40 ` Sebastian Andrzej Siewior
2019-10-17 17:25 ` David Miller
2019-10-17 21:54 ` Thomas Gleixner
2019-10-17 22:13 ` David Miller
2019-10-17 23:50 ` Thomas Gleixner
2019-10-17 23:27 ` Alexei Starovoitov
2019-10-18 0:22 ` Thomas Gleixner
2019-10-18 5:52 ` Alexei Starovoitov
2019-10-18 11:28 ` Thomas Gleixner
2019-10-18 12:48 ` Sebastian Sewior
2019-10-18 23:05 ` Alexei Starovoitov
2019-10-20 9:06 ` Thomas Gleixner
2019-10-22 1:43 ` Alexei Starovoitov
2019-10-18 2:49 ` Clark Williams [this message]
2019-10-18 4:57 ` David Miller
2019-10-18 5:54 ` Alexei Starovoitov
2019-10-18 8:38 ` Thomas Gleixner
2019-10-18 12:49 ` Clark Williams
2019-10-18 8:46 ` Thomas Gleixner
2019-10-18 12:43 ` Sebastian Sewior
2019-10-18 12:58 ` Clark Williams
2019-10-17 22:11 ` Thomas Gleixner
2019-10-17 22:23 ` David Miller
2019-10-17 17:26 ` David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20191017214917.18911f58@tagon \
--to=williams@redhat.com \
--cc=acme@redhat.com \
--cc=ast@kernel.org \
--cc=bigeasy@linutronix.de \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=kafai@fb.com \
--cc=peterz@infradead.org \
--cc=songliubraving@fb.com \
--cc=tglx@linutronix.de \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox