public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed
From: Clark Williams <williams@redhat.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: David Miller <davem@davemloft.net>,
	Sebastian Sewior <bigeasy@linutronix.de>,
	daniel@iogearbox.net, bpf@vger.kernel.org, ast@kernel.org,
	kafai@fb.com, songliubraving@fb.com, yhs@fb.com,
	Peter Zijlstra <peterz@infradead.org>,
	Arnaldo Carvalho de Melo <acme@redhat.com>
Subject: Re: [PATCH] BPF: Disable on PREEMPT_RT
Date: Thu, 17 Oct 2019 21:49:17 -0500	[thread overview]
Message-ID: <20191017214917.18911f58@tagon> (raw)
In-Reply-To: <alpine.DEB.2.21.1910172342090.1869@nanos.tec.linutronix.de>

+acme

On Thu, 17 Oct 2019 23:54:07 +0200 (CEST)
Thomas Gleixner <tglx@linutronix.de> wrote:
> On Thu, 17 Oct 2019, David Miller wrote:
> 
> > From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> > Date: Thu, 17 Oct 2019 17:40:21 +0200
> >   
> > > On 2019-10-17 16:53:58 [+0200], Daniel Borkmann wrote:  
> > >> On Thu, Oct 17, 2019 at 11:05:01AM +0200, Sebastian Andrzej Siewior wrote:  
> > >> > Disable BPF on PREEMPT_RT because
> > >> > - it allocates and frees memory in atomic context
> > >> > - it uses up_read_non_owner()
> > >> > - BPF_PROG_RUN() expects to be invoked in non-preemptible context  
> > >> 
> > >> For the latter you'd also need to disable seccomp-BPF and everything
> > >> cBPF related as they are /all/ invoked via BPF_PROG_RUN() ...  
> > > 
> > > I looked at tracing and it depended on BPF_SYSCALL so I assumed they all
> > > do… Now looking for BPF_PROG_RUN() there is PPP_FILTER,
> > > NET_TEAM_MODE_LOADBALANCE and probably more.  I didn't find a symbol for
> > > seccomp-BPF. 
> > > Would it make sense to override BPF_PROG_RUN() and make each caller fail
> > > instead? Other recommendations?  
> > 
> > I hope you understand that basically you are disabling any packet sniffing
> > on the system with this patch you are proposing.
> > 
> > This means no tcpdump, not wireshark, etc.  They will all become
> > non-functional.
> > 
> > Turning off BPF just because PREEMPT_RT is enabled is a non-starter it is
> > absolutely essential functionality for a Linux system at this point.  
> 
> I'm all ears for an alternative solution. Here are the pain points:
> 
>   #1) BPF disables preemption unconditionally with no way to do a proper RT
>       substitution like most other infrastructure in the kernel provides
>       via spinlocks or other locking primitives.

As I understand it, BPF programs cannot loop and are limited to 4096 instructions.
Has anyone done any timing to see just how much having preemption off while a
BPF program executes is going to affect us? Are we talking 1us or 50us? or longer?
I wonder if there's some instrumentation we could use to determine the maximum time
spent running a BPF program. Maybe some perf mojo...

> 
>   #2) BPF does allocations in atomic contexts, which is a dubious decision
>       even for non RT. That's related to #1

I guess my question here is, are the allocations done on behalf of an about-to-run
BPF program, or as a result of executing BPF code?  Is it something we might be able
to satisfy from a pre-allocated pool rather than kmalloc()? Ok, I need to go dive
into BPF a bit deeper.

> 
>   #3) BPF uses the up_read_non_owner() hackery which was only invented to
>       deal with already existing horrors and not meant to be proliferated.
> 
>       Yes, I know it's a existing facility ....

I'm sure I'll regret asking this, but why is up_read_non_owner() a horror? I mean,
I get the fundamental wrongness of having someone that's not the owner of a semaphore
performing an 'up' on it, but is there an RT-specific reason that it's bad? Is it
totally a blocker for using BPF with RT or is it something we should fix over time?

> 
> TBH, I have no idea how to deal with those things. So the only way forward
> for RT right now is to disable the whole thing.
> 
> Clark might have some insight from the product side for you how much that
> impacts usability.
> 
> Thanks,
> 
> 	tglx


Clark is only just starting his journey with BPF, so not an expert.

I do think that we (RT) are going to have to co-exist with BPF, if only due to the
increased use of XDP. I also think that other sub-systems will start to
employ BPF for production purposes (as opposed to debug/analysis which is
how we generally look at tracing, packet sniffing, etc.). I think we *have* to
figure out how to co-exist. 

Guess my "hey, that look interesting, think I'll leisurely read up on it" just got
a little less leisurely. I'm out most of the day tomorrow but I'll catch up on email
over the weekend.


Clark

-- 
The United States Coast Guard
Ruining Natural Selection since 1790

  parent reply	other threads:[~2019-10-18  5:14 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-17  9:05 [PATCH] BPF: Disable on PREEMPT_RT Sebastian Andrzej Siewior
2019-10-17 14:53 ` Daniel Borkmann
2019-10-17 15:40   ` Sebastian Andrzej Siewior
2019-10-17 17:25     ` David Miller
2019-10-17 21:54       ` Thomas Gleixner
2019-10-17 22:13         ` David Miller
2019-10-17 23:50           ` Thomas Gleixner
2019-10-17 23:27         ` Alexei Starovoitov
2019-10-18  0:22           ` Thomas Gleixner
2019-10-18  5:52             ` Alexei Starovoitov
2019-10-18 11:28               ` Thomas Gleixner
2019-10-18 12:48                 ` Sebastian Sewior
2019-10-18 23:05                 ` Alexei Starovoitov
2019-10-20  9:06                   ` Thomas Gleixner
2019-10-22  1:43                     ` Alexei Starovoitov
2019-10-18  2:49         ` Clark Williams [this message]
2019-10-18  4:57           ` David Miller
2019-10-18  5:54             ` Alexei Starovoitov
2019-10-18  8:38             ` Thomas Gleixner
2019-10-18 12:49               ` Clark Williams
2019-10-18  8:46           ` Thomas Gleixner
2019-10-18 12:43             ` Sebastian Sewior
2019-10-18 12:58             ` Clark Williams
2019-10-17 22:11       ` Thomas Gleixner
2019-10-17 22:23         ` David Miller
2019-10-17 17:26   ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191017214917.18911f58@tagon \
    --to=williams@redhat.com \
    --cc=acme@redhat.com \
    --cc=ast@kernel.org \
    --cc=bigeasy@linutronix.de \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=kafai@fb.com \
    --cc=peterz@infradead.org \
    --cc=songliubraving@fb.com \
    --cc=tglx@linutronix.de \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox