All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: LKML <linux-kernel@vger.kernel.org>,
	David Miller <davem@davemloft.net>,
	bpf@vger.kernel.org, netdev@vger.kernel.org,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Sebastian Sewior <bigeasy@linutronix.de>,
	Peter Zijlstra <peterz@infradead.org>,
	Clark Williams <williams@redhat.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Ingo Molnar <mingo@kernel.org>
Subject: Re: [RFC patch 14/19] bpf: Use migrate_disable() in hashtab code
Date: Fri, 14 Feb 2020 14:11:26 -0500	[thread overview]
Message-ID: <20200214191126.lbiusetaxecdl3of@localhost> (raw)
In-Reply-To: <20200214161504.325142160@linutronix.de>

On 14-Feb-2020 02:39:31 PM, Thomas Gleixner wrote:
> The required protection is that the caller cannot be migrated to a
> different CPU as these places take either a hash bucket lock or might
> trigger a kprobe inside the memory allocator. Both scenarios can lead to
> deadlocks. The deadlock prevention is per CPU by incrementing a per CPU
> variable which temporarily blocks the invocation of BPF programs from perf
> and kprobes.
> 
> Replace the preempt_disable/enable() pairs with migrate_disable/enable()
> pairs to prepare BPF to work on PREEMPT_RT enabled kernels. On a non-RT
> kernel this maps to preempt_disable/enable(), i.e. no functional change.

Will that _really_ work on RT ?

I'm puzzled about what will happen in the following scenario on RT:

Thread A is preempted within e.g. htab_elem_free_rcu, and Thread B is
scheduled and runs through a bunch of tracepoints. Both are on the
same CPU's runqueue:

CPU 1

Thread A is scheduled
(Thread A) htab_elem_free_rcu()
(Thread A)   migrate disable
(Thread A)   __this_cpu_inc(bpf_prog_active); -> per-cpu variable for
                                               deadlock prevention.
Thread A is preempted
Thread B is scheduled
(Thread B) Runs through various tracepoints:
           trace_call_bpf()
           if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1)) {
               -> will skip any instrumentation that happens to be on
                  this CPU until...
Thread B is preempted
Thread A is scheduled
(Thread A)  __this_cpu_dec(bpf_prog_active);
(Thread A)  migrate enable

Having all those events randomly and silently discarded might be quite
unexpected from a user standpoint. This turns the deadlock prevention
mechanism into a random tracepoint-dropping facility, which is
unsettling. One alternative approach we could consider to solve this
is to make this deadlock prevention nesting counter per-thread rather
than per-cpu.

Also, I don't think using __this_cpu_inc() without preempt-disable or
irq off is safe. You'll probably want to move to this_cpu_inc/dec
instead, which can be heavier on some architectures.

Thanks,

Mathieu


> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  kernel/bpf/hashtab.c |   12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> --- a/kernel/bpf/hashtab.c
> +++ b/kernel/bpf/hashtab.c
> @@ -698,11 +698,11 @@ static void htab_elem_free_rcu(struct rc
>  	 * we're calling kfree, otherwise deadlock is possible if kprobes
>  	 * are placed somewhere inside of slub
>  	 */
> -	preempt_disable();
> +	migrate_disable();
>  	__this_cpu_inc(bpf_prog_active);
>  	htab_elem_free(htab, l);
>  	__this_cpu_dec(bpf_prog_active);
> -	preempt_enable();
> +	migrate_enable();
>  }
>  
>  static void free_htab_elem(struct bpf_htab *htab, struct htab_elem *l)
> @@ -1327,7 +1327,7 @@ static int
>  	}
>  
>  again:
> -	preempt_disable();
> +	migrate_disable();
>  	this_cpu_inc(bpf_prog_active);
>  	rcu_read_lock();
>  again_nocopy:
> @@ -1347,7 +1347,7 @@ static int
>  		raw_spin_unlock_irqrestore(&b->lock, flags);
>  		rcu_read_unlock();
>  		this_cpu_dec(bpf_prog_active);
> -		preempt_enable();
> +		migrate_enable();
>  		goto after_loop;
>  	}
>  
> @@ -1356,7 +1356,7 @@ static int
>  		raw_spin_unlock_irqrestore(&b->lock, flags);
>  		rcu_read_unlock();
>  		this_cpu_dec(bpf_prog_active);
> -		preempt_enable();
> +		migrate_enable();
>  		kvfree(keys);
>  		kvfree(values);
>  		goto alloc;
> @@ -1406,7 +1406,7 @@ static int
>  
>  	rcu_read_unlock();
>  	this_cpu_dec(bpf_prog_active);
> -	preempt_enable();
> +	migrate_enable();
>  	if (bucket_cnt && (copy_to_user(ukeys + total * key_size, keys,
>  	    key_size * bucket_cnt) ||
>  	    copy_to_user(uvalues + total * value_size, values,
> 

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

  reply	other threads:[~2020-02-14 19:11 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-14 13:39 [RFC patch 00/19] bpf: Make BPF and PREEMPT_RT co-exist Thomas Gleixner
2020-02-14 13:39 ` [RFC patch 01/19] sched: Provide migrate_disable/enable() inlines Thomas Gleixner
2020-02-14 13:39 ` [RFC patch 02/19] sched: Provide cant_migrate() Thomas Gleixner
2020-02-20 20:20   ` [tip: sched/rt] " tip-bot2 for Thomas Gleixner
2020-02-14 13:39 ` [RFC patch 03/19] bpf: Update locking comment in hashtab code Thomas Gleixner
2020-02-14 13:39 ` [RFC patch 04/19] bpf/tracing: Remove redundant preempt_disable() in __bpf_trace_run() Thomas Gleixner
2020-02-19 16:54   ` Steven Rostedt
2020-02-19 17:26     ` Thomas Gleixner
2020-02-14 13:39 ` [RFC patch 05/19] perf/bpf: Remove preempt disable around BPF invocation Thomas Gleixner
2020-02-14 13:39 ` [RFC patch 06/19] bpf: Dont iterate over possible CPUs with interrupts disabled Thomas Gleixner
2020-02-14 13:39 ` [RFC patch 07/19] bpf: Provide BPF_PROG_RUN_PIN_ON_CPU() macro Thomas Gleixner
2020-02-14 18:50   ` Mathieu Desnoyers
2020-02-14 19:36     ` Thomas Gleixner
2020-02-14 13:39 ` [RFC patch 08/19] bpf: Replace cant_sleep() with cant_migrate() Thomas Gleixner
2020-02-14 13:39 ` [RFC patch 09/19] bpf: Use BPF_PROG_RUN_PIN_ON_CPU() at simple call sites Thomas Gleixner
2020-02-19  1:39   ` Vinicius Costa Gomes
2020-02-19  9:00     ` Thomas Gleixner
2020-02-19 16:38       ` Alexei Starovoitov
2020-02-21  0:20       ` Kees Cook
2020-02-21 14:00         ` Thomas Gleixner
2020-02-21 14:05           ` Peter Zijlstra
2020-02-21 22:15           ` Kees Cook
2020-02-14 13:39 ` [RFC patch 10/19] trace/bpf: Use migrate disable in trace_call_bpf() Thomas Gleixner
2020-02-14 13:39 ` [RFC patch 11/19] bpf/tests: Use migrate disable instead of preempt disable Thomas Gleixner
2020-02-14 13:39 ` [RFC patch 12/19] bpf: Use migrate_disable/enabe() in trampoline code Thomas Gleixner
2020-02-14 13:39 ` [RFC patch 13/19] bpf: Use migrate_disable/enable in array macros and cgroup/lirc code Thomas Gleixner
2020-02-14 13:39 ` [RFC patch 14/19] bpf: Use migrate_disable() in hashtab code Thomas Gleixner
2020-02-14 19:11   ` Mathieu Desnoyers [this message]
2020-02-14 19:56     ` Thomas Gleixner
2020-02-18 23:36       ` Alexei Starovoitov
2020-02-19  0:49         ` Thomas Gleixner
2020-02-19  1:23           ` Alexei Starovoitov
2020-02-19 15:17         ` Mathieu Desnoyers
2020-02-20  4:19           ` Alexei Starovoitov
2020-02-14 13:39 ` [RFC patch 15/19] bpf: Use migrate_disable() in sys_bpf() Thomas Gleixner
2020-02-14 13:39 ` [RFC patch 16/19] bpf: Factor out hashtab bucket lock operations Thomas Gleixner
2020-02-14 13:39 ` [RFC patch 17/19] bpf: Prepare hashtab locking for PREEMPT_RT Thomas Gleixner
2020-02-14 13:39 ` [RFC patch 18/19] bpf, lpm: Make locking RT friendly Thomas Gleixner
2020-02-14 13:39 ` [RFC patch 19/19] bpf/stackmap: Dont trylock mmap_sem with PREEMPT_RT and interrupts disabled Thomas Gleixner
2020-02-14 17:53 ` [RFC patch 00/19] bpf: Make BPF and PREEMPT_RT co-exist David Miller
2020-02-14 18:36   ` Thomas Gleixner
2020-02-17 12:59     ` [PATCH] bpf: Enforce map preallocation for all instrumentation programs Thomas Gleixner
2020-02-15 20:09 ` [RFC patch 00/19] bpf: Make BPF and PREEMPT_RT co-exist Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200214191126.lbiusetaxecdl3of@localhost \
    --to=mathieu.desnoyers@efficios.com \
    --cc=ast@kernel.org \
    --cc=bigeasy@linutronix.de \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=williams@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.