[PATCH bpf-next v2] Add notrace to queued_spin_lock

BPF List
 help / color / mirror / Atom feed

* [PATCH bpf-next v2] Add notrace to queued_spin_lock_slowpath
@ 2024-04-21 23:43 Siddharth Chintamaneni
  2024-04-22  7:47 ` Jiri Olsa
  0 siblings, 1 reply; 7+ messages in thread
From: Siddharth Chintamaneni @ 2024-04-21 23:43 UTC (permalink / raw)
  To: bpf
  Cc: alexei.starovoitov, daniel, olsajiri, andrii, yonghong.song,
	miloc, rjsu26, sairoop, djwillia, Siddharth Chintamaneni

This patch is to prevent deadlocks when multiple bpf
programs are attached to queued_spin_locks functions. This issue is similar
to what is already discussed[1] before with the spin_lock helpers.

The addition of notrace macro to the queued_spin_locks
has been discussed[2] when bpf_spin_locks are introduced.

[1] https://lore.kernel.org/bpf/CAE5sdEigPnoGrzN8WU7Tx-h-iFuMZgW06qp0KHWtpvoXxf1OAQ@mail.gmail.com/#r
[2] https://lore.kernel.org/all/20190117011629.efxp7abj4bpf5yco@ast-mbp/t/#maf05c4d71f935f3123013b7ed410e4f50e9da82c

Fixes: d83525ca62cf ("bpf: introduce bpf_spin_lock")
Signed-off-by: Siddharth Chintamaneni <sidchintamaneni@vt.edu>
---
 kernel/locking/qspinlock.c                    |  2 +-
 .../bpf/prog_tests/tracing_failure.c          | 24 +++++++++++++++++++
 .../selftests/bpf/progs/tracing_failure.c     |  6 +++++
 3 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index ebe6b8ec7cb3..4d46538d8399 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -313,7 +313,7 @@ static __always_inline u32  __pv_wait_head_or_lock(struct qspinlock *lock,
  * contended             :    (*,x,y) +--> (*,0,0) ---> (*,0,1) -'  :
  *   queue               :         ^--'                             :
  */
-void __lockfunc queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
+notrace void __lockfunc queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
 {
 	struct mcs_spinlock *prev, *next, *node;
 	u32 old, tail;
diff --git a/tools/testing/selftests/bpf/prog_tests/tracing_failure.c b/tools/testing/selftests/bpf/prog_tests/tracing_failure.c
index a222df765bc3..822ee6c559bc 100644
--- a/tools/testing/selftests/bpf/prog_tests/tracing_failure.c
+++ b/tools/testing/selftests/bpf/prog_tests/tracing_failure.c
@@ -28,10 +28,34 @@ static void test_bpf_spin_lock(bool is_spin_lock)
 	tracing_failure__destroy(skel);
 }
 
+static void test_queued_spin_lock(void)
+{
+	struct tracing_failure *skel;
+	int err;
+
+	skel = tracing_failure__open();
+	if (!ASSERT_OK_PTR(skel, "tracing_failure__open"))
+		return;
+
+	bpf_program__set_autoload(skel->progs.test_queued_spin_lock, true);
+
+	err = tracing_failure__load(skel);
+	if (!ASSERT_OK(err, "tracing_failure__load"))
+		goto out;
+
+	err = tracing_failure__attach(skel);
+	ASSERT_ERR(err, "tracing_failure__attach");
+
+out:
+	tracing_failure__destroy(skel);
+}
+
 void test_tracing_failure(void)
 {
 	if (test__start_subtest("bpf_spin_lock"))
 		test_bpf_spin_lock(true);
 	if (test__start_subtest("bpf_spin_unlock"))
 		test_bpf_spin_lock(false);
+	if (test__start_subtest("queued_spin_lock_slowpath"))
+		test_queued_spin_lock();
 }
diff --git a/tools/testing/selftests/bpf/progs/tracing_failure.c b/tools/testing/selftests/bpf/progs/tracing_failure.c
index d41665d2ec8c..2d2e7fc9d4f0 100644
--- a/tools/testing/selftests/bpf/progs/tracing_failure.c
+++ b/tools/testing/selftests/bpf/progs/tracing_failure.c
@@ -18,3 +18,9 @@ int BPF_PROG(test_spin_unlock, struct bpf_spin_lock *lock)
 {
 	return 0;
 }
+
+SEC("?fentry/queued_spin_lock_slowpath")
+int BPF_PROG(test_queued_spin_lock, struct qspinlock *lock, u32 val)
+{
+	return 0;
+}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH bpf-next v2] Add notrace to queued_spin_lock_slowpath
  2024-04-21 23:43 [PATCH bpf-next v2] Add notrace to queued_spin_lock_slowpath Siddharth Chintamaneni
@ 2024-04-22  7:47 ` Jiri Olsa
  2024-04-22 17:13   ` Alexei Starovoitov
  0 siblings, 1 reply; 7+ messages in thread
From: Jiri Olsa @ 2024-04-22  7:47 UTC (permalink / raw)
  To: Siddharth Chintamaneni
  Cc: bpf, alexei.starovoitov, daniel, olsajiri, andrii, yonghong.song,
	miloc, rjsu26, sairoop, djwillia, Siddharth Chintamaneni

On Sun, Apr 21, 2024 at 07:43:36PM -0400, Siddharth Chintamaneni wrote:
> This patch is to prevent deadlocks when multiple bpf
> programs are attached to queued_spin_locks functions. This issue is similar
> to what is already discussed[1] before with the spin_lock helpers.
> 
> The addition of notrace macro to the queued_spin_locks
> has been discussed[2] when bpf_spin_locks are introduced.
> 
> [1] https://lore.kernel.org/bpf/CAE5sdEigPnoGrzN8WU7Tx-h-iFuMZgW06qp0KHWtpvoXxf1OAQ@mail.gmail.com/#r
> [2] https://lore.kernel.org/all/20190117011629.efxp7abj4bpf5yco@ast-mbp/t/#maf05c4d71f935f3123013b7ed410e4f50e9da82c
> 
> Fixes: d83525ca62cf ("bpf: introduce bpf_spin_lock")
> Signed-off-by: Siddharth Chintamaneni <sidchintamaneni@vt.edu>
> ---
>  kernel/locking/qspinlock.c                    |  2 +-
>  .../bpf/prog_tests/tracing_failure.c          | 24 +++++++++++++++++++
>  .../selftests/bpf/progs/tracing_failure.c     |  6 +++++
>  3 files changed, 31 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
> index ebe6b8ec7cb3..4d46538d8399 100644
> --- a/kernel/locking/qspinlock.c
> +++ b/kernel/locking/qspinlock.c
> @@ -313,7 +313,7 @@ static __always_inline u32  __pv_wait_head_or_lock(struct qspinlock *lock,
>   * contended             :    (*,x,y) +--> (*,0,0) ---> (*,0,1) -'  :
>   *   queue               :         ^--'                             :
>   */
> -void __lockfunc queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
> +notrace void __lockfunc queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)

we did the same for bpf spin lock helpers, which is fine, but I wonder
removing queued_spin_lock_slowpath from traceable functions could break
some scripts (even though many probably use contention tracepoints..)

maybe we could have a list of helpers/kfuncs that could call spin lock
and deny bpf program to load/attach to queued_spin_lock_slowpath
if it calls anything from that list

>  {
>  	struct mcs_spinlock *prev, *next, *node;
>  	u32 old, tail;
> diff --git a/tools/testing/selftests/bpf/prog_tests/tracing_failure.c b/tools/testing/selftests/bpf/prog_tests/tracing_failure.c
> index a222df765bc3..822ee6c559bc 100644
> --- a/tools/testing/selftests/bpf/prog_tests/tracing_failure.c
> +++ b/tools/testing/selftests/bpf/prog_tests/tracing_failure.c
> @@ -28,10 +28,34 @@ static void test_bpf_spin_lock(bool is_spin_lock)
>  	tracing_failure__destroy(skel);
>  }
>  
> +static void test_queued_spin_lock(void)
> +{
> +	struct tracing_failure *skel;
> +	int err;
> +
> +	skel = tracing_failure__open();
> +	if (!ASSERT_OK_PTR(skel, "tracing_failure__open"))
> +		return;
> +
> +	bpf_program__set_autoload(skel->progs.test_queued_spin_lock, true);
> +
> +	err = tracing_failure__load(skel);
> +	if (!ASSERT_OK(err, "tracing_failure__load"))
> +		goto out;
> +
> +	err = tracing_failure__attach(skel);
> +	ASSERT_ERR(err, "tracing_failure__attach");

the test is broken, fentry program won't load with notrace function

[root@qemu bpf]# ./test_progs -n 391/3
test_queued_spin_lock:PASS:tracing_failure__open 0 nsec
libbpf: prog 'test_queued_spin_lock': failed to find kernel BTF type ID of 'queued_spin_lock_slowpath': -3
libbpf: prog 'test_queued_spin_lock': failed to prepare load attributes: -3
libbpf: prog 'test_queued_spin_lock': failed to load: -3
libbpf: failed to load object 'tracing_failure'
libbpf: failed to load BPF skeleton 'tracing_failure': -3
test_queued_spin_lock:FAIL:tracing_failure__load unexpected error: -3 (errno 3)
#391/3   tracing_failure/queued_spin_lock_slowpath:FAIL
#391     tracing_failure:FAIL

jirka

> +
> +out:
> +	tracing_failure__destroy(skel);
> +}
> +
>  void test_tracing_failure(void)
>  {
>  	if (test__start_subtest("bpf_spin_lock"))
>  		test_bpf_spin_lock(true);
>  	if (test__start_subtest("bpf_spin_unlock"))
>  		test_bpf_spin_lock(false);
> +	if (test__start_subtest("queued_spin_lock_slowpath"))
> +		test_queued_spin_lock();
>  }
> diff --git a/tools/testing/selftests/bpf/progs/tracing_failure.c b/tools/testing/selftests/bpf/progs/tracing_failure.c
> index d41665d2ec8c..2d2e7fc9d4f0 100644
> --- a/tools/testing/selftests/bpf/progs/tracing_failure.c
> +++ b/tools/testing/selftests/bpf/progs/tracing_failure.c
> @@ -18,3 +18,9 @@ int BPF_PROG(test_spin_unlock, struct bpf_spin_lock *lock)
>  {
>  	return 0;
>  }
> +
> +SEC("?fentry/queued_spin_lock_slowpath")
> +int BPF_PROG(test_queued_spin_lock, struct qspinlock *lock, u32 val)
> +{
> +	return 0;
> +}
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH bpf-next v2] Add notrace to queued_spin_lock_slowpath
  2024-04-22  7:47 ` Jiri Olsa
@ 2024-04-22 17:13   ` Alexei Starovoitov
  2024-04-22 23:20     ` Siddharth Chintamaneni
  0 siblings, 1 reply; 7+ messages in thread
From: Alexei Starovoitov @ 2024-04-22 17:13 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Siddharth Chintamaneni, bpf, Daniel Borkmann, Andrii Nakryiko,
	Yonghong Song, miloc, rjsu26, sairoop, Dan Williams,
	Siddharth Chintamaneni

On Mon, Apr 22, 2024 at 12:47 AM Jiri Olsa <olsajiri@gmail.com> wrote:
>
> On Sun, Apr 21, 2024 at 07:43:36PM -0400, Siddharth Chintamaneni wrote:
> > This patch is to prevent deadlocks when multiple bpf
> > programs are attached to queued_spin_locks functions. This issue is similar
> > to what is already discussed[1] before with the spin_lock helpers.
> >
> > The addition of notrace macro to the queued_spin_locks
> > has been discussed[2] when bpf_spin_locks are introduced.
> >
> > [1] https://lore.kernel.org/bpf/CAE5sdEigPnoGrzN8WU7Tx-h-iFuMZgW06qp0KHWtpvoXxf1OAQ@mail.gmail.com/#r
> > [2] https://lore.kernel.org/all/20190117011629.efxp7abj4bpf5yco@ast-mbp/t/#maf05c4d71f935f3123013b7ed410e4f50e9da82c
> >
> > Fixes: d83525ca62cf ("bpf: introduce bpf_spin_lock")
> > Signed-off-by: Siddharth Chintamaneni <sidchintamaneni@vt.edu>
> > ---
> >  kernel/locking/qspinlock.c                    |  2 +-
> >  .../bpf/prog_tests/tracing_failure.c          | 24 +++++++++++++++++++
> >  .../selftests/bpf/progs/tracing_failure.c     |  6 +++++
> >  3 files changed, 31 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
> > index ebe6b8ec7cb3..4d46538d8399 100644
> > --- a/kernel/locking/qspinlock.c
> > +++ b/kernel/locking/qspinlock.c
> > @@ -313,7 +313,7 @@ static __always_inline u32  __pv_wait_head_or_lock(struct qspinlock *lock,
> >   * contended             :    (*,x,y) +--> (*,0,0) ---> (*,0,1) -'  :
> >   *   queue               :         ^--'                             :
> >   */
> > -void __lockfunc queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
> > +notrace void __lockfunc queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
>
> we did the same for bpf spin lock helpers, which is fine, but I wonder
> removing queued_spin_lock_slowpath from traceable functions could break
> some scripts (even though many probably use contention tracepoints..)
>
> maybe we could have a list of helpers/kfuncs that could call spin lock
> and deny bpf program to load/attach to queued_spin_lock_slowpath
> if it calls anything from that list

We can filter out many such functions, but the possibility of deadlock
will still exist.
Adding notrace here won't help much,
since there are tracepoints in there: trace_contention_begin/end
which are quite useful and we should still allow bpf to use them.
I think the only bullet proof way is to detect deadlocks at runtime.
I'm working on such "try hard to spin_lock but abort if it deadlocks."

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH bpf-next v2] Add notrace to queued_spin_lock_slowpath
  2024-04-22 17:13   ` Alexei Starovoitov
@ 2024-04-22 23:20     ` Siddharth Chintamaneni
  2024-04-22 23:47       ` Alexei Starovoitov
  0 siblings, 1 reply; 7+ messages in thread
From: Siddharth Chintamaneni @ 2024-04-22 23:20 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Jiri Olsa, bpf, Daniel Borkmann, Andrii Nakryiko, Yonghong Song,
	miloc, rjsu26, sairoop, Dan Williams, Siddharth Chintamaneni

On Mon, 22 Apr 2024 at 13:13, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Mon, Apr 22, 2024 at 12:47 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> >
> > On Sun, Apr 21, 2024 at 07:43:36PM -0400, Siddharth Chintamaneni wrote:
> > > This patch is to prevent deadlocks when multiple bpf
> > > programs are attached to queued_spin_locks functions. This issue is similar
> > > to what is already discussed[1] before with the spin_lock helpers.
> > >
> > > The addition of notrace macro to the queued_spin_locks
> > > has been discussed[2] when bpf_spin_locks are introduced.
> > >
> > > [1] https://lore.kernel.org/bpf/CAE5sdEigPnoGrzN8WU7Tx-h-iFuMZgW06qp0KHWtpvoXxf1OAQ@mail.gmail.com/#r
> > > [2] https://lore.kernel.org/all/20190117011629.efxp7abj4bpf5yco@ast-mbp/t/#maf05c4d71f935f3123013b7ed410e4f50e9da82c
> > >
> > > Fixes: d83525ca62cf ("bpf: introduce bpf_spin_lock")
> > > Signed-off-by: Siddharth Chintamaneni <sidchintamaneni@vt.edu>
> > > ---
> > >  kernel/locking/qspinlock.c                    |  2 +-
> > >  .../bpf/prog_tests/tracing_failure.c          | 24 +++++++++++++++++++
> > >  .../selftests/bpf/progs/tracing_failure.c     |  6 +++++
> > >  3 files changed, 31 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
> > > index ebe6b8ec7cb3..4d46538d8399 100644
> > > --- a/kernel/locking/qspinlock.c
> > > +++ b/kernel/locking/qspinlock.c
> > > @@ -313,7 +313,7 @@ static __always_inline u32  __pv_wait_head_or_lock(struct qspinlock *lock,
> > >   * contended             :    (*,x,y) +--> (*,0,0) ---> (*,0,1) -'  :
> > >   *   queue               :         ^--'                             :
> > >   */
> > > -void __lockfunc queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
> > > +notrace void __lockfunc queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
> >
> > we did the same for bpf spin lock helpers, which is fine, but I wonder
> > removing queued_spin_lock_slowpath from traceable functions could break
> > some scripts (even though many probably use contention tracepoints..)
> >
> > maybe we could have a list of helpers/kfuncs that could call spin lock
> > and deny bpf program to load/attach to queued_spin_lock_slowpath
> > if it calls anything from that list
>
> We can filter out many such functions, but the possibility of deadlock
> will still exist.
> Adding notrace here won't help much,
> since there are tracepoints in there: trace_contention_begin/end
> which are quite useful and we should still allow bpf to use them.
> I think the only bullet proof way is to detect deadlocks at runtime.
> I'm working on such "try hard to spin_lock but abort if it deadlocks."

I agree with the point that notracing all the functions will not
resolve the issue. I could also find a scenario where BPF programs
will end up in a deadlock easily by using bpf_map_pop_elem and
bpf_map_push_elem helper functions called from two different BPF
programs accessing the same map. Here are some issues raised by syzbot
[2, 3].

 I also believe that a BPF program can end up in a deadlock scenario
without any assistance from the second BPF program, like described
above. The runtime solution sounds like a better fit to address this
problem, unless there is a BPF program that should definitely run for
the performance or security of the system (like an LSM hook or a
nested scheduling type program as mentioned here [1]).

In those cases, the user assumes that these BPF programs will always
trigger. So, to address these types of issues, we are currently
working on a helper's function callgraph based approach so that the
verifier gets the ability to make a decision during load time on
whether to load it or not, ensuring that if a BPF program is attached,
it will be triggered.

[1] https://lore.kernel.org/all/a15f6a20-c902-4057-a1a9-8259a225bb8b@linux.dev/
[2] https://lore.kernel.org/lkml/0000000000004aa700061379547e@google.com/
[3] https://lore.kernel.org/lkml/0000000000004c3fc90615f37756@google.com/

PS. We are following a similar approach to solve the stackoverflow
problem with nesting.

Thanks,
Siddharth

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH bpf-next v2] Add notrace to queued_spin_lock_slowpath
  2024-04-22 23:20     ` Siddharth Chintamaneni
@ 2024-04-22 23:47       ` Alexei Starovoitov
  2024-04-23 23:58         ` Siddharth Chintamaneni
  0 siblings, 1 reply; 7+ messages in thread
From: Alexei Starovoitov @ 2024-04-22 23:47 UTC (permalink / raw)
  To: Siddharth Chintamaneni
  Cc: Jiri Olsa, bpf, Daniel Borkmann, Andrii Nakryiko, Yonghong Song,
	miloc, rjsu26, sairoop, Dan Williams, Siddharth Chintamaneni

On Mon, Apr 22, 2024 at 4:20 PM Siddharth Chintamaneni
<sidchintamaneni@gmail.com> wrote:
>
> On Mon, 22 Apr 2024 at 13:13, Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Mon, Apr 22, 2024 at 12:47 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> > >
> > > On Sun, Apr 21, 2024 at 07:43:36PM -0400, Siddharth Chintamaneni wrote:
> > > > This patch is to prevent deadlocks when multiple bpf
> > > > programs are attached to queued_spin_locks functions. This issue is similar
> > > > to what is already discussed[1] before with the spin_lock helpers.
> > > >
> > > > The addition of notrace macro to the queued_spin_locks
> > > > has been discussed[2] when bpf_spin_locks are introduced.
> > > >
> > > > [1] https://lore.kernel.org/bpf/CAE5sdEigPnoGrzN8WU7Tx-h-iFuMZgW06qp0KHWtpvoXxf1OAQ@mail.gmail.com/#r
> > > > [2] https://lore.kernel.org/all/20190117011629.efxp7abj4bpf5yco@ast-mbp/t/#maf05c4d71f935f3123013b7ed410e4f50e9da82c
> > > >
> > > > Fixes: d83525ca62cf ("bpf: introduce bpf_spin_lock")
> > > > Signed-off-by: Siddharth Chintamaneni <sidchintamaneni@vt.edu>
> > > > ---
> > > >  kernel/locking/qspinlock.c                    |  2 +-
> > > >  .../bpf/prog_tests/tracing_failure.c          | 24 +++++++++++++++++++
> > > >  .../selftests/bpf/progs/tracing_failure.c     |  6 +++++
> > > >  3 files changed, 31 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
> > > > index ebe6b8ec7cb3..4d46538d8399 100644
> > > > --- a/kernel/locking/qspinlock.c
> > > > +++ b/kernel/locking/qspinlock.c
> > > > @@ -313,7 +313,7 @@ static __always_inline u32  __pv_wait_head_or_lock(struct qspinlock *lock,
> > > >   * contended             :    (*,x,y) +--> (*,0,0) ---> (*,0,1) -'  :
> > > >   *   queue               :         ^--'                             :
> > > >   */
> > > > -void __lockfunc queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
> > > > +notrace void __lockfunc queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
> > >
> > > we did the same for bpf spin lock helpers, which is fine, but I wonder
> > > removing queued_spin_lock_slowpath from traceable functions could break
> > > some scripts (even though many probably use contention tracepoints..)
> > >
> > > maybe we could have a list of helpers/kfuncs that could call spin lock
> > > and deny bpf program to load/attach to queued_spin_lock_slowpath
> > > if it calls anything from that list
> >
> > We can filter out many such functions, but the possibility of deadlock
> > will still exist.
> > Adding notrace here won't help much,
> > since there are tracepoints in there: trace_contention_begin/end
> > which are quite useful and we should still allow bpf to use them.
> > I think the only bullet proof way is to detect deadlocks at runtime.
> > I'm working on such "try hard to spin_lock but abort if it deadlocks."
>
> I agree with the point that notracing all the functions will not
> resolve the issue. I could also find a scenario where BPF programs
> will end up in a deadlock easily by using bpf_map_pop_elem and
> bpf_map_push_elem helper functions called from two different BPF
> programs accessing the same map. Here are some issues raised by syzbot
> [2, 3].

ringbuf and stackqueue maps should probably be fixed now
similar to hashmap's __this_cpu_inc_return(*(htab->map_locked...)
approach.
Both ringbug and queue_stack can handle failure to lock.
That will address the issue spotted by these 2 syzbot reports.
Could you work on such patches?

The full run-time solution will take time to land and
may be too big to be backportable.
I'll certainly cc you on the patches when I send them.

>  I also believe that a BPF program can end up in a deadlock scenario
> without any assistance from the second BPF program, like described
> above. The runtime solution sounds like a better fit to address this
> problem, unless there is a BPF program that should definitely run for
> the performance or security of the system (like an LSM hook or a
> nested scheduling type program as mentioned here [1]).

Right. Certain bpf progs like tcp-bpf don't even have a recursion
run-time counter, because they have to nest and it's safe to nest.

> In those cases, the user assumes that these BPF programs will always
> trigger. So, to address these types of issues, we are currently
> working on a helper's function callgraph based approach so that the
> verifier gets the ability to make a decision during load time on
> whether to load it or not, ensuring that if a BPF program is attached,
> it will be triggered.

callgraph approach? Could you share more details?

>
> [1] https://lore.kernel.org/all/a15f6a20-c902-4057-a1a9-8259a225bb8b@linux.dev/
> [2] https://lore.kernel.org/lkml/0000000000004aa700061379547e@google.com/
> [3] https://lore.kernel.org/lkml/0000000000004c3fc90615f37756@google.com/
>
> PS. We are following a similar approach to solve the stackoverflow
> problem with nesting.

Not following. The stack overflow issue is being fixed by not using
the kernel stack. So each bpf prog will consume a tiny bit of stack
to save frame pointer, return address, and callee regs.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH bpf-next v2] Add notrace to queued_spin_lock_slowpath
  2024-04-22 23:47       ` Alexei Starovoitov
@ 2024-04-23 23:58         ` Siddharth Chintamaneni
  2024-04-24 18:45           ` Alexei Starovoitov
  0 siblings, 1 reply; 7+ messages in thread
From: Siddharth Chintamaneni @ 2024-04-23 23:58 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Jiri Olsa, bpf, Daniel Borkmann, Andrii Nakryiko, Yonghong Song,
	miloc, rjsu26, sairoop, Dan Williams, Siddharth Chintamaneni

> > I agree with the point that notracing all the functions will not
> > resolve the issue. I could also find a scenario where BPF programs
> > will end up in a deadlock easily by using bpf_map_pop_elem and
> > bpf_map_push_elem helper functions called from two different BPF
> > programs accessing the same map. Here are some issues raised by syzbot
> > [2, 3].
>
> ringbuf and stackqueue maps should probably be fixed now
> similar to hashmap's __this_cpu_inc_return(*(htab->map_locked...)
> approach.
> Both ringbug and queue_stack can handle failure to lock.
> That will address the issue spotted by these 2 syzbot reports.
> Could you work on such patches?

Just seen your latest patches related to this. Yes, I will work on the
fixes and send the patches.

> > In those cases, the user assumes that these BPF programs will always
> > trigger. So, to address these types of issues, we are currently
> > working on a helper's function callgraph based approach so that the
> > verifier gets the ability to make a decision during load time on
> > whether to load it or not, ensuring that if a BPF program is attached,
> > it will be triggered.
>
> callgraph approach? Could you share more details?

We are generating a call graph for all the helper functions (including
the indirect call targets) and trying to filter out the functions and
their callee's which take locks. So if any BPF program tries to attach
to these lock-taking functions and contains these lock-taking
functions inside the helper it is calling, we want to reject it at
load time. This type of approach may lead to many false positives, but
it will adhere to the principle that "If a BPF program is attached,
then it should get triggered as expected without affecting the
kernel." We are planning to work towards this solution and would love
your feedback on it.

> Not following. The stack overflow issue is being fixed by not using
> the kernel stack. So each bpf prog will consume a tiny bit of stack
> to save frame pointer, return address, and callee regs.

(IIRC), in the email chain, it is mentioned that BPF programs are
going to use a new stack allocated from the heap. I think with a
deeper call chain of fentry BPF programs, isn't it still a possibility
to overflow the stack? Also, through our call graph analysis, we found
that some helpers have really deeper call depths. If a BPF program is
attached at the deepest point in the helper's call chain, isn't there
still a possibility to overflow it? In LPC '23 [1], we presented a
similar idea of stack switching to prevent the overflow in nesting and
later realized that it may give you some extra space and the ability
to nest more, but it is not entirely resolving the issue (tail calls
will even worsen this issue).

[1] https://lpc.events/event/17/contributions/1595/attachments/1230/2506/LPC2023_slides.pdf

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH bpf-next v2] Add notrace to queued_spin_lock_slowpath
  2024-04-23 23:58         ` Siddharth Chintamaneni
@ 2024-04-24 18:45           ` Alexei Starovoitov
  0 siblings, 0 replies; 7+ messages in thread
From: Alexei Starovoitov @ 2024-04-24 18:45 UTC (permalink / raw)
  To: Siddharth Chintamaneni
  Cc: Jiri Olsa, bpf, Daniel Borkmann, Andrii Nakryiko, Yonghong Song,
	miloc, rjsu26, sairoop, Dan Williams, Siddharth Chintamaneni

On Tue, Apr 23, 2024 at 4:58 PM Siddharth Chintamaneni
<sidchintamaneni@gmail.com> wrote:
>
> > > I agree with the point that notracing all the functions will not
> > > resolve the issue. I could also find a scenario where BPF programs
> > > will end up in a deadlock easily by using bpf_map_pop_elem and
> > > bpf_map_push_elem helper functions called from two different BPF
> > > programs accessing the same map. Here are some issues raised by syzbot
> > > [2, 3].
> >
> > ringbuf and stackqueue maps should probably be fixed now
> > similar to hashmap's __this_cpu_inc_return(*(htab->map_locked...)
> > approach.
> > Both ringbug and queue_stack can handle failure to lock.
> > That will address the issue spotted by these 2 syzbot reports.
> > Could you work on such patches?
>
> Just seen your latest patches related to this. Yes, I will work on the
> fixes and send the patches.

Great.

> > > In those cases, the user assumes that these BPF programs will always
> > > trigger. So, to address these types of issues, we are currently
> > > working on a helper's function callgraph based approach so that the
> > > verifier gets the ability to make a decision during load time on
> > > whether to load it or not, ensuring that if a BPF program is attached,
> > > it will be triggered.
> >
> > callgraph approach? Could you share more details?
>
> We are generating a call graph for all the helper functions (including
> the indirect call targets) and trying to filter out the functions and
> their callee's which take locks. So if any BPF program tries to attach
> to these lock-taking functions and contains these lock-taking
> functions inside the helper it is calling, we want to reject it at
> load time. This type of approach may lead to many false positives, but
> it will adhere to the principle that "If a BPF program is attached,
> then it should get triggered as expected without affecting the
> kernel." We are planning to work towards this solution and would love
> your feedback on it.

I can see how you can build a cfg across bpf progs and helpers/kfuncs
that they call, but not across arbitrary attach points in the kernel.
So attaching to qspinlock_slowpath or some tracepoint won't be
recognized in such callgraph. Or am I missing it?

In the past we floated an idea to dual compile the kernel.
Once for native and once for bpf isa, so that all of the kernel code
is analyzable. But there was no strong enough use case to do it.

> > Not following. The stack overflow issue is being fixed by not using
> > the kernel stack. So each bpf prog will consume a tiny bit of stack
> > to save frame pointer, return address, and callee regs.
>
> (IIRC), in the email chain, it is mentioned that BPF programs are
> going to use a new stack allocated from the heap. I think with a
> deeper call chain of fentry BPF programs, isn't it still a possibility
> to overflow the stack?

stack overflow is a possibility even without bpf. That's why the stack
is now virtually mapped with guard pages.

> Also, through our call graph analysis, we found
> that some helpers have really deeper call depths. If a BPF program is
> attached at the deepest point in the helper's call chain, isn't there
> still a possibility to overflow it? In LPC '23 [1], we presented a
> similar idea of stack switching to prevent the overflow in nesting and
> later realized that it may give you some extra space and the ability
> to nest more, but it is not entirely resolving the issue (tail calls
> will even worsen this issue).

The problem with any kind of stack switching is reliability of stack
traces. Kernel panics must be able to walk the stack. Even when
there are bugs. Full stack switch makes it risky.
Then livepatching depends on reliable stack walks.
That's another reason to avoid stack switch.

>
> [1] https://lpc.events/event/17/contributions/1595/attachments/1230/2506/LPC2023_slides.pdf

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-04-24 18:46 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-21 23:43 [PATCH bpf-next v2] Add notrace to queued_spin_lock_slowpath Siddharth Chintamaneni
2024-04-22  7:47 ` Jiri Olsa
2024-04-22 17:13   ` Alexei Starovoitov
2024-04-22 23:20     ` Siddharth Chintamaneni
2024-04-22 23:47       ` Alexei Starovoitov
2024-04-23 23:58         ` Siddharth Chintamaneni
2024-04-24 18:45           ` Alexei Starovoitov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox