public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: "Jiayuan Chen" <jiayuan.chen@linux.dev>
To: "Sebastian Andrzej Siewior" <bigeasy@linutronix.de>
Cc: xxx@vger.kernel.org, "Jiayuan Chen" <jiayuan.chen@shopee.com>,
	syzbot+2b3391f44313b3983e91@syzkaller.appspotmail.com,
	"Alexei Starovoitov" <ast@kernel.org>,
	"Daniel Borkmann" <daniel@iogearbox.net>,
	"David S. Miller" <davem@davemloft.net>,
	"Jakub Kicinski" <kuba@kernel.org>,
	"Jesper Dangaard Brouer" <hawk@kernel.org>,
	"John Fastabend" <john.fastabend@gmail.com>,
	"Stanislav Fomichev" <sdf@fomichev.me>,
	"Andrii Nakryiko" <andrii@kernel.org>,
	"Martin KaFai Lau" <martin.lau@linux.dev>,
	"Eduard Zingerman" <eddyz87@gmail.com>,
	"Song Liu" <song@kernel.org>,
	"Yonghong Song" <yonghong.song@linux.dev>,
	"KP Singh" <kpsingh@kernel.org>, "Hao Luo" <haoluo@google.com>,
	"Jiri Olsa" <jolsa@kernel.org>,
	"Clark Williams" <clrkwllms@kernel.org>,
	"Steven Rostedt" <rostedt@goodmis.org>,
	"Thomas Gleixner" <tglx@kernel.org>,
	netdev@vger.kernel.org, bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-rt-devel@lists.linux.dev
Subject: Re: [PATCH bpf v2] bpf: cpumap: fix race in bq_flush_to_queue on PREEMPT_RT
Date: Fri, 13 Feb 2026 01:37:35 +0000	[thread overview]
Message-ID: <4efdec422a00d95a2bd7d227b8bcab61ebc231ce@linux.dev> (raw)
In-Reply-To: <20260212143344.j3_GaCuV@linutronix.de>

2026/2/12 22:33, "Sebastian Andrzej Siewior" <bigeasy@linutronix.de mailto:bigeasy@linutronix.de?to=%22Sebastian%20Andrzej%20Siewior%22%20%3Cbigeasy%40linutronix.de%3E > wrote:




Thanks for the review, Sebastian.


> There is no spin_lock() otherwise there would be no problem.


Right, will fix the description. The per-CPU bq has no lock at all,
which is the root cause...
> …
> 
> > 
> > Fix this by adding a local_lock_t to xdp_bulk_queue and acquiring it
> >  in bq_enqueue() and __cpu_map_flush(). On non-RT kernels, local_lock
> >  maps to preempt_disable/enable with zero additional overhead. On
> >  PREEMPT_RT, it provides a per-CPU sleeping lock that serializes
> >  access to the bq. Use local_lock_nested_bh() since these paths already
> >  run under local_bh_disable().
> > 
> So you use local_lock_nested_bh() and not local_lock() but you mention
> local_lock. The difference is that the former does not add any
> preempt_disable() on !RT. At this point I am curious how much of this
> was written by you and how much is auto generated. 


Sorry about the inconsistencies. The commit message became messy after
several rounds of editing across versions, and I failed to update all
the descriptions to match the final code.


> > 
> > An alternative approach of snapshotting bq->count and bq->q[] before
> >  releasing the producer_lock was considered, but it requires copying
> >  the entire bq->q[] array on every flush, adding unnecessary overhead.
> > 
> But you still have list_head which is not protected.

The snapshot alternative is incomplete as described. Will drop that paragraph.

> > 
> > To reproduce, insert an mdelay(100) between spin_unlock() and
> >  __list_del_clearprev() in bq_flush_to_queue(), then run reproducer
> >  provided by syzkaller.
> >  
> >  Panic:
> >  ===
> >  BUG: kernel NULL pointer dereference, address: 0000000000000000
> >  #PF: supervisor write access in kernel mode
> >  #PF: error_code(0x0002) - not-present page
> >  PGD 0 P4D 0
> >  Oops: Oops: 0002 [#1] SMP PTI
> >  CPU: 0 UID: 0 PID: 377 Comm: a.out Not tainted 6.19.0+ #21 PREEMPT_RT
> >  RIP: 0010:bq_flush_to_queue+0x145/0x200
> >  Call Trace:
> >  <TASK>
> >  __cpu_map_flush+0x2c/0x70
> >  xdp_do_flush+0x64/0x1b0
> >  xdp_test_run_batch.constprop.0+0x4d4/0x6d0
> >  bpf_test_run_xdp_live+0x24b/0x3e0
> >  bpf_prog_test_run_xdp+0x4a1/0x6e0
> >  __sys_bpf+0x44a/0x2760
> >  __x64_sys_bpf+0x1a/0x30
> >  x64_sys_call+0x146c/0x26e0
> >  do_syscall_64+0xd5/0x5a0
> >  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> >  </TASK>
> > 
> This could be omitted. It is obvious once you see it. I somehow missed

Will remove the panic trace.

> this alloc_percpu instance while looking for this kind of bugs.
> Another one is hiding in devmap.c. Mind to take a look? I think I
> skip this entire folder…


I see the same pattern there - xdp_dev_bulk_queue is per-CPU with no preemption
protection in bq_enqueue() and __dev_flush().

> > 
> > Fixes: 3253cb49cbad ("softirq: Allow to drop the softirq-BKL lock on PREEMPT_RT")
> >  Reported-by: syzbot+2b3391f44313b3983e91@syzkaller.appspotmail.com
> >  Closes: https://lore.kernel.org/all/69369331.a70a0220.38f243.009d.GAE@google.com/T/
> >  Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com>
> >  Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
> >  ---
> >  v1 -> v2: https://lore.kernel.org/bpf/20260211064417.196401-1-jiayuan.chen@linux.dev/
> >  - Use local_lock_nested_bh()/local_unlock_nested_bh() instead of
> >  local_lock()/local_unlock(), since these paths already run under
> >  local_bh_disable(). (Sebastian Andrzej Siewior)
> >  - Replace "Caller must hold bq->bq_lock" comment with
> >  lockdep_assert_held() in bq_flush_to_queue(). (Sebastian Andrzej Siewior)
> >  - Fix Fixes tag to 3253cb49cbad ("softirq: Allow to drop the
> >  softirq-BKL lock on PREEMPT_RT") which is the actual commit that
> >  makes the race possible. (Sebastian Andrzej Siewior)
> >  ---
> >  kernel/bpf/cpumap.c | 17 +++++++++++++++--
> >  1 file changed, 15 insertions(+), 2 deletions(-)
> > 
> The changes below look good.

Thanks!

> Sebastian

      reply	other threads:[~2026-02-13  1:37 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-12  2:36 [PATCH bpf v2] bpf: cpumap: fix race in bq_flush_to_queue on PREEMPT_RT Jiayuan Chen
2026-02-12 14:33 ` Sebastian Andrzej Siewior
2026-02-13  1:37   ` Jiayuan Chen [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4efdec422a00d95a2bd7d227b8bcab61ebc231ce@linux.dev \
    --to=jiayuan.chen@linux.dev \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bigeasy@linutronix.de \
    --cc=bpf@vger.kernel.org \
    --cc=clrkwllms@kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=eddyz87@gmail.com \
    --cc=haoluo@google.com \
    --cc=hawk@kernel.org \
    --cc=jiayuan.chen@shopee.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rt-devel@lists.linux.dev \
    --cc=martin.lau@linux.dev \
    --cc=netdev@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=sdf@fomichev.me \
    --cc=song@kernel.org \
    --cc=syzbot+2b3391f44313b3983e91@syzkaller.appspotmail.com \
    --cc=tglx@kernel.org \
    --cc=xxx@vger.kernel.org \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox