All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Jiayuan Chen" <jiayuan.chen@linux.dev>
To: "Sebastian Andrzej Siewior" <bigeasy@linutronix.de>
Cc: xxx@vger.kernel.org, "Jiayuan Chen" <jiayuan.chen@shopee.com>,
	syzbot+2b3391f44313b3983e91@syzkaller.appspotmail.com,
	"Alexei Starovoitov" <ast@kernel.org>,
	"Daniel Borkmann" <daniel@iogearbox.net>,
	"David S. Miller" <davem@davemloft.net>,
	"Jakub Kicinski" <kuba@kernel.org>,
	"Jesper Dangaard Brouer" <hawk@kernel.org>,
	"John Fastabend" <john.fastabend@gmail.com>,
	"Stanislav Fomichev" <sdf@fomichev.me>,
	"Andrii Nakryiko" <andrii@kernel.org>,
	"Martin KaFai Lau" <martin.lau@linux.dev>,
	"Eduard Zingerman" <eddyz87@gmail.com>,
	"Song Liu" <song@kernel.org>,
	"Yonghong Song" <yonghong.song@linux.dev>,
	"KP Singh" <kpsingh@kernel.org>, "Hao Luo" <haoluo@google.com>,
	"Jiri Olsa" <jolsa@kernel.org>,
	"Clark Williams" <clrkwllms@kernel.org>,
	"Steven Rostedt" <rostedt@goodmis.org>,
	"Thomas Gleixner" <tglx@kernel.org>,
	netdev@vger.kernel.org, bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-rt-devel@lists.linux.dev
Subject: Re: [PATCH bpf v2] bpf: cpumap: fix race in bq_flush_to_queue on PREEMPT_RT
Date: Fri, 13 Feb 2026 01:37:35 +0000	[thread overview]
Message-ID: <4efdec422a00d95a2bd7d227b8bcab61ebc231ce@linux.dev> (raw)
In-Reply-To: <20260212143344.j3_GaCuV@linutronix.de>

2026/2/12 22:33, "Sebastian Andrzej Siewior" <bigeasy@linutronix.de mailto:bigeasy@linutronix.de?to=%22Sebastian%20Andrzej%20Siewior%22%20%3Cbigeasy%40linutronix.de%3E > wrote:




Thanks for the review, Sebastian.


> There is no spin_lock() otherwise there would be no problem.


Right, will fix the description. The per-CPU bq has no lock at all,
which is the root cause...
> …
> 
> > 
> > Fix this by adding a local_lock_t to xdp_bulk_queue and acquiring it
> >  in bq_enqueue() and __cpu_map_flush(). On non-RT kernels, local_lock
> >  maps to preempt_disable/enable with zero additional overhead. On
> >  PREEMPT_RT, it provides a per-CPU sleeping lock that serializes
> >  access to the bq. Use local_lock_nested_bh() since these paths already
> >  run under local_bh_disable().
> > 
> So you use local_lock_nested_bh() and not local_lock() but you mention
> local_lock. The difference is that the former does not add any
> preempt_disable() on !RT. At this point I am curious how much of this
> was written by you and how much is auto generated. 


Sorry about the inconsistencies. The commit message became messy after
several rounds of editing across versions, and I failed to update all
the descriptions to match the final code.


> > 
> > An alternative approach of snapshotting bq->count and bq->q[] before
> >  releasing the producer_lock was considered, but it requires copying
> >  the entire bq->q[] array on every flush, adding unnecessary overhead.
> > 
> But you still have list_head which is not protected.

The snapshot alternative is incomplete as described. Will drop that paragraph.

> > 
> > To reproduce, insert an mdelay(100) between spin_unlock() and
> >  __list_del_clearprev() in bq_flush_to_queue(), then run reproducer
> >  provided by syzkaller.
> >  
> >  Panic:
> >  ===
> >  BUG: kernel NULL pointer dereference, address: 0000000000000000
> >  #PF: supervisor write access in kernel mode
> >  #PF: error_code(0x0002) - not-present page
> >  PGD 0 P4D 0
> >  Oops: Oops: 0002 [#1] SMP PTI
> >  CPU: 0 UID: 0 PID: 377 Comm: a.out Not tainted 6.19.0+ #21 PREEMPT_RT
> >  RIP: 0010:bq_flush_to_queue+0x145/0x200
> >  Call Trace:
> >  <TASK>
> >  __cpu_map_flush+0x2c/0x70
> >  xdp_do_flush+0x64/0x1b0
> >  xdp_test_run_batch.constprop.0+0x4d4/0x6d0
> >  bpf_test_run_xdp_live+0x24b/0x3e0
> >  bpf_prog_test_run_xdp+0x4a1/0x6e0
> >  __sys_bpf+0x44a/0x2760
> >  __x64_sys_bpf+0x1a/0x30
> >  x64_sys_call+0x146c/0x26e0
> >  do_syscall_64+0xd5/0x5a0
> >  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> >  </TASK>
> > 
> This could be omitted. It is obvious once you see it. I somehow missed

Will remove the panic trace.

> this alloc_percpu instance while looking for this kind of bugs.
> Another one is hiding in devmap.c. Mind to take a look? I think I
> skip this entire folder…


I see the same pattern there - xdp_dev_bulk_queue is per-CPU with no preemption
protection in bq_enqueue() and __dev_flush().

> > 
> > Fixes: 3253cb49cbad ("softirq: Allow to drop the softirq-BKL lock on PREEMPT_RT")
> >  Reported-by: syzbot+2b3391f44313b3983e91@syzkaller.appspotmail.com
> >  Closes: https://lore.kernel.org/all/69369331.a70a0220.38f243.009d.GAE@google.com/T/
> >  Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com>
> >  Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
> >  ---
> >  v1 -> v2: https://lore.kernel.org/bpf/20260211064417.196401-1-jiayuan.chen@linux.dev/
> >  - Use local_lock_nested_bh()/local_unlock_nested_bh() instead of
> >  local_lock()/local_unlock(), since these paths already run under
> >  local_bh_disable(). (Sebastian Andrzej Siewior)
> >  - Replace "Caller must hold bq->bq_lock" comment with
> >  lockdep_assert_held() in bq_flush_to_queue(). (Sebastian Andrzej Siewior)
> >  - Fix Fixes tag to 3253cb49cbad ("softirq: Allow to drop the
> >  softirq-BKL lock on PREEMPT_RT") which is the actual commit that
> >  makes the race possible. (Sebastian Andrzej Siewior)
> >  ---
> >  kernel/bpf/cpumap.c | 17 +++++++++++++++--
> >  1 file changed, 15 insertions(+), 2 deletions(-)
> > 
> The changes below look good.

Thanks!

> Sebastian

      reply	other threads:[~2026-02-13  1:37 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-12  2:36 [PATCH bpf v2] bpf: cpumap: fix race in bq_flush_to_queue on PREEMPT_RT Jiayuan Chen
2026-02-12 14:33 ` Sebastian Andrzej Siewior
2026-02-13  1:37   ` Jiayuan Chen [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4efdec422a00d95a2bd7d227b8bcab61ebc231ce@linux.dev \
    --to=jiayuan.chen@linux.dev \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bigeasy@linutronix.de \
    --cc=bpf@vger.kernel.org \
    --cc=clrkwllms@kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=eddyz87@gmail.com \
    --cc=haoluo@google.com \
    --cc=hawk@kernel.org \
    --cc=jiayuan.chen@shopee.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rt-devel@lists.linux.dev \
    --cc=martin.lau@linux.dev \
    --cc=netdev@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=sdf@fomichev.me \
    --cc=song@kernel.org \
    --cc=syzbot+2b3391f44313b3983e91@syzkaller.appspotmail.com \
    --cc=tglx@kernel.org \
    --cc=xxx@vger.kernel.org \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.