public inbox for virtualization@lists.linux-foundation.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Eric Dumazet <edumazet@google.com>
Cc: Simon Schippers <simon.schippers@tu-dortmund.de>,
	willemdebruijn.kernel@gmail.com, jasowang@redhat.com,
	andrew+netdev@lunn.ch, davem@davemloft.net, kuba@kernel.org,
	pabeni@redhat.com, eperezma@redhat.com, leiyang@redhat.com,
	stephen@networkplumber.org, jon@nutanix.com,
	tim.gebauer@tu-dortmund.de, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	virtualization@lists.linux.dev
Subject: Re: [PATCH net-next v8 3/4] ptr_ring: move free-space check into separate helper
Date: Wed, 25 Mar 2026 07:07:28 -0400	[thread overview]
Message-ID: <20260325070252-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <CANn89iLHwZ4foPEL2TdQhYx3Ps3zXiwygb=Qr7OTKW2R7Ldixg@mail.gmail.com>

On Thu, Mar 12, 2026 at 03:21:25PM +0100, Eric Dumazet wrote:
> On Thu, Mar 12, 2026 at 2:48 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Thu, Mar 12, 2026 at 02:17:16PM +0100, Eric Dumazet wrote:
> > > On Thu, Mar 12, 2026 at 2:06 PM Simon Schippers
> > > <simon.schippers@tu-dortmund.de> wrote:
> > > >
> > > > This patch moves the check for available free space for a new entry into
> > > > a separate function. As a result, __ptr_ring_produce() remains logically
> > > > unchanged, while the new helper allows callers to determine in advance
> > > > whether subsequent __ptr_ring_produce() calls will succeed. This
> > > > information can, for example, be used to temporarily stop producing until
> > > > __ptr_ring_peek() indicates that space is available again.
> > > >
> > > > Co-developed-by: Tim Gebauer <tim.gebauer@tu-dortmund.de>
> > > > Signed-off-by: Tim Gebauer <tim.gebauer@tu-dortmund.de>
> > > > Signed-off-by: Simon Schippers <simon.schippers@tu-dortmund.de>
> > > > ---
> > > >  include/linux/ptr_ring.h | 14 ++++++++++++--
> > > >  1 file changed, 12 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
> > > > index 534531807d95..a5a3fa4916d3 100644
> > > > --- a/include/linux/ptr_ring.h
> > > > +++ b/include/linux/ptr_ring.h
> > > > @@ -96,6 +96,14 @@ static inline bool ptr_ring_full_bh(struct ptr_ring *r)
> > > >         return ret;
> > > >  }
> > > >
> > > > +static inline int __ptr_ring_produce_peek(struct ptr_ring *r)
> > > > +{
> > > > +       if (unlikely(!r->size) || r->queue[r->producer])
> > >
> > > I think this should be
> > >
> > >        if (unlikely(!r->size) || READ_ONCE(r->queue[r->producer]))
> > >
> > > And of course:
> > >
> > > @@ -194,7 +194,7 @@ static inline void *__ptr_ring_peek(struct ptr_ring *r)
> > >  static inline bool __ptr_ring_empty(struct ptr_ring *r)
> > >  {
> > >         if (likely(r->size))
> > > -               return !r->queue[READ_ONCE(r->consumer_head)];
> > > +               return !READ_ONCE(r->queue[READ_ONCE(r->consumer_head)]);
> > >         return true;
> > >  }
> >
> >
> > I don't understand why it's necessary. consumer_head etc are
> > all lock protected.
> >
> > queue itself is not but we are only checking it for NULL -
> > it is fine if compiler reads it in many chunks and not all
> > at once.
> >
> > > @@ -256,7 +256,7 @@ static inline void __ptr_ring_zero_tail(struct
> > > ptr_ring *r, int consumer_head)
> > >          * besides the first one until we write out all entries.
> > >          */
> > >         while (likely(head > r->consumer_tail))
> > > -               r->queue[--head] = NULL;
> > > +               WRITE_ONCE(r->queue[--head], NULL);
> > >
> > >         r->consumer_tail = consumer_head;
> > >  }
> > >
> > >
> > > Presumably we should fix this in net tree first.
> >
> >
> > Maybe this one yes but I am not sure at all - KCSAN is happy.
> >
> 
> Hmmm.. what about this trace ?
> 
> BUG: KCSAN: data-race in pfifo_fast_dequeue / pfifo_fast_enqueue
> 
> write to 0xffff88811d5ccc00 of 8 bytes by interrupt on cpu 0:
> __ptr_ring_zero_tail include/linux/ptr_ring.h:259 [inline]
> __ptr_ring_discard_one include/linux/ptr_ring.h:291 [inline]
> __ptr_ring_consume include/linux/ptr_ring.h:311 [inline]
> __skb_array_consume include/linux/skb_array.h:98 [inline]
> pfifo_fast_dequeue+0x770/0x8f0 net/sched/sch_generic.c:770
> dequeue_skb net/sched/sch_generic.c:297 [inline]
> qdisc_restart net/sched/sch_generic.c:402 [inline]
> __qdisc_run+0x189/0xc80 net/sched/sch_generic.c:420
> qdisc_run include/net/pkt_sched.h:120 [inline]
> net_tx_action+0x379/0x590 net/core/dev.c:5793
> handle_softirqs+0xb9/0x280 kernel/softirq.c:622
> do_softirq+0x45/0x60 kernel/softirq.c:523
> __local_bh_enable_ip+0x70/0x80 kernel/softirq.c:450
> local_bh_enable include/linux/bottom_half.h:33 [inline]
> bpf_test_run+0x2db/0x620 net/bpf/test_run.c:426
> bpf_prog_test_run_skb+0x9a4/0xef0 net/bpf/test_run.c:1159
> bpf_prog_test_run+0x204/0x340 kernel/bpf/syscall.c:4721
> __sys_bpf+0x52e/0x7e0 kernel/bpf/syscall.c:6246
> __do_sys_bpf kernel/bpf/syscall.c:6341 [inline]
> __se_sys_bpf kernel/bpf/syscall.c:6339 [inline]
> __x64_sys_bpf+0x41/0x50 kernel/bpf/syscall.c:6339
> x64_sys_call+0x10cb/0x3020 arch/x86/include/generated/asm/syscalls_64.h:322
> do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> do_syscall_64+0x12c/0x370 arch/x86/entry/syscall_64.c:94
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
> 
> read to 0xffff88811d5ccc00 of 8 bytes by task 22632 on cpu 1:
> __ptr_ring_produce include/linux/ptr_ring.h:106 [inline]
> ptr_ring_produce include/linux/ptr_ring.h:129 [inline]
> skb_array_produce include/linux/skb_array.h:44 [inline]
> pfifo_fast_enqueue+0xd5/0x2c0 net/sched/sch_generic.c:741
> dev_qdisc_enqueue net/core/dev.c:4144 [inline]
> __dev_xmit_skb net/core/dev.c:4188 [inline]
> __dev_queue_xmit+0x6a4/0x1f20 net/core/dev.c:4795
> dev_queue_xmit include/linux/netdevice.h:3384 [inline]
> __bpf_tx_skb net/core/filter.c:2153 [inline]
> __bpf_redirect_common net/core/filter.c:2197 [inline]
> __bpf_redirect+0x862/0x990 net/core/filter.c:2204
> ____bpf_clone_redirect net/core/filter.c:2487 [inline]
> bpf_clone_redirect+0x20c/0x290 net/core/filter.c:2450
> bpf_prog_53f18857bc887b09+0x22/0x2a
> bpf_dispatcher_nop_func include/linux/bpf.h:1402 [inline]
> __bpf_prog_run include/linux/filter.h:723 [inline]
> bpf_prog_run include/linux/filter.h:730 [inline]
> bpf_test_run+0x29d/0x620 net/bpf/test_run.c:423
> bpf_prog_test_run_skb+0x9a4/0xef0 net/bpf/test_run.c:1159
> bpf_prog_test_run+0x204/0x340 kernel/bpf/syscall.c:4721
> __sys_bpf+0x52e/0x7e0 kernel/bpf/syscall.c:6246
> __do_sys_bpf kernel/bpf/syscall.c:6341 [inline]
> __se_sys_bpf kernel/bpf/syscall.c:6339 [inline]
> __x64_sys_bpf+0x41/0x50 kernel/bpf/syscall.c:6339
> x64_sys_call+0x10cb/0x3020 arch/x86/include/generated/asm/syscalls_64.h:322
> do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> do_syscall_64+0x12c/0x370 arch/x86/entry/syscall_64.c:94
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
> 
> value changed: 0xffff888104a93a00 -> 0x0000000000000000
> 
> Reported by Kernel Concurrency Sanitizer on:
> CPU: 1 UID: 0 PID: 22632 Comm: syz.0.4135 Tainted: G W syzkaller #0
> PREEMPT(full)
> Tainted: [W]=WARN
> Hardware name: Google Google Compute Engine/Google Compute Engine,
> BIOS Google 01/24/2026


I'm a bit unhappy to demand "ONCE" when it's actually OK if it is
written in any order (e.g. I had some optimizations doing a memcpy
here in mind).

So I wanted to reproduce this, but couldn't. And I got this so
I know KCSAN was enabled:



[  333.381268] kworker/u8:2 (40) used greatest stack depth: 12600 bytes left
Thread 0 done
[  390.104354] ==================================================================
[  390.106378] BUG: KCSAN: data-race in enqueue_hrtimer / hrtimer_interrupt
[  390.108170] 
[  390.108656] write to 0xffff88807dd1c7cc of 1 bytes by interrupt on cpu 1:
[  390.110562]  hrtimer_interrupt+0x3d7/0x400
[  390.111697]  __sysvec_apic_timer_interrupt+0xaf/0x2c0
[  390.113130]  sysvec_apic_timer_interrupt+0x6b/0x80
[  390.114469]  asm_sysvec_apic_timer_interrupt+0x1a/0x20
[  390.116319]  qdisc_pkt_len_segs_init+0x81/0x370
[  390.117840]  __dev_queue_xmit+0x13e/0x1fe0
[  390.119229]  __bpf_redirect+0x31b/0x5d0
[  390.120572]  bpf_clone_redirect+0x193/0x200
[  390.122127]  bpf_prog_5c0c01093e7cbf07+0x27/0x30
[  390.123762]  bpf_test_run+0x24a/0x520
[  390.125082]  bpf_prog_test_run_skb+0x7f0/0x14d0
[  390.126720]  __sys_bpf+0x1f34/0x3e60
[  390.128077]  __x64_sys_bpf+0x4c/0x70
[  390.129475]  x64_sys_call+0x12eb/0x2520
[  390.130900]  do_syscall_64+0x133/0x530
[  390.132270]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[  390.134143] 
[  390.134744] read to 0xffff88807dd1c7cc of 1 bytes by interrupt on cpu 0:
[  390.137213]  enqueue_hrtimer+0x69/0x1c0
[  390.138630]  hrtimer_start_range_ns+0x54b/0x640
[  390.140363]  start_dl_timer+0xb2/0x1a0
[  390.141777]  update_curr_dl_se+0x29c/0x360
[  390.143410]  sched_tick+0xe3/0x2d0
[  390.144773]  update_process_times+0xa2/0x120
[  390.146440]  tick_nohz_handler+0x11a/0x350
[  390.148068]  __hrtimer_run_queues+0x11a/0x680
[  390.149710]  hrtimer_interrupt+0x1f7/0x400
[  390.151244]  __sysvec_apic_timer_interrupt+0xaf/0x2c0
[  390.153143]  sysvec_apic_timer_interrupt+0x6b/0x80
[  390.154604]  asm_sysvec_apic_timer_interrupt+0x1a/0x20
[  390.156003]  pv_native_safe_halt+0xf/0x20
[  390.157216]  default_idle+0x9/0x10
[  390.158189]  default_idle_call+0x88/0x230
[  390.159369]  do_idle+0x1f9/0x280
[  390.160383]  cpu_startup_entry+0x24/0x30
[  390.161528]  rest_init+0x1be/0x1c0
[  390.162556]  start_kernel+0xa1b/0xa20
[  390.163658]  x86_64_start_reservations+0x24/0x30
[  390.164950]  x86_64_start_kernel+0xd1/0xe0
[  390.166094]  common_startup_64+0x13e/0x148
[  390.167242] 
[  390.167721] Reported by Kernel Concurrency Sanitizer on:
[  390.169462] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 7.0.0-rc2-dirty #352 PREEMPT(full) 
[  390.171946] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-9.fc43 06/10/2025
[  390.174471] ==================================================================


  reply	other threads:[~2026-03-25 11:07 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-12 13:06 [PATCH net-next v8 0/4] tun/tap & vhost-net: apply qdisc backpressure on full ptr_ring to reduce TX drops Simon Schippers
2026-03-12 13:06 ` [PATCH net-next v8 1/4] tun/tap: add ptr_ring consume helper with netdev queue wakeup Simon Schippers
2026-03-24  1:47   ` Jason Wang
2026-03-12 13:06 ` [PATCH net-next v8 2/4] vhost-net: wake queue of tun/tap after ptr_ring consume Simon Schippers
2026-03-12 13:54   ` Michael S. Tsirkin
2026-03-24  1:47   ` Jason Wang
2026-03-12 13:06 ` [PATCH net-next v8 3/4] ptr_ring: move free-space check into separate helper Simon Schippers
2026-03-12 13:17   ` Eric Dumazet
2026-03-12 13:48     ` Michael S. Tsirkin
2026-03-12 14:21       ` Eric Dumazet
2026-03-25 11:07         ` Michael S. Tsirkin [this message]
2026-03-12 13:06 ` [PATCH net-next v8 4/4] tun/tap & vhost-net: avoid ptr_ring tail-drop when a qdisc is present Simon Schippers
2026-03-24  1:47   ` Jason Wang
2026-03-24 10:14     ` Simon Schippers
2026-03-25 14:47       ` Simon Schippers
2026-03-26  2:41         ` Jason Wang
2026-03-12 13:55 ` [PATCH net-next v8 0/4] tun/tap & vhost-net: apply qdisc backpressure on full ptr_ring to reduce TX drops Michael S. Tsirkin
2026-03-13  9:49   ` Simon Schippers
2026-03-13 10:35     ` Michael S. Tsirkin
2026-03-23 21:49 ` Simon Schippers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260325070252-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=eperezma@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=jon@nutanix.com \
    --cc=kuba@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=leiyang@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=simon.schippers@tu-dortmund.de \
    --cc=stephen@networkplumber.org \
    --cc=tim.gebauer@tu-dortmund.de \
    --cc=virtualization@lists.linux.dev \
    --cc=willemdebruijn.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox