All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Eric Dumazet <edumazet@google.com>
Cc: Simon Schippers <simon.schippers@tu-dortmund.de>,
	willemdebruijn.kernel@gmail.com, jasowang@redhat.com,
	andrew+netdev@lunn.ch, davem@davemloft.net, kuba@kernel.org,
	pabeni@redhat.com, eperezma@redhat.com, leiyang@redhat.com,
	stephen@networkplumber.org, jon@nutanix.com,
	tim.gebauer@tu-dortmund.de, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	virtualization@lists.linux.dev
Subject: Re: [PATCH net-next v8 3/4] ptr_ring: move free-space check into separate helper
Date: Wed, 25 Mar 2026 07:07:28 -0400	[thread overview]
Message-ID: <20260325070252-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <CANn89iLHwZ4foPEL2TdQhYx3Ps3zXiwygb=Qr7OTKW2R7Ldixg@mail.gmail.com>

On Thu, Mar 12, 2026 at 03:21:25PM +0100, Eric Dumazet wrote:
> On Thu, Mar 12, 2026 at 2:48 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Thu, Mar 12, 2026 at 02:17:16PM +0100, Eric Dumazet wrote:
> > > On Thu, Mar 12, 2026 at 2:06 PM Simon Schippers
> > > <simon.schippers@tu-dortmund.de> wrote:
> > > >
> > > > This patch moves the check for available free space for a new entry into
> > > > a separate function. As a result, __ptr_ring_produce() remains logically
> > > > unchanged, while the new helper allows callers to determine in advance
> > > > whether subsequent __ptr_ring_produce() calls will succeed. This
> > > > information can, for example, be used to temporarily stop producing until
> > > > __ptr_ring_peek() indicates that space is available again.
> > > >
> > > > Co-developed-by: Tim Gebauer <tim.gebauer@tu-dortmund.de>
> > > > Signed-off-by: Tim Gebauer <tim.gebauer@tu-dortmund.de>
> > > > Signed-off-by: Simon Schippers <simon.schippers@tu-dortmund.de>
> > > > ---
> > > >  include/linux/ptr_ring.h | 14 ++++++++++++--
> > > >  1 file changed, 12 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
> > > > index 534531807d95..a5a3fa4916d3 100644
> > > > --- a/include/linux/ptr_ring.h
> > > > +++ b/include/linux/ptr_ring.h
> > > > @@ -96,6 +96,14 @@ static inline bool ptr_ring_full_bh(struct ptr_ring *r)
> > > >         return ret;
> > > >  }
> > > >
> > > > +static inline int __ptr_ring_produce_peek(struct ptr_ring *r)
> > > > +{
> > > > +       if (unlikely(!r->size) || r->queue[r->producer])
> > >
> > > I think this should be
> > >
> > >        if (unlikely(!r->size) || READ_ONCE(r->queue[r->producer]))
> > >
> > > And of course:
> > >
> > > @@ -194,7 +194,7 @@ static inline void *__ptr_ring_peek(struct ptr_ring *r)
> > >  static inline bool __ptr_ring_empty(struct ptr_ring *r)
> > >  {
> > >         if (likely(r->size))
> > > -               return !r->queue[READ_ONCE(r->consumer_head)];
> > > +               return !READ_ONCE(r->queue[READ_ONCE(r->consumer_head)]);
> > >         return true;
> > >  }
> >
> >
> > I don't understand why it's necessary. consumer_head etc are
> > all lock protected.
> >
> > queue itself is not but we are only checking it for NULL -
> > it is fine if compiler reads it in many chunks and not all
> > at once.
> >
> > > @@ -256,7 +256,7 @@ static inline void __ptr_ring_zero_tail(struct
> > > ptr_ring *r, int consumer_head)
> > >          * besides the first one until we write out all entries.
> > >          */
> > >         while (likely(head > r->consumer_tail))
> > > -               r->queue[--head] = NULL;
> > > +               WRITE_ONCE(r->queue[--head], NULL);
> > >
> > >         r->consumer_tail = consumer_head;
> > >  }
> > >
> > >
> > > Presumably we should fix this in net tree first.
> >
> >
> > Maybe this one yes but I am not sure at all - KCSAN is happy.
> >
> 
> Hmmm.. what about this trace ?
> 
> BUG: KCSAN: data-race in pfifo_fast_dequeue / pfifo_fast_enqueue
> 
> write to 0xffff88811d5ccc00 of 8 bytes by interrupt on cpu 0:
> __ptr_ring_zero_tail include/linux/ptr_ring.h:259 [inline]
> __ptr_ring_discard_one include/linux/ptr_ring.h:291 [inline]
> __ptr_ring_consume include/linux/ptr_ring.h:311 [inline]
> __skb_array_consume include/linux/skb_array.h:98 [inline]
> pfifo_fast_dequeue+0x770/0x8f0 net/sched/sch_generic.c:770
> dequeue_skb net/sched/sch_generic.c:297 [inline]
> qdisc_restart net/sched/sch_generic.c:402 [inline]
> __qdisc_run+0x189/0xc80 net/sched/sch_generic.c:420
> qdisc_run include/net/pkt_sched.h:120 [inline]
> net_tx_action+0x379/0x590 net/core/dev.c:5793
> handle_softirqs+0xb9/0x280 kernel/softirq.c:622
> do_softirq+0x45/0x60 kernel/softirq.c:523
> __local_bh_enable_ip+0x70/0x80 kernel/softirq.c:450
> local_bh_enable include/linux/bottom_half.h:33 [inline]
> bpf_test_run+0x2db/0x620 net/bpf/test_run.c:426
> bpf_prog_test_run_skb+0x9a4/0xef0 net/bpf/test_run.c:1159
> bpf_prog_test_run+0x204/0x340 kernel/bpf/syscall.c:4721
> __sys_bpf+0x52e/0x7e0 kernel/bpf/syscall.c:6246
> __do_sys_bpf kernel/bpf/syscall.c:6341 [inline]
> __se_sys_bpf kernel/bpf/syscall.c:6339 [inline]
> __x64_sys_bpf+0x41/0x50 kernel/bpf/syscall.c:6339
> x64_sys_call+0x10cb/0x3020 arch/x86/include/generated/asm/syscalls_64.h:322
> do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> do_syscall_64+0x12c/0x370 arch/x86/entry/syscall_64.c:94
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
> 
> read to 0xffff88811d5ccc00 of 8 bytes by task 22632 on cpu 1:
> __ptr_ring_produce include/linux/ptr_ring.h:106 [inline]
> ptr_ring_produce include/linux/ptr_ring.h:129 [inline]
> skb_array_produce include/linux/skb_array.h:44 [inline]
> pfifo_fast_enqueue+0xd5/0x2c0 net/sched/sch_generic.c:741
> dev_qdisc_enqueue net/core/dev.c:4144 [inline]
> __dev_xmit_skb net/core/dev.c:4188 [inline]
> __dev_queue_xmit+0x6a4/0x1f20 net/core/dev.c:4795
> dev_queue_xmit include/linux/netdevice.h:3384 [inline]
> __bpf_tx_skb net/core/filter.c:2153 [inline]
> __bpf_redirect_common net/core/filter.c:2197 [inline]
> __bpf_redirect+0x862/0x990 net/core/filter.c:2204
> ____bpf_clone_redirect net/core/filter.c:2487 [inline]
> bpf_clone_redirect+0x20c/0x290 net/core/filter.c:2450
> bpf_prog_53f18857bc887b09+0x22/0x2a
> bpf_dispatcher_nop_func include/linux/bpf.h:1402 [inline]
> __bpf_prog_run include/linux/filter.h:723 [inline]
> bpf_prog_run include/linux/filter.h:730 [inline]
> bpf_test_run+0x29d/0x620 net/bpf/test_run.c:423
> bpf_prog_test_run_skb+0x9a4/0xef0 net/bpf/test_run.c:1159
> bpf_prog_test_run+0x204/0x340 kernel/bpf/syscall.c:4721
> __sys_bpf+0x52e/0x7e0 kernel/bpf/syscall.c:6246
> __do_sys_bpf kernel/bpf/syscall.c:6341 [inline]
> __se_sys_bpf kernel/bpf/syscall.c:6339 [inline]
> __x64_sys_bpf+0x41/0x50 kernel/bpf/syscall.c:6339
> x64_sys_call+0x10cb/0x3020 arch/x86/include/generated/asm/syscalls_64.h:322
> do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> do_syscall_64+0x12c/0x370 arch/x86/entry/syscall_64.c:94
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
> 
> value changed: 0xffff888104a93a00 -> 0x0000000000000000
> 
> Reported by Kernel Concurrency Sanitizer on:
> CPU: 1 UID: 0 PID: 22632 Comm: syz.0.4135 Tainted: G W syzkaller #0
> PREEMPT(full)
> Tainted: [W]=WARN
> Hardware name: Google Google Compute Engine/Google Compute Engine,
> BIOS Google 01/24/2026


I'm a bit unhappy to demand "ONCE" when it's actually OK if it is
written in any order (e.g. I had some optimizations doing a memcpy
here in mind).

So I wanted to reproduce this, but couldn't. And I got this so
I know KCSAN was enabled:



[  333.381268] kworker/u8:2 (40) used greatest stack depth: 12600 bytes left
Thread 0 done
[  390.104354] ==================================================================
[  390.106378] BUG: KCSAN: data-race in enqueue_hrtimer / hrtimer_interrupt
[  390.108170] 
[  390.108656] write to 0xffff88807dd1c7cc of 1 bytes by interrupt on cpu 1:
[  390.110562]  hrtimer_interrupt+0x3d7/0x400
[  390.111697]  __sysvec_apic_timer_interrupt+0xaf/0x2c0
[  390.113130]  sysvec_apic_timer_interrupt+0x6b/0x80
[  390.114469]  asm_sysvec_apic_timer_interrupt+0x1a/0x20
[  390.116319]  qdisc_pkt_len_segs_init+0x81/0x370
[  390.117840]  __dev_queue_xmit+0x13e/0x1fe0
[  390.119229]  __bpf_redirect+0x31b/0x5d0
[  390.120572]  bpf_clone_redirect+0x193/0x200
[  390.122127]  bpf_prog_5c0c01093e7cbf07+0x27/0x30
[  390.123762]  bpf_test_run+0x24a/0x520
[  390.125082]  bpf_prog_test_run_skb+0x7f0/0x14d0
[  390.126720]  __sys_bpf+0x1f34/0x3e60
[  390.128077]  __x64_sys_bpf+0x4c/0x70
[  390.129475]  x64_sys_call+0x12eb/0x2520
[  390.130900]  do_syscall_64+0x133/0x530
[  390.132270]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[  390.134143] 
[  390.134744] read to 0xffff88807dd1c7cc of 1 bytes by interrupt on cpu 0:
[  390.137213]  enqueue_hrtimer+0x69/0x1c0
[  390.138630]  hrtimer_start_range_ns+0x54b/0x640
[  390.140363]  start_dl_timer+0xb2/0x1a0
[  390.141777]  update_curr_dl_se+0x29c/0x360
[  390.143410]  sched_tick+0xe3/0x2d0
[  390.144773]  update_process_times+0xa2/0x120
[  390.146440]  tick_nohz_handler+0x11a/0x350
[  390.148068]  __hrtimer_run_queues+0x11a/0x680
[  390.149710]  hrtimer_interrupt+0x1f7/0x400
[  390.151244]  __sysvec_apic_timer_interrupt+0xaf/0x2c0
[  390.153143]  sysvec_apic_timer_interrupt+0x6b/0x80
[  390.154604]  asm_sysvec_apic_timer_interrupt+0x1a/0x20
[  390.156003]  pv_native_safe_halt+0xf/0x20
[  390.157216]  default_idle+0x9/0x10
[  390.158189]  default_idle_call+0x88/0x230
[  390.159369]  do_idle+0x1f9/0x280
[  390.160383]  cpu_startup_entry+0x24/0x30
[  390.161528]  rest_init+0x1be/0x1c0
[  390.162556]  start_kernel+0xa1b/0xa20
[  390.163658]  x86_64_start_reservations+0x24/0x30
[  390.164950]  x86_64_start_kernel+0xd1/0xe0
[  390.166094]  common_startup_64+0x13e/0x148
[  390.167242] 
[  390.167721] Reported by Kernel Concurrency Sanitizer on:
[  390.169462] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 7.0.0-rc2-dirty #352 PREEMPT(full) 
[  390.171946] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-9.fc43 06/10/2025
[  390.174471] ==================================================================


  reply	other threads:[~2026-03-25 11:07 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-12 13:06 [PATCH net-next v8 0/4] tun/tap & vhost-net: apply qdisc backpressure on full ptr_ring to reduce TX drops Simon Schippers
2026-03-12 13:06 ` [PATCH net-next v8 1/4] tun/tap: add ptr_ring consume helper with netdev queue wakeup Simon Schippers
2026-03-24  1:47   ` Jason Wang
2026-03-12 13:06 ` [PATCH net-next v8 2/4] vhost-net: wake queue of tun/tap after ptr_ring consume Simon Schippers
2026-03-12 13:54   ` Michael S. Tsirkin
2026-03-24  1:47   ` Jason Wang
2026-03-12 13:06 ` [PATCH net-next v8 3/4] ptr_ring: move free-space check into separate helper Simon Schippers
2026-03-12 13:17   ` Eric Dumazet
2026-03-12 13:48     ` Michael S. Tsirkin
2026-03-12 14:21       ` Eric Dumazet
2026-03-25 11:07         ` Michael S. Tsirkin [this message]
2026-03-12 13:06 ` [PATCH net-next v8 4/4] tun/tap & vhost-net: avoid ptr_ring tail-drop when a qdisc is present Simon Schippers
2026-03-24  1:47   ` Jason Wang
2026-03-24 10:14     ` Simon Schippers
2026-03-25 14:47       ` Simon Schippers
2026-03-26  2:41         ` Jason Wang
2026-03-26 15:30           ` Simon Schippers
2026-03-27  1:13             ` Jason Wang
2026-03-27  8:31               ` Simon Schippers
2026-04-08 19:04                 ` Simon Schippers
2026-04-15 18:27                 ` Simon Schippers
2026-04-16  8:54                 ` Simon Schippers
2026-04-16 10:51                   ` Michael S. Tsirkin
2026-03-27  8:47               ` Michael S. Tsirkin
2026-03-12 13:55 ` [PATCH net-next v8 0/4] tun/tap & vhost-net: apply qdisc backpressure on full ptr_ring to reduce TX drops Michael S. Tsirkin
2026-03-13  9:49   ` Simon Schippers
2026-03-13 10:35     ` Michael S. Tsirkin
2026-03-23 21:49 ` Simon Schippers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260325070252-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=eperezma@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=jon@nutanix.com \
    --cc=kuba@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=leiyang@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=simon.schippers@tu-dortmund.de \
    --cc=stephen@networkplumber.org \
    --cc=tim.gebauer@tu-dortmund.de \
    --cc=virtualization@lists.linux.dev \
    --cc=willemdebruijn.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.