Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next 2/3] tcp: use SKB_DROP_REASON_IP_OUTNOROUTES in tcp_v6_send_response()
From: Kuniyuki Iwashima @ 2026-05-07 22:56 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Neal Cardwell,
	Simon Horman, Ido Schimmel, David Ahern, netdev, eric.dumazet
In-Reply-To: <20260507084305.2506115-3-edumazet@google.com>

On Thu, May 7, 2026 at 1:43 AM Eric Dumazet <edumazet@google.com> wrote:
>
> Replace a bare kfree_skb() with a modern sk_skb_reason_drop() call,
> and provide IP_OUTNOROUTES drop reason.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>

^ permalink raw reply

* Re: [PATCH net] vsock/virtio: fix potential unbounded skb queue
From: Michael S. Tsirkin @ 2026-05-07 22:48 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Eric Dumazet, Arseniy Krasnov, Bobby Eshleman, Stefan Hajnoczi,
	David S . Miller, Jakub Kicinski, Paolo Abeni, Simon Horman,
	netdev, eric.dumazet, Arseniy Krasnov, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, kvm, virtualization
In-Reply-To: <afyMCyBvZpzWrLtO@sgarzare-redhat>

On Thu, May 07, 2026 at 02:59:13PM +0200, Stefano Garzarella wrote:
> On Thu, May 07, 2026 at 07:45:10AM -0400, Michael S. Tsirkin wrote:
> > On Thu, May 07, 2026 at 11:09:47AM +0200, Stefano Garzarella wrote:
> > > On Wed, May 06, 2026 at 11:37:45AM -0400, Michael S. Tsirkin wrote:
> > > > On Tue, May 05, 2026 at 06:11:13PM +0200, Stefano Garzarella wrote:
> > > > > On Tue, May 05, 2026 at 07:14:36AM -0700, Eric Dumazet wrote:
> > > > > > On Tue, May 5, 2026 at 6:52 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
> > > > > > >
> > > > > > > On Thu, Apr 30, 2026 at 12:26:52PM +0000, Eric Dumazet wrote:
> > > > > > > >virtio_transport_inc_rx_pkt() checks vvs->rx_bytes + len > vvs->buf_alloc.
> > > > > > > >
> > > > > > > >virtio_transport_recv_enqueue() skips coalescing for packets
> > > > > > > >with VIRTIO_VSOCK_SEQ_EOM.
> > > > > > > >
> > > > > > > >If fed with packets with len == 0 and VIRTIO_VSOCK_SEQ_EOM,
> > > > > > > >a very large number of packets can be queued
> > > > > > > >because vvs->rx_bytes stays at 0.
> > > > > > > >
> > > > > > > >Fix this by estimating the skb metadata size:
> > > > > > > >
> > > > > > > >       (Number of skbs in the queue) * SKB_TRUESIZE(0)
> > > > > > > >
> > > > > > > >Fixes: 077706165717 ("virtio/vsock: don't use skbuff state to account credit")
> > > > > > > >Signed-off-by: Eric Dumazet <edumazet@google.com>
> > > > > > > >Cc: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
> > > > > > > >Cc: Stefan Hajnoczi <stefanha@redhat.com>
> > > > > > > >Cc: Stefano Garzarella <sgarzare@redhat.com>
> > > > > > > >Cc: "Michael S. Tsirkin" <mst@redhat.com>
> > > > > > > >Cc: Jason Wang <jasowang@redhat.com>
> > > > > > > >Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > > >Cc: "Eugenio Pérez" <eperezma@redhat.com>
> > > > > > > >Cc: kvm@vger.kernel.org
> > > > > > > >Cc: virtualization@lists.linux.dev
> > > > > > > >---
> > > > > > > > net/vmw_vsock/virtio_transport_common.c | 4 +++-
> > > > > > > > 1 file changed, 3 insertions(+), 1 deletion(-)
> > > > > > > >
> > > > > > > >diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> > > > > > > >index 416d533f493d7b07e9c77c43f741d28cfcd0953e..9b8014516f4fb1130ae184635fbba4dfee58bd64 100644
> > > > > > > >--- a/net/vmw_vsock/virtio_transport_common.c
> > > > > > > >+++ b/net/vmw_vsock/virtio_transport_common.c
> > > > > > > >@@ -447,7 +447,9 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> > > > > > > > static bool virtio_transport_inc_rx_pkt(struct virtio_vsock_sock *vvs,
> > > > > > > >                                       u32 len)
> > > > > > > > {
> > > > > > > >-      if (vvs->buf_used + len > vvs->buf_alloc)
> > > > > > > >+      u64 skb_overhead = (skb_queue_len(&vvs->rx_queue) + 1) * SKB_TRUESIZE(0);
> > > > > > > >+
> > > > > > > >+      if (skb_overhead + vvs->buf_used + len > vvs->buf_alloc)
> > > > > > > >               return false;
> > > > > > >
> > > > > > > I'm not sure about this fix, I mean that maybe this is incomplete.
> > > > > > > In virtio-vsock, there is a credit mechanism between the two peers:
> > > > > > > https://docs.oasis-open.org/virtio/virtio/v1.3/csd01/virtio-v1.3-csd01.html#x1-4850003
> > > > > > >
> > > > > > > This takes only the payload into account, so it’s true that this problem
> > > > > > > exists; however, perhaps we should also inform the other peer of a lower
> > > > > > > credit balance, otherwise the other peer will believe it has much more
> > > > > > > credit than it actually does, send a large payload, and then the packet
> > > > > > > will be discarded and the data lost (there are no retransmissions,
> > > > > > > etc.).
> > > > > >
> > > > > > I dunno, perhaps revert 077706165717 ("virtio/vsock: don't use skbuff
> > > > > > state to account credit")
> > > > > > and find a better fix then?
> > > > >
> > > > > IIRC the same issue was there before the commit fixed by that one (commit
> > > > > 71dc9ec9ac7d ("virtio/vsock: replace virtio_vsock_pkt with sk_buff")), so
> > > > > not sure about reverting it TBH.
> > > > >
> > > > > CCing Arseniy and Bobby.
> > > > >
> > > > > >
> > > > > > There is always a discrepancy between skb->len and skb->truesize.
> > > > > > You will not be able to announce a 1MB window, and accept one milliion
> > > > > > skb of 1-byte each.
> > > > > >
> > > > > > This kind of contract is broken.
> > > > > >
> > > > >
> > > > > Yep, I agree, but before we start discarding data (and losing it), IMHO we
> > > > > should at least inform the other peer that we're out of space.
> > > > >
> > > > > @Stefan, @Michael, do you think we can do something in the spec to avoid
> > > > > this issue and in some way take into account also the metadata in the
> > > > > credit. I mean to avoid the 1-byte packets flooding.
> > > > >
> > > > > Thanks,
> > > > > Stefano
> > > >
> > > > Why do we need the metadata? Just don't keep it around if you begin
> > > > running low on memory.
> > > 
> > > I don't think removing the skuffs will be easy; we added them for ebpf,
> > > zero-copy, and seqpacket as well.
> > 
> > You do not need to remove them completely.
> > 
> > > For now, we're already doing something:
> > > merging the skuffs if they don't have EOM set.
> > 
> > 
> > Right that's good. You could go further and merge with EOM too
> > if you stick the info about message boundaries somewhere else.
> 
> This adds a lot of complexity IMO, but we can try.
> 
> Do you have something in mind?

BER is clearly overkill but here's a POC that claude made for me,
just to give u an idea. It's clearly has a ton of issues,
for example I dislike how GFP_ATOMIC is handled.
Yet it seems to work fine in light testing.

-->


vsock/virtio: use DWARF ULEB128 to record EOM boundaries, enable cross-EOM skb coalescing

virtio_transport_recv_enqueue() currently refuses to coalesce an
incoming skb with the previous one when the previous skb carries
VIRTIO_VSOCK_SEQ_EOM.  This forces one skb per seqpacket message.
For workloads with many small or zero-byte messages the per-skb
overhead (~960 bytes) dominates, causing unbounded memory growth.

Decouple message boundary tracking from the skb structure: store
boundary offsets in a compact side buffer using DWARF ULEB128
encoding with the EOR flag folded into the low bit, then allow
the data of multiple complete messages to be coalesced into a single
skb.

Cross-EOM coalescing fires only when:
- both the tail skb and the incoming packet carry EOM (complete msgs)
- the incoming packet fits in the tail skb's tailroom
- no BPF psock is attached (read_skb expects one msg per skb)

On allocation failure the code falls back to separate skbs (existing
behaviour).  Credit accounting is unchanged; the boundary buffer is
capped at PAGE_SIZE.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index f91704731057..e36b9ab28372 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -12,6 +12,7 @@
 struct virtio_vsock_skb_cb {
 	bool reply;
 	bool tap_delivered;
+	bool has_boundary_entries;
 	u32 offset;
 };
 
@@ -167,6 +168,12 @@ struct virtio_vsock_sock {
 	u32 buf_used;
 	struct sk_buff_head rx_queue;
 	u32 msg_count;
+
+	/* ULEB128-encoded seqpacket message boundary buffer */
+	u8 *boundary_buf;
+	u32 boundary_len;
+	u32 boundary_alloc;
+	u32 boundary_off;
 };
 
 struct virtio_vsock_pkt_info {
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 416d533f493d..81654f70f72c 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -11,6 +11,7 @@
 #include <linux/sched/signal.h>
 #include <linux/ctype.h>
 #include <linux/list.h>
+#include <linux/skmsg.h>
 #include <linux/virtio_vsock.h>
 #include <uapi/linux/vsockmon.h>
 
@@ -26,6 +27,91 @@
 /* Threshold for detecting small packets to copy */
 #define GOOD_COPY_LEN  128
 
+#define VSOCK_BOUNDARY_BUF_INIT	64
+#define VSOCK_BOUNDARY_BUF_MAX	PAGE_SIZE
+
+/* ULEB128 boundary encoding: value = (msg_len << 1) | eor.
+ * Each byte carries 7 data bits; bit 7 is set on all but the last byte.
+ * Max 5 bytes for a u32 msg_len (33 bits with eor shift).
+ */
+static int vsock_uleb_encode_boundary(u8 *buf, u32 msg_len, bool eor)
+{
+	u64 val = ((u64)msg_len << 1) | eor;
+	int n = 0;
+
+	do {
+		buf[n] = val & 0x7f;
+		val >>= 7;
+		if (val)
+			buf[n] |= 0x80;
+		n++;
+	} while (val);
+
+	return n;
+}
+
+static int vsock_uleb_decode_boundary(const u8 *buf, u32 avail,
+				      u32 *msg_len, bool *eor)
+{
+	u64 val = 0;
+	int shift = 0;
+	int n = 0;
+
+	do {
+		if (n >= avail || shift >= 35)
+			return -EINVAL;
+		val |= (u64)(buf[n] & 0x7f) << shift;
+		shift += 7;
+	} while (buf[n++] & 0x80);
+
+	*eor = val & 1;
+	*msg_len = val >> 1;
+	return n;
+}
+
+static void vsock_boundary_buf_compact(struct virtio_vsock_sock *vvs)
+{
+	if (vvs->boundary_off == 0)
+		return;
+
+	vvs->boundary_len -= vvs->boundary_off;
+	memmove(vvs->boundary_buf, vvs->boundary_buf + vvs->boundary_off,
+		vvs->boundary_len);
+	vvs->boundary_off = 0;
+}
+
+static int vsock_boundary_buf_ensure(struct virtio_vsock_sock *vvs, u32 needed)
+{
+	u32 new_alloc;
+	u8 *new_buf;
+
+	if (vvs->boundary_alloc >= needed)
+		return 0;
+
+	/* Reclaim consumed space before growing */
+	if (vvs->boundary_off) {
+		needed -= vvs->boundary_off;
+		vsock_boundary_buf_compact(vvs);
+		if (vvs->boundary_alloc >= needed)
+			return 0;
+	}
+
+	new_alloc = max(needed, vvs->boundary_alloc ? vvs->boundary_alloc * 2
+						    : VSOCK_BOUNDARY_BUF_INIT);
+	if (new_alloc > VSOCK_BOUNDARY_BUF_MAX)
+		new_alloc = VSOCK_BOUNDARY_BUF_MAX;
+	if (new_alloc < needed)
+		return -ENOMEM;
+
+	new_buf = krealloc(vvs->boundary_buf, new_alloc, GFP_ATOMIC);
+	if (!new_buf)
+		return -ENOMEM;
+
+	vvs->boundary_buf = new_buf;
+	vvs->boundary_alloc = new_alloc;
+	return 0;
+}
+
 static void virtio_transport_cancel_close_work(struct vsock_sock *vsk,
 					       bool cancel_timeout);
 static s64 virtio_transport_has_space(struct virtio_vsock_sock *vvs);
@@ -682,41 +768,74 @@ virtio_transport_seqpacket_do_peek(struct vsock_sock *vsk,
 	total = 0;
 	len = msg_data_left(msg);
 
-	skb_queue_walk(&vvs->rx_queue, skb) {
-		struct virtio_vsock_hdr *hdr;
+	skb = skb_peek(&vvs->rx_queue);
+	if (skb && VIRTIO_VSOCK_SKB_CB(skb)->has_boundary_entries) {
+		u32 msg_len, offset;
+		size_t bytes;
+		bool eor;
+		int ret;
 
-		if (total < len) {
-			size_t bytes;
+		ret = vsock_uleb_decode_boundary(
+			vvs->boundary_buf + vvs->boundary_off,
+			vvs->boundary_len - vvs->boundary_off,
+			&msg_len, &eor);
+		if (ret < 0)
+			goto unlock;
+
+		offset = VIRTIO_VSOCK_SKB_CB(skb)->offset;
+		bytes = min(len, (size_t)msg_len);
+
+		if (bytes) {
 			int err;
 
-			bytes = len - total;
-			if (bytes > skb->len)
-				bytes = skb->len;
-
 			spin_unlock_bh(&vvs->rx_lock);
-
-			/* sk_lock is held by caller so no one else can dequeue.
-			 * Unlock rx_lock since skb_copy_datagram_iter() may sleep.
-			 */
-			err = skb_copy_datagram_iter(skb, VIRTIO_VSOCK_SKB_CB(skb)->offset,
+			err = skb_copy_datagram_iter(skb, offset,
 						     &msg->msg_iter, bytes);
 			if (err)
 				return err;
-
 			spin_lock_bh(&vvs->rx_lock);
 		}
 
-		total += skb->len;
-		hdr = virtio_vsock_hdr(skb);
+		total = msg_len;
+		if (eor)
+			msg->msg_flags |= MSG_EOR;
+	} else {
+		skb_queue_walk(&vvs->rx_queue, skb) {
+			struct virtio_vsock_hdr *hdr;
 
-		if (le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SEQ_EOM) {
-			if (le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SEQ_EOR)
-				msg->msg_flags |= MSG_EOR;
+			if (total < len) {
+				size_t bytes;
+				int err;
 
-			break;
+				bytes = len - total;
+				if (bytes > skb->len)
+					bytes = skb->len;
+
+				spin_unlock_bh(&vvs->rx_lock);
+
+				err = skb_copy_datagram_iter(
+					skb,
+					VIRTIO_VSOCK_SKB_CB(skb)->offset,
+					&msg->msg_iter, bytes);
+				if (err)
+					return err;
+
+				spin_lock_bh(&vvs->rx_lock);
+			}
+
+			total += skb->len;
+			hdr = virtio_vsock_hdr(skb);
+
+			if (le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SEQ_EOM) {
+				if (le32_to_cpu(hdr->flags) &
+				    VIRTIO_VSOCK_SEQ_EOR)
+					msg->msg_flags |= MSG_EOR;
+				break;
+			}
 		}
 	}
 
+unlock:
 	spin_unlock_bh(&vvs->rx_lock);
 
 	return total;
@@ -740,57 +859,105 @@ static int virtio_transport_seqpacket_do_dequeue(struct vsock_sock *vsk,
 	}
 
 	while (!msg_ready) {
-		struct virtio_vsock_hdr *hdr;
-		size_t pkt_len;
-
-		skb = __skb_dequeue(&vvs->rx_queue);
+		skb = skb_peek(&vvs->rx_queue);
 		if (!skb)
 			break;
-		hdr = virtio_vsock_hdr(skb);
-		pkt_len = (size_t)le32_to_cpu(hdr->len);
 
-		if (dequeued_len >= 0) {
+		if (VIRTIO_VSOCK_SKB_CB(skb)->has_boundary_entries) {
 			size_t bytes_to_copy;
+			u32 msg_len, offset;
+			bool eor;
+			int ret;
 
-			bytes_to_copy = min(user_buf_len, pkt_len);
+			ret = vsock_uleb_decode_boundary(
+				vvs->boundary_buf + vvs->boundary_off,
+				vvs->boundary_len - vvs->boundary_off,
+				&msg_len, &eor);
+			if (ret < 0)
+				break;
+			vvs->boundary_off += ret;
 
-			if (bytes_to_copy) {
+			offset = VIRTIO_VSOCK_SKB_CB(skb)->offset;
+			bytes_to_copy = min(user_buf_len, (size_t)msg_len);
+
+			if (bytes_to_copy && dequeued_len >= 0) {
 				int err;
 
-				/* sk_lock is held by caller so no one else can dequeue.
-				 * Unlock rx_lock since skb_copy_datagram_iter() may sleep.
-				 */
 				spin_unlock_bh(&vvs->rx_lock);
-
-				err = skb_copy_datagram_iter(skb, 0,
+				err = skb_copy_datagram_iter(skb, offset,
 							     &msg->msg_iter,
 							     bytes_to_copy);
-				if (err) {
-					/* Copy of message failed. Rest of
-					 * fragments will be freed without copy.
-					 */
-					dequeued_len = err;
-				} else {
-					user_buf_len -= bytes_to_copy;
-				}
-
 				spin_lock_bh(&vvs->rx_lock);
+				if (err)
+					dequeued_len = err;
+				else
+					user_buf_len -= bytes_to_copy;
 			}
 
 			if (dequeued_len >= 0)
-				dequeued_len += pkt_len;
-		}
+				dequeued_len += msg_len;
 
-		if (le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SEQ_EOM) {
+			VIRTIO_VSOCK_SKB_CB(skb)->offset += msg_len;
 			msg_ready = true;
 			vvs->msg_count--;
 
-			if (le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SEQ_EOR)
+			if (eor)
 				msg->msg_flags |= MSG_EOR;
-		}
 
-		virtio_transport_dec_rx_pkt(vvs, pkt_len, pkt_len);
-		kfree_skb(skb);
+			virtio_transport_dec_rx_pkt(vvs, msg_len, msg_len);
+
+			if (VIRTIO_VSOCK_SKB_CB(skb)->offset >= skb->len) {
+				__skb_unlink(skb, &vvs->rx_queue);
+				kfree_skb(skb);
+			}
+
+			if (vvs->boundary_off >= vvs->boundary_len / 2)
+				vsock_boundary_buf_compact(vvs);
+		} else {
+			struct virtio_vsock_hdr *hdr;
+			size_t pkt_len;
+
+			skb = __skb_dequeue(&vvs->rx_queue);
+			if (!skb)
+				break;
+			hdr = virtio_vsock_hdr(skb);
+			pkt_len = (size_t)le32_to_cpu(hdr->len);
+
+			if (dequeued_len >= 0) {
+				size_t bytes_to_copy;
+
+				bytes_to_copy = min(user_buf_len, pkt_len);
+
+				if (bytes_to_copy) {
+					int err;
+
+					spin_unlock_bh(&vvs->rx_lock);
+					err = skb_copy_datagram_iter(
+						skb, 0, &msg->msg_iter,
+						bytes_to_copy);
+					if (err)
+						dequeued_len = err;
+					else
+						user_buf_len -= bytes_to_copy;
+					spin_lock_bh(&vvs->rx_lock);
+				}
+
+				if (dequeued_len >= 0)
+					dequeued_len += pkt_len;
+			}
+
+			if (le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SEQ_EOM) {
+				msg_ready = true;
+				vvs->msg_count--;
+
+				if (le32_to_cpu(hdr->flags) &
+				    VIRTIO_VSOCK_SEQ_EOR)
+					msg->msg_flags |= MSG_EOR;
+			}
+
+			virtio_transport_dec_rx_pkt(vvs, pkt_len, pkt_len);
+			kfree_skb(skb);
+		}
 	}
 
 	spin_unlock_bh(&vvs->rx_lock);
@@ -1132,6 +1299,7 @@ void virtio_transport_destruct(struct vsock_sock *vsk)
 
 	virtio_transport_cancel_close_work(vsk, true);
 
+	kfree(vvs->boundary_buf);
 	kfree(vvs);
 	vsk->trans = NULL;
 }
@@ -1224,6 +1392,11 @@ static void virtio_transport_remove_sock(struct vsock_sock *vsk)
 	 * removing it.
 	 */
 	__skb_queue_purge(&vvs->rx_queue);
+	kfree(vvs->boundary_buf);
+	vvs->boundary_buf = NULL;
+	vvs->boundary_len = 0;
+	vvs->boundary_alloc = 0;
+	vvs->boundary_off = 0;
 	vsock_remove_sock(vsk);
 }
 
@@ -1395,23 +1568,62 @@ virtio_transport_recv_enqueue(struct vsock_sock *vsk,
 	    !skb_is_nonlinear(skb)) {
 		struct virtio_vsock_hdr *last_hdr;
 		struct sk_buff *last_skb;
+		bool last_has_eom;
+		bool has_eom;
 
 		last_skb = skb_peek_tail(&vvs->rx_queue);
 		last_hdr = virtio_vsock_hdr(last_skb);
+		last_has_eom = le32_to_cpu(last_hdr->flags) & VIRTIO_VSOCK_SEQ_EOM;
+		has_eom = le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SEQ_EOM;
 
-		/* If there is space in the last packet queued, we copy the
-		 * new packet in its buffer. We avoid this if the last packet
-		 * queued has VIRTIO_VSOCK_SEQ_EOM set, because this is
-		 * delimiter of SEQPACKET message, so 'pkt' is the first packet
-		 * of a new message.
-		 */
-		if (skb->len < skb_tailroom(last_skb) &&
-		    !(le32_to_cpu(last_hdr->flags) & VIRTIO_VSOCK_SEQ_EOM)) {
-			memcpy(skb_put(last_skb, skb->len), skb->data, skb->len);
-			free_pkt = true;
-			last_hdr->flags |= hdr->flags;
-			le32_add_cpu(&last_hdr->len, len);
-			goto out;
+		if (skb->len < skb_tailroom(last_skb)) {
+			if (!last_has_eom) {
+				/* Same-message coalescing (existing path) */
+				memcpy(skb_put(last_skb, skb->len),
+				       skb->data, skb->len);
+				free_pkt = true;
+				last_hdr->flags |= hdr->flags;
+				le32_add_cpu(&last_hdr->len, len);
+				goto out;
+			}
+
+			/* Cross-EOM: coalesce complete messages into one skb,
+			 * recording message boundaries in a compact BER buffer.
+			 * Only when incoming packet also has EOM (complete msg).
+			 */
+			if (has_eom && !sk_psock(sk_vsock(vsk))) {
+				bool prev_eor, cur_eor;
+				u8 tmp[12];
+				int n = 0;
+
+				cur_eor = le32_to_cpu(hdr->flags) &
+					  VIRTIO_VSOCK_SEQ_EOR;
+
+				if (!VIRTIO_VSOCK_SKB_CB(last_skb)->has_boundary_entries) {
+					u32 prev_len = le32_to_cpu(last_hdr->len);
+
+					prev_eor = le32_to_cpu(last_hdr->flags) &
+						   VIRTIO_VSOCK_SEQ_EOR;
+					n += vsock_uleb_encode_boundary(
+						tmp + n, prev_len, prev_eor);
+				}
+				n += vsock_uleb_encode_boundary(
+					tmp + n, len, cur_eor);
+
+				if (!vsock_boundary_buf_ensure(
+					    vvs, vvs->boundary_len + n)) {
+					memcpy(vvs->boundary_buf +
+					       vvs->boundary_len, tmp, n);
+					vvs->boundary_len += n;
+					VIRTIO_VSOCK_SKB_CB(last_skb)->has_boundary_entries = true;
+					memcpy(skb_put(last_skb, skb->len),
+					       skb->data, skb->len);
+					free_pkt = true;
+					last_hdr->flags |= hdr->flags;
+					le32_add_cpu(&last_hdr->len, len);
+					goto out;
+				}
+			}
 		}
 	}
 


^ permalink raw reply related

* Re: [PATCH net-next 1/3] net: constify sk_skb_reason_drop() sock parameter
From: Kuniyuki Iwashima @ 2026-05-07 22:43 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Neal Cardwell,
	Simon Horman, Ido Schimmel, David Ahern, netdev, eric.dumazet
In-Reply-To: <20260507084305.2506115-2-edumazet@google.com>

On Thu, May 7, 2026 at 1:43 AM Eric Dumazet <edumazet@google.com> wrote:
>
> sk_skb_reason_drop() does not change sock parameter, make it
> const so that we can call it from TCP stack without a cast
> on a (const) listener socket.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>

^ permalink raw reply

* Re: [PATCH v1 net] tcp: Fix out-of-bounds access for twsk in tcp_ao_established_key().
From: Kuniyuki Iwashima @ 2026-05-07 22:42 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Paul E. McKenney, Neal Cardwell, David S. Miller, Jakub Kicinski,
	Paolo Abeni, Dmitry Safonov, Simon Horman, Kuniyuki Iwashima,
	netdev, Damiano Melotti
In-Reply-To: <CANn89iJawd-gi4qoCZLA_mB6c-gLaqF5dirALS-ZSsFWpEB6Qw@mail.gmail.com>

On Thu, May 7, 2026 at 4:23 AM Eric Dumazet <edumazet@google.com> wrote:
>
> On Wed, May 6, 2026 at 10:28 AM Kuniyuki Iwashima <kuniyu@google.com> wrote:
> >
> > lockdep_sock_is_held() was added in tcp_ao_established_key()
> > by the cited commit.
> >
> > It can be called from tcp_v[46]_timewait_ack() with twsk.
> >
> > Since it does not have sk->sk_lock, the lockdep annotation
> > results in out-of-bound access.
> >
> >   $ pahole -C tcp_timewait_sock vmlinux | grep size
> >         /* size: 288, cachelines: 5, members: 8 */
> >   $ pahole -C sock vmlinux | grep sk_lock
> >         socket_lock_t              sk_lock;              /*   440   192 */
> >
> > Let's not use lockdep_sock_is_held() for TCP_TIME_WAIT.
> >
> > Fixes: 6b2d11e2d8fc ("net/tcp: Add missing lockdep annotations for TCP-AO hlist traversals")
> > Reported-by: Damiano Melotti <melotti@google.com>
> > Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
> > ---
> >  net/ipv4/tcp_ao.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/net/ipv4/tcp_ao.c b/net/ipv4/tcp_ao.c
> > index a97cdf3e6af4..e2720233e36b 100644
> > --- a/net/ipv4/tcp_ao.c
> > +++ b/net/ipv4/tcp_ao.c
> > @@ -116,7 +116,9 @@ struct tcp_ao_key *tcp_ao_established_key(const struct sock *sk,
> >  {
> >         struct tcp_ao_key *key;
> >
> > -       hlist_for_each_entry_rcu(key, &ao->head, node, lockdep_sock_is_held(sk)) {
> > +       hlist_for_each_entry_rcu(key, &ao->head, node,
> > +                                sk->sk_state == TCP_TIME_WAIT ||
> > +                                lockdep_sock_is_held(sk)) {
> >                 if ((sndid >= 0 && key->sndid != sndid) ||
> >                     (rcvid >= 0 && key->rcvid != rcvid))
> >                         continue;
>
> I wonder if a better fix would be to change __list_check_rcu() evaluation order.
>
> Otherwise, this would mean that a TIME_WAIT socket would evade RCU
> LOCKDEP checks.

Ah exactly,

>
> If a fix in tcp_ao.c is te way to go, I would suggest the opposite of
> what you did.

will change that way.

Thanks !

>
> diff --git a/net/ipv4/tcp_ao.c b/net/ipv4/tcp_ao.c
> index a97cdf3e6af4cf1ee1cd8c6361944536056543e6..0a4b38b315fed40899901ea5f66fbdffd776df38
> 100644
> --- a/net/ipv4/tcp_ao.c
> +++ b/net/ipv4/tcp_ao.c
> @@ -116,7 +116,8 @@ struct tcp_ao_key *tcp_ao_established_key(const
> struct sock *sk,
>  {
>         struct tcp_ao_key *key;
>
> -       hlist_for_each_entry_rcu(key, &ao->head, node,
> lockdep_sock_is_held(sk)) {
> +       hlist_for_each_entry_rcu(key, &ao->head, node,
> +                                sk_fullsock(sk) && lockdep_sock_is_held(sk)) {
>                 if ((sndid >= 0 && key->sndid != sndid) ||
>                     (rcvid >= 0 && key->rcvid != rcvid))
>                         continue;

^ permalink raw reply

* Re: [Intel-wired-lan] [PATCH iwl-next v2 00/10] Add ACL support
From: Jacob Keller @ 2026-05-07 22:22 UTC (permalink / raw)
  To: Marcin Szycik, intel-wired-lan
  Cc: netdev, sandeep.penigalapati, ananth.s, alexander.duyck
In-Reply-To: <20260409120003.2719-1-marcin.szycik@linux.intel.com>

On 4/9/2026 4:59 AM, Marcin Szycik wrote:
> E8xx hardware provides a Ternary Classifier block for implementing
> functions such as ACL (Access Control List). In this series it's simply
> referred to as "ACL".
> 
> Implement ACL filtering. This expands support of network flow classification
> rules for the ethtool ntuple command. ACL filtering allows for an ip or port
> field's optional mask to be specified.
> 
> Example filters:
>   ethtool -N eth0 flow-type tcp4 dst-port 8880 m 0x00ff action 10
>   ethtool -N eth0 flow-type tcp4 src-ip 192.168.0.55 m 0.0.0.255 action -1
> 
> This is a resurrection of an old series from 2020 [1] with several
> improvements, but the fundamental logic unchanged. v1 was almost pulled
> in, but ultimately it was decided to drop it [2] because of unresolved
> issues. One issue was too many defensive NULL checks. Second issue is
> about inconsistency when using multiple input sets. Both are addressed
> in this patchset.
> 
> More about the second issue:
> 
> From [3]:
>> I would argue that you need to have some sort of logic that basically
>> checks to see if you are going to hit the input set issue and falls
>> back and applies the ACL rules. Otherwise you are significantly
>> hampering the usefulness of this filter type. It doesn't make sense
>> that dropping a field will cause a rule to fail to be added, but
>> masking a single bit in some field will make it valid. It would make
>> it a nightmare to use from the user point of view as the rules come
>> across as arbitrary.
> 
> Flow Director (FD) has a hardware limitation where all filters for the same
> packet type must use identical input sets. Previously, attempting to add the
> second filter would fail.
> 
> Patch 10 adds automatic fallback to ACL block when FD cannot accommodate a
> filter due to input set conflicts, which resolves this inconsistency.
> 
> v2:
> * Rebase. Notable conflicts were the removal of ice_status and the addition of
>   libie (which affected AdminQ communication)
> * Reduce the number of defensive NULL checks
> * Use = {} instead of memset for definitions
> * Use kzalloc_obj() instead of plain kzalloc()
> * Move from devm_ to plain allocation for objects that don't require it
> * Move iterator declaration to loop start
> * Move some defines out of structs
> * Fix kdoc (except untouched ice_ethtool_fdir.c functions)
> * Adjust style (err for return variable, spacing, rewrite some comments,
> * commit messages)
> * Remove overly verbose comments
> * Add patches 5, 6, 9 and 10
> * More changes listed in patches (if applicable)
> 
> [1] https://lore.kernel.org/intel-wired-lan/20200914153720.48498-1-anthony.l.nguyen@intel.com
> [2] https://lore.kernel.org/netdev/7192efe4d27c93148b3205e65f37203c89170316.camel@intel.com/#t
> [3] https://lore.kernel.org/netdev/CAKgT0Ucxd5-gvEwWAdbL04ER2o++RX_oekUV3E0rYquEgFKj1w@mail.gmail.com
> 

Marcin wrote to me and mentioned he had a v3 planned. I'm going to drop
this series from the queue in awaiting v3, despite the testing pass that
recently completed. It was apparently already known that a v3 was
eminent but I lost track of that detail when applying the series.

Thanks,
Jake

^ permalink raw reply

* [PATCH net-next 8/8] net/sched: mq_prio: no longer acquire qdisc spinlocks in mqprio_dump_class_stats()
From: Eric Dumazet @ 2026-05-07 22:19 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Jamal Hadi Salim, Jiri Pirko, netdev, eric.dumazet,
	Eric Dumazet
In-Reply-To: <20260507221948.335726-1-edumazet@google.com>

Prepare mqprio_dump_class_stats() for RTNL avoidance.

Use RCU instead of RTNL, and no longer acquire each children spinlock.

As a bonus we no longer have to release/acquire d->lock.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/sched/sch_mqprio.c | 35 ++++++++++++++---------------------
 1 file changed, 14 insertions(+), 21 deletions(-)

diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
index 37756932d4917caa7c3b96dff1999e30623fe953..6eb1db7b5d67548643b3e84f254cc1e034d1e6c7 100644
--- a/net/sched/sch_mqprio.c
+++ b/net/sched/sch_mqprio.c
@@ -672,9 +672,9 @@ static int mqprio_dump_class(struct Qdisc *sch, unsigned long cl,
 
 static int mqprio_dump_class_stats(struct Qdisc *sch, unsigned long cl,
 				   struct gnet_dump *d)
-	__releases(d->lock)
-	__acquires(d->lock)
 {
+	const struct Qdisc *qdisc;
+
 	if (cl >= TC_H_MIN_PRIORITY) {
 		struct net_device *dev = qdisc_dev(sch);
 		struct netdev_tc_txq tc = dev->tc_to_txq[cl & TC_BITMASK];
@@ -684,44 +684,37 @@ static int mqprio_dump_class_stats(struct Qdisc *sch, unsigned long cl,
 		int i;
 
 		gnet_stats_basic_sync_init(&bstats);
-		/* Drop lock here it will be reclaimed before touching
-		 * statistics this is required because the d->lock we
-		 * hold here is the look on dev_queue->qdisc_sleeping
-		 * also acquired below.
-		 */
-		if (d->lock)
-			spin_unlock_bh(d->lock);
 
+		rcu_read_lock();
 		for (i = tc.offset; i < tc.offset + tc.count; i++) {
 			struct netdev_queue *q = netdev_get_tx_queue(dev, i);
-			struct Qdisc *qdisc = rtnl_dereference(q->qdisc);
-
-			spin_lock_bh(qdisc_lock(qdisc));
 
+			qdisc = rcu_dereference(q->qdisc);
 			gnet_stats_add_basic(&bstats, qdisc->cpu_bstats,
 					     &qdisc->bstats, false);
 			gnet_stats_add_queue(&qstats, qdisc->cpu_qstats,
 					     &qdisc->qstats);
 			qlen += qdisc_qlen_lockless(qdisc);
-
-			spin_unlock_bh(qdisc_lock(qdisc));
 		}
+		rcu_read_unlock();
+
 		qlen = qlen + qstats.qlen;
 
-		/* Reclaim root sleeping lock before completing stats */
-		if (d->lock)
-			spin_lock_bh(d->lock);
 		if (gnet_stats_copy_basic(d, NULL, &bstats, false) < 0 ||
 		    gnet_stats_copy_queue(d, NULL, &qstats, qlen) < 0)
 			return -1;
 	} else {
 		struct netdev_queue *dev_queue = mqprio_queue_get(sch, cl);
+		int res = 0;
 
-		sch = rtnl_dereference(dev_queue->qdisc_sleeping);
-		if (gnet_stats_copy_basic(d, sch->cpu_bstats,
-					  &sch->bstats, true) < 0 ||
+		rcu_read_lock();
+		qdisc = rcu_dereference(dev_queue->qdisc_sleeping);
+		if (gnet_stats_copy_basic(d, qdisc->cpu_bstats,
+					  &qdisc->bstats, true) < 0 ||
 		    qdisc_qstats_copy(d, sch) < 0)
-			return -1;
+			res = -1;
+		rcu_read_unlock();
+		return res;
 	}
 	return 0;
 }
-- 
2.54.0.563.g4f69b47b94-goog


^ permalink raw reply related

* [PATCH net-next 7/8] net/sched: mq_prio: no longer acquire qdisc spinlocks in mqprio_dump()
From: Eric Dumazet @ 2026-05-07 22:19 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Jamal Hadi Salim, Jiri Pirko, netdev, eric.dumazet,
	Eric Dumazet
In-Reply-To: <20260507221948.335726-1-edumazet@google.com>

Prepare mqprio_dump() for RTNL avoidance.

Use RCU instead of RTNL, and no longer acquire each children spinlock.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/sched/sch_mqprio.c | 35 +++++++++++++++++++++++------------
 1 file changed, 23 insertions(+), 12 deletions(-)

diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
index 3b4881c389c535368687454ea268bec892ecb942..37756932d4917caa7c3b96dff1999e30623fe953 100644
--- a/net/sched/sch_mqprio.c
+++ b/net/sched/sch_mqprio.c
@@ -551,35 +551,46 @@ static int mqprio_dump_tc_entries(struct mqprio_sched *priv,
 
 static int mqprio_dump(struct Qdisc *sch, struct sk_buff *skb)
 {
-	struct net_device *dev = qdisc_dev(sch);
-	struct mqprio_sched *priv = qdisc_priv(sch);
 	struct nlattr *nla = (struct nlattr *)skb_tail_pointer(skb);
+	struct mqprio_sched *priv = qdisc_priv(sch);
+	struct net_device *dev = qdisc_dev(sch);
+	struct gnet_stats_queue qstats = { 0 };
+	struct gnet_stats_basic_sync bstats;
 	struct tc_mqprio_qopt opt = { 0 };
+	const struct Qdisc *qdisc;
 	unsigned int qlen = 0;
-	struct Qdisc *qdisc;
 	unsigned int ntx;
 
-	qlen = 0;
-	gnet_stats_basic_sync_init(&sch->bstats);
-	memset(&sch->qstats, 0, sizeof(sch->qstats));
+	gnet_stats_basic_sync_init(&bstats);
 
 	/* MQ supports lockless qdiscs. However, statistics accounting needs
 	 * to account for all, none, or a mix of locked and unlocked child
 	 * qdiscs. Percpu stats are added to counters in-band and locking
 	 * qdisc totals are added at end.
 	 */
+	rcu_read_lock();
 	for (ntx = 0; ntx < dev->num_tx_queues; ntx++) {
-		qdisc = rtnl_dereference(netdev_get_tx_queue(dev, ntx)->qdisc_sleeping);
-		spin_lock_bh(qdisc_lock(qdisc));
+		qdisc = rcu_dereference(netdev_get_tx_queue(dev, ntx)->qdisc_sleeping);
 
-		gnet_stats_add_basic(&sch->bstats, qdisc->cpu_bstats,
+		gnet_stats_add_basic(&bstats, qdisc->cpu_bstats,
 				     &qdisc->bstats, false);
-		gnet_stats_add_queue(&sch->qstats, qdisc->cpu_qstats,
+		gnet_stats_add_queue(&qstats, qdisc->cpu_qstats,
 				     &qdisc->qstats);
 		qlen += qdisc_qlen_lockless(qdisc);
-
-		spin_unlock_bh(qdisc_lock(qdisc));
 	}
+	rcu_read_unlock();
+
+	spin_lock_bh(qdisc_lock(sch));
+	_bstats_set(&sch->bstats, u64_stats_read(&bstats.bytes),
+		    u64_stats_read(&bstats.packets));
+	spin_unlock_bh(qdisc_lock(sch));
+
+	WRITE_ONCE(sch->qstats.qlen, qstats.qlen);
+	WRITE_ONCE(sch->qstats.backlog, qstats.backlog);
+	WRITE_ONCE(sch->qstats.drops, qstats.drops);
+	WRITE_ONCE(sch->qstats.requeues, qstats.requeues);
+	WRITE_ONCE(sch->qstats.overlimits, qstats.overlimits);
+
 	WRITE_ONCE(sch->q.qlen, qlen);
 
 	mqprio_qopt_reconstruct(dev, &opt);
-- 
2.54.0.563.g4f69b47b94-goog


^ permalink raw reply related

* [PATCH net-next 6/8] net/sched: mq: no longer acquire qdisc spinlocks in dump operations
From: Eric Dumazet @ 2026-05-07 22:19 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Jamal Hadi Salim, Jiri Pirko, netdev, eric.dumazet,
	Eric Dumazet
In-Reply-To: <20260507221948.335726-1-edumazet@google.com>

Prepare mq_dump_common() for RTNL avoidance.

Use RCU instead of RTNL, and no longer acquire each children spinlock.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/sch_generic.h |  9 +++++++++
 net/sched/sch_mq.c        | 30 +++++++++++++++++++++---------
 2 files changed, 30 insertions(+), 9 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 3070a717bb98386838f3e8149f34d52572fe208f..bfd1167ed575e5154c52a4491194e17e3998977c 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -952,6 +952,15 @@ static inline void _bstats_update(struct gnet_stats_basic_sync *bstats,
 	u64_stats_update_end(&bstats->syncp);
 }
 
+static inline void _bstats_set(struct gnet_stats_basic_sync *bstats,
+			       u64 bytes, u64 packets)
+{
+	u64_stats_update_begin(&bstats->syncp);
+	u64_stats_set(&bstats->bytes, bytes);
+	u64_stats_set(&bstats->packets, packets);
+	u64_stats_update_end(&bstats->syncp);
+}
+
 static inline void bstats_update(struct gnet_stats_basic_sync *bstats,
 				 const struct sk_buff *skb)
 {
diff --git a/net/sched/sch_mq.c b/net/sched/sch_mq.c
index 4172ec24a43d1c2fe56789986a46da93eb522721..6bb1042e4595b50c6023d3ad81706ad7ab6fe0e5 100644
--- a/net/sched/sch_mq.c
+++ b/net/sched/sch_mq.c
@@ -143,30 +143,42 @@ EXPORT_SYMBOL_NS_GPL(mq_attach, "NET_SCHED_INTERNAL");
 void mq_dump_common(struct Qdisc *sch, struct sk_buff *skb)
 {
 	struct net_device *dev = qdisc_dev(sch);
+	struct gnet_stats_queue qstats = { 0 };
+	struct gnet_stats_basic_sync bstats;
+	const struct Qdisc *qdisc;
 	unsigned int qlen = 0;
-	struct Qdisc *qdisc;
 	unsigned int ntx;
 
-	gnet_stats_basic_sync_init(&sch->bstats);
-	memset(&sch->qstats, 0, sizeof(sch->qstats));
+	gnet_stats_basic_sync_init(&bstats);
 
 	/* MQ supports lockless qdiscs. However, statistics accounting needs
 	 * to account for all, none, or a mix of locked and unlocked child
 	 * qdiscs. Percpu stats are added to counters in-band and locking
 	 * qdisc totals are added at end.
 	 */
+	rcu_read_lock();
 	for (ntx = 0; ntx < dev->num_tx_queues; ntx++) {
-		qdisc = rtnl_dereference(netdev_get_tx_queue(dev, ntx)->qdisc_sleeping);
-		spin_lock_bh(qdisc_lock(qdisc));
+		qdisc = rcu_dereference(netdev_get_tx_queue(dev, ntx)->qdisc_sleeping);
 
-		gnet_stats_add_basic(&sch->bstats, qdisc->cpu_bstats,
+		gnet_stats_add_basic(&bstats, qdisc->cpu_bstats,
 				     &qdisc->bstats, false);
-		gnet_stats_add_queue(&sch->qstats, qdisc->cpu_qstats,
+		gnet_stats_add_queue(&qstats, qdisc->cpu_qstats,
 				     &qdisc->qstats);
 		qlen += qdisc_qlen_lockless(qdisc);
-
-		spin_unlock_bh(qdisc_lock(qdisc));
 	}
+	rcu_read_unlock();
+
+	spin_lock_bh(qdisc_lock(sch));
+	_bstats_set(&sch->bstats, u64_stats_read(&bstats.bytes),
+		    u64_stats_read(&bstats.packets));
+	spin_unlock_bh(qdisc_lock(sch));
+
+	WRITE_ONCE(sch->qstats.qlen, qstats.qlen);
+	WRITE_ONCE(sch->qstats.backlog, qstats.backlog);
+	WRITE_ONCE(sch->qstats.drops, qstats.drops);
+	WRITE_ONCE(sch->qstats.requeues, qstats.requeues);
+	WRITE_ONCE(sch->qstats.overlimits, qstats.overlimits);
+
 	WRITE_ONCE(sch->q.qlen, qlen);
 }
 EXPORT_SYMBOL_NS_GPL(mq_dump_common, "NET_SCHED_INTERNAL");
-- 
2.54.0.563.g4f69b47b94-goog


^ permalink raw reply related

* [PATCH net-next 5/8] net/sched: add const qualifiers to gnet_stats helpers
From: Eric Dumazet @ 2026-05-07 22:19 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Jamal Hadi Salim, Jiri Pirko, netdev, eric.dumazet,
	Eric Dumazet
In-Reply-To: <20260507221948.335726-1-edumazet@google.com>

In preparation of lockless qdisc dumps, add const qualifiers to:

- gnet_stats_add_basic()
- gnet_stats_copy_basic()
- gnet_stats_copy_queue()
- gnet_stats_read_basic()
- ___gnet_stats_copy_basic()

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/gen_stats.h | 12 ++++++------
 net/core/gen_stats.c    | 24 ++++++++++++------------
 2 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/include/net/gen_stats.h b/include/net/gen_stats.h
index 7aa2b8e1fb298c4f994a745b114fc4da785ddf4b..6e661b743bc35743de9c211bdf5c24d69be5c0f1 100644
--- a/include/net/gen_stats.h
+++ b/include/net/gen_stats.h
@@ -47,19 +47,19 @@ int gnet_stats_start_copy_compat(struct sk_buff *skb, int type,
 				 int padattr);
 
 int gnet_stats_copy_basic(struct gnet_dump *d,
-			  struct gnet_stats_basic_sync __percpu *cpu,
-			  struct gnet_stats_basic_sync *b, bool running);
+			  const struct gnet_stats_basic_sync __percpu *cpu,
+			  const struct gnet_stats_basic_sync *b, bool running);
 void gnet_stats_add_basic(struct gnet_stats_basic_sync *bstats,
-			  struct gnet_stats_basic_sync __percpu *cpu,
-			  struct gnet_stats_basic_sync *b, bool running);
+			  const struct gnet_stats_basic_sync __percpu *cpu,
+			  const struct gnet_stats_basic_sync *b, bool running);
 int gnet_stats_copy_basic_hw(struct gnet_dump *d,
 			     struct gnet_stats_basic_sync __percpu *cpu,
 			     struct gnet_stats_basic_sync *b, bool running);
 int gnet_stats_copy_rate_est(struct gnet_dump *d,
 			     struct net_rate_estimator __rcu **ptr);
 int gnet_stats_copy_queue(struct gnet_dump *d,
-			  struct gnet_stats_queue __percpu *cpu_q,
-			  struct gnet_stats_queue *q, __u32 qlen);
+			  const struct gnet_stats_queue __percpu *cpu_q,
+			  const struct gnet_stats_queue *q, __u32 qlen);
 void gnet_stats_add_queue(struct gnet_stats_queue *qstats,
 			  const struct gnet_stats_queue __percpu *cpu_q,
 			  const struct gnet_stats_queue *q);
diff --git a/net/core/gen_stats.c b/net/core/gen_stats.c
index 1a2380e74272de8eaf3d4ef453e56105a31e9edf..3b2f9ea2eb072dde792aad5b60cf00dcc2efa76d 100644
--- a/net/core/gen_stats.c
+++ b/net/core/gen_stats.c
@@ -124,7 +124,7 @@ void gnet_stats_basic_sync_init(struct gnet_stats_basic_sync *b)
 EXPORT_SYMBOL(gnet_stats_basic_sync_init);
 
 static void gnet_stats_add_basic_cpu(struct gnet_stats_basic_sync *bstats,
-				     struct gnet_stats_basic_sync __percpu *cpu)
+				     const struct gnet_stats_basic_sync __percpu *cpu)
 {
 	u64 t_bytes = 0, t_packets = 0;
 	int i;
@@ -147,8 +147,8 @@ static void gnet_stats_add_basic_cpu(struct gnet_stats_basic_sync *bstats,
 }
 
 void gnet_stats_add_basic(struct gnet_stats_basic_sync *bstats,
-			  struct gnet_stats_basic_sync __percpu *cpu,
-			  struct gnet_stats_basic_sync *b, bool running)
+			  const struct gnet_stats_basic_sync __percpu *cpu,
+			  const struct gnet_stats_basic_sync *b, bool running)
 {
 	unsigned int start;
 	u64 bytes = 0;
@@ -172,8 +172,8 @@ void gnet_stats_add_basic(struct gnet_stats_basic_sync *bstats,
 EXPORT_SYMBOL(gnet_stats_add_basic);
 
 static void gnet_stats_read_basic(u64 *ret_bytes, u64 *ret_packets,
-				  struct gnet_stats_basic_sync __percpu *cpu,
-				  struct gnet_stats_basic_sync *b, bool running)
+				  const struct gnet_stats_basic_sync __percpu *cpu,
+				  const struct gnet_stats_basic_sync *b, bool running)
 {
 	unsigned int start;
 
@@ -182,7 +182,7 @@ static void gnet_stats_read_basic(u64 *ret_bytes, u64 *ret_packets,
 		int i;
 
 		for_each_possible_cpu(i) {
-			struct gnet_stats_basic_sync *bcpu = per_cpu_ptr(cpu, i);
+			const struct gnet_stats_basic_sync *bcpu = per_cpu_ptr(cpu, i);
 			unsigned int start;
 			u64 bytes, packets;
 
@@ -209,8 +209,8 @@ static void gnet_stats_read_basic(u64 *ret_bytes, u64 *ret_packets,
 
 static int
 ___gnet_stats_copy_basic(struct gnet_dump *d,
-			 struct gnet_stats_basic_sync __percpu *cpu,
-			 struct gnet_stats_basic_sync *b,
+			 const struct gnet_stats_basic_sync __percpu *cpu,
+			 const struct gnet_stats_basic_sync *b,
 			 int type, bool running)
 {
 	u64 bstats_bytes, bstats_packets;
@@ -258,8 +258,8 @@ ___gnet_stats_copy_basic(struct gnet_dump *d,
  */
 int
 gnet_stats_copy_basic(struct gnet_dump *d,
-		      struct gnet_stats_basic_sync __percpu *cpu,
-		      struct gnet_stats_basic_sync *b,
+		      const struct gnet_stats_basic_sync __percpu *cpu,
+		      const struct gnet_stats_basic_sync *b,
 		      bool running)
 {
 	return ___gnet_stats_copy_basic(d, cpu, b, TCA_STATS_BASIC, running);
@@ -385,8 +385,8 @@ EXPORT_SYMBOL(gnet_stats_add_queue);
  */
 int
 gnet_stats_copy_queue(struct gnet_dump *d,
-		      struct gnet_stats_queue __percpu *cpu_q,
-		      struct gnet_stats_queue *q, __u32 qlen)
+		      const struct gnet_stats_queue __percpu *cpu_q,
+		      const struct gnet_stats_queue *q, __u32 qlen)
 {
 	struct gnet_stats_queue qstats = {0};
 
-- 
2.54.0.563.g4f69b47b94-goog


^ permalink raw reply related

* [PATCH net-next 4/8] net/sched: add qdisc_qlen_lockless() helper
From: Eric Dumazet @ 2026-05-07 22:19 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Jamal Hadi Salim, Jiri Pirko, netdev, eric.dumazet,
	Eric Dumazet
In-Reply-To: <20260507221948.335726-1-edumazet@google.com>

Used in contexts were qdisc spinlock is not held.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/sch_generic.h | 7 ++++++-
 net/sched/sch_mq.c        | 2 +-
 net/sched/sch_mqprio.c    | 4 ++--
 3 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index d147549169a4d43c80684db2e1815a8a0d6596c6..3070a717bb98386838f3e8149f34d52572fe208f 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -542,6 +542,11 @@ static inline int qdisc_qlen(const struct Qdisc *q)
 	return q->q.qlen;
 }
 
+static inline int qdisc_qlen_lockless(const struct Qdisc *q)
+{
+	return READ_ONCE(q->q.qlen);
+}
+
 static inline void qdisc_qlen_inc(struct Qdisc *q)
 {
 	WRITE_ONCE(q->q.qlen, q->q.qlen + 1);
@@ -561,7 +566,7 @@ static inline int qdisc_qlen_sum(const struct Qdisc *q)
 		for_each_possible_cpu(i)
 			qlen += READ_ONCE(per_cpu_ptr(q->cpu_qstats, i)->qlen);
 	} else {
-		qlen += READ_ONCE(q->q.qlen);
+		qlen += qdisc_qlen_lockless(q);
 	}
 
 	return qlen;
diff --git a/net/sched/sch_mq.c b/net/sched/sch_mq.c
index ec8c91d3fde04e59daec2aecdb14d6bf50715e15..4172ec24a43d1c2fe56789986a46da93eb522721 100644
--- a/net/sched/sch_mq.c
+++ b/net/sched/sch_mq.c
@@ -163,7 +163,7 @@ void mq_dump_common(struct Qdisc *sch, struct sk_buff *skb)
 				     &qdisc->bstats, false);
 		gnet_stats_add_queue(&sch->qstats, qdisc->cpu_qstats,
 				     &qdisc->qstats);
-		qlen += qdisc_qlen(qdisc);
+		qlen += qdisc_qlen_lockless(qdisc);
 
 		spin_unlock_bh(qdisc_lock(qdisc));
 	}
diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
index 91a92992cd24ab6c30bf7db2288c08cd493c7bc3..3b4881c389c535368687454ea268bec892ecb942 100644
--- a/net/sched/sch_mqprio.c
+++ b/net/sched/sch_mqprio.c
@@ -576,7 +576,7 @@ static int mqprio_dump(struct Qdisc *sch, struct sk_buff *skb)
 				     &qdisc->bstats, false);
 		gnet_stats_add_queue(&sch->qstats, qdisc->cpu_qstats,
 				     &qdisc->qstats);
-		qlen += qdisc_qlen(qdisc);
+		qlen += qdisc_qlen_lockless(qdisc);
 
 		spin_unlock_bh(qdisc_lock(qdisc));
 	}
@@ -691,7 +691,7 @@ static int mqprio_dump_class_stats(struct Qdisc *sch, unsigned long cl,
 					     &qdisc->bstats, false);
 			gnet_stats_add_queue(&qstats, qdisc->cpu_qstats,
 					     &qdisc->qstats);
-			qlen += qdisc_qlen(qdisc);
+			qlen += qdisc_qlen_lockless(qdisc);
 
 			spin_unlock_bh(qdisc_lock(qdisc));
 		}
-- 
2.54.0.563.g4f69b47b94-goog


^ permalink raw reply related

* [PATCH net-next 3/8] net/sched: annotate data-races around sch->qstats.backlog
From: Eric Dumazet @ 2026-05-07 22:19 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Jamal Hadi Salim, Jiri Pirko, netdev, eric.dumazet,
	Eric Dumazet
In-Reply-To: <20260507221948.335726-1-edumazet@google.com>

Add qstats_backlog_sub() and qstats_backlog_add() helpers
and use them instead of open-coding them.

These helpers use WRITE_ONCE() to prevent store-tearing.

Also use WRITE_ONCE() in fq_reset() and qdisc_reset()
when sch->qstats.backlog is cleared.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/sch_generic.h | 16 +++++++++++++---
 net/sched/sch_api.c       |  2 +-
 net/sched/sch_cake.c      |  7 +++----
 net/sched/sch_cbs.c       |  2 +-
 net/sched/sch_codel.c     |  2 +-
 net/sched/sch_drr.c       |  2 +-
 net/sched/sch_ets.c       |  2 +-
 net/sched/sch_fq.c        |  2 +-
 net/sched/sch_fq_codel.c  |  4 ++--
 net/sched/sch_fq_pie.c    |  4 ++--
 net/sched/sch_generic.c   |  2 +-
 net/sched/sch_gred.c      |  2 +-
 net/sched/sch_hfsc.c      |  2 +-
 net/sched/sch_htb.c       |  2 +-
 net/sched/sch_netem.c     |  2 +-
 net/sched/sch_prio.c      |  2 +-
 net/sched/sch_qfq.c       |  2 +-
 net/sched/sch_red.c       |  2 +-
 net/sched/sch_sfb.c       |  4 ++--
 net/sched/sch_sfq.c       |  2 +-
 net/sched/sch_tbf.c       |  4 ++--
 21 files changed, 39 insertions(+), 30 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 3893fbb29960d9b32042616b747168b689b355fd..d147549169a4d43c80684db2e1815a8a0d6596c6 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -965,10 +965,15 @@ static inline void qdisc_bstats_update(struct Qdisc *sch,
 	bstats_update(&sch->bstats, skb);
 }
 
+static inline void qstats_backlog_sub(struct Qdisc *sch, u32 val)
+{
+	WRITE_ONCE(sch->qstats.backlog, sch->qstats.backlog - val);
+}
+
 static inline void qdisc_qstats_backlog_dec(struct Qdisc *sch,
 					    const struct sk_buff *skb)
 {
-	sch->qstats.backlog -= qdisc_pkt_len(skb);
+	qstats_backlog_sub(sch, qdisc_pkt_len(skb));
 }
 
 static inline void qdisc_qstats_cpu_backlog_dec(struct Qdisc *sch,
@@ -977,10 +982,15 @@ static inline void qdisc_qstats_cpu_backlog_dec(struct Qdisc *sch,
 	this_cpu_sub(sch->cpu_qstats->backlog, qdisc_pkt_len(skb));
 }
 
+static inline void qstats_backlog_add(struct Qdisc *sch, u32 val)
+{
+	WRITE_ONCE(sch->qstats.backlog, sch->qstats.backlog + val);
+}
+
 static inline void qdisc_qstats_backlog_inc(struct Qdisc *sch,
 					    const struct sk_buff *skb)
 {
-	sch->qstats.backlog += qdisc_pkt_len(skb);
+	qstats_backlog_add(sch, qdisc_pkt_len(skb));
 }
 
 static inline void qdisc_qstats_cpu_backlog_inc(struct Qdisc *sch,
@@ -1304,7 +1314,7 @@ static inline void qdisc_update_stats_at_enqueue(struct Qdisc *sch,
 		qdisc_qstats_cpu_qlen_inc(sch);
 		this_cpu_add(sch->cpu_qstats->backlog, pkt_len);
 	} else {
-		sch->qstats.backlog += pkt_len;
+		qstats_backlog_add(sch, pkt_len);
 		qdisc_qlen_inc(sch);
 	}
 }
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index cefa2d8ac5ec00c78b08b520a11672120d10cdef..3c779e5098efd6602ec4efb0abadb8dac21c4b44 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -806,7 +806,7 @@ void qdisc_tree_reduce_backlog(struct Qdisc *sch, int n, int len)
 			cops->qlen_notify(sch, cl);
 		}
 		WRITE_ONCE(sch->q.qlen, sch->q.qlen - n);
-		sch->qstats.backlog -= len;
+		qstats_backlog_sub(sch, len);
 		__qdisc_qstats_drop(sch, drops);
 	}
 	rcu_read_unlock();
diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c
index 7ab75a52f7d1a46d87fc8f7c099c749a5331ccf6..7d59f52a4617b7ca3adaf040457ca8d30aa44be7 100644
--- a/net/sched/sch_cake.c
+++ b/net/sched/sch_cake.c
@@ -1603,7 +1603,6 @@ static unsigned int cake_drop(struct Qdisc *sch, struct sk_buff **to_free)
 	q->buffer_used      -= skb->truesize;
 	WRITE_ONCE(b->tin_backlog, b->tin_backlog - len);
 	WRITE_ONCE(b->backlogs[idx], b->backlogs[idx] - len);
-	sch->qstats.backlog -= len;
 
 	WRITE_ONCE(flow->dropped, flow->dropped + 1);
 	WRITE_ONCE(b->tin_dropped, b->tin_dropped + 1);
@@ -1830,7 +1829,7 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 		}
 
 		/* stats */
-		sch->qstats.backlog += slen;
+		qstats_backlog_add(sch, slen);
 		q->avg_window_bytes += slen;
 		WRITE_ONCE(b->bytes, b->bytes + slen);
 		WRITE_ONCE(b->tin_backlog, b->tin_backlog + slen);
@@ -1867,7 +1866,7 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 
 		/* stats */
 		WRITE_ONCE(b->packets, b->packets + 1);
-		sch->qstats.backlog += len - ack_pkt_len;
+		qstats_backlog_add(sch, len - ack_pkt_len);
 		q->avg_window_bytes += len - ack_pkt_len;
 		WRITE_ONCE(b->bytes, b->bytes + len - ack_pkt_len);
 		WRITE_ONCE(b->tin_backlog, b->tin_backlog + len - ack_pkt_len);
@@ -1985,7 +1984,7 @@ static struct sk_buff *cake_dequeue_one(struct Qdisc *sch)
 		len = qdisc_pkt_len(skb);
 		WRITE_ONCE(b->backlogs[q->cur_flow], b->backlogs[q->cur_flow] - len);
 		WRITE_ONCE(b->tin_backlog, b->tin_backlog - len);
-		sch->qstats.backlog      -= len;
+		qstats_backlog_sub(sch, len);
 		q->buffer_used		 -= skb->truesize;
 		qdisc_qlen_dec(sch);
 
diff --git a/net/sched/sch_cbs.c b/net/sched/sch_cbs.c
index a75e58876797952f2218725f6da5cff29f330ae2..2cfa0fd92829ad7eba7454e09dc17eb8f22519b8 100644
--- a/net/sched/sch_cbs.c
+++ b/net/sched/sch_cbs.c
@@ -96,7 +96,7 @@ static int cbs_child_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 	if (err != NET_XMIT_SUCCESS)
 		return err;
 
-	sch->qstats.backlog += len;
+	qstats_backlog_add(sch, len);
 	qdisc_qlen_inc(sch);
 
 	return NET_XMIT_SUCCESS;
diff --git a/net/sched/sch_codel.c b/net/sched/sch_codel.c
index 317aae0ec7bd6aedb4bae09b18423c981fed16e7..91dd2e629af8f2d1a29f439a6dbb5c186fa01d33 100644
--- a/net/sched/sch_codel.c
+++ b/net/sched/sch_codel.c
@@ -42,7 +42,7 @@ static struct sk_buff *dequeue_func(struct codel_vars *vars, void *ctx)
 	struct sk_buff *skb = __qdisc_dequeue_head(&sch->q);
 
 	if (skb) {
-		sch->qstats.backlog -= qdisc_pkt_len(skb);
+		qstats_backlog_sub(sch, qdisc_pkt_len(skb));
 		prefetch(&skb->end); /* we'll need skb_shinfo() */
 	}
 	return skb;
diff --git a/net/sched/sch_drr.c b/net/sched/sch_drr.c
index 925fa0cfd730ce72e45e8983ba02eb913afb1235..3f6687fa9666257952be5d44f9e3460845fe2a40 100644
--- a/net/sched/sch_drr.c
+++ b/net/sched/sch_drr.c
@@ -365,7 +365,7 @@ static int drr_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 		cl->deficit = cl->quantum;
 	}
 
-	sch->qstats.backlog += len;
+	qstats_backlog_add(sch, len);
 	qdisc_qlen_inc(sch);
 	return err;
 }
diff --git a/net/sched/sch_ets.c b/net/sched/sch_ets.c
index c817e0a6c14653a35f5ebb9de1a5ccc44d1a2f98..1cc559634ed27ce5a6630186a51a8ac8180dad96 100644
--- a/net/sched/sch_ets.c
+++ b/net/sched/sch_ets.c
@@ -448,7 +448,7 @@ static int ets_qdisc_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 		cl->deficit = cl->quantum;
 	}
 
-	sch->qstats.backlog += len;
+	qstats_backlog_add(sch, len);
 	qdisc_qlen_inc(sch);
 	return err;
 }
diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c
index 1e34ac136b15cf24742f2810d201420cf763021a..796cb8046a902b94952a571b250813c5e557d600 100644
--- a/net/sched/sch_fq.c
+++ b/net/sched/sch_fq.c
@@ -802,7 +802,7 @@ static void fq_reset(struct Qdisc *sch)
 	unsigned int idx;
 
 	WRITE_ONCE(sch->q.qlen, 0);
-	sch->qstats.backlog = 0;
+	WRITE_ONCE(sch->qstats.backlog, 0);
 
 	fq_flow_purge(&q->internal);
 
diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
index cae8483fbb0c4f62f28dba4c15b4426485390bcf..1b1de693d4c64a1f5f4e9e788371829dea91740e 100644
--- a/net/sched/sch_fq_codel.c
+++ b/net/sched/sch_fq_codel.c
@@ -177,7 +177,7 @@ static unsigned int fq_codel_drop(struct Qdisc *sch, unsigned int max_packets,
 	WRITE_ONCE(q->backlogs[idx], q->backlogs[idx] - len);
 	q->memory_usage -= mem;
 	__qdisc_qstats_drop(sch, i);
-	sch->qstats.backlog -= len;
+	qstats_backlog_sub(sch, len);
 	WRITE_ONCE(sch->q.qlen, sch->q.qlen - i);
 	return idx;
 }
@@ -268,7 +268,7 @@ static struct sk_buff *dequeue_func(struct codel_vars *vars, void *ctx)
 			   q->backlogs[flow - q->flows] - qdisc_pkt_len(skb));
 		q->memory_usage -= get_codel_cb(skb)->mem_usage;
 		qdisc_qlen_dec(sch);
-		sch->qstats.backlog -= qdisc_pkt_len(skb);
+		qdisc_qstats_backlog_dec(sch, skb);
 	}
 	return skb;
 }
diff --git a/net/sched/sch_fq_pie.c b/net/sched/sch_fq_pie.c
index 0a4eca4ab086ebebbdba17784f12370c301bbac6..72f48fa4010bebbe6be212938b457db21ff3c5a0 100644
--- a/net/sched/sch_fq_pie.c
+++ b/net/sched/sch_fq_pie.c
@@ -184,7 +184,7 @@ static int fq_pie_qdisc_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 		pkt_len = qdisc_pkt_len(skb);
 		q->stats.packets_in++;
 		q->memory_usage += skb->truesize;
-		sch->qstats.backlog += pkt_len;
+		qstats_backlog_add(sch, pkt_len);
 		qdisc_qlen_inc(sch);
 		flow_queue_add(sel_flow, skb);
 		if (list_empty(&sel_flow->flowchain)) {
@@ -262,7 +262,7 @@ static struct sk_buff *fq_pie_qdisc_dequeue(struct Qdisc *sch)
 	if (flow->head) {
 		skb = dequeue_head(flow);
 		pkt_len = qdisc_pkt_len(skb);
-		sch->qstats.backlog -= pkt_len;
+		qstats_backlog_sub(sch, pkt_len);
 		qdisc_qlen_dec(sch);
 		qdisc_bstats_update(sch, skb);
 	}
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index e35d9c58850fa9d82471d64daedfdf8c47e92b68..e8647a5c74af237d20fc73a05b27a03cc8b62427 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -1060,7 +1060,7 @@ void qdisc_reset(struct Qdisc *qdisc)
 	__skb_queue_purge(&qdisc->skb_bad_txq);
 
 	WRITE_ONCE(qdisc->q.qlen, 0);
-	qdisc->qstats.backlog = 0;
+	WRITE_ONCE(qdisc->qstats.backlog, 0);
 }
 EXPORT_SYMBOL(qdisc_reset);
 
diff --git a/net/sched/sch_gred.c b/net/sched/sch_gred.c
index 8ae65572162c188cca5ac8f030dc6f2054a7fcd0..fcc1a4c0363624293986f221c70572ce6503e220 100644
--- a/net/sched/sch_gred.c
+++ b/net/sched/sch_gred.c
@@ -388,7 +388,7 @@ static int gred_offload_dump_stats(struct Qdisc *sch)
 		bytes += u64_stats_read(&hw_stats->stats.bstats[i].bytes);
 		packets += u64_stats_read(&hw_stats->stats.bstats[i].packets);
 		sch->qstats.qlen += hw_stats->stats.qstats[i].qlen;
-		sch->qstats.backlog += hw_stats->stats.qstats[i].backlog;
+		qstats_backlog_add(sch, hw_stats->stats.qstats[i].backlog);
 		__qdisc_qstats_drop(sch, hw_stats->stats.qstats[i].drops);
 		sch->qstats.requeues += hw_stats->stats.qstats[i].requeues;
 		sch->qstats.overlimits += hw_stats->stats.qstats[i].overlimits;
diff --git a/net/sched/sch_hfsc.c b/net/sched/sch_hfsc.c
index e71a565100edf60881ca7542faa408c5bb1a0984..59409ee2d2ff9279d7439b744030c0e845386de0 100644
--- a/net/sched/sch_hfsc.c
+++ b/net/sched/sch_hfsc.c
@@ -1560,7 +1560,7 @@ hfsc_enqueue(struct sk_buff *skb, struct Qdisc *sch, struct sk_buff **to_free)
 		return err;
 	}
 
-	sch->qstats.backlog += len;
+	qstats_backlog_add(sch, len);
 	qdisc_qlen_inc(sch);
 
 	if (first && !cl_in_el_or_vttree(cl)) {
diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index c22ccd8eae8c73323ccdf425e62857b3b851d74e..1e600f65c8769a74286c4f060b0d45da9a13eeeb 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -650,7 +650,7 @@ static int htb_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 		htb_activate(q, cl);
 	}
 
-	sch->qstats.backlog += len;
+	qstats_backlog_add(sch, len);
 	qdisc_qlen_inc(sch);
 	return NET_XMIT_SUCCESS;
 }
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index 57b12cbca45355c69780614fa87aaf37255d64cc..ddbfea9dd32a7cee381dc82e0291db709ee57f8a 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -750,7 +750,7 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch)
 				if (err != NET_XMIT_SUCCESS) {
 					if (net_xmit_drop_count(err))
 						qdisc_qstats_drop(sch);
-					sch->qstats.backlog -= pkt_len;
+					qstats_backlog_sub(sch, pkt_len);
 					qdisc_qlen_dec(sch);
 					qdisc_tree_reduce_backlog(sch, 1, pkt_len);
 				}
diff --git a/net/sched/sch_prio.c b/net/sched/sch_prio.c
index fe42ae3d6b696b2fc47f4d397af32e950eeec194..e4dd56a890725b4c14d6715c96f5b3fa44a8f4f2 100644
--- a/net/sched/sch_prio.c
+++ b/net/sched/sch_prio.c
@@ -85,7 +85,7 @@ prio_enqueue(struct sk_buff *skb, struct Qdisc *sch, struct sk_buff **to_free)
 
 	ret = qdisc_enqueue(skb, qdisc, to_free);
 	if (ret == NET_XMIT_SUCCESS) {
-		sch->qstats.backlog += len;
+		qstats_backlog_add(sch, len);
 		qdisc_qlen_inc(sch);
 		return NET_XMIT_SUCCESS;
 	}
diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
index 195c434aae5f7e03d1a1238ed73bb64b3f04e105..cb56787e1d258c06f2e86959c3b2cfaeb12df1ac 100644
--- a/net/sched/sch_qfq.c
+++ b/net/sched/sch_qfq.c
@@ -1264,7 +1264,7 @@ static int qfq_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 	}
 
 	_bstats_update(&cl->bstats, len, gso_segs);
-	sch->qstats.backlog += len;
+	qstats_backlog_add(sch, len);
 	qdisc_qlen_inc(sch);
 
 	agg = cl->agg;
diff --git a/net/sched/sch_red.c b/net/sched/sch_red.c
index 0719590dfd73b64d21f71ab00621f64ed0eefc89..d7598214270b8e5b6b818be37f1519f64ad537c4 100644
--- a/net/sched/sch_red.c
+++ b/net/sched/sch_red.c
@@ -138,7 +138,7 @@ static int red_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 	len = qdisc_pkt_len(skb);
 	ret = qdisc_enqueue(skb, child, to_free);
 	if (likely(ret == NET_XMIT_SUCCESS)) {
-		sch->qstats.backlog += len;
+		qstats_backlog_add(sch, len);
 		qdisc_qlen_inc(sch);
 	} else if (net_xmit_drop_count(ret)) {
 		WRITE_ONCE(q->stats.pdrop,
diff --git a/net/sched/sch_sfb.c b/net/sched/sch_sfb.c
index efd9251c3add317f3b817f08c732fca0c347bf35..b1d46509427692eeeabcfa19957c83fae3fa306e 100644
--- a/net/sched/sch_sfb.c
+++ b/net/sched/sch_sfb.c
@@ -415,7 +415,7 @@ static int sfb_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 	memcpy(&cb, sfb_skb_cb(skb), sizeof(cb));
 	ret = qdisc_enqueue(skb, child, to_free);
 	if (likely(ret == NET_XMIT_SUCCESS)) {
-		sch->qstats.backlog += len;
+		qstats_backlog_add(sch, len);
 		qdisc_qlen_inc(sch);
 		increment_qlen(&cb, q);
 	} else if (net_xmit_drop_count(ret)) {
@@ -592,7 +592,7 @@ static int sfb_dump(struct Qdisc *sch, struct sk_buff *skb)
 		.penalty_burst = q->penalty_burst,
 	};
 
-	sch->qstats.backlog = q->qdisc->qstats.backlog;
+	WRITE_ONCE(sch->qstats.backlog, READ_ONCE(q->qdisc->qstats.backlog));
 	opts = nla_nest_start_noflag(skb, TCA_OPTIONS);
 	if (opts == NULL)
 		goto nla_put_failure;
diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
index f9807ee2cf6c72101ce39c4f43bf32c03c0a5f62..758b88f218652704454647f25da270a0254cafcf 100644
--- a/net/sched/sch_sfq.c
+++ b/net/sched/sch_sfq.c
@@ -427,7 +427,7 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch, struct sk_buff **to_free)
 		/* We know we have at least one packet in queue */
 		head = slot_dequeue_head(slot);
 		delta = qdisc_pkt_len(head) - qdisc_pkt_len(skb);
-		sch->qstats.backlog -= delta;
+		qstats_backlog_sub(sch, delta);
 		WRITE_ONCE(slot->backlog, slot->backlog - delta);
 		qdisc_drop_reason(head, sch, to_free, QDISC_DROP_FLOW_LIMIT);
 
diff --git a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c
index 25edf11a7d671fe63878b0995998c5920b86ef74..67c7aaaf8f607e82ad13b7fdf177405a1dd075bb 100644
--- a/net/sched/sch_tbf.c
+++ b/net/sched/sch_tbf.c
@@ -232,7 +232,7 @@ static int tbf_segment(struct sk_buff *skb, struct Qdisc *sch,
 		}
 	}
 	WRITE_ONCE(sch->q.qlen, sch->q.qlen + nb);
-	sch->qstats.backlog += len;
+	qstats_backlog_add(sch, len);
 	if (nb > 0) {
 		qdisc_tree_reduce_backlog(sch, 1 - nb, prev_len - len);
 		consume_skb(skb);
@@ -263,7 +263,7 @@ static int tbf_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 		return ret;
 	}
 
-	sch->qstats.backlog += len;
+	qstats_backlog_add(sch, len);
 	qdisc_qlen_inc(sch);
 	return NET_XMIT_SUCCESS;
 }
-- 
2.54.0.563.g4f69b47b94-goog


^ permalink raw reply related

* [PATCH net-next 2/8] net/sched: add qdisc_qlen_inc() and qdisc_qlen_dec()
From: Eric Dumazet @ 2026-05-07 22:19 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Jamal Hadi Salim, Jiri Pirko, netdev, eric.dumazet,
	Eric Dumazet
In-Reply-To: <20260507221948.335726-1-edumazet@google.com>

Helpers to increment or decrement sch->q.qlen, with appropriate
WRITE_ONCE() to prevent store tearing.

Add other WRITE_ONCE() when sch->q.qlen is changed.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/sch_generic.h | 26 ++++++++++++++++++--------
 net/sched/sch_api.c       |  2 +-
 net/sched/sch_cake.c      |  8 ++++----
 net/sched/sch_cbs.c       |  4 ++--
 net/sched/sch_choke.c     |  8 ++++----
 net/sched/sch_drr.c       |  4 ++--
 net/sched/sch_dualpi2.c   |  6 +++---
 net/sched/sch_etf.c       |  8 ++++----
 net/sched/sch_ets.c       |  4 ++--
 net/sched/sch_fq.c        |  6 +++---
 net/sched/sch_fq_codel.c  |  7 ++++---
 net/sched/sch_fq_pie.c    |  4 ++--
 net/sched/sch_generic.c   | 10 +++++-----
 net/sched/sch_hfsc.c      |  4 ++--
 net/sched/sch_hhf.c       |  7 ++++---
 net/sched/sch_htb.c       |  4 ++--
 net/sched/sch_mq.c        |  5 +++--
 net/sched/sch_mqprio.c    | 18 ++++++++++--------
 net/sched/sch_multiq.c    |  4 ++--
 net/sched/sch_netem.c     | 10 +++++-----
 net/sched/sch_prio.c      |  4 ++--
 net/sched/sch_qfq.c       |  6 +++---
 net/sched/sch_red.c       |  4 ++--
 net/sched/sch_sfb.c       |  4 ++--
 net/sched/sch_sfq.c       |  9 +++++----
 net/sched/sch_skbprio.c   |  4 ++--
 net/sched/sch_taprio.c    |  4 ++--
 net/sched/sch_tbf.c       |  6 +++---
 net/sched/sch_teql.c      |  2 +-
 29 files changed, 104 insertions(+), 88 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index ccfabfac674ef8617faeabd2fcb15daf8a1ea17f..3893fbb29960d9b32042616b747168b689b355fd 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -542,6 +542,16 @@ static inline int qdisc_qlen(const struct Qdisc *q)
 	return q->q.qlen;
 }
 
+static inline void qdisc_qlen_inc(struct Qdisc *q)
+{
+	WRITE_ONCE(q->q.qlen, q->q.qlen + 1);
+}
+
+static inline void qdisc_qlen_dec(struct Qdisc *q)
+{
+	WRITE_ONCE(q->q.qlen, q->q.qlen - 1);
+}
+
 static inline int qdisc_qlen_sum(const struct Qdisc *q)
 {
 	__u32 qlen = q->qstats.qlen;
@@ -549,9 +559,9 @@ static inline int qdisc_qlen_sum(const struct Qdisc *q)
 
 	if (qdisc_is_percpu_stats(q)) {
 		for_each_possible_cpu(i)
-			qlen += per_cpu_ptr(q->cpu_qstats, i)->qlen;
+			qlen += READ_ONCE(per_cpu_ptr(q->cpu_qstats, i)->qlen);
 	} else {
-		qlen += q->q.qlen;
+		qlen += READ_ONCE(q->q.qlen);
 	}
 
 	return qlen;
@@ -1110,7 +1120,7 @@ static inline struct sk_buff *qdisc_dequeue_internal(struct Qdisc *sch, bool dir
 
 	skb = __skb_dequeue(&sch->gso_skb);
 	if (skb) {
-		sch->q.qlen--;
+		qdisc_qlen_dec(sch);
 		qdisc_qstats_backlog_dec(sch, skb);
 		return skb;
 	}
@@ -1266,7 +1276,7 @@ static inline struct sk_buff *qdisc_peek_dequeued(struct Qdisc *sch)
 			__skb_queue_head(&sch->gso_skb, skb);
 			/* it's still part of the queue */
 			qdisc_qstats_backlog_inc(sch, skb);
-			sch->q.qlen++;
+			qdisc_qlen_inc(sch);
 		}
 	}
 
@@ -1283,7 +1293,7 @@ static inline void qdisc_update_stats_at_dequeue(struct Qdisc *sch,
 	} else {
 		qdisc_qstats_backlog_dec(sch, skb);
 		qdisc_bstats_update(sch, skb);
-		sch->q.qlen--;
+		qdisc_qlen_dec(sch);
 	}
 }
 
@@ -1295,7 +1305,7 @@ static inline void qdisc_update_stats_at_enqueue(struct Qdisc *sch,
 		this_cpu_add(sch->cpu_qstats->backlog, pkt_len);
 	} else {
 		sch->qstats.backlog += pkt_len;
-		sch->q.qlen++;
+		qdisc_qlen_inc(sch);
 	}
 }
 
@@ -1311,7 +1321,7 @@ static inline struct sk_buff *qdisc_dequeue_peeked(struct Qdisc *sch)
 			qdisc_qstats_cpu_qlen_dec(sch);
 		} else {
 			qdisc_qstats_backlog_dec(sch, skb);
-			sch->q.qlen--;
+			qdisc_qlen_dec(sch);
 		}
 	} else {
 		skb = sch->dequeue(sch);
@@ -1332,7 +1342,7 @@ static inline void __qdisc_reset_queue(struct qdisc_skb_head *qh)
 
 		qh->head = NULL;
 		qh->tail = NULL;
-		qh->qlen = 0;
+		WRITE_ONCE(qh->qlen, 0);
 	}
 }
 
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 6f7847c5536f16e6754954f0a606581e17257361..cefa2d8ac5ec00c78b08b520a11672120d10cdef 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -805,7 +805,7 @@ void qdisc_tree_reduce_backlog(struct Qdisc *sch, int n, int len)
 			cl = cops->find(sch, parentid);
 			cops->qlen_notify(sch, cl);
 		}
-		sch->q.qlen -= n;
+		WRITE_ONCE(sch->q.qlen, sch->q.qlen - n);
 		sch->qstats.backlog -= len;
 		__qdisc_qstats_drop(sch, drops);
 	}
diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c
index d931e8d51f723fdedea9f3f90efceec6e0a070d3..7ab75a52f7d1a46d87fc8f7c099c749a5331ccf6 100644
--- a/net/sched/sch_cake.c
+++ b/net/sched/sch_cake.c
@@ -1612,7 +1612,7 @@ static unsigned int cake_drop(struct Qdisc *sch, struct sk_buff **to_free)
 		cake_advance_shaper(q, b, skb, now, true);
 
 	qdisc_drop_reason(skb, sch, to_free, QDISC_DROP_OVERLIMIT);
-	sch->q.qlen--;
+	qdisc_qlen_dec(sch);
 
 	cake_heapify(q, 0);
 
@@ -1822,7 +1822,7 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 									  segs);
 			flow_queue_add(flow, segs);
 
-			sch->q.qlen++;
+			qdisc_qlen_inc(sch);
 			numsegs++;
 			slen += segs->len;
 			q->buffer_used += segs->truesize;
@@ -1861,7 +1861,7 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 			qdisc_tree_reduce_backlog(sch, 1, ack_pkt_len);
 			consume_skb(ack);
 		} else {
-			sch->q.qlen++;
+			qdisc_qlen_inc(sch);
 			q->buffer_used      += skb->truesize;
 		}
 
@@ -1987,7 +1987,7 @@ static struct sk_buff *cake_dequeue_one(struct Qdisc *sch)
 		WRITE_ONCE(b->tin_backlog, b->tin_backlog - len);
 		sch->qstats.backlog      -= len;
 		q->buffer_used		 -= skb->truesize;
-		sch->q.qlen--;
+		qdisc_qlen_dec(sch);
 
 		if (q->overflow_timeout)
 			cake_heapify(q, b->overflow_idx[q->cur_flow]);
diff --git a/net/sched/sch_cbs.c b/net/sched/sch_cbs.c
index 8c9a0400c8622c652db290796f2dd338eb61799c..a75e58876797952f2218725f6da5cff29f330ae2 100644
--- a/net/sched/sch_cbs.c
+++ b/net/sched/sch_cbs.c
@@ -97,7 +97,7 @@ static int cbs_child_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 		return err;
 
 	sch->qstats.backlog += len;
-	sch->q.qlen++;
+	qdisc_qlen_inc(sch);
 
 	return NET_XMIT_SUCCESS;
 }
@@ -168,7 +168,7 @@ static struct sk_buff *cbs_child_dequeue(struct Qdisc *sch, struct Qdisc *child)
 
 	qdisc_qstats_backlog_dec(sch, skb);
 	qdisc_bstats_update(sch, skb);
-	sch->q.qlen--;
+	qdisc_qlen_dec(sch);
 
 	return skb;
 }
diff --git a/net/sched/sch_choke.c b/net/sched/sch_choke.c
index 2875bcdb18a413075c795665e95f9dbbaac45962..73d3e673dc7b16cf2b9ac1d622da280c2ceb064a 100644
--- a/net/sched/sch_choke.c
+++ b/net/sched/sch_choke.c
@@ -123,7 +123,7 @@ static void choke_drop_by_idx(struct Qdisc *sch, unsigned int idx,
 	if (idx == q->tail)
 		choke_zap_tail_holes(q);
 
-	--sch->q.qlen;
+	qdisc_qlen_dec(sch);
 	qdisc_qstats_backlog_dec(sch, skb);
 	qdisc_tree_reduce_backlog(sch, 1, qdisc_pkt_len(skb));
 	qdisc_drop(skb, sch, to_free);
@@ -271,7 +271,7 @@ static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 	if (sch->q.qlen < q->limit) {
 		q->tab[q->tail] = skb;
 		q->tail = (q->tail + 1) & q->tab_mask;
-		++sch->q.qlen;
+		qdisc_qlen_inc(sch);
 		qdisc_qstats_backlog_inc(sch, skb);
 		return NET_XMIT_SUCCESS;
 	}
@@ -298,7 +298,7 @@ static struct sk_buff *choke_dequeue(struct Qdisc *sch)
 	skb = q->tab[q->head];
 	q->tab[q->head] = NULL;
 	choke_zap_head_holes(q);
-	--sch->q.qlen;
+	qdisc_qlen_dec(sch);
 	qdisc_qstats_backlog_dec(sch, skb);
 	qdisc_bstats_update(sch, skb);
 
@@ -396,7 +396,7 @@ static int choke_change(struct Qdisc *sch, struct nlattr *opt,
 				}
 				dropped += qdisc_pkt_len(skb);
 				qdisc_qstats_backlog_dec(sch, skb);
-				--sch->q.qlen;
+				qdisc_qlen_dec(sch);
 				rtnl_qdisc_drop(skb, sch);
 			}
 			qdisc_tree_reduce_backlog(sch, oqlen - sch->q.qlen, dropped);
diff --git a/net/sched/sch_drr.c b/net/sched/sch_drr.c
index 01335a49e091444747635ee8bc7e22ded504d571..925fa0cfd730ce72e45e8983ba02eb913afb1235 100644
--- a/net/sched/sch_drr.c
+++ b/net/sched/sch_drr.c
@@ -366,7 +366,7 @@ static int drr_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 	}
 
 	sch->qstats.backlog += len;
-	sch->q.qlen++;
+	qdisc_qlen_inc(sch);
 	return err;
 }
 
@@ -399,7 +399,7 @@ static struct sk_buff *drr_dequeue(struct Qdisc *sch)
 			bstats_update(&cl->bstats, skb);
 			qdisc_bstats_update(sch, skb);
 			qdisc_qstats_backlog_dec(sch, skb);
-			sch->q.qlen--;
+			qdisc_qlen_dec(sch);
 			return skb;
 		}
 
diff --git a/net/sched/sch_dualpi2.c b/net/sched/sch_dualpi2.c
index 241e6a46bd00e39820f5ba9dc71d559f205a4de0..c6416f09dddd8f170b92e50fb89377a15773c5bf 100644
--- a/net/sched/sch_dualpi2.c
+++ b/net/sched/sch_dualpi2.c
@@ -415,7 +415,7 @@ static int dualpi2_enqueue_skb(struct sk_buff *skb, struct Qdisc *sch,
 		dualpi2_skb_cb(skb)->apply_step = skb_apply_step(skb, q);
 
 		/* Keep the overall qdisc stats consistent */
-		++sch->q.qlen;
+		qdisc_qlen_inc(sch);
 		qdisc_qstats_backlog_inc(sch, skb);
 		++q->packets_in_l;
 		if (!q->l_head_ts)
@@ -530,7 +530,7 @@ static struct sk_buff *dequeue_packet(struct Qdisc *sch,
 		qdisc_qstats_backlog_dec(q->l_queue, skb);
 
 		/* Keep the global queue size consistent */
-		--sch->q.qlen;
+		qdisc_qlen_dec(sch);
 		q->memory_used -= skb->truesize;
 	} else if (c_len) {
 		skb = __qdisc_dequeue_head(&sch->q);
@@ -888,7 +888,7 @@ static int dualpi2_change(struct Qdisc *sch, struct nlattr *opt,
 			 * l_queue on enqueue; qdisc_dequeue_internal()
 			 * handled l_queue, so we further account for sch.
 			 */
-			--sch->q.qlen;
+			qdisc_qlen_dec(sch);
 			qdisc_qstats_backlog_dec(sch, skb);
 			q->memory_used -= skb->truesize;
 			rtnl_qdisc_drop(skb, q->l_queue);
diff --git a/net/sched/sch_etf.c b/net/sched/sch_etf.c
index c74d778c32a1eda639650df4d1d103c5338f14e6..ada87a81da6ac4c20e036b5391eb4efe9795ab91 100644
--- a/net/sched/sch_etf.c
+++ b/net/sched/sch_etf.c
@@ -189,7 +189,7 @@ static int etf_enqueue_timesortedlist(struct sk_buff *nskb, struct Qdisc *sch,
 	rb_insert_color_cached(&nskb->rbnode, &q->head, leftmost);
 
 	qdisc_qstats_backlog_inc(sch, nskb);
-	sch->q.qlen++;
+	qdisc_qlen_inc(sch);
 
 	/* Now we may need to re-arm the qdisc watchdog for the next packet. */
 	reset_watchdog(sch);
@@ -222,7 +222,7 @@ static void timesortedlist_drop(struct Qdisc *sch, struct sk_buff *skb,
 		qdisc_qstats_backlog_dec(sch, skb);
 		qdisc_drop(skb, sch, &to_free);
 		qdisc_qstats_overlimit(sch);
-		sch->q.qlen--;
+		qdisc_qlen_dec(sch);
 	}
 
 	kfree_skb_list(to_free);
@@ -247,7 +247,7 @@ static void timesortedlist_remove(struct Qdisc *sch, struct sk_buff *skb)
 
 	q->last = skb->tstamp;
 
-	sch->q.qlen--;
+	qdisc_qlen_dec(sch);
 }
 
 static struct sk_buff *etf_dequeue_timesortedlist(struct Qdisc *sch)
@@ -426,7 +426,7 @@ static void timesortedlist_clear(struct Qdisc *sch)
 
 		rb_erase_cached(&skb->rbnode, &q->head);
 		rtnl_kfree_skbs(skb, skb);
-		sch->q.qlen--;
+		qdisc_qlen_dec(sch);
 	}
 }
 
diff --git a/net/sched/sch_ets.c b/net/sched/sch_ets.c
index a4b07b661b7756a675d22c0f84f8f0a713cdb7eb..c817e0a6c14653a35f5ebb9de1a5ccc44d1a2f98 100644
--- a/net/sched/sch_ets.c
+++ b/net/sched/sch_ets.c
@@ -449,7 +449,7 @@ static int ets_qdisc_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 	}
 
 	sch->qstats.backlog += len;
-	sch->q.qlen++;
+	qdisc_qlen_inc(sch);
 	return err;
 }
 
@@ -458,7 +458,7 @@ ets_qdisc_dequeue_skb(struct Qdisc *sch, struct sk_buff *skb)
 {
 	qdisc_bstats_update(sch, skb);
 	qdisc_qstats_backlog_dec(sch, skb);
-	sch->q.qlen--;
+	qdisc_qlen_dec(sch);
 	return skb;
 }
 
diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c
index f2edcf872981fd8181dfb97a3bc665fd4a869115..1e34ac136b15cf24742f2810d201420cf763021a 100644
--- a/net/sched/sch_fq.c
+++ b/net/sched/sch_fq.c
@@ -497,7 +497,7 @@ static void fq_dequeue_skb(struct Qdisc *sch, struct fq_flow *flow,
 	fq_erase_head(sch, flow, skb);
 	skb_mark_not_on_list(skb);
 	qdisc_qstats_backlog_dec(sch, skb);
-	sch->q.qlen--;
+	qdisc_qlen_dec(sch);
 	qdisc_bstats_update(sch, skb);
 }
 
@@ -597,7 +597,7 @@ static int fq_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 	flow_queue_add(f, skb);
 
 	qdisc_qstats_backlog_inc(sch, skb);
-	sch->q.qlen++;
+	qdisc_qlen_inc(sch);
 
 	return NET_XMIT_SUCCESS;
 }
@@ -801,7 +801,7 @@ static void fq_reset(struct Qdisc *sch)
 	struct fq_flow *f;
 	unsigned int idx;
 
-	sch->q.qlen = 0;
+	WRITE_ONCE(sch->q.qlen, 0);
 	sch->qstats.backlog = 0;
 
 	fq_flow_purge(&q->internal);
diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
index ed42ce62a17f1de9516af90533d16b65657f86cd..cae8483fbb0c4f62f28dba4c15b4426485390bcf 100644
--- a/net/sched/sch_fq_codel.c
+++ b/net/sched/sch_fq_codel.c
@@ -178,7 +178,7 @@ static unsigned int fq_codel_drop(struct Qdisc *sch, unsigned int max_packets,
 	q->memory_usage -= mem;
 	__qdisc_qstats_drop(sch, i);
 	sch->qstats.backlog -= len;
-	sch->q.qlen -= i;
+	WRITE_ONCE(sch->q.qlen, sch->q.qlen - i);
 	return idx;
 }
 
@@ -215,7 +215,8 @@ static int fq_codel_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 	get_codel_cb(skb)->mem_usage = skb->truesize;
 	q->memory_usage += get_codel_cb(skb)->mem_usage;
 	memory_limited = q->memory_usage > q->memory_limit;
-	if (++sch->q.qlen <= sch->limit && !memory_limited)
+	qdisc_qlen_inc(sch);
+	if (sch->q.qlen <= sch->limit && !memory_limited)
 		return NET_XMIT_SUCCESS;
 
 	prev_backlog = sch->qstats.backlog;
@@ -266,7 +267,7 @@ static struct sk_buff *dequeue_func(struct codel_vars *vars, void *ctx)
 		WRITE_ONCE(q->backlogs[flow - q->flows],
 			   q->backlogs[flow - q->flows] - qdisc_pkt_len(skb));
 		q->memory_usage -= get_codel_cb(skb)->mem_usage;
-		sch->q.qlen--;
+		qdisc_qlen_dec(sch);
 		sch->qstats.backlog -= qdisc_pkt_len(skb);
 	}
 	return skb;
diff --git a/net/sched/sch_fq_pie.c b/net/sched/sch_fq_pie.c
index 7becbf5362b3165bac4517f32887386b01301612..0a4eca4ab086ebebbdba17784f12370c301bbac6 100644
--- a/net/sched/sch_fq_pie.c
+++ b/net/sched/sch_fq_pie.c
@@ -185,7 +185,7 @@ static int fq_pie_qdisc_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 		q->stats.packets_in++;
 		q->memory_usage += skb->truesize;
 		sch->qstats.backlog += pkt_len;
-		sch->q.qlen++;
+		qdisc_qlen_inc(sch);
 		flow_queue_add(sel_flow, skb);
 		if (list_empty(&sel_flow->flowchain)) {
 			list_add_tail(&sel_flow->flowchain, &q->new_flows);
@@ -263,7 +263,7 @@ static struct sk_buff *fq_pie_qdisc_dequeue(struct Qdisc *sch)
 		skb = dequeue_head(flow);
 		pkt_len = qdisc_pkt_len(skb);
 		sch->qstats.backlog -= pkt_len;
-		sch->q.qlen--;
+		qdisc_qlen_dec(sch);
 		qdisc_bstats_update(sch, skb);
 	}
 
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index a93321db8fd75d30c61e146c290bbc139c37c913..e35d9c58850fa9d82471d64daedfdf8c47e92b68 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -118,7 +118,7 @@ static inline struct sk_buff *__skb_dequeue_bad_txq(struct Qdisc *q)
 				qdisc_qstats_cpu_qlen_dec(q);
 			} else {
 				qdisc_qstats_backlog_dec(q, skb);
-				q->q.qlen--;
+				qdisc_qlen_dec(q);
 			}
 		} else {
 			skb = SKB_XOFF_MAGIC;
@@ -159,7 +159,7 @@ static inline void qdisc_enqueue_skb_bad_txq(struct Qdisc *q,
 		qdisc_qstats_cpu_qlen_inc(q);
 	} else {
 		qdisc_qstats_backlog_inc(q, skb);
-		q->q.qlen++;
+		qdisc_qlen_inc(q);
 	}
 
 	if (lock)
@@ -188,7 +188,7 @@ static inline void dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
 		} else {
 			q->qstats.requeues++;
 			qdisc_qstats_backlog_inc(q, skb);
-			q->q.qlen++;
+			qdisc_qlen_inc(q);
 		}
 
 		skb = next;
@@ -294,7 +294,7 @@ static struct sk_buff *dequeue_skb(struct Qdisc *q, bool *validate,
 				qdisc_qstats_cpu_qlen_dec(q);
 			} else {
 				qdisc_qstats_backlog_dec(q, skb);
-				q->q.qlen--;
+				qdisc_qlen_dec(q);
 			}
 		} else {
 			skb = NULL;
@@ -1059,7 +1059,7 @@ void qdisc_reset(struct Qdisc *qdisc)
 	__skb_queue_purge(&qdisc->gso_skb);
 	__skb_queue_purge(&qdisc->skb_bad_txq);
 
-	qdisc->q.qlen = 0;
+	WRITE_ONCE(qdisc->q.qlen, 0);
 	qdisc->qstats.backlog = 0;
 }
 EXPORT_SYMBOL(qdisc_reset);
diff --git a/net/sched/sch_hfsc.c b/net/sched/sch_hfsc.c
index 83b2ca2e37fc82cfebf089e6c0e36f18af939887..e71a565100edf60881ca7542faa408c5bb1a0984 100644
--- a/net/sched/sch_hfsc.c
+++ b/net/sched/sch_hfsc.c
@@ -1561,7 +1561,7 @@ hfsc_enqueue(struct sk_buff *skb, struct Qdisc *sch, struct sk_buff **to_free)
 	}
 
 	sch->qstats.backlog += len;
-	sch->q.qlen++;
+	qdisc_qlen_inc(sch);
 
 	if (first && !cl_in_el_or_vttree(cl)) {
 		if (cl->cl_flags & HFSC_RSC)
@@ -1650,7 +1650,7 @@ hfsc_dequeue(struct Qdisc *sch)
 
 	qdisc_bstats_update(sch, skb);
 	qdisc_qstats_backlog_dec(sch, skb);
-	sch->q.qlen--;
+	qdisc_qlen_dec(sch);
 
 	return skb;
 }
diff --git a/net/sched/sch_hhf.c b/net/sched/sch_hhf.c
index 96021f52d835b56339509565ca03fe796593e231..1e25b75daae2e5de31bd212dfa1f6d7aea927174 100644
--- a/net/sched/sch_hhf.c
+++ b/net/sched/sch_hhf.c
@@ -360,7 +360,7 @@ static unsigned int hhf_drop(struct Qdisc *sch, struct sk_buff **to_free)
 	if (bucket->head) {
 		struct sk_buff *skb = dequeue_head(bucket);
 
-		sch->q.qlen--;
+		qdisc_qlen_dec(sch);
 		qdisc_qstats_backlog_dec(sch, skb);
 		qdisc_drop(skb, sch, to_free);
 	}
@@ -400,7 +400,8 @@ static int hhf_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 		}
 		bucket->deficit = weight * q->quantum;
 	}
-	if (++sch->q.qlen <= sch->limit)
+	qdisc_qlen_inc(sch);
+	if (sch->q.qlen <= sch->limit)
 		return NET_XMIT_SUCCESS;
 
 	prev_backlog = sch->qstats.backlog;
@@ -443,7 +444,7 @@ static struct sk_buff *hhf_dequeue(struct Qdisc *sch)
 
 	if (bucket->head) {
 		skb = dequeue_head(bucket);
-		sch->q.qlen--;
+		qdisc_qlen_dec(sch);
 		qdisc_qstats_backlog_dec(sch, skb);
 	}
 
diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index eb12381795ce1bb0f3b8c5f502e16ad64c4408c8..c22ccd8eae8c73323ccdf425e62857b3b851d74e 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -651,7 +651,7 @@ static int htb_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 	}
 
 	sch->qstats.backlog += len;
-	sch->q.qlen++;
+	qdisc_qlen_inc(sch);
 	return NET_XMIT_SUCCESS;
 }
 
@@ -951,7 +951,7 @@ static struct sk_buff *htb_dequeue(struct Qdisc *sch)
 ok:
 		qdisc_bstats_update(sch, skb);
 		qdisc_qstats_backlog_dec(sch, skb);
-		sch->q.qlen--;
+		qdisc_qlen_dec(sch);
 		return skb;
 	}
 
diff --git a/net/sched/sch_mq.c b/net/sched/sch_mq.c
index a0133a7b9d3b09a0d2a6064234c8fdef60dbf955..ec8c91d3fde04e59daec2aecdb14d6bf50715e15 100644
--- a/net/sched/sch_mq.c
+++ b/net/sched/sch_mq.c
@@ -143,10 +143,10 @@ EXPORT_SYMBOL_NS_GPL(mq_attach, "NET_SCHED_INTERNAL");
 void mq_dump_common(struct Qdisc *sch, struct sk_buff *skb)
 {
 	struct net_device *dev = qdisc_dev(sch);
+	unsigned int qlen = 0;
 	struct Qdisc *qdisc;
 	unsigned int ntx;
 
-	sch->q.qlen = 0;
 	gnet_stats_basic_sync_init(&sch->bstats);
 	memset(&sch->qstats, 0, sizeof(sch->qstats));
 
@@ -163,10 +163,11 @@ void mq_dump_common(struct Qdisc *sch, struct sk_buff *skb)
 				     &qdisc->bstats, false);
 		gnet_stats_add_queue(&sch->qstats, qdisc->cpu_qstats,
 				     &qdisc->qstats);
-		sch->q.qlen += qdisc_qlen(qdisc);
+		qlen += qdisc_qlen(qdisc);
 
 		spin_unlock_bh(qdisc_lock(qdisc));
 	}
+	WRITE_ONCE(sch->q.qlen, qlen);
 }
 EXPORT_SYMBOL_NS_GPL(mq_dump_common, "NET_SCHED_INTERNAL");
 
diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
index 002add5ce9e0ab04a6260495d1bec02983c2a204..91a92992cd24ab6c30bf7db2288c08cd493c7bc3 100644
--- a/net/sched/sch_mqprio.c
+++ b/net/sched/sch_mqprio.c
@@ -555,10 +555,11 @@ static int mqprio_dump(struct Qdisc *sch, struct sk_buff *skb)
 	struct mqprio_sched *priv = qdisc_priv(sch);
 	struct nlattr *nla = (struct nlattr *)skb_tail_pointer(skb);
 	struct tc_mqprio_qopt opt = { 0 };
+	unsigned int qlen = 0;
 	struct Qdisc *qdisc;
 	unsigned int ntx;
 
-	sch->q.qlen = 0;
+	qlen = 0;
 	gnet_stats_basic_sync_init(&sch->bstats);
 	memset(&sch->qstats, 0, sizeof(sch->qstats));
 
@@ -575,10 +576,11 @@ static int mqprio_dump(struct Qdisc *sch, struct sk_buff *skb)
 				     &qdisc->bstats, false);
 		gnet_stats_add_queue(&sch->qstats, qdisc->cpu_qstats,
 				     &qdisc->qstats);
-		sch->q.qlen += qdisc_qlen(qdisc);
+		qlen += qdisc_qlen(qdisc);
 
 		spin_unlock_bh(qdisc_lock(qdisc));
 	}
+	WRITE_ONCE(sch->q.qlen, qlen);
 
 	mqprio_qopt_reconstruct(dev, &opt);
 	opt.hw = priv->hw_offload;
@@ -663,12 +665,12 @@ static int mqprio_dump_class_stats(struct Qdisc *sch, unsigned long cl,
 	__acquires(d->lock)
 {
 	if (cl >= TC_H_MIN_PRIORITY) {
-		int i;
-		__u32 qlen;
-		struct gnet_stats_queue qstats = {0};
-		struct gnet_stats_basic_sync bstats;
 		struct net_device *dev = qdisc_dev(sch);
 		struct netdev_tc_txq tc = dev->tc_to_txq[cl & TC_BITMASK];
+		struct gnet_stats_queue qstats = {0};
+		struct gnet_stats_basic_sync bstats;
+		u32 qlen = 0;
+		int i;
 
 		gnet_stats_basic_sync_init(&bstats);
 		/* Drop lock here it will be reclaimed before touching
@@ -689,11 +691,11 @@ static int mqprio_dump_class_stats(struct Qdisc *sch, unsigned long cl,
 					     &qdisc->bstats, false);
 			gnet_stats_add_queue(&qstats, qdisc->cpu_qstats,
 					     &qdisc->qstats);
-			sch->q.qlen += qdisc_qlen(qdisc);
+			qlen += qdisc_qlen(qdisc);
 
 			spin_unlock_bh(qdisc_lock(qdisc));
 		}
-		qlen = qdisc_qlen(sch) + qstats.qlen;
+		qlen = qlen + qstats.qlen;
 
 		/* Reclaim root sleeping lock before completing stats */
 		if (d->lock)
diff --git a/net/sched/sch_multiq.c b/net/sched/sch_multiq.c
index 9f822fee113df6562ddac89092357434547a4599..4e465d11e3d75e36b875b66f8c8087c2e15cdad9 100644
--- a/net/sched/sch_multiq.c
+++ b/net/sched/sch_multiq.c
@@ -76,7 +76,7 @@ multiq_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 
 	ret = qdisc_enqueue(skb, qdisc, to_free);
 	if (ret == NET_XMIT_SUCCESS) {
-		sch->q.qlen++;
+		qdisc_qlen_inc(sch);
 		return NET_XMIT_SUCCESS;
 	}
 	if (net_xmit_drop_count(ret))
@@ -106,7 +106,7 @@ static struct sk_buff *multiq_dequeue(struct Qdisc *sch)
 			skb = qdisc->dequeue(qdisc);
 			if (skb) {
 				qdisc_bstats_update(sch, skb);
-				sch->q.qlen--;
+				qdisc_qlen_dec(sch);
 				return skb;
 			}
 		}
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index bc18e1976b6e07f81f975ceeb35c8b1a5125e8df..57b12cbca45355c69780614fa87aaf37255d64cc 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -416,7 +416,7 @@ static void tfifo_enqueue(struct sk_buff *nskb, struct Qdisc *sch)
 		rb_insert_color(&nskb->rbnode, &q->t_root);
 	}
 	q->t_len++;
-	sch->q.qlen++;
+	qdisc_qlen_inc(sch);
 }
 
 /* netem can't properly corrupt a megapacket (like we get from GSO), so instead
@@ -751,19 +751,19 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch)
 					if (net_xmit_drop_count(err))
 						qdisc_qstats_drop(sch);
 					sch->qstats.backlog -= pkt_len;
-					sch->q.qlen--;
+					qdisc_qlen_dec(sch);
 					qdisc_tree_reduce_backlog(sch, 1, pkt_len);
 				}
 				goto tfifo_dequeue;
 			}
-			sch->q.qlen--;
+			qdisc_qlen_dec(sch);
 			goto deliver;
 		}
 
 		if (q->qdisc) {
 			skb = q->qdisc->ops->dequeue(q->qdisc);
 			if (skb) {
-				sch->q.qlen--;
+				qdisc_qlen_dec(sch);
 				goto deliver;
 			}
 		}
@@ -776,7 +776,7 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch)
 	if (q->qdisc) {
 		skb = q->qdisc->ops->dequeue(q->qdisc);
 		if (skb) {
-			sch->q.qlen--;
+			qdisc_qlen_dec(sch);
 			goto deliver;
 		}
 	}
diff --git a/net/sched/sch_prio.c b/net/sched/sch_prio.c
index 9e2b9a490db23d858b27b7fc073b05a06535b05e..fe42ae3d6b696b2fc47f4d397af32e950eeec194 100644
--- a/net/sched/sch_prio.c
+++ b/net/sched/sch_prio.c
@@ -86,7 +86,7 @@ prio_enqueue(struct sk_buff *skb, struct Qdisc *sch, struct sk_buff **to_free)
 	ret = qdisc_enqueue(skb, qdisc, to_free);
 	if (ret == NET_XMIT_SUCCESS) {
 		sch->qstats.backlog += len;
-		sch->q.qlen++;
+		qdisc_qlen_inc(sch);
 		return NET_XMIT_SUCCESS;
 	}
 	if (net_xmit_drop_count(ret))
@@ -119,7 +119,7 @@ static struct sk_buff *prio_dequeue(struct Qdisc *sch)
 		if (skb) {
 			qdisc_bstats_update(sch, skb);
 			qdisc_qstats_backlog_dec(sch, skb);
-			sch->q.qlen--;
+			qdisc_qlen_dec(sch);
 			return skb;
 		}
 	}
diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
index 699e45873f86145e96abd0d9ca77a6d0ff763b1b..195c434aae5f7e03d1a1238ed73bb64b3f04e105 100644
--- a/net/sched/sch_qfq.c
+++ b/net/sched/sch_qfq.c
@@ -1152,12 +1152,12 @@ static struct sk_buff *qfq_dequeue(struct Qdisc *sch)
 	if (!skb)
 		return NULL;
 
-	sch->q.qlen--;
+	qdisc_qlen_dec(sch);
 
 	skb = agg_dequeue(in_serv_agg, cl, len);
 
 	if (!skb) {
-		sch->q.qlen++;
+		qdisc_qlen_inc(sch);
 		return NULL;
 	}
 
@@ -1265,7 +1265,7 @@ static int qfq_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 
 	_bstats_update(&cl->bstats, len, gso_segs);
 	sch->qstats.backlog += len;
-	++sch->q.qlen;
+	qdisc_qlen_inc(sch);
 
 	agg = cl->agg;
 	/* if the class is active, then done here */
diff --git a/net/sched/sch_red.c b/net/sched/sch_red.c
index 4d0e44a2e7c664e1599699d21ef482529ee2b119..0719590dfd73b64d21f71ab00621f64ed0eefc89 100644
--- a/net/sched/sch_red.c
+++ b/net/sched/sch_red.c
@@ -139,7 +139,7 @@ static int red_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 	ret = qdisc_enqueue(skb, child, to_free);
 	if (likely(ret == NET_XMIT_SUCCESS)) {
 		sch->qstats.backlog += len;
-		sch->q.qlen++;
+		qdisc_qlen_inc(sch);
 	} else if (net_xmit_drop_count(ret)) {
 		WRITE_ONCE(q->stats.pdrop,
 			   q->stats.pdrop + 1);
@@ -166,7 +166,7 @@ static struct sk_buff *red_dequeue(struct Qdisc *sch)
 	if (skb) {
 		qdisc_bstats_update(sch, skb);
 		qdisc_qstats_backlog_dec(sch, skb);
-		sch->q.qlen--;
+		qdisc_qlen_dec(sch);
 	} else {
 		if (!red_is_idling(&q->vars))
 			red_start_of_idle_period(&q->vars);
diff --git a/net/sched/sch_sfb.c b/net/sched/sch_sfb.c
index d3ee8e5479b35e38b71b0979e78aeadb40eb1655..efd9251c3add317f3b817f08c732fca0c347bf35 100644
--- a/net/sched/sch_sfb.c
+++ b/net/sched/sch_sfb.c
@@ -416,7 +416,7 @@ static int sfb_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 	ret = qdisc_enqueue(skb, child, to_free);
 	if (likely(ret == NET_XMIT_SUCCESS)) {
 		sch->qstats.backlog += len;
-		sch->q.qlen++;
+		qdisc_qlen_inc(sch);
 		increment_qlen(&cb, q);
 	} else if (net_xmit_drop_count(ret)) {
 		WRITE_ONCE(q->stats.childdrop,
@@ -446,7 +446,7 @@ static struct sk_buff *sfb_dequeue(struct Qdisc *sch)
 	if (skb) {
 		qdisc_bstats_update(sch, skb);
 		qdisc_qstats_backlog_dec(sch, skb);
-		sch->q.qlen--;
+		qdisc_qlen_dec(sch);
 		decrement_qlen(skb, q);
 	}
 
diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
index f39822babf88bee9d52cac9f39637d38ec36994f..f9807ee2cf6c72101ce39c4f43bf32c03c0a5f62 100644
--- a/net/sched/sch_sfq.c
+++ b/net/sched/sch_sfq.c
@@ -302,7 +302,7 @@ static unsigned int sfq_drop(struct Qdisc *sch, struct sk_buff **to_free)
 		len = qdisc_pkt_len(skb);
 		WRITE_ONCE(slot->backlog, slot->backlog - len);
 		sfq_dec(q, x);
-		sch->q.qlen--;
+		qdisc_qlen_dec(sch);
 		qdisc_qstats_backlog_dec(sch, skb);
 		qdisc_drop_reason(skb, sch, to_free, QDISC_DROP_OVERLIMIT);
 		return len;
@@ -456,7 +456,8 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch, struct sk_buff **to_free)
 		/* We could use a bigger initial quantum for new flows */
 		WRITE_ONCE(slot->allot, q->quantum);
 	}
-	if (++sch->q.qlen <= q->limit)
+	qdisc_qlen_inc(sch);
+	if (sch->q.qlen <= q->limit)
 		return NET_XMIT_SUCCESS;
 
 	qlen = slot->qlen;
@@ -497,7 +498,7 @@ sfq_dequeue(struct Qdisc *sch)
 	skb = slot_dequeue_head(slot);
 	sfq_dec(q, a);
 	qdisc_bstats_update(sch, skb);
-	sch->q.qlen--;
+	qdisc_qlen_dec(sch);
 	qdisc_qstats_backlog_dec(sch, skb);
 	WRITE_ONCE(slot->backlog, slot->backlog - qdisc_pkt_len(skb));
 	/* Is the slot empty? */
@@ -596,7 +597,7 @@ static void sfq_rehash(struct Qdisc *sch)
 			WRITE_ONCE(slot->allot, q->quantum);
 		}
 	}
-	sch->q.qlen -= dropped;
+	WRITE_ONCE(sch->q.qlen, sch->q.qlen - dropped);
 	qdisc_tree_reduce_backlog(sch, dropped, drop_len);
 }
 
diff --git a/net/sched/sch_skbprio.c b/net/sched/sch_skbprio.c
index f485f62ab721ab8cde21230c60514708fb479982..52abfb4015a36408046d96b349497419ab5dacf8 100644
--- a/net/sched/sch_skbprio.c
+++ b/net/sched/sch_skbprio.c
@@ -93,7 +93,7 @@ static int skbprio_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 		if (prio < q->lowest_prio)
 			q->lowest_prio = prio;
 
-		sch->q.qlen++;
+		qdisc_qlen_inc(sch);
 		return NET_XMIT_SUCCESS;
 	}
 
@@ -145,7 +145,7 @@ static struct sk_buff *skbprio_dequeue(struct Qdisc *sch)
 	if (unlikely(!skb))
 		return NULL;
 
-	sch->q.qlen--;
+	qdisc_qlen_dec(sch);
 	qdisc_qstats_backlog_dec(sch, skb);
 	qdisc_bstats_update(sch, skb);
 
diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index 71b690e1974dad8fbab7e12998e03f86a0847a98..d6b981e5df11cba060c9c92212479c0d5a058f5b 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -574,7 +574,7 @@ static int taprio_enqueue_one(struct sk_buff *skb, struct Qdisc *sch,
 	}
 
 	qdisc_qstats_backlog_inc(sch, skb);
-	sch->q.qlen++;
+	qdisc_qlen_inc(sch);
 
 	return qdisc_enqueue(skb, child, to_free);
 }
@@ -755,7 +755,7 @@ static struct sk_buff *taprio_dequeue_from_txq(struct Qdisc *sch, int txq,
 
 	qdisc_bstats_update(sch, skb);
 	qdisc_qstats_backlog_dec(sch, skb);
-	sch->q.qlen--;
+	qdisc_qlen_dec(sch);
 
 	return skb;
 }
diff --git a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c
index f2340164f579a25431979e12ec3d23ab828edd16..25edf11a7d671fe63878b0995998c5920b86ef74 100644
--- a/net/sched/sch_tbf.c
+++ b/net/sched/sch_tbf.c
@@ -231,7 +231,7 @@ static int tbf_segment(struct sk_buff *skb, struct Qdisc *sch,
 			len += seg_len;
 		}
 	}
-	sch->q.qlen += nb;
+	WRITE_ONCE(sch->q.qlen, sch->q.qlen + nb);
 	sch->qstats.backlog += len;
 	if (nb > 0) {
 		qdisc_tree_reduce_backlog(sch, 1 - nb, prev_len - len);
@@ -264,7 +264,7 @@ static int tbf_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 	}
 
 	sch->qstats.backlog += len;
-	sch->q.qlen++;
+	qdisc_qlen_inc(sch);
 	return NET_XMIT_SUCCESS;
 }
 
@@ -309,7 +309,7 @@ static struct sk_buff *tbf_dequeue(struct Qdisc *sch)
 			q->tokens = toks;
 			q->ptokens = ptoks;
 			qdisc_qstats_backlog_dec(sch, skb);
-			sch->q.qlen--;
+			qdisc_qlen_dec(sch);
 			qdisc_bstats_update(sch, skb);
 			return skb;
 		}
diff --git a/net/sched/sch_teql.c b/net/sched/sch_teql.c
index ec4039a201a2c2c502bc649fa5f6a0e4feee8fd5..bd10da46f5ddbc53f914648066dab526c8064e55 100644
--- a/net/sched/sch_teql.c
+++ b/net/sched/sch_teql.c
@@ -107,7 +107,7 @@ teql_dequeue(struct Qdisc *sch)
 	} else {
 		qdisc_bstats_update(sch, skb);
 	}
-	sch->q.qlen = dat->q.qlen + q->q.qlen;
+	WRITE_ONCE(sch->q.qlen, dat->q.qlen + q->q.qlen);
 	return skb;
 }
 
-- 
2.54.0.563.g4f69b47b94-goog


^ permalink raw reply related

* [PATCH net-next 1/8] net/sched: add READ_ONCE() in gnet_stats_add_queue[_cpu]
From: Eric Dumazet @ 2026-05-07 22:19 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Jamal Hadi Salim, Jiri Pirko, netdev, eric.dumazet,
	Eric Dumazet
In-Reply-To: <20260507221948.335726-1-edumazet@google.com>

Stats are read locklessly, add READ_ONCE() to prevent load-stearing.

Write side will be handled in separate patches.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/core/gen_stats.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/net/core/gen_stats.c b/net/core/gen_stats.c
index b71ccaec0991461333dbe465ee619bca4a06e75b..1a2380e74272de8eaf3d4ef453e56105a31e9edf 100644
--- a/net/core/gen_stats.c
+++ b/net/core/gen_stats.c
@@ -345,11 +345,11 @@ static void gnet_stats_add_queue_cpu(struct gnet_stats_queue *qstats,
 	for_each_possible_cpu(i) {
 		const struct gnet_stats_queue *qcpu = per_cpu_ptr(q, i);
 
-		qstats->qlen += qcpu->qlen;
-		qstats->backlog += qcpu->backlog;
-		qstats->drops += qcpu->drops;
-		qstats->requeues += qcpu->requeues;
-		qstats->overlimits += qcpu->overlimits;
+		qstats->qlen += READ_ONCE(qcpu->qlen);
+		qstats->backlog += READ_ONCE(qcpu->backlog);
+		qstats->drops += READ_ONCE(qcpu->drops);
+		qstats->requeues += READ_ONCE(qcpu->requeues);
+		qstats->overlimits += READ_ONCE(qcpu->overlimits);
 	}
 }
 
@@ -360,11 +360,11 @@ void gnet_stats_add_queue(struct gnet_stats_queue *qstats,
 	if (cpu) {
 		gnet_stats_add_queue_cpu(qstats, cpu);
 	} else {
-		qstats->qlen += q->qlen;
-		qstats->backlog += q->backlog;
-		qstats->drops += q->drops;
-		qstats->requeues += q->requeues;
-		qstats->overlimits += q->overlimits;
+		qstats->qlen += READ_ONCE(q->qlen);
+		qstats->backlog += READ_ONCE(q->backlog);
+		qstats->drops += READ_ONCE(q->drops);
+		qstats->requeues += READ_ONCE(q->requeues);
+		qstats->overlimits += READ_ONCE(q->overlimits);
 	}
 }
 EXPORT_SYMBOL(gnet_stats_add_queue);
-- 
2.54.0.563.g4f69b47b94-goog


^ permalink raw reply related

* [PATCH net-next 0/8] net/sched: prepare lockless qdisc dumps
From: Eric Dumazet @ 2026-05-07 22:19 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Jamal Hadi Salim, Jiri Pirko, netdev, eric.dumazet,
	Eric Dumazet

Goal is to no longer acquire RTNL in qdisc dumps.

This series annotate data-races, and change mq and mq_prio to
no longer acquire children qdisc spinlocks.

Eric Dumazet (8):
  net/sched: add READ_ONCE() in gnet_stats_add_queue[_cpu]
  net/sched: add qdisc_qlen_inc() and qdisc_qlen_dec()
  net/sched: annotate data-races around sch->qstats.backlog
  net/sched: add qdisc_qlen_lockless() helper
  net/sched: add const qualifiers to gnet_stats helpers
  net/sched: mq: no longer acquire qdisc spinlocks in dump operations
  net/sched: mq_prio: no longer acquire qdisc spinlocks in mqprio_dump()
  net/sched: mq_prio: no longer acquire qdisc spinlocks in
    mqprio_dump_class_stats()

 include/net/gen_stats.h   | 12 +++---
 include/net/sch_generic.h | 56 ++++++++++++++++++++-----
 net/core/gen_stats.c      | 44 ++++++++++----------
 net/sched/sch_api.c       |  4 +-
 net/sched/sch_cake.c      | 15 ++++---
 net/sched/sch_cbs.c       |  6 +--
 net/sched/sch_choke.c     |  8 ++--
 net/sched/sch_codel.c     |  2 +-
 net/sched/sch_drr.c       |  6 +--
 net/sched/sch_dualpi2.c   |  6 +--
 net/sched/sch_etf.c       |  8 ++--
 net/sched/sch_ets.c       |  6 +--
 net/sched/sch_fq.c        |  8 ++--
 net/sched/sch_fq_codel.c  | 11 ++---
 net/sched/sch_fq_pie.c    |  8 ++--
 net/sched/sch_generic.c   | 12 +++---
 net/sched/sch_gred.c      |  2 +-
 net/sched/sch_hfsc.c      |  6 +--
 net/sched/sch_hhf.c       |  7 ++--
 net/sched/sch_htb.c       |  6 +--
 net/sched/sch_mq.c        | 35 +++++++++++-----
 net/sched/sch_mqprio.c    | 86 +++++++++++++++++++++------------------
 net/sched/sch_multiq.c    |  4 +-
 net/sched/sch_netem.c     | 12 +++---
 net/sched/sch_prio.c      |  6 +--
 net/sched/sch_qfq.c       |  8 ++--
 net/sched/sch_red.c       |  6 +--
 net/sched/sch_sfb.c       |  8 ++--
 net/sched/sch_sfq.c       | 11 ++---
 net/sched/sch_skbprio.c   |  4 +-
 net/sched/sch_taprio.c    |  4 +-
 net/sched/sch_tbf.c       | 10 ++---
 net/sched/sch_teql.c      |  2 +-
 33 files changed, 242 insertions(+), 187 deletions(-)

-- 
2.54.0.563.g4f69b47b94-goog


^ permalink raw reply

* Re: [PATCH net-next 08/12] dt-bindings: net: toshiba,tc965x-dwmac: add TC956x Ethernet bridge
From: Alex Elder @ 2026-05-07 22:17 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: andrew+netdev, davem, edumazet, kuba, pabeni, maxime.chevallier,
	rmk+kernel, andersson, konradybcio, robh, krzk+dt, conor+dt,
	linusw, brgl, arnd, gregkh, Daniel Thompson, mohd.anwar,
	a0987203069, alexandre.torgue, ast, boon.khai.ng, chenchuangyu,
	chenhuacai, daniel, hawk, hkallweit1, inochiama, john.fastabend,
	julianbraha, livelycarpet87, matthew.gerlach, mcoquelin.stm32, me,
	prabhakar.mahadev-lad.rj, richardcochran, rohan.g.thomas, sdf,
	siyanteng, weishangjuan, wens, netdev, bpf, linux-arm-msm,
	devicetree, linux-gpio, linux-stm32, linux-arm-kernel,
	linux-kernel
In-Reply-To: <1f34cbce-e2dd-4e80-b136-55d0efa50002@lunn.ch>

On 5/1/26 12:38 PM, Andrew Lunn wrote:
> Why not add an subnodes for the ethernet interfaces?

We will define "ethernet" devicetree subnodes of the PCIe functions
in the next version of the series.  Something like what's below.

					-Alex

pci@0,1 {
         compatible = "pci1179,0220";
         reg = <0x50100 0x0 0x0 0x0 0x0>;
         #address-cells = <3>;
         #size-cells = <2>;
         device_type = "pci";
         ranges;

         ethernet {
                 phy-mode = "sgmii";
                 phy-handle = <&tc956x_emac1_phy>;

                 mdio {
                         compatible = "snps,dwmac-mdio";
                         #address-cells = <1>;
                         #size-cells = <0>;

                         tc956x_emac1_phy: ethernet-phy@1c {
                                 compatible = "ethernet-phy-id004d.d101";
				...
			};
		};
	};
};

^ permalink raw reply

* Re: [PATCH v4 0/7] landlock: Add UDP access control support
From: Matthieu Buffet @ 2026-05-07 22:11 UTC (permalink / raw)
  To: Günther Noack
  Cc: Mickaël Salaün, linux-security-module, Mikhail Ivanov,
	konstantin.meskhidze, Tingmao Wang, netdev
In-Reply-To: <aftfVvru3npQ9kWq@google.com>

Hi Günther,

On 5/6/2026 5:33 PM, Günther Noack wrote:
> For the final revision, I think it would be good to squash the two
> commits that are about LANDLOCK_ACCESS_NET_CONNECT_SEND_UDP.  That
> reduces the chances that someone backports the first but not the
> second to one of the distribution kernels.

I did indeed split the implementation of that access right into two 
commits, 100% to ease reading each part of the change semi-independently 
for reviewers. It can/should indeed be squashed without losing anything.

-- 
Matthieu

^ permalink raw reply

* Re: [PATCH net v2] ice: fix packet corruption due to extraneous page flip
From: Jacob Keller @ 2026-05-07 22:11 UTC (permalink / raw)
  To: John Ousterhout, anthony.l.nguyen, Jakub Kicinski, Paolo Abeni
  Cc: intel-wired-lan, przemyslaw.kitszel, netdev, stable
In-Reply-To: <20260507183843.1457-1-ouster@cs.stanford.edu>

On 5/7/2026 11:38 AM, John Ousterhout wrote:
> Note: major revisions to the ice driver make this patch irrelevant
> for recent versions. It applies to longterm stable versions
> 6.18.27 and 6.12.86; it also seems relevant for 6.6.137, but would
> need modifications for that version. I have not examined earlier
> versions
> 

From this description I take it this only applies to the ice driver
prior to its conversion to page pool?

In that case, I think you need to Cc: stable@vger.kernel.org and include
the relevant versions you intend to target.

I think this case is "unique" since there would not be an upstream
equivalent patch. But that is merely because we removed the faulty code
before it could be fixed.

I'm not 100% sure whta method to follow since typical stable rules don't
really like taking patches that don't apply to mainline...

Even with it being somewhat rare to get 0 size packet, it is not
impossible and packet corruption is a Big(TM) deal.

Thanks,
Jake

> Signed-off-by: John Ousterhout <ouster@cs.stanford.edu>
> ---
>  drivers/net/ethernet/intel/ice/ice_txrx.c | 23 ++++++++++++++++++++---
>  1 file changed, 20 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
> index 51c459a3e722..081c7a7392b7 100644
> --- a/drivers/net/ethernet/intel/ice/ice_txrx.c
> +++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
> @@ -1215,6 +1215,13 @@ static void ice_put_rx_mbuf(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp,
>  		xdp_frags = xdp_get_shared_info_from_buff(xdp)->nr_frags;
>  
>  	while (idx != ntc) {
> +		union ice_32b_rx_flex_desc *rx_desc;
> +		unsigned int size;
> +
> +		rx_desc = ICE_RX_DESC(rx_ring, idx);
> +		size = le16_to_cpu(rx_desc->wb.pkt_len) &
> +		       ICE_RX_FLX_DESC_PKT_LEN_M;
> +
>  		buf = &rx_ring->rx_buf[idx];
>  		if (++idx == cnt)
>  			idx = 0;
> @@ -1224,10 +1231,20 @@ static void ice_put_rx_mbuf(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp,
>  		 * To do this, only adjust pagecnt_bias for fragments up to
>  		 * the total remaining after the XDP program has run.
>  		 */
> -		if (verdict != ICE_XDP_CONSUMED)
> -			ice_rx_buf_adjust_pg_offset(buf, xdp->frame_sz);
> -		else if (i++ <= xdp_frags)
> +		if (verdict != ICE_XDP_CONSUMED) {
> +			/* Don't "flip" the page if size is 0: in this case
> +			 * the data in the current half will not be used so
> +			 * it's OK to reuse that half. And, since the bias
> +			 * didn't get decremented for this half, the page can
> +			 * be returned to the NIC even if the other half is
> +			 * still in use, so flipping the page could cause
> +			 * live packet data to be overwritten.
> +			 */
> +			if (size != 0)
> +				ice_rx_buf_adjust_pg_offset(buf, xdp->frame_sz);
> +		} else if (i++ <= xdp_frags) {
>  			buf->pagecnt_bias++;
> +		}
>  
>  		ice_put_rx_buf(rx_ring, buf);
>  	}


^ permalink raw reply

* Re: [devel-ipsec] [PATCH RFC] xfrm: enforce SPI uniqueness for inbound SAs only
From: Andrew Cagney @ 2026-05-07 22:07 UTC (permalink / raw)
  To: antony.antony
  Cc: Steffen Klassert, Herbert Xu, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Aakash Kumar S, Yan Yan,
	Abed Mohammad Kamaluddin, Nathan Harold, netdev, devel
In-Reply-To: <20260416-alloc-spi-dir-v1-1-145e16477480@secunet.com>

On Thu, 16 Apr 2026 at 01:45, Antony Antony via Devel
<devel@lists.linux-ipsec.org> wrote:
>
> Per RFC 4301 section 4.4.2.1, the SPI is selected by the receiving
> end, which is interpreted as making SPI uniqueness an inbound-only
> requirement.

Yes please!

^ permalink raw reply

* Re: [Intel-wired-lan] [PATCH net v2] ice: Fix missing 1's complement negation in GCS raw checksum
From: Jacob Keller @ 2026-05-07 21:56 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Tony Nguyen, Aleksandr Loktionov, kernel-team, Matt Fleming,
	stable, Simon Horman, Przemek Kitszel, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Eric Joyner, Paul Greenwalt, Alice Michael, intel-wired-lan,
	netdev, linux-kernel
In-Reply-To: <afxbZjldi1OC3HmS@matt-Precision-5490>

On 5/7/2026 2:34 AM, Matt Fleming wrote:
> On Mon, May 04, 2026 at 05:10:23PM -0700, Jacob Keller wrote:
>>
>> Hi,
>>
>> Based on your patch description, I assume that you've tested this on
>> real hardware.
>>
>> I dug a little through some of our internal changes history and sawe
>> that it looks like the hardware has a register setting in its
>> GL_RDPU_CNTRL register which determines whether the checksum value
>> reported is inverted or not. In E830 hardware, it is supposed to be off
>> (i.e. the checksum value reported already matches the expected setting.
>>
>> Perhaps your device somehow got the GL_RDPU_CNTRL register set to the
>> wrong mode and that results in the swap being necessary. Hmm.
>>
>> I'll ask the team to see if they can confirm this behavior.
> 
> Hi Jake,
> 
> Thanks for digging into this.
> 

I'm still trying to see if I can get someone on our team with hardware
setup to confirm this behavior.

> I read GL_RDPU_CNTRL on our affected E830 and the value is the same on
> both ports of the NIC:
> 
>   0000:c1:00.0: GL_RDPU_CNTRL = 0x0020a275
>   0000:c1:00.1: GL_RDPU_CNTRL = 0x0020a275
> 

Ok. Makes sense.

> Decoding bit 22 (E830_GL_RDPU_CNTRL_CHECKSUM_COMPLETE_INV) gives 0,
> i.e. the hardware is supposedly in "not inverted" mode, which matches
> the default you described.
> 
I wonder if it would actually work if we set the bit to 1. It could be
that there was miscommunication between us and the hardware folks so
what "inverted" means got.. inverted. (pun intended).

> However, looking at the data on the wire I see:
> 
>   - netdev_rx_csum_fault fires ~65 000 times/sec on this host.
>   - bpftrace at fexit:ice_process_skb_fields shows skb->csum =
>     swab16(raw_csum) directly (no negation), e.g. raw_csum=0xfb4f
>     -> skb->csum=0x4ffb.
>   - At fentry:__skb_checksum_complete the upper 16 bits of skb->csum
>     are 0xFFFF on every TCP/UDP packet -- the signature of nf_ip_checksum
>     adding the pseudo-header to a value that was the un-negated raw_csum.
>   - fold2(skb->csum_at_fentry + skb_checksum(skb,0,len,0)) ≈ 0xFFFF
>     for every packet, which means the two values are ones-complement
>     complements of each other, i.e. the driver stored S where the
>     stack expects ~S.
> 
> Negating the checksum makes the failures go away.
> 

Yea. Clearly we're inverting the checksum relative to what the stack wants.

I'm also curious why our validation folks haven't noticed or
complained.. but I think it may be partially because we have to disable
the GSC support when operating TSO offload, so TCP traffic won't be able
to replicate this. Hmm... Oh....

I wonder if the way we distinguish between these modes isn't per-flow
but instead per-device.. so if you have NETIF_F_ALL_TSO enabled you
won't get NETIF_F_HW_CSUM, which means that we'd go through the old
legacy path... Ugh. That might explain how this escaped testing :(

I've queued your v2 to dev-queue, so at the very least we can get some
validation and confirm the behavior.

> Thanks,
> Matt


^ permalink raw reply

* Re: [PATCH net-next v2 4/4] net: phy: Introduce Airoha AN8801/R Gigabit Ethernet PHY driver
From: Andrew Lunn @ 2026-05-07 21:43 UTC (permalink / raw)
  To: Louis-Alexis Eyraud
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	AngeloGioacchino Del Regno, Heiner Kallweit, Russell King,
	kevin-kw.huang, macpaul.lin, matthias.bgg, kernel, netdev,
	devicetree, linux-arm-kernel, linux-mediatek, linux-kernel
In-Reply-To: <2c441d51f6a865ddb6e67b63cd26a651ed3ff058.camel@collabora.com>

> > > +static int an8801r_of_init_leds(struct phy_device *phydev, u8
> > > *led_cfg)
> > > +{
> > > +	struct device *dev = &phydev->mdio.dev;
> > > +	struct device_node *np = dev->of_node;
> > > +	struct device_node *leds;
> > > +	u32 function_enum_idx;
> > > +	int ret;
> > > +
> > > +	if (!np)
> > > +		return 0;
> > > +
> > > +	/* If devicetree is present, leds configuration is
> > > required */
> > > +	leds = of_get_child_by_name(np, "leds");
> > > +	if (!leds)
> > > +		return 0;
> > > +
> > > +	for_each_available_child_of_node_scoped(leds, led) {
> > > +		u32 led_idx;
> > > +
> > > +		ret = of_property_read_u32(led, "reg", &led_idx);
> > > +		if (ret)
> > > +			goto out;
> > > +
> > > +		if (led_idx >= AN8801R_NUM_LEDS) {
> > > +			ret = -EINVAL;
> > > +			goto out;
> > > +		}
> > > +
> > > +		ret = of_property_read_u32(led, "function-
> > > enumerator",
> > > +					   &function_enum_idx);
> > > +		if (ret)
> > > +			function_enum_idx = AN8801R_LED_FN_NONE;
> > > +
> > 
> > What is this doing? Is this documented in the binding?
> The `function-enumerator` property is only documented in the led common
> dt-binding file. The an8801 dt-bindings inherits this property from the
> ethernet-phy dt-bindings.
> 
> We aimed to have this PHY have its led behaviour (how many to enable
> and what their role shall be) configurable using devicetree and not to
> rely on a default configuration, hard-coded in the driver (like the
> air_en8811h driver did) and also make use of the led hardware
> offloading (for functions like 100/1000, activity blinking, and others)
> that this PHY is capable of.

What other drivers do is leave the configuration with its reset
default. They are often sensible. When the netdev trigger loads, it
should ask the LED how it is configured, and the values in sysfs will
reflect it. After that you can change it, via udev rules, etc.

You have to be careful about what you put in DT. DT describes
hardware, not configuration or policy. How the LED blinks is probably
configuration, so it does not belong in DT.

> > > +static int an8801r_read_status(struct phy_device *phydev)
> > > +{
> > > +	int prev_speed, ret;
> > > +	u32 val;
> > > +
> > > +	prev_speed = phydev->speed;
> > > +
> > > +	ret = genphy_read_status(phydev);
> > > +	if (ret)
> > > +		return ret;
> > > +
> > > +	if (phydev->link && prev_speed != phydev->speed) {
> > > +		val = phydev->speed == SPEED_1000 ?
> > > +		      AN8801_BPBUS_LINK_MODE_1000 : 0;
> > > +
> > > +		return an8801_buckpbus_reg_rmw(phydev,
> > > +					      
> > > AN8801_BPBUS_REG_LINK_MODE,
> > > +					      
> > > AN8801_BPBUS_LINK_MODE_1000,
> > > +					       val);
> > > +	};
> > 
> > This is unusual. What is it doing? Please add a comment.
> This call is to ensure that the PHY switches to the expected 1Gbps 
> speed when available. 

So this is an errata workaround? Please add this in a patch of its
own, described the problem in the commit message, list the errata etc.

     Andrew

^ permalink raw reply

* [PATCH] net: mention the convention for .ndo_setup_tc()
From: David Yang @ 2026-05-07 21:40 UTC (permalink / raw)
  To: netdev
  Cc: David Yang, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Andrew Lunn, linux-kernel

qdisc_offload_dump_helper(), originated from commit 602f3baf2218
("net_sch: red: Add offload ability to RED qdisc"), is designed to that

    Whether RED is being offloaded is being determined every time dump
    action is being called because parent change of this qdisc could
    change its offload state but doesn't require any RED function to be
    called.

and returning -EOPNOTSUPP (for dump queries) does not mean "I don't have
any statistics", but "I don't offload this qdisc anymore". At least two
existing drivers did it wrong, so it is worth mentioning.

Signed-off-by: David Yang <mmyangfl@gmail.com>
---
 include/linux/netdevice.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 744ffa243501..b18a6d917771 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1223,6 +1223,12 @@ struct netdev_net_notifier {
  *	tx queues stopped. This allows the netdevice to perform queue
  *	management safely.
  *
+ *	NB: Returning -EOPNOTSUPP for whatever commands means "this qdisc is not
+ *	offloaded (anymore, offloading may have silently stopped)", and the
+ *	offloading flag is cleared. This is especially true for dump queries
+ *	(e.g. TC_*_STATS commands). If the underlying device does not report any
+ *	statistics but is still offloading, return 0 instead.
+ *
  *	Fiber Channel over Ethernet (FCoE) offload functions.
  * int (*ndo_fcoe_enable)(struct net_device *dev);
  *	Called when the FCoE protocol stack wants to start using LLD for FCoE
-- 
2.53.0


^ permalink raw reply related

* Re: [PATCH net-next v2 3/4] net: phy: air_phy_lib: Factorize BuckPBus register accessors
From: Andrew Lunn @ 2026-05-07 21:36 UTC (permalink / raw)
  To: Louis-Alexis Eyraud
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	AngeloGioacchino Del Regno, Heiner Kallweit, Russell King,
	kevin-kw.huang, macpaul.lin, matthias.bgg, kernel, netdev,
	devicetree, linux-arm-kernel, linux-mediatek, linux-kernel
In-Reply-To: <9f569ba00b959b701a7a51bcd7347c1a108a15ee.camel@collabora.com>

On Thu, May 07, 2026 at 02:11:54PM +0200, Louis-Alexis Eyraud wrote:
> Hi Andrew,
> 
> On Thu, 2026-03-26 at 13:30 +0100, Andrew Lunn wrote:
> > > @@ -480,8 +287,8 @@ static int en8811h_wait_mcu_ready(struct
> > > phy_device *phydev)
> > >  {
> > >  	int ret, reg_value;
> > >  
> > > -	ret = air_buckpbus_reg_write(phydev, EN8811H_FW_CTRL_1,
> > > -				     EN8811H_FW_CTRL_1_FINISH);
> > > +	ret = air_phy_buckpbus_reg_write(phydev,
> > > EN8811H_FW_CTRL_1,
> > > +					
> > > EN8811H_FW_CTRL_1_FINISH);
> > 
> > Is a rename required? Is the namespace air_buckpbus_ used somewhere
> > else?
> > 
> > 	Andrew
> Sorry for the delay.
> 
> The air_buckpbus_ namespace is only used in the air_en8811h driver.
> It seemed better to me that in the new air_phy_lib, all functions (the
> buckpbus accessors and air_phy_read/write_page functions) started with
> the same prefix. That is the reason I renamed them, even if not
> required.
> 
> As an alternative, to avoid renaming those buckpbus function calls on
> air_en8811h driver and reduce this patch changes, I can add macros at
> the beginning of the file such as:
> ```
> #define air_buckpbus_reg_write(_phydev, _pbus_address, _pbus_data) \
> 	air_phy_buckpbus_reg_write(_phydev, _pbus_address, _pbus_data)

No don't do this.

If you want to rename them, rename them. But do it in a patch which
only contains a rename. That is easier to review, and more obviously
correct.

	Andrew

^ permalink raw reply

* Re: [PATCH v2] dt-bindings: Fix phandle-array constraints, again
From: Rob Herring (Arm) @ 2026-05-07 21:30 UTC (permalink / raw)
  To: Rob Herring (Arm)
  Cc: Jakub Kicinski, Eric Dumazet, Paolo Abeni, Wolfram Sang,
	Andrew Lunn, linux-kernel, linux-mmc, dri-devel, devicetree,
	Johannes Berg, Bjorn Andersson, Krzysztof Wilczyński,
	linux-arm-kernel, Jeff Johnson, linux-pci, Lorenzo Pieralisi,
	linux-usb, Conor Dooley, ath11k, linux-spi, linux-remoteproc,
	Ulf Hansson, Greg Kroah-Hartman, Andi Shyti,
	Manivannan Sadhasivam, David S. Miller, Mathieu Poirier,
	Maxime Ripard, Sylwester Nawrocki, linux-wireless, Mark Brown,
	linux-sound, linux-i2c, Bjorn Helgaas, netdev, ath10k,
	Krzysztof Kozlowski
In-Reply-To: <20260507201749.2605365-1-robh@kernel.org>


On Thu, 07 May 2026 15:16:00 -0500, Rob Herring (Arm) wrote:
> The unfortunately named 'phandle-array' property type is really a matrix
> with phandle and fixed arg cells entries. A matrix property should have 2
> levels of items constraints.
> 
> Acked-by: Mark Brown <broonie@kernel.org>
> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
> ---
> v2:
>  - Add proper descriptions for 'qcom,smem-states'. Thanks Krzysztof!
>  - Fix i2c-parent warning
>  - Fix extra blank lines
> ---
>  .../rockchip/rockchip,rk3399-cdn-dp.yaml       |  2 ++
>  .../bindings/i2c/i2c-demux-pinctrl.yaml        |  1 +
>  .../mmc/hisilicon,hi3798cv200-dw-mshc.yaml     |  7 ++++---
>  .../devicetree/bindings/net/qcom,bam-dmux.yaml | 12 ++++++++++++
>  .../devicetree/bindings/net/qcom,ipa.yaml      | 12 ++++++++++++
>  .../bindings/net/wireless/qcom,ath10k.yaml     |  8 +++++++-
>  .../bindings/net/wireless/qcom,ath11k.yaml     |  8 +++++++-
>  .../net/wireless/qcom,ipq5332-wifi.yaml        | 18 ++++++++++++++++++
>  .../bindings/pci/toshiba,tc9563.yaml           |  5 +++--
>  .../remoteproc/qcom,msm8916-mss-pil.yaml       |  6 ++++++
>  .../remoteproc/qcom,msm8996-mss-pil.yaml       |  7 +++++++
>  .../bindings/remoteproc/qcom,pas-common.yaml   |  6 ++++++
>  .../remoteproc/qcom,qcs404-cdsp-pil.yaml       |  6 ++++++
>  .../remoteproc/qcom,sc7180-mss-pil.yaml        |  6 ++++++
>  .../remoteproc/qcom,sc7280-adsp-pil.yaml       |  6 ++++++
>  .../remoteproc/qcom,sc7280-mss-pil.yaml        |  6 ++++++
>  .../remoteproc/qcom,sc7280-wpss-pil.yaml       |  6 ++++++
>  .../remoteproc/qcom,sdm845-adsp-pil.yaml       |  6 ++++++
>  .../bindings/remoteproc/qcom,wcnss-pil.yaml    |  6 ++++++
>  .../devicetree/bindings/sound/samsung,tm2.yaml |  8 ++++++--
>  .../bindings/spi/st,stm32mp25-ospi.yaml        |  5 +++--
>  .../bindings/usb/chipidea,usb2-common.yaml     |  2 ++
>  .../devicetree/bindings/usb/ci-hdrc-usb2.yaml  |  7 ++++---
>  23 files changed, 142 insertions(+), 14 deletions(-)
> 

My bot found errors running 'make dt_binding_check' on your patch:

yamllint warnings/errors:

dtschema/dtc warnings/errors:
/builds/robherring/dt-review-ci/linux/Documentation/devicetree/bindings/i2c/i2c-demux-pinctrl.example.dtb: i2c-mux3 (i2c-demux-pinctrl): i2c-parent:0: [2, 3, 4] is too long
	from schema $id: http://devicetree.org/schemas/i2c/i2c-demux-pinctrl.yaml
/builds/robherring/dt-review-ci/linux/Documentation/devicetree/bindings/i2c/i2c-demux-pinctrl.example.dtb: i2c-mux3 (i2c-demux-pinctrl): i2c-parent: [[2, 3, 4]] is too short
	from schema $id: http://devicetree.org/schemas/i2c/i2c-demux-pinctrl.yaml

doc reference errors (make refcheckdocs):

See https://patchwork.kernel.org/project/devicetree/patch/20260507201749.2605365-1-robh@kernel.org

The base for the series is generally the latest rc1. A different dependency
should be noted in *this* patch.

If you already ran 'make dt_binding_check' and didn't see the above
error(s), then make sure 'yamllint' is installed and dt-schema is up to
date:

pip3 install dtschema --upgrade

Please check and re-submit after running the above command yourself. Note
that DT_SCHEMA_FILES can be set to your schema file to speed up checking
your schema. However, it must be unset to test all examples with your schema.


^ permalink raw reply

* [mst-vhost:balloon 6/30] Warning: mm/mempolicy.c:2444 function parameter 'user_addr' not described in '__alloc_pages_mpol'
From: kernel test robot @ 2026-05-07 21:26 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: oe-kbuild-all, kvm, virtualization, netdev

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git balloon
head:   9f56ee36fbf6a6d336dc6a9eaeb4f8a67cb42a31
commit: c4289f5a4e563611a468b4b5379025a4aa4a7c12 [6/30] mm: thread user_addr through page allocator for cache-friendly zeroing
config: powerpc-allmodconfig (https://download.01.org/0day-ci/archive/20260508/202605080515.6jRN5wN7-lkp@intel.com/config)
compiler: powerpc64-linux-gcc (GCC) 15.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260508/202605080515.6jRN5wN7-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202605080515.6jRN5wN7-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> Warning: mm/mempolicy.c:2444 function parameter 'user_addr' not described in '__alloc_pages_mpol'
>> Warning: mm/mempolicy.c:2444 expecting prototype for alloc_pages_mpol(). Prototype was for __alloc_pages_mpol() instead
   Warning: mm/mempolicy.c:2547 expecting prototype for vma_alloc_folio(). Prototype was for alloc_frozen_pages() instead

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply

* [PATCH 3/3] [v5 omap] ARM: dts: omap2: add stlc4560 spi-wireless node
From: Arnd Bergmann @ 2026-05-07 21:24 UTC (permalink / raw)
  To: netdev
  Cc: Arnd Bergmann, Aaro Koskinen, Andreas Kemnade,
	Bartosz Golaszewski, Benoît Cousson, David S. Miller,
	Dmitry Torokhov, Eric Dumazet, Felipe Balbi, Jakub Kicinski,
	Johannes Berg, Kevin Hilman, Krzysztof Kozlowski, Linus Walleij,
	Paolo Abeni, Rob Herring, Roger Quadros, Tony Lindgren,
	linux-wireless, devicetree, linux-kernel, linux-arm-kernel,
	linux-gpio, linux-omap, Krzysztof Kozlowski
In-Reply-To: <20260507212451.3333185-1-arnd@kernel.org>

From: Arnd Bergmann <arnd@arndb.de>

Converted from the platform_device creation in board-n8x0.c.

Link: https://lore.kernel.org/all/20230314163201.955689-1-arnd@kernel.org/
Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org>
Reviewed-by: Linus Walleij <linusw@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 arch/arm/boot/dts/ti/omap/omap2.dtsi                |  4 ++++
 arch/arm/boot/dts/ti/omap/omap2420-n8x0-common.dtsi | 12 ++++++++++++
 2 files changed, 16 insertions(+)

diff --git a/arch/arm/boot/dts/ti/omap/omap2.dtsi b/arch/arm/boot/dts/ti/omap/omap2.dtsi
index afabb36a8ac1..fdc1790adf43 100644
--- a/arch/arm/boot/dts/ti/omap/omap2.dtsi
+++ b/arch/arm/boot/dts/ti/omap/omap2.dtsi
@@ -129,6 +129,8 @@ i2c2: i2c@48072000 {
 		};
 
 		mcspi1: spi@48098000 {
+			#address-cells = <1>;
+			#size-cells = <0>;
 			compatible = "ti,omap2-mcspi";
 			ti,hwmods = "mcspi1";
 			reg = <0x48098000 0x100>;
@@ -140,6 +142,8 @@ mcspi1: spi@48098000 {
 		};
 
 		mcspi2: spi@4809a000 {
+			#address-cells = <1>;
+			#size-cells = <0>;
 			compatible = "ti,omap2-mcspi";
 			ti,hwmods = "mcspi2";
 			reg = <0x4809a000 0x100>;
diff --git a/arch/arm/boot/dts/ti/omap/omap2420-n8x0-common.dtsi b/arch/arm/boot/dts/ti/omap/omap2420-n8x0-common.dtsi
index 63b0b4921e4e..fe9dd8bbfc85 100644
--- a/arch/arm/boot/dts/ti/omap/omap2420-n8x0-common.dtsi
+++ b/arch/arm/boot/dts/ti/omap/omap2420-n8x0-common.dtsi
@@ -109,3 +109,15 @@ partition@5 {
 		};
 	};
 };
+
+&mcspi2 {
+	status = "okay";
+
+	wifi@0 {
+		reg = <0>;
+		compatible = "st,stlc4560";
+		spi-max-frequency = <48000000>;
+		interrupts-extended = <&gpio3 23 IRQ_TYPE_EDGE_RISING>;
+		powerdown-gpios = <&gpio4 1 GPIO_ACTIVE_LOW>; /* gpio 97 */
+	};
+};
-- 
2.39.5


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox