Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net v4 1/2] flow_dissector: do not dissect PPPoE PFC frames
From: Simon Horman @ 2026-04-10 17:10 UTC (permalink / raw)
  To: Qingfang Deng
  Cc: linux-ppp, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Guillaume Nault, Wojciech Drewek, Tony Nguyen,
	linux-kernel, netdev, Paul Mackerras, Jaco Kroon, James Carlson,
	Marcin Szycik
In-Reply-To: <20260410033627.93786-1-qingfang.deng@linux.dev>

On Fri, Apr 10, 2026 at 11:36:20AM +0800, Qingfang Deng wrote:
> RFC 2516 Section 7 states that Protocol Field Compression (PFC) is NOT
> RECOMMENDED for PPPoE. In practice, pppd does not support negotiating
> PFC for PPPoE sessions, and the flow dissector driver has assumed an
> uncompressed frame until the blamed commit.
> 
> During the review process of that commit [1], support for PFC is
> suggested. However, having a compressed (1-byte) protocol field means
> the subsequent PPP payload is shifted by one byte, causing 4-byte
> misalignment for the network header and an unaligned access exception
> on some architectures.
> 
> The exception can be reproduced by sending a PPPoE PFC frame to an
> ethernet interface of a MIPS board, with RPS enabled, even if no PPPoE
> session is active on that interface:
> 
> $ 0   : 00000000 80c40000 00000000 85144817
> $ 4   : 00000008 00000100 80a75758 81dc9bb8
> $ 8   : 00000010 8087ae2c 0000003d 00000000
> $12   : 000000e0 00000039 00000000 00000000
> $16   : 85043240 80a75758 81dc9bb8 00006488
> $20   : 0000002f 00000007 85144810 80a70000
> $24   : 81d1bda0 00000000
> $28   : 81dc8000 81dc9aa8 00000000 805ead08
> Hi    : 00009d51
> Lo    : 2163358a
> epc   : 805e91f0 __skb_flow_dissect+0x1b0/0x1b50
> ra    : 805ead08 __skb_get_hash_net+0x74/0x12c
> Status: 11000403        KERNEL EXL IE
> Cause : 40800010 (ExcCode 04)
> BadVA : 85144817
> PrId  : 0001992f (MIPS 1004Kc)
> Call Trace:
> [<805e91f0>] __skb_flow_dissect+0x1b0/0x1b50
> [<805ead08>] __skb_get_hash_net+0x74/0x12c
> [<805ef330>] get_rps_cpu+0x1b8/0x3fc
> [<805fca70>] netif_receive_skb_list_internal+0x324/0x364
> [<805fd120>] napi_complete_done+0x68/0x2a4
> [<8058de5c>] mtk_napi_rx+0x228/0xfec
> [<805fd398>] __napi_poll+0x3c/0x1c4
> [<805fd754>] napi_threaded_poll_loop+0x234/0x29c
> [<805fd848>] napi_threaded_poll+0x8c/0xb0
> [<80053544>] kthread+0x104/0x12c
> [<80002bd8>] ret_from_kernel_thread+0x14/0x1c
> 
> Code: 02d51821  1060045b  00000000 <8c640000> 3084000f  2c820005  144001a2  00042080  8e220000
> 
> To reduce the attack surface and maintain performance, do not process
> PPPoE PFC frames. While at it, avoid byte-swapping at runtime, restoring
> the original behavior.
> 
> [1] https://patch.msgid.link/20220630231016.GA392@debian.home
> Fixes: 46126db9c861 ("flow_dissector: Add PPPoE dissectors")
> Signed-off-by: Qingfang Deng <qingfang.deng@linux.dev>
> ---
> Changes in v4: no new changes
>  Link to v3: https://lore.kernel.org/r/20260409031107.616630-1-qingfang.deng@linux.dev
> Changes in v3:
>  Make ppp_proto_is_valid() private and fix kdoc warning, avoiding
>  gotchas if some out-of-tree modules use this function.
>  Link to v1: https://lore.kernel.org/r/20260407045743.174446-1-qingfang.deng@linux.dev
> 
>  include/linux/ppp_defs.h  | 13 -------------
>  net/core/flow_dissector.c | 39 +++++++++++++++++++++++----------------
>  2 files changed, 23 insertions(+), 29 deletions(-)
> 
> diff --git a/include/linux/ppp_defs.h b/include/linux/ppp_defs.h
> index b7e57fdbd413..45c0947fa404 100644
> --- a/include/linux/ppp_defs.h
> +++ b/include/linux/ppp_defs.h
> @@ -12,17 +12,4 @@
>  
>  #define PPP_FCS(fcs, c) crc_ccitt_byte(fcs, c)
>  
> -/**
> - * ppp_proto_is_valid - checks if PPP protocol is valid
> - * @proto: PPP protocol
> - *
> - * Assumes proto is not compressed.
> - * Protocol is valid if the value is odd and the least significant bit of the
> - * most significant octet is 0 (see RFC 1661, section 2).
> - */
> -static inline bool ppp_proto_is_valid(u16 proto)
> -{
> -	return !!((proto & 0x0101) == 0x0001);
> -}
> -
>  #endif /* _PPP_DEFS_H_ */
> diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
> index 1b61bb25ba0e..64b843800370 100644
> --- a/net/core/flow_dissector.c
> +++ b/net/core/flow_dissector.c
> @@ -1035,6 +1035,21 @@ static bool is_pppoe_ses_hdr_valid(const struct pppoe_hdr *hdr)
>  	return hdr->ver == 1 && hdr->type == 1 && hdr->code == 0;
>  }
>  
> +/**
> + * ppp_proto_is_valid - checks if PPP protocol is valid
> + * @proto: PPP protocol
> + *
> + * Assumes proto is not compressed.
> + * Protocol is valid if the value is odd and the least significant bit of the
> + * most significant octet is 0 (see RFC 1661, section 2).
> + *
> + * Return: Whether the PPP protocol is valid.
> + */
> +static bool ppp_proto_is_valid(__be16 proto)
> +{
> +	return (proto & htons(0x0101)) == htons(0x0001);
> +}
> +
>  /**
>   * __skb_flow_dissect - extract the flow_keys struct and return it
>   * @net: associated network namespace, derived from @skb if NULL
> @@ -1361,7 +1376,7 @@ bool __skb_flow_dissect(const struct net *net,
>  			struct pppoe_hdr hdr;
>  			__be16 proto;
>  		} *hdr, _hdr;
> -		u16 ppp_proto;
> +		__be16 ppp_proto;

I'm unclear of the relationship between changing the type of ppp_proto
and the problem described in the patch description. And it
is creating a log of churn in this patch. I suggest dropping it.

>  
>  		hdr = __skb_header_pointer(skb, nhoff, sizeof(_hdr), data, hlen, &_hdr);
>  		if (!hdr) {
> @@ -1374,27 +1389,19 @@ bool __skb_flow_dissect(const struct net *net,
>  			break;
>  		}
>  
> -		/* least significant bit of the most significant octet
> -		 * indicates if protocol field was compressed
> -		 */
> -		ppp_proto = ntohs(hdr->proto);
> -		if (ppp_proto & 0x0100) {
> -			ppp_proto = ppp_proto >> 8;
> -			nhoff += PPPOE_SES_HLEN - 1;
> -		} else {
> -			nhoff += PPPOE_SES_HLEN;
> -		}

Could we go for something like this?

		ppp_proto = ntohs(hdr->proto);
		nhoff += PPPOE_SES_HLEN;

		/* Explanation of what is going on */
		if (ppp_proto & 0x0100)
			ppp_proto = some invalid value like 0

> +		ppp_proto = hdr->proto;
> +		nhoff += PPPOE_SES_HLEN;
>  
> -		if (ppp_proto == PPP_IP) {
> +		if (ppp_proto == htons(PPP_IP)) {
>  			proto = htons(ETH_P_IP);
>  			fdret = FLOW_DISSECT_RET_PROTO_AGAIN;
> -		} else if (ppp_proto == PPP_IPV6) {
> +		} else if (ppp_proto == htons(PPP_IPV6)) {
>  			proto = htons(ETH_P_IPV6);
>  			fdret = FLOW_DISSECT_RET_PROTO_AGAIN;
> -		} else if (ppp_proto == PPP_MPLS_UC) {
> +		} else if (ppp_proto == htons(PPP_MPLS_UC)) {
>  			proto = htons(ETH_P_MPLS_UC);
>  			fdret = FLOW_DISSECT_RET_PROTO_AGAIN;
> -		} else if (ppp_proto == PPP_MPLS_MC) {
> +		} else if (ppp_proto == htons(PPP_MPLS_MC)) {
>  			proto = htons(ETH_P_MPLS_MC);
>  			fdret = FLOW_DISSECT_RET_PROTO_AGAIN;
>  		} else if (ppp_proto_is_valid(ppp_proto)) {
> @@ -1412,7 +1419,7 @@ bool __skb_flow_dissect(const struct net *net,
>  							      FLOW_DISSECTOR_KEY_PPPOE,
>  							      target_container);
>  			key_pppoe->session_id = hdr->hdr.sid;
> -			key_pppoe->ppp_proto = htons(ppp_proto);
> +			key_pppoe->ppp_proto = ppp_proto;
>  			key_pppoe->type = htons(ETH_P_PPP_SES);
>  		}
>  		break;
> -- 
> 2.43.0
> 

^ permalink raw reply

* Re: [PATCH net v4 2/2] pppoe: drop PFC frames
From: Simon Horman @ 2026-04-10 17:11 UTC (permalink / raw)
  To: Qingfang Deng
  Cc: linux-ppp, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Guillaume Nault, Breno Leitao,
	Kuniyuki Iwashima, Kees Cook, Sebastian Andrzej Siewior,
	Eric Woudstra, Sam Protsenko, netdev, linux-kernel,
	Paul Mackerras, Jaco Kroon, James Carlson, Wojciech Drewek,
	Marcin Szycik
In-Reply-To: <20260410033627.93786-2-qingfang.deng@linux.dev>

On Fri, Apr 10, 2026 at 11:36:21AM +0800, Qingfang Deng wrote:
> RFC 2516 Section 7 states that Protocol Field Compression (PFC) is NOT
> RECOMMENDED for PPPoE. In practice, pppd does not support negotiating
> PFC for PPPoE sessions, and the current PPPoE driver assumes an
> uncompressed (2-byte) protocol field. However, the generic PPP layer
> function ppp_input() is not aware of the negotiation result, and still
> accepts PFC frames.
> 
> If a peer with a broken implementation or an attacker sends a frame with
> a compressed (1-byte) protocol field, the subsequent PPP payload is
> shifted by one byte. This causes the network header to be 4-byte
> misaligned, which may trigger unaligned access exceptions on some
> architectures.
> 
> To reduce the attack surface, drop PPPoE PFC frames. Introduce
> ppp_skb_is_compressed_proto() helper function to be used in both
> ppp_generic.c and pppoe.c to avoid open-coding.
> 
> Fixes: 7fb1b8ca8fa1 ("ppp: Move PFC decompression to PPP generic layer")
> Signed-off-by: Qingfang Deng <qingfang.deng@linux.dev>
> ---
> Changes in v4:
>  Update Fixes tag as suggested by AI review
>  Link to v3: https://lore.kernel.org/r/20260409031107.616630-2-qingfang.deng@linux.dev
> Changes in v3:
>  Fix kdoc warning
>  Link to v2: https://lore.kernel.org/r/20260408024245.312732-1-qingfang.deng@linux.dev

Reviewed-by: Simon Horman <horms@kernel.org>


^ permalink raw reply

* Re: [PATCH net] netrom: do some basic forms of validation on incoming frames
From: David Ranch @ 2026-04-10 16:38 UTC (permalink / raw)
  To: jj, Simon Horman, Greg Kroah-Hartman
  Cc: Jakub Kicinski, netdev, linux-kernel, David S. Miller,
	Eric Dumazet, Paolo Abeni, linux-hams, Yizhe Zhuang, stable,
	Bernard, f6bvp
In-Reply-To: <18e3df62-34f9-4de0-903b-19919d7ae2ca@eastlink.ca>


I agree with John VE1JOT that amateur radio protocols such as AX25, 
NETROM, and ROSE are still very active in the Linux kernel.  This 
discussion makes me wonder how the Linux kernel community judges how 
"active" a given feature / driver / etc is being used in the real world 
before considering deprecation.  If there is an official mechanism to 
get metrics sent from users back to the kernel developer community, 
please let us know and we'll try to get you some one-off or periodic 
metrics.

--David
KI6ZHD
Avid AX.25 and NETROM packet radio on X86 and ARM-based Raspberry Pi
https://www.trinityos.com/HAM/index-ham.html


On 04/10/2026 08:12 AM, jj wrote:
> This is NOT an obsolete protocol..this is in use by amateur radio 
> operators world-wide...we use it for RF comms usually, because what 
> happens if the internet goes "down", we can still provide comms over 
> slower RF links....(plus it's a fun mode)please PLEASE do not 
> drop...and sorry for the noise...
>
> de John VE1JOT
>
> On 2026-04-10 07:28, Simon Horman wrote:
>> On Fri, Apr 10, 2026 at 07:24:36AM +0200, Greg Kroah-Hartman wrote:
>>> On Thu, Apr 09, 2026 at 08:32:35PM -0700, Jakub Kicinski wrote:
>>>> On Thu, 9 Apr 2026 20:03:28 +0100 Simon Horman wrote:
>>>>> I expect that checking skb->len isn't sufficient here
>>>>> and pskb_may_pull needs to be used to ensure that
>>>>> the data is also available in the linear section of the skb.
>>>> Or for simplicity we could also be testing against skb_headlen()
>>>> since we don't expect any legit non-linear frames here? Dunno.
>> Sure, that's find by me if it leads to simpler code than
>> using pskb_may_pull(). Else I'd lean towards pskb_may_pull()
>> as it is a more general approach that feels worth proliferating.
>>
>>> I'll be glad to change this either way, your call.  Given that this is
>>> an obsolete protocol that seems to only be a target for drive-by 
>>> fuzzers
>>> to attack, whatever the simplest thing to do to quiet them up I'll be
>>> glad to implement.
>>>
>>> Or can we just delete this stuff entirely?  :)
>> Deleting sounds good to me.
>> But we likely need a deprecation process.
>> In which case fixing these bugs still makes sense for the short term.
>>


^ permalink raw reply

* Re: [PATCH net] netrom: do some basic forms of validation on incoming frames
From: Dan Carpenter @ 2026-04-10 17:21 UTC (permalink / raw)
  To: David Ranch
  Cc: jj, Simon Horman, Greg Kroah-Hartman, Jakub Kicinski, netdev,
	linux-kernel, David S. Miller, Eric Dumazet, Paolo Abeni,
	linux-hams, Yizhe Zhuang, stable, Bernard, f6bvp
In-Reply-To: <2e0f1dca-56dc-d6e6-f4d4-1de3160e0179@trinnet.net>

On Fri, Apr 10, 2026 at 09:38:25AM -0700, David Ranch wrote:
> 
> I agree with John VE1JOT that amateur radio protocols such as AX25, NETROM,
> and ROSE are still very active in the Linux kernel.  This discussion makes
> me wonder how the Linux kernel community judges how "active" a given feature
> / driver / etc is being used in the real world before considering
> deprecation.  If there is an official mechanism to get metrics sent from
> users back to the kernel developer community, please let us know and we'll
> try to get you some one-off or periodic metrics.
> 

We've had times where it felt like users weren't testing new kernels.

regards,
dan carpenter


^ permalink raw reply

* [GIT PULL] bluetooth-next 2026-04-10
From: Luiz Augusto von Dentz @ 2026-04-10 17:22 UTC (permalink / raw)
  To: davem, kuba; +Cc: linux-bluetooth, netdev

The following changes since commit 42f9b4c6ef19e71d2c7d9bfd3c5037d4fe434ad7:

  tools: ynl: tests: fix leading space on Makefile target (2026-04-09 20:41:40 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next.git tags/for-net-next-2026-04-10

for you to fetch changes up to 02751aa9eb61cba881dad5a1e2663389b76ea8f2:

  Bluetooth: hci_qca: Fix missing wakeup during SSR memdump handling (2026-04-10 10:38:37 -0400)

----------------------------------------------------------------
bluetooth-next pull request for net-next:

core:
 - hci_core: Rate limit the logging of invalid ISO handle
 - hci_sync: make hci_cmd_sync_run_once return -EEXIST if exists
 - hci_event: fix locking in hci_conn_request_evt() with HCI_PROTO_DEFER
 - hci_event: fix potential UAF in SSP passkey handlers
 - HCI: Avoid a couple -Wflex-array-member-not-at-end warnings
 - L2CAP: CoC: Disconnect if received packet size exceeds MPS
 - L2CAP: Add missing chan lock in l2cap_ecred_reconf_rsp
 - L2CAP: Fix printing wrong information if SDU length exceeds MTU
 - SCO: check for codecs->num_codecs == 1 before assigning to sco_pi(sk)->codec

drivers:
 - btusb: MT7922: Add VID/PID 0489/e174
 - btusb: Add Lite-On 04ca:3807 for MediaTek MT7921
 - btusb: Add MT7927 IDs ASUS ROG Crosshair X870E Hero, Lenovo Legion Pro 7
	  16ARX9, Gigabyte Z790 AORUS MASTER X, MSI X870E Ace Max, TP-Link
	  Archer TBE550E, ASUS X870E / ProArt X870E-Creator.
 - btusb: Add MT7902 IDs 13d3/3579, 13d3/3580, 13d3/3594, 13d3/3596, 0e8d/1ede
 - btusb: MediaTek MT7922: Add VID 0489 & PID e11d
 - btintel: Add support for Scorpious Peak2 support
 - btintel: Add support for Scorpious Peak2F support
 - btintel_pcie: Add device id of Scorpius Peak2, Nova Lake-PCD-H
 - btintel_pcie: Add device id of Scorpious2, Nova Lake-PCD-S
 - btmtk: Add reset mechanism if downloading firmware failed
 - btmtk: Add MT6639 (MT7927) Bluetooth support
 - btmtk: fix ISO interface setup for single alt setting
 - btmtk: add MT7902 SDIO support
 - Bluetooth: btmtk: add MT7902 MCU support
 - btbcm: Add entry for BCM4343A2 UART Bluetooth
 - qca: enable pwrseq support for wcn39xx devices
 - hci_qca: Fix BT not getting powered-off on rmmod
 - hci_qca: disable power control for WCN7850 when bt_en is not defined
 - hci_qca: Fix missing wakeup during SSR memdump handling
 - hci_ldisc: Clear HCI_UART_PROTO_INIT on error
 - mmc: sdio: add MediaTek MT7902 SDIO device ID
 - hci_ll: Enable BROKEN_ENHANCED_SETUP_SYNC_CONN for WL183x

----------------------------------------------------------------
Arnd Bergmann (1):
      Bluetooth: btmtk: hide unused btmtk_mt6639_devs[] array

Chris Lu (4):
      Bluetooth: btusb: MT7922: Add VID/PID 0489/e174
      Bluetooth: btmtk: improve mt79xx firmware setup retry flow
      Bluetooth: btmtk: add status check in mt79xx firmware setup
      Bluetooth: btmtk: Add reset mechanism if downloading firmware failed

Christian Eggers (1):
      Bluetooth: L2CAP: CoC: Disconnect if received packet size exceeds MPS

Dmitry Baryshkov (1):
      Bluetooth: qca: enable pwrseq support for WCN39xx devices

Dongyang Jin (1):
      Bluetooth: btbcm: remove done label in btbcm_patchram

Dudu Lu (1):
      Bluetooth: l2cap: Add missing chan lock in l2cap_ecred_reconf_rsp

Dylan Eray (1):
      Bluetooth: btusb: Add Lite-On 04ca:3807 for MediaTek MT7921

Gustavo A. R. Silva (1):
      Bluetooth: hci.h: Avoid a couple -Wflex-array-member-not-at-end warnings

Hans de Goede (2):
      Bluetooth: hci_qca: Fix confusing shutdown() and power_off() naming
      Bluetooth: hci_qca: Fix BT not getting powered-off on rmmod

Javier Tia (8):
      Bluetooth: btmtk: Add MT6639 (MT7927) Bluetooth support
      Bluetooth: btmtk: fix ISO interface setup for single alt setting
      Bluetooth: btusb: Add MT7927 ID for ASUS ROG Crosshair X870E Hero
      Bluetooth: btusb: Add MT7927 ID for Lenovo Legion Pro 7 16ARX9
      Bluetooth: btusb: Add MT7927 ID for Gigabyte Z790 AORUS MASTER X
      Bluetooth: btusb: Add MT7927 ID for MSI X870E Ace Max
      Bluetooth: btusb: Add MT7927 ID for TP-Link Archer TBE550E
      Bluetooth: btusb: Add MT7927 ID for ASUS X870E / ProArt X870E-Creator

Johan Hovold (2):
      Bluetooth: btusb: refactor endpoint lookup
      Bluetooth: btmtk: refactor endpoint lookup

Jonathan Rissanen (1):
      Bluetooth: hci_ldisc: Clear HCI_UART_PROTO_INIT on error

Kamiyama Chiaki (1):
      Bluetooth: btusb: MediaTek MT7922: Add VID 0489 & PID e11d

Kiran K (10):
      Bluetooth: btintel: Add support for hybrid signature for ScP2 onwards
      Bluetooth: btintel: Replace CNVi id with hardware variant
      Bluetooth: btintel: Add support for Scorpious Peak2 support
      Bluetooth: btintel: Add DSBR support for ScP2 onwards
      Bluetooth: btintel_pcie: Add support for exception dump for ScP2
      Bluetooth: btintel: Add support for Scorpious Peak2F support
      Bluetooth: btintel_pcie: Add support for exception dump for ScP2F
      Bluetooth: btintel_pcie: Add device id of Scorpius Peak2, Nova Lake-PCD-H
      Bluetooth: btintel_pcie: Add device id of Scorpious2, Nova Lake-PCD-S
      Bluetooth: btintel_pcie: Align shared DMA memory to 128 bytes

Luiz Augusto von Dentz (2):
      Bluetooth: btintel_pci: Fix btintel_pcie_read_hwexp code style
      Bluetooth: L2CAP: Fix printing wrong information if SDU length exceeds MTU

Lukas Kraft (1):
      bluetooth: btusb: Fix whitespace in btusb.c

Marek Vasut (1):
      Bluetooth: btbcm: Add entry for BCM4343A2 UART Bluetooth

Pauli Virtanen (3):
      Bluetooth: hci_core: Rate limit the logging of invalid ISO handle
      Bluetooth: hci_sync: make hci_cmd_sync_run_once return -EEXIST if exists
      Bluetooth: fix locking in hci_conn_request_evt() with HCI_PROTO_DEFER

Sean Wang (8):
      mmc: sdio: add MediaTek MT7902 SDIO device ID
      Bluetooth: btmtk: add MT7902 MCU support
      Bluetooth: btusb: Add new VID/PID 13d3/3579 for MT7902
      Bluetooth: btusb: Add new VID/PID 13d3/3580 for MT7902
      Bluetooth: btusb: Add new VID/PID 13d3/3594 for MT7902
      Bluetooth: btusb: Add new VID/PID 13d3/3596 for MT7902
      Bluetooth: btusb: Add new VID/PID 0e8d/1ede for MT7902
      Bluetooth: btmtk: add MT7902 SDIO support

Shuai Zhang (2):
      Bluetooth: hci_qca: disable power control for WCN7850 when bt_en is not defined
      Bluetooth: hci_qca: Fix missing wakeup during SSR memdump handling

Shuvam Pandey (1):
      Bluetooth: hci_event: fix potential UAF in SSP passkey handlers

Stefan Metzmacher (1):
      Bluetooth: SCO: check for codecs->num_codecs == 1 before assigning to sco_pi(sk)->codec

Stefano Radaelli (1):
      Bluetooth: hci_ll: Enable BROKEN_ENHANCED_SETUP_SYNC_CONN for WL183x

Thorsten Blum (3):
      Bluetooth: btintel_pcie: Replace snprintf("%s") with strscpy
      Bluetooth: btintel_pcie: Use struct_size to improve hci_drv_read_info
      Bluetooth: btintel_pcie: use strscpy to copy plain strings

Vivek Sahu (1):
      Bluetooth: qca: Refactor code on the basis of chipset names

 drivers/bluetooth/btbcm.c        |  11 ++--
 drivers/bluetooth/btintel.c      | 109 ++++++++++++++++++++++++++++------
 drivers/bluetooth/btintel.h      |  20 +++++--
 drivers/bluetooth/btintel_pcie.c | 122 ++++++++++++++++++++++++---------------
 drivers/bluetooth/btintel_pcie.h |   3 -
 drivers/bluetooth/btmtk.c        | 115 ++++++++++++++++++++++++++----------
 drivers/bluetooth/btmtk.h        |   9 ++-
 drivers/bluetooth/btmtksdio.c    |  44 +++++++++-----
 drivers/bluetooth/btqca.c        |  37 ++++++------
 drivers/bluetooth/btusb.c        |  84 +++++++++++++--------------
 drivers/bluetooth/hci_ldisc.c    |   3 +
 drivers/bluetooth/hci_ll.c       |  10 ++++
 drivers/bluetooth/hci_qca.c      |  84 ++++++++++++++++-----------
 include/linux/mmc/sdio_ids.h     |   1 +
 include/net/bluetooth/hci.h      |  16 +++--
 net/bluetooth/hci_conn.c         |   4 +-
 net/bluetooth/hci_core.c         |   4 +-
 net/bluetooth/hci_event.c        |  21 ++++---
 net/bluetooth/hci_sync.c         |   2 +-
 net/bluetooth/l2cap_core.c       |  15 ++++-
 net/bluetooth/sco.c              |   3 +-
 21 files changed, 476 insertions(+), 241 deletions(-)

^ permalink raw reply

* RE: [PATCH net-next] net: phy: call phy_init_hw() in phy resume path
From: Biju Das @ 2026-04-10 17:27 UTC (permalink / raw)
  To: Russell King, Andrew Lunn
  Cc: Heiner Kallweit, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Ovidiu Panait, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, Geert Uytterhoeven,
	Prabhakar Mahadev Lad, linux-renesas-soc@vger.kernel.org
In-Reply-To: <adkPXI9pz1RCYpJ8@shell.armlinux.org.uk>

Hi Russell King/ Andrew,

> -----Original Message-----
> From: Russell King <linux@armlinux.org.uk>
> Sent: 10 April 2026 15:55
> Subject: Re: [PATCH net-next] net: phy: call phy_init_hw() in phy resume path
> 
> On Fri, Apr 10, 2026 at 03:51:18PM +0100, Russell King (Oracle) wrote:
> > On Fri, Apr 10, 2026 at 03:29:01PM +0100, Biju wrote:
> > > From: Ovidiu Panait <ovidiu.panait.rb@renesas.com>
> > >
> > > When mac_managed_pm flag is set, mdio_bus_phy_resume() is skipped,
> > > so phy_init_hw(), which performs soft_reset and config_init, is not
> > > called during resume.
> > >
> > > This is inconsistent with the non-mac_managed_pm path, where
> > > mdio_bus_phy_resume() calls phy_init_hw() before phy_resume() on
> > > every resume.
> > >
> > > To align both paths, add a phy_init_hw() call at the top of
> > > __phy_resume(), before invoking the driver's resume callback. This
> > > guarantees the PHY undergoes soft reset and re-initialization
> > > regardless of whether PM is managed by the MAC or the MDIO bus.
> > >
> > > Signed-off-by: Ovidiu Panait <ovidiu.panait.rb@renesas.com>
> > > Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
> > > ---
> > >  drivers/net/phy/phy_device.c | 4 ++++
> > >  1 file changed, 4 insertions(+)
> > >
> > > diff --git a/drivers/net/phy/phy_device.c
> > > b/drivers/net/phy/phy_device.c index 0edff47478c2..8255f4208d66
> > > 100644
> > > --- a/drivers/net/phy/phy_device.c
> > > +++ b/drivers/net/phy/phy_device.c
> > > @@ -2008,6 +2008,10 @@ int __phy_resume(struct phy_device *phydev)
> > >  	if (!phydrv || !phydrv->resume)
> > >  		return 0;
> > >
> > > +	ret = phy_init_hw(phydev);
> > > +	if (ret)
> > > +		return ret;
> >
> > Do we want to do this even when phydrv->resume is NULL?
> 
> I should've also added (sorry, busy packing) - with it always being called even when phydrv->resume is
> NULL, it means that the call sites to phy_resume() in phylib which are preceeded by a call to
> phy_init_hw() should have that call removed, otherwise we're going to be calling phy_init_hw() twice.
> 
> As the patch currently stands, that's the case when phydrv->resume is populated, and I think we should
> avoid that.
> 
> > Apart from that, looks fine to me - it seems some paths call
> > phy_init_hw() can be called with or without phydev->lock held, and
> > this one will call it with the lock held which seems to be okay.


The new patch will be like this, after moving phy_init_hw() without
phydev->lock held. Please let me know are you ok with this?


diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 0edff47478c2..4a2b19d39373 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -396,10 +396,6 @@ static __maybe_unused int mdio_bus_phy_resume(struct device *dev)
 	WARN_ON(phydev->state != PHY_HALTED && phydev->state != PHY_READY &&
 		phydev->state != PHY_UP);
 
-	ret = phy_init_hw(phydev);
-	if (ret < 0)
-		return ret;
-
 	ret = phy_resume(phydev);
 	if (ret < 0)
 		return ret;
@@ -1857,16 +1853,14 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev,
 	if (dev)
 		netif_carrier_off(phydev->attached_dev);
 
-	/* Do initial configuration here, now that
+	/* Do initial configuration inside phy_init_hw(), now that
 	 * we have certain key parameters
 	 * (dev_flags and interface)
 	 */
-	err = phy_init_hw(phydev);
+	err = phy_resume(phydev);
 	if (err)
 		goto error;
 
-	phy_resume(phydev);
-
 	/**
 	 * If the external phy used by current mac interface is managed by
 	 * another mac interface, so we should create a device link between
@@ -2020,6 +2014,10 @@ int phy_resume(struct phy_device *phydev)
 {
 	int ret;
 
+	ret = phy_init_hw(phydev);
+	if (ret)
+		return ret;
+
 	mutex_lock(&phydev->lock);
 	ret = __phy_resume(phydev);
 	mutex_unlock(&phydev->lock);
--

^ permalink raw reply related

* [PATCH net-next] tcp: add indirect call wrapper in tcp_conn_request()
From: Eric Dumazet @ 2026-04-10 17:49 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Neal Cardwell, Kuniyuki Iwashima, netdev,
	eric.dumazet, Eric Dumazet

Small improvement in SYN processing, to directly call
tcp_v6_init_seq_and_ts_off() or tcp_v4_init_seq_and_ts_off().

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/tcp.h    | 6 ++++++
 net/ipv4/tcp_input.c | 5 ++++-
 net/ipv4/tcp_ipv4.c  | 2 +-
 net/ipv6/tcp_ipv6.c  | 2 +-
 4 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 6156d1d068e142f696ec9dfff63e3aaebb0171bc..8c3762a9d6fd4e23ed3e1af7fbc8404ad17b1923 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -3085,4 +3085,10 @@ static inline int tcp_recv_should_stop(struct sock *sk)
 	       signal_pending(current);
 }
 
+INDIRECT_CALLABLE_DECLARE(union tcp_seq_and_ts_off
+			  tcp_v4_init_seq_and_ts_off(const struct net *net,
+						     const struct sk_buff *skb));
+INDIRECT_CALLABLE_DECLARE(union tcp_seq_and_ts_off
+			  tcp_v6_init_seq_and_ts_off(const struct net *net,
+						     const struct sk_buff *skb));
 #endif	/* _TCP_H */
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 7171442c3ed7ae62ec45c093cc58ad5a5c978ed9..021f745747c59d8b9e200c5954af7807a4d08866 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -7658,7 +7658,10 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
 		goto drop_and_free;
 
 	if (tmp_opt.tstamp_ok || (!want_cookie && !isn))
-		st = af_ops->init_seq_and_ts_off(net, skb);
+		st = INDIRECT_CALL_INET(af_ops->init_seq_and_ts_off,
+					tcp_v6_init_seq_and_ts_off,
+					tcp_v4_init_seq_and_ts_off,
+					net, skb);
 
 	if (tmp_opt.tstamp_ok) {
 		tcp_rsk(req)->req_usec_ts = dst_tcp_usec_ts(dst);
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 69ab236072e7142d5ca9d0703d99f02c1e17c738..9309294370a7ba256103813b6e1cb6b79b54cd44 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -105,7 +105,7 @@ static DEFINE_PER_CPU(struct sock_bh_locked, ipv4_tcp_sk) = {
 
 static DEFINE_MUTEX(tcp_exit_batch_mutex);
 
-static union tcp_seq_and_ts_off
+INDIRECT_CALLABLE_SCOPE union tcp_seq_and_ts_off
 tcp_v4_init_seq_and_ts_off(const struct net *net, const struct sk_buff *skb)
 {
 	return secure_tcp_seq_and_ts_off(net,
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 8dc3874e8b9252da60f21ad77a5ca834532e650a..4e39e8dd8e1e04d3771455fee65aecb88436a578 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -105,7 +105,7 @@ static void inet6_sk_rx_dst_set(struct sock *sk, const struct sk_buff *skb)
 	}
 }
 
-static union tcp_seq_and_ts_off
+INDIRECT_CALLABLE_SCOPE union tcp_seq_and_ts_off
 tcp_v6_init_seq_and_ts_off(const struct net *net, const struct sk_buff *skb)
 {
 	return secure_tcpv6_seq_and_ts_off(net,
-- 
2.53.0.1213.gd9a14994de-goog


^ permalink raw reply related

* Re: clarification: PCI device not getting enumerated
From: Ilpo Järvinen @ 2026-04-10 17:52 UTC (permalink / raw)
  To: Ratheesh Kannoth; +Cc: Bjorn Helgaas, vidyas, bhelgaas, Netdev, LKML, linux-pci
In-Reply-To: <abojekUtFrMxM1RB@rkannoth-OptiPlex-7090>

[-- Attachment #1: Type: text/plain, Size: 6614 bytes --]

On Wed, 18 Mar 2026, Ratheesh Kannoth wrote:

> On 2026-03-18 at 02:36:40, Bjorn Helgaas (helgaas@kernel.org) wrote:
> > On Tue, Mar 17, 2026 at 03:49:07PM +0530, Ratheesh Kannoth wrote:
> > > Hi,
> > >
> > > Below commit breaks PCI enumeration of a marvell PCI endpoint (177d:a0e2).
> > > I am not familiar with PCI code, could you please help in debugging/fixing this ?
> > >
> > > Before the commit, the kernel set PCI_REASSIGN_ALL_BUS. So pcibios_assign_all_busses() was true,
> > > and in pci_scan_bridge_extend() the kernel took the “reassign” path and used EA
> > > (and pci_ea_fixed_busnrs()) to assign bus numbers.
> > >
> > > commit 7246a4520b4bf1494d7d030166a11b5226f6d508
> > > Author: Vidya Sagar <vidyas@nvidia.com>
> > > Date:   Wed May 8 23:11:38 2024 +0530
> > >
> > >     PCI: Use preserve_config in place of pci_flags
> > >
> > >     Use preserve_config in place of checking for PCI_PROBE_ONLY flag to enable
> > >     support for "linux,pci-probe-only" on a per host bridge basis.
> > >
> > >     This also obviates the use of adding PCI_REASSIGN_ALL_BUS flag if
> > >     !PCI_PROBE_ONLY, as pci_assign_unassigned_root_bus_resources() takes care
> > >     of reassigning the resources that are not already claimed.
> >
> > This commit appeared in v6.11.  Apparently on v6.6, 0002:1b:00.0 is
> > detected, and in the v6.12 dmesg below, we don't enumerate it.
> >
> > It looks like 7246a4520b4b can still be reverted cleanly from
> > v7.0-rc1.  Can you collect the complete dmesg logs from v7.0-rc1
> > (where I assume we won't see 0002:1b:00.0) and from v7.0-rc1 with
> > 7246a4520b4b reverted (where I assume we *will* see it) and output of
> > "sudo lspci -vv"?  Use the same kernel config and DT for both, of
> > course.
> Please find dmesg, lspci -vv and device tree output for v7.0-rc1 and v7.0-rc1 with patch reverted
> at bottom of this email. Used same kernel config and DT for both.
> 
> >
> > I don't understand some of the v6.12 dmesg log:
> >
> >   3.187193] pci-host-generic 878000000000.pci: Memory resource size exceeds max for 32 bits
> >   3.313087] pci 0000:00:01.0: PCI bridge to [bus 01] (subtractive decode)
> >   3.319878] pci 0000:00:01.0:   bridge window [mem 0x00000000-0x000fffff]
> >   3.723673] pci 0000:01:00.0: [177d:a075] type 00 class 0x088000 PCIe Endpoint
> >   3.730924] pci 0000:01:00.0: BAR 0 [mem 0x87e0fc000000-0x87e0fc03ffff 64bit]: from Enhanced Allocation, properties 0x0
> >   3.741710] pci 0000:01:00.0: BAR 4 [mem 0x87e0fcf00000-0x87e0fcffffff 64bit]: from Enhanced Allocation, properties 0x0
> >
> >   3.448655] pci 0000:00:0c.0: PCI bridge to [bus 02] (subtractive decode)
> >   3.455454] pci 0000:00:0c.0:   bridge window [mem 0x00000000-0x000fffff]
> >   4.425308] pci 0000:02:00.0: [177d:a075] type 00 class 0x088000 PCIe Endpoint
> >   4.432561] pci 0000:02:00.0: BAR 0 [mem 0x87e0f9000000-0x87e0f903ffff 64bit]: from Enhanced Allocation, properties 0x0
> >
> >   5.744082] pci-host-generic 878020000000.pci: Memory resource size exceeds max for 32 bits
> >   5.799532] pci 0002:00:00.0: PCI bridge to [bus fa] (subtractive decode)
> >   5.806322] pci 0002:00:00.0:   bridge window [mem 0x00000000-0x000fffff]
> >   5.820632] pci 0002:00:01.0: PCI bridge to [bus fb] (subtractive decode)
> >   5.827423] pci 0002:00:01.0:   bridge window [mem 0x00000000-0x000fffff]
> >
> > - The "Memory resource size exceeds max for 32 bits" warnings are
> >   interesting.  Is this a DT error?
>
> I dont see any errors as such. Could you guide me how to locate this issue, or suggest any command to check
> the same.

I'm sorry I only noticed this thread now.

Take a look for "root bus resource" lines in your log. When the size of 
those exceeds 4G, the warning is printed if the mem resource does not have 
IORESOURCE_PREFETCH set (which implies 64-bit window in the old way things 
were spec'ed). That resource printout includes "pref" when the flag is 
present.

For such resource, I think having IORESOURCE_PREFETCH and 
IORESOURCE_MEM_64 would be appropriate. And also for any resource that is 
expected to require >4G address even if size itself is small enough. 
Unfortunately things have not been very consistent when it comes to root 
bus resources.

The resource bits are given in DT along with the address range (but I'm 
not an DT expert by any means so I state something stupid when it comes 
to DTs, take it with a grain of salt :-)):

# pci feild (phys.hi)

npt000ss bbbbbbbb dddddfff rrrrrrrr

n: Relocateable Flag. (0 means relocation, 1 means impossible)
p: Prefetch Flag. (0 means impossible 1 means possible)
t: Aliased Flag.
ss: space code
  00: compositional space
  01: I/O space
  10: 32bit memory space
  11: 64bit memory space

I think your ranges have 0x03... which would seem to say they were 64-bit 
but that flag did not get set (or got cleared) for some reason as there's 
not "64-bit" in the root bus resource printouts.

I suspect the warning goes away if you change 0x03... to 0x43... but you 
also need to pay attention that pci_parse_request_of_pci_ranges() requires 
also one non-prefetchable resource within each set of root bus resources 
so or it will trigger another warning if you change them all.

I'm not sure if it resolves the primary issue.

> > - There should only be one subtractive decode bridge per segment, but
> >   there are many here.
> >
> > - Most of the bridge windows look uninitialized ([mem
> >   0x00000000-0x000fffff]), and we don't seem to reassign them, so BARs
> >   of the downstream devices shouldn't be reachable.
> >
> > - Interesting to see so much "Enhanced Allocation"; that's very
> >   unusual.
> >
> > - I guess your DT must have "linux,pci-probe-only"; seems like that
> No.
> 
> +++ fdt dump from uboot +++
> 
> crb106-pcie> fdt print /soc/pci@878020000000
> pci@878020000000 {
>         compatible = "pci-host-ecam-generic";
>         device_type = "pci";
>         msi-map = <0x00000000 0x00000037 0x00020000 0x00010000>;
>         bus-range = <0x00000000 0x000000ff>;
>         #size-cells = <0x00000002>;
>         #address-cells = <0x00000003>;
>         dma-coherent;
>         linux,pci-domain = <0x00000002>;
>         reg = <0x00008780 0x20000000 0x00000000 0x10000000>;
>         iommu-map = <0x00000000 0x00000039 0x00020000 0x00010000>;
>         ranges = <0x03000000 0x00008400 0x00000000 0x00008400 0x00000000 0x000001f5 0x00000000 0x03000000 0x00000009 0xfc000000 0x00000009 0xfc000000 0x00000000 0x03040000>;


-- 
 i.

^ permalink raw reply

* Re: [PATCH net-next v2] iavf: fix kernel-doc comment style in iavf_ethtool.c
From: Joe Damato @ 2026-04-10 17:53 UTC (permalink / raw)
  To: Aleksandr Loktionov
  Cc: intel-wired-lan, anthony.l.nguyen, netdev, Leszek Pepiak
In-Reply-To: <20260409093020.3808687-1-aleksandr.loktionov@intel.com>

On Thu, Apr 09, 2026 at 11:30:20AM +0200, Aleksandr Loktionov wrote:
> iavf_ethtool.c contains 31 kernel-doc comment blocks using the legacy
> `**/` terminator instead of the correct single `*/`. Two function
> headers also use a colon separator (`iavf_get_channels:`,
> `iavf_set_channels:`) instead of the ` - ` dash required by kernel-doc.
> 
> Additionally several comments embed their return-value descriptions in
> the body paragraph, producing `scripts/kernel-doc -Wreturn` warnings.
> Void functions that incorrectly say "Returns ..." are also rephrased.
> 
> Fix all issues across the full file:
>  - Replace every `**/` terminator with `*/`.
>  - Change `function_name:` doc headers to `function_name -`.
>  - Move inline "Returns ..." sentences into dedicated `Return:` sections
>    for non-void functions (iavf_get_msglevel, iavf_get_rxnfc,
>    iavf_set_channels, iavf_get_rxfh_key_size, iavf_get_rxfh_indir_size,
>    iavf_get_rxfh, iavf_set_rxfh).
>  - Rephrase body descriptions in void functions that incorrectly said
>    "Returns ..." (iavf_get_drvinfo, iavf_get_ringparam, iavf_get_coalesce).
>  - Remove boilerplate body text for iavf_get_rxfh_key_size and
>    iavf_get_rxfh_indir_size; the `Return:` line now conveys the same
>    information without the vague "Returns the table size." sentence.
> 
> Suggested-by: Anthony L. Nguyen <anthony.l.nguyen@intel.com>
> Suggested-by: Leszek Pepiak <leszek.pepiak@intel.com>
> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
> ---
> v1 -> v2 extending the scope of the changes to whole iavf_ethtool.c file
> ---
>  drivers/net/ethernet/intel/iavf/iavf_ethtool.c | 103 ++++++++++++------------
>  1 file changed, 53 insertions(+), 50 deletions(-)

Reviewed-by: Joe Damato <joe@dama.to>

^ permalink raw reply

* Re: [PATCH net-next v3 4/5] ipv6: mld: encode multicast exponential fields
From: Ujjal Roy @ 2026-04-10 18:06 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Nikolay Aleksandrov, David Ahern, Shuah Khan,
	Andy Roulin, Yong Wang, Petr Machata, Ujjal Roy, bridge, netdev,
	linux-kernel, linux-kselftest
In-Reply-To: <20260407134502.GD849209@shredder>

On Tue, Apr 7, 2026 at 7:15 PM Ido Schimmel <idosch@nvidia.com> wrote:
>
> On Fri, Apr 03, 2026 at 03:00:49PM +0000, Ujjal Roy wrote:
> > In MLD, QQIC and MRC fields are not correctly encoded when
> > generating query packets. Since the receiver of the query
> > interprets these fields using the MLDv2 floating-point
> > decoding logic, any value that exceeds the linear threshold
> > is incorrectly parsed as an exponential value, leading to
> > an incorrect interval calculation.
> >
> > Encode and assign the corresponding protocol fields during
> > query generation. Introduce the logic to dynamically
> > calculate the exponent and mantissa using bit-scan (fls).
> > This ensures QQIC (8-bit) and MRC (16-bit) fields are
> > properly encoded when transmitting query packets with
> > intervals that exceed their respective linear thresholds
> > (128 for QQI; 32768 for MRD).
> >
> > RFC3810: If QQIC >= 128, the QQIC field represents a
> > floating-point value as follows:
> >      0 1 2 3 4 5 6 7
> >     +-+-+-+-+-+-+-+-+
> >     |1| exp | mant  |
> >     +-+-+-+-+-+-+-+-+
> >
> > RFC3810: If Maximum Response Code >= 32768, the Maximum
> > Response Code field represents a floating-point value as
> > follows:
> >      0 1 2 3 4 5 6 7 8 9 A B C D E F
> >     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >     |1| exp |          mant         |
> >     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >
> > Signed-off-by: Ujjal Roy <royujjal@gmail.com>
>
> Reviewed-by: Ido Schimmel <idosch@nvidia.com>

Do you mean to include "Reviewed-by:" tag into this commit message or
the entire patchset? I will modify and send v4 once I get the reply.

^ permalink raw reply

* [PATCH v3 net-next 00/15] net/sched: prepare RTNL removal from qdisc dumps
From: Eric Dumazet @ 2026-04-10 18:22 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Jamal Hadi Salim, Jiri Pirko, netdev, eric.dumazet,
	Eric Dumazet

We add annotations for data-races, so that most dump methods
can run in parallel with data path.

Then change mq and mqprio to no longer acquire each children
qdisc spinlock.

Next round of patches will wait for linux-7.2.

v2/v3: addressed most sashiko.dev feedbacks.
       I think remaining problems (in red offloads) are minor
       and can be fixed later.

Eric Dumazet (15):
  net/sched: rename qstats_overlimit_inc() to qstats_cpu_overlimit_inc()
  net/sched: add qstats_cpu_drop_inc() helper
  net/sched: add READ_ONCE() in gnet_stats_add_queue[_cpu]
  net/sched: add qdisc_qlen_inc() and qdisc_qlen_dec()
  net/sched: annotate data-races around sch->qstats.backlog
  net/sched: sch_sfb: annotate data-races in sfb_dump_stats()
  net/sched: sch_red: annotate data-races in red_dump_stats()
  net/sched: sch_fq_codel: remove data-races from fq_codel_dump_stats()
  net/sched: sch_pie: annotate data-races in pie_dump_stats()
  net/sched: sch_fq_pie: annotate data-races in fq_pie_dump_stats()
  net_sched: sch_hhf: annotate data-races in hhf_dump_stats()
  net/sched: sch_choke: annotate data-races in choke_dump_stats()
  net/sched: sch_cake: annotate data-races in cake_dump_stats()
  net/sched: mq: no longer acquire qdisc spinlocks in dump operations
  net/sched: taprio: prepare taprio_dump() for RTNL removal

 include/net/act_api.h     |   4 +-
 include/net/gen_stats.h   |   9 +-
 include/net/pie.h         |   2 +-
 include/net/sch_generic.h |  71 +++++--
 net/core/gen_estimator.c  |  24 +--
 net/core/gen_stats.c      |  37 ++--
 net/sched/act_api.c       |   2 +-
 net/sched/act_bpf.c       |   2 +-
 net/sched/act_ife.c       |  12 +-
 net/sched/act_mpls.c      |   2 +-
 net/sched/act_police.c    |   4 +-
 net/sched/act_skbedit.c   |   2 +-
 net/sched/act_skbmod.c    |   2 +-
 net/sched/sch_api.c       |   4 +-
 net/sched/sch_cake.c      | 422 +++++++++++++++++++++-----------------
 net/sched/sch_cbs.c       |   6 +-
 net/sched/sch_choke.c     |  34 +--
 net/sched/sch_codel.c     |   2 +-
 net/sched/sch_drr.c       |   6 +-
 net/sched/sch_dualpi2.c   |   4 +-
 net/sched/sch_etf.c       |   8 +-
 net/sched/sch_ets.c       |   6 +-
 net/sched/sch_fq.c        |   6 +-
 net/sched/sch_fq_codel.c  |  16 +-
 net/sched/sch_fq_pie.c    |  27 +--
 net/sched/sch_generic.c   |   8 +-
 net/sched/sch_gred.c      |   4 +-
 net/sched/sch_hfsc.c      |   6 +-
 net/sched/sch_hhf.c       |  26 +--
 net/sched/sch_htb.c       |   6 +-
 net/sched/sch_mq.c        |  36 ++--
 net/sched/sch_mqprio.c    |  81 ++++----
 net/sched/sch_multiq.c    |   4 +-
 net/sched/sch_netem.c     |  12 +-
 net/sched/sch_pie.c       |  38 ++--
 net/sched/sch_prio.c      |   6 +-
 net/sched/sch_qfq.c       |   8 +-
 net/sched/sch_red.c       |  37 ++--
 net/sched/sch_sfb.c       |  54 +++--
 net/sched/sch_sfq.c       |  11 +-
 net/sched/sch_skbprio.c   |   4 +-
 net/sched/sch_taprio.c    |  46 +++--
 net/sched/sch_tbf.c       |  10 +-
 net/sched/sch_teql.c      |   2 +-
 44 files changed, 624 insertions(+), 489 deletions(-)

-- 
2.53.0.1213.gd9a14994de-goog


^ permalink raw reply

* [PATCH v3 net-next 02/15] net/sched: add qstats_cpu_drop_inc() helper
From: Eric Dumazet @ 2026-04-10 18:22 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Jamal Hadi Salim, Jiri Pirko, netdev, eric.dumazet,
	Eric Dumazet
In-Reply-To: <20260410182257.774311-1-edumazet@google.com>

1) Using this_cpu_inc() is better than going through this_cpu_ptr():

- Single instruction on x86.
- Store tearing prevention.

2) Change tcf_action_update_stats() to use this_cpu_add().

3) Add WRITE_ONCE() to __qdisc_qstats_drop() and qstats_drop_inc().

$ scripts/bloat-o-meter -t vmlinux.1 vmlinux.2
add/remove: 0/0 grow/shrink: 1/10 up/down: 16/-97 (-81)
Function                                     old     new   delta
tcf_ife_act                                 1061    1077     +16
tcf_vlan_act                                 684     676      -8
tcf_skbedit_act                              626     618      -8
tcf_police_act                               725     717      -8
tcf_mpls_act                                1297    1289      -8
tcf_gate_act                                 310     302      -8
tcf_gact_act                                 195     187      -8
tcf_csum_act                                2905    2897      -8
tcf_bpf_act                                  749     741      -8
tcf_action_update_stats                      124     115      -9
tcf_ct_act                                  2154    2130     -24
Total: Before=29739602, After=29739521, chg -0.00%

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/act_api.h     | 2 +-
 include/net/sch_generic.h | 9 +++++++--
 net/sched/act_api.c       | 2 +-
 net/sched/act_bpf.c       | 2 +-
 net/sched/act_ife.c       | 8 ++++----
 net/sched/act_mpls.c      | 2 +-
 net/sched/act_police.c    | 2 +-
 net/sched/act_skbedit.c   | 2 +-
 net/sched/sch_cake.c      | 2 +-
 net/sched/sch_fq_codel.c  | 2 +-
 net/sched/sch_gred.c      | 2 +-
 11 files changed, 20 insertions(+), 15 deletions(-)

diff --git a/include/net/act_api.h b/include/net/act_api.h
index 2ec4ef9a5d0c8e9110f92f135cc3c31a38af0479..167435c5615e09f491a05d01ec86b0c9f9f4fd5b 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -241,7 +241,7 @@ static inline void tcf_action_update_bstats(struct tc_action *a,
 static inline void tcf_action_inc_drop_qstats(struct tc_action *a)
 {
 	if (likely(a->cpu_qstats)) {
-		qstats_drop_inc(this_cpu_ptr(a->cpu_qstats));
+		qstats_cpu_drop_inc(a->cpu_qstats);
 		return;
 	}
 	atomic_inc(&a->tcfa_drops);
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 3ee383c6fc3f66f1aecd9ebc675fbd143852c150..b22579671e4b4dd04c5dfa810b714daaac74af2a 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -996,12 +996,17 @@ static inline void qdisc_qstats_cpu_requeues_inc(struct Qdisc *sch)
 
 static inline void __qdisc_qstats_drop(struct Qdisc *sch, int count)
 {
-	sch->qstats.drops += count;
+	WRITE_ONCE(sch->qstats.drops, sch->qstats.drops + count);
 }
 
 static inline void qstats_drop_inc(struct gnet_stats_queue *qstats)
 {
-	qstats->drops++;
+	WRITE_ONCE(qstats->drops, qstats->drops + 1);
+}
+
+static inline void qstats_cpu_drop_inc(struct gnet_stats_queue __percpu *qstats)
+{
+	this_cpu_inc(qstats->drops);
 }
 
 static inline void qstats_cpu_overlimit_inc(struct gnet_stats_queue __percpu *qstats)
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index 332fd9695e54a1fc63bb869c28cacf5f2ed14971..551992683d9e69c247b8d9c613a69e2a897a1e79 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -1578,7 +1578,7 @@ void tcf_action_update_stats(struct tc_action *a, u64 bytes, u64 packets,
 	if (a->cpu_bstats) {
 		_bstats_update(this_cpu_ptr(a->cpu_bstats), bytes, packets);
 
-		this_cpu_ptr(a->cpu_qstats)->drops += drops;
+		this_cpu_add(a->cpu_qstats->drops, drops);
 
 		if (hw)
 			_bstats_update(this_cpu_ptr(a->cpu_bstats_hw),
diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c
index c2b5bc19e09118857d1ef3c4aed566b8225f2e9a..58a074651176730fd1bd370ba8420dfbed0d4e9c 100644
--- a/net/sched/act_bpf.c
+++ b/net/sched/act_bpf.c
@@ -76,7 +76,7 @@ TC_INDIRECT_SCOPE int tcf_bpf_act(struct sk_buff *skb,
 		break;
 	case TC_ACT_SHOT:
 		action = filter_res;
-		qstats_drop_inc(this_cpu_ptr(prog->common.cpu_qstats));
+		qstats_cpu_drop_inc(prog->common.cpu_qstats);
 		break;
 	case TC_ACT_UNSPEC:
 		action = prog->tcf_action;
diff --git a/net/sched/act_ife.c b/net/sched/act_ife.c
index e1b825e14900d6f46bbfd1b7f72ab6cd554d8a73..065228026c58eb0f8ff3b3a08758e4ef0d6ea708 100644
--- a/net/sched/act_ife.c
+++ b/net/sched/act_ife.c
@@ -727,7 +727,7 @@ static int tcf_ife_decode(struct sk_buff *skb, const struct tc_action *a,
 
 	tlv_data = ife_decode(skb, &metalen);
 	if (unlikely(!tlv_data)) {
-		qstats_drop_inc(this_cpu_ptr(ife->common.cpu_qstats));
+		qstats_cpu_drop_inc(ife->common.cpu_qstats);
 		return TC_ACT_SHOT;
 	}
 
@@ -740,7 +740,7 @@ static int tcf_ife_decode(struct sk_buff *skb, const struct tc_action *a,
 		curr_data = ife_tlv_meta_decode(tlv_data, ifehdr_end, &mtype,
 						&dlen, NULL);
 		if (!curr_data) {
-			qstats_drop_inc(this_cpu_ptr(ife->common.cpu_qstats));
+			qstats_cpu_drop_inc(ife->common.cpu_qstats);
 			return TC_ACT_SHOT;
 		}
 
@@ -755,7 +755,7 @@ static int tcf_ife_decode(struct sk_buff *skb, const struct tc_action *a,
 	}
 
 	if (WARN_ON(tlv_data != ifehdr_end)) {
-		qstats_drop_inc(this_cpu_ptr(ife->common.cpu_qstats));
+		qstats_cpu_drop_inc(ife->common.cpu_qstats);
 		return TC_ACT_SHOT;
 	}
 
@@ -821,7 +821,7 @@ static int tcf_ife_encode(struct sk_buff *skb, const struct tc_action *a,
 	 * so lets be conservative.. */
 	if ((action == TC_ACT_SHOT) || exceed_mtu) {
 drop:
-		qstats_drop_inc(this_cpu_ptr(ife->common.cpu_qstats));
+		qstats_cpu_drop_inc(ife->common.cpu_qstats);
 		return TC_ACT_SHOT;
 	}
 
diff --git a/net/sched/act_mpls.c b/net/sched/act_mpls.c
index 1abfaf9d99f1fce0fe7cafa2a9e35c80a3969ce7..4ea8b2e08c3a4dddfe1670af72a5d487a5219f5e 100644
--- a/net/sched/act_mpls.c
+++ b/net/sched/act_mpls.c
@@ -123,7 +123,7 @@ TC_INDIRECT_SCOPE int tcf_mpls_act(struct sk_buff *skb,
 	return p->action;
 
 drop:
-	qstats_drop_inc(this_cpu_ptr(m->common.cpu_qstats));
+	qstats_cpu_drop_inc(m->common.cpu_qstats);
 	return TC_ACT_SHOT;
 }
 
diff --git a/net/sched/act_police.c b/net/sched/act_police.c
index 8060f43e4d11c0a26e1475db06b76426f50c5975..b16468a98c55e32260e8d4cb1fe3d771fca65120 100644
--- a/net/sched/act_police.c
+++ b/net/sched/act_police.c
@@ -310,7 +310,7 @@ TC_INDIRECT_SCOPE int tcf_police_act(struct sk_buff *skb,
 	qstats_cpu_overlimit_inc(police->common.cpu_qstats);
 inc_drops:
 	if (ret == TC_ACT_SHOT)
-		qstats_drop_inc(this_cpu_ptr(police->common.cpu_qstats));
+		qstats_cpu_drop_inc(police->common.cpu_qstats);
 end:
 	return ret;
 }
diff --git a/net/sched/act_skbedit.c b/net/sched/act_skbedit.c
index a778cdba9258c2c776ee5ba0751cca1b73c984df..bfec6b66841031cd566d0c2bdc3d120cec41e3e4 100644
--- a/net/sched/act_skbedit.c
+++ b/net/sched/act_skbedit.c
@@ -86,7 +86,7 @@ TC_INDIRECT_SCOPE int tcf_skbedit_act(struct sk_buff *skb,
 	return params->action;
 
 err:
-	qstats_drop_inc(this_cpu_ptr(d->common.cpu_qstats));
+	qstats_cpu_drop_inc(d->common.cpu_qstats);
 	return TC_ACT_SHOT;
 }
 
diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c
index ffea9fbd522d8dd3311cbca0a55a3d133eaceae4..30c881dda6a366af40e94f746cf95af83d2f10d2 100644
--- a/net/sched/sch_cake.c
+++ b/net/sched/sch_cake.c
@@ -1844,7 +1844,7 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 
 		if (ack) {
 			b->ack_drops++;
-			sch->qstats.drops++;
+			qdisc_qstats_drop(sch);
 			ack_pkt_len = qdisc_pkt_len(ack);
 			b->bytes += ack_pkt_len;
 			q->buffer_used += skb->truesize - ack->truesize;
diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
index 2a3d758f67ab43d17128442fd8b51c6ba7775d52..f9ef95e69f72870c8a6a16b7e7911d1287b8e83e 100644
--- a/net/sched/sch_fq_codel.c
+++ b/net/sched/sch_fq_codel.c
@@ -176,7 +176,7 @@ static unsigned int fq_codel_drop(struct Qdisc *sch, unsigned int max_packets,
 	flow->cvars.count += i;
 	q->backlogs[idx] -= len;
 	q->memory_usage -= mem;
-	sch->qstats.drops += i;
+	__qdisc_qstats_drop(sch, i);
 	sch->qstats.backlog -= len;
 	sch->q.qlen -= i;
 	return idx;
diff --git a/net/sched/sch_gred.c b/net/sched/sch_gred.c
index 36d0cafac2063f3ba1133ad9b8fab08ce4468550..8ae65572162c188cca5ac8f030dc6f2054a7fcd0 100644
--- a/net/sched/sch_gred.c
+++ b/net/sched/sch_gred.c
@@ -389,7 +389,7 @@ static int gred_offload_dump_stats(struct Qdisc *sch)
 		packets += u64_stats_read(&hw_stats->stats.bstats[i].packets);
 		sch->qstats.qlen += hw_stats->stats.qstats[i].qlen;
 		sch->qstats.backlog += hw_stats->stats.qstats[i].backlog;
-		sch->qstats.drops += hw_stats->stats.qstats[i].drops;
+		__qdisc_qstats_drop(sch, hw_stats->stats.qstats[i].drops);
 		sch->qstats.requeues += hw_stats->stats.qstats[i].requeues;
 		sch->qstats.overlimits += hw_stats->stats.qstats[i].overlimits;
 	}
-- 
2.53.0.1213.gd9a14994de-goog


^ permalink raw reply related

* [PATCH v3 net-next 04/15] net/sched: add qdisc_qlen_inc() and qdisc_qlen_dec()
From: Eric Dumazet @ 2026-04-10 18:22 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Jamal Hadi Salim, Jiri Pirko, netdev, eric.dumazet,
	Eric Dumazet
In-Reply-To: <20260410182257.774311-1-edumazet@google.com>

Helpers to increment or decrement sch->q.qlen, with appropriate
WRITE_ONCE() to prevent store tearing.

Add other WRITE_ONCE() when sch->q.qlen is changed.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/sch_generic.h | 26 ++++++++++++++++++--------
 net/sched/sch_api.c       |  2 +-
 net/sched/sch_cake.c      |  8 ++++----
 net/sched/sch_cbs.c       |  4 ++--
 net/sched/sch_choke.c     |  8 ++++----
 net/sched/sch_drr.c       |  4 ++--
 net/sched/sch_dualpi2.c   |  4 ++--
 net/sched/sch_etf.c       |  8 ++++----
 net/sched/sch_ets.c       |  4 ++--
 net/sched/sch_fq.c        |  6 +++---
 net/sched/sch_fq_codel.c  |  7 ++++---
 net/sched/sch_fq_pie.c    |  4 ++--
 net/sched/sch_generic.c   |  8 ++++----
 net/sched/sch_hfsc.c      |  4 ++--
 net/sched/sch_hhf.c       |  7 ++++---
 net/sched/sch_htb.c       |  4 ++--
 net/sched/sch_mq.c        |  5 +++--
 net/sched/sch_mqprio.c    | 18 ++++++++++--------
 net/sched/sch_multiq.c    |  4 ++--
 net/sched/sch_netem.c     | 10 +++++-----
 net/sched/sch_prio.c      |  4 ++--
 net/sched/sch_qfq.c       |  6 +++---
 net/sched/sch_red.c       |  4 ++--
 net/sched/sch_sfb.c       |  4 ++--
 net/sched/sch_sfq.c       |  9 +++++----
 net/sched/sch_skbprio.c   |  4 ++--
 net/sched/sch_taprio.c    |  4 ++--
 net/sched/sch_tbf.c       |  6 +++---
 net/sched/sch_teql.c      |  2 +-
 29 files changed, 102 insertions(+), 86 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index b22579671e4b4dd04c5dfa810b714daaac74af2a..27d705246dbde99eda02f225dd745f14c73bd830 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -542,6 +542,16 @@ static inline int qdisc_qlen(const struct Qdisc *q)
 	return q->q.qlen;
 }
 
+static inline void qdisc_qlen_inc(struct Qdisc *q)
+{
+	WRITE_ONCE(q->q.qlen, q->q.qlen + 1);
+}
+
+static inline void qdisc_qlen_dec(struct Qdisc *q)
+{
+	WRITE_ONCE(q->q.qlen, q->q.qlen - 1);
+}
+
 static inline int qdisc_qlen_sum(const struct Qdisc *q)
 {
 	__u32 qlen = q->qstats.qlen;
@@ -549,9 +559,9 @@ static inline int qdisc_qlen_sum(const struct Qdisc *q)
 
 	if (qdisc_is_percpu_stats(q)) {
 		for_each_possible_cpu(i)
-			qlen += per_cpu_ptr(q->cpu_qstats, i)->qlen;
+			qlen += READ_ONCE(per_cpu_ptr(q->cpu_qstats, i)->qlen);
 	} else {
-		qlen += q->q.qlen;
+		qlen += READ_ONCE(q->q.qlen);
 	}
 
 	return qlen;
@@ -1110,7 +1120,7 @@ static inline struct sk_buff *qdisc_dequeue_internal(struct Qdisc *sch, bool dir
 
 	skb = __skb_dequeue(&sch->gso_skb);
 	if (skb) {
-		sch->q.qlen--;
+		qdisc_qlen_dec(sch);
 		qdisc_qstats_backlog_dec(sch, skb);
 		return skb;
 	}
@@ -1256,7 +1266,7 @@ static inline struct sk_buff *qdisc_peek_dequeued(struct Qdisc *sch)
 			__skb_queue_head(&sch->gso_skb, skb);
 			/* it's still part of the queue */
 			qdisc_qstats_backlog_inc(sch, skb);
-			sch->q.qlen++;
+			qdisc_qlen_inc(sch);
 		}
 	}
 
@@ -1273,7 +1283,7 @@ static inline void qdisc_update_stats_at_dequeue(struct Qdisc *sch,
 	} else {
 		qdisc_qstats_backlog_dec(sch, skb);
 		qdisc_bstats_update(sch, skb);
-		sch->q.qlen--;
+		qdisc_qlen_dec(sch);
 	}
 }
 
@@ -1285,7 +1295,7 @@ static inline void qdisc_update_stats_at_enqueue(struct Qdisc *sch,
 		this_cpu_add(sch->cpu_qstats->backlog, pkt_len);
 	} else {
 		sch->qstats.backlog += pkt_len;
-		sch->q.qlen++;
+		qdisc_qlen_inc(sch);
 	}
 }
 
@@ -1301,7 +1311,7 @@ static inline struct sk_buff *qdisc_dequeue_peeked(struct Qdisc *sch)
 			qdisc_qstats_cpu_qlen_dec(sch);
 		} else {
 			qdisc_qstats_backlog_dec(sch, skb);
-			sch->q.qlen--;
+			qdisc_qlen_dec(sch);
 		}
 	} else {
 		skb = sch->dequeue(sch);
@@ -1322,7 +1332,7 @@ static inline void __qdisc_reset_queue(struct qdisc_skb_head *qh)
 
 		qh->head = NULL;
 		qh->tail = NULL;
-		qh->qlen = 0;
+		WRITE_ONCE(qh->qlen, 0);
 	}
 }
 
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index ed869a5ffc7377b7c19e66ae5fc9788e709488da..0dd3efd86393870e9695dddb4a471c5bf854f81e 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -805,7 +805,7 @@ void qdisc_tree_reduce_backlog(struct Qdisc *sch, int n, int len)
 			cl = cops->find(sch, parentid);
 			cops->qlen_notify(sch, cl);
 		}
-		sch->q.qlen -= n;
+		WRITE_ONCE(sch->q.qlen, sch->q.qlen - n);
 		sch->qstats.backlog -= len;
 		__qdisc_qstats_drop(sch, drops);
 	}
diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c
index 30c881dda6a366af40e94f746cf95af83d2f10d2..f2d9aded493264f6235d43c7e99db1d4c27408e6 100644
--- a/net/sched/sch_cake.c
+++ b/net/sched/sch_cake.c
@@ -1605,7 +1605,7 @@ static unsigned int cake_drop(struct Qdisc *sch, struct sk_buff **to_free)
 		cake_advance_shaper(q, b, skb, now, true);
 
 	qdisc_drop_reason(skb, sch, to_free, QDISC_DROP_OVERLIMIT);
-	sch->q.qlen--;
+	qdisc_qlen_dec(sch);
 
 	cake_heapify(q, 0);
 
@@ -1815,7 +1815,7 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 									  segs);
 			flow_queue_add(flow, segs);
 
-			sch->q.qlen++;
+			qdisc_qlen_inc(sch);
 			numsegs++;
 			slen += segs->len;
 			q->buffer_used += segs->truesize;
@@ -1854,7 +1854,7 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 			qdisc_tree_reduce_backlog(sch, 1, ack_pkt_len);
 			consume_skb(ack);
 		} else {
-			sch->q.qlen++;
+			qdisc_qlen_inc(sch);
 			q->buffer_used      += skb->truesize;
 		}
 
@@ -1980,7 +1980,7 @@ static struct sk_buff *cake_dequeue_one(struct Qdisc *sch)
 		b->tin_backlog		 -= len;
 		sch->qstats.backlog      -= len;
 		q->buffer_used		 -= skb->truesize;
-		sch->q.qlen--;
+		qdisc_qlen_dec(sch);
 
 		if (q->overflow_timeout)
 			cake_heapify(q, b->overflow_idx[q->cur_flow]);
diff --git a/net/sched/sch_cbs.c b/net/sched/sch_cbs.c
index 8c9a0400c8622c652db290796f2dd338eb61799c..a75e58876797952f2218725f6da5cff29f330ae2 100644
--- a/net/sched/sch_cbs.c
+++ b/net/sched/sch_cbs.c
@@ -97,7 +97,7 @@ static int cbs_child_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 		return err;
 
 	sch->qstats.backlog += len;
-	sch->q.qlen++;
+	qdisc_qlen_inc(sch);
 
 	return NET_XMIT_SUCCESS;
 }
@@ -168,7 +168,7 @@ static struct sk_buff *cbs_child_dequeue(struct Qdisc *sch, struct Qdisc *child)
 
 	qdisc_qstats_backlog_dec(sch, skb);
 	qdisc_bstats_update(sch, skb);
-	sch->q.qlen--;
+	qdisc_qlen_dec(sch);
 
 	return skb;
 }
diff --git a/net/sched/sch_choke.c b/net/sched/sch_choke.c
index 94df8e741a979191a06885ad3ee813f12650ff3c..cd0785ad8e74314e6d5c88144ffcf64f286e02dd 100644
--- a/net/sched/sch_choke.c
+++ b/net/sched/sch_choke.c
@@ -123,7 +123,7 @@ static void choke_drop_by_idx(struct Qdisc *sch, unsigned int idx,
 	if (idx == q->tail)
 		choke_zap_tail_holes(q);
 
-	--sch->q.qlen;
+	qdisc_qlen_dec(sch);
 	qdisc_qstats_backlog_dec(sch, skb);
 	qdisc_tree_reduce_backlog(sch, 1, qdisc_pkt_len(skb));
 	qdisc_drop(skb, sch, to_free);
@@ -267,7 +267,7 @@ static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 	if (sch->q.qlen < q->limit) {
 		q->tab[q->tail] = skb;
 		q->tail = (q->tail + 1) & q->tab_mask;
-		++sch->q.qlen;
+		qdisc_qlen_inc(sch);
 		qdisc_qstats_backlog_inc(sch, skb);
 		return NET_XMIT_SUCCESS;
 	}
@@ -294,7 +294,7 @@ static struct sk_buff *choke_dequeue(struct Qdisc *sch)
 	skb = q->tab[q->head];
 	q->tab[q->head] = NULL;
 	choke_zap_head_holes(q);
-	--sch->q.qlen;
+	qdisc_qlen_dec(sch);
 	qdisc_qstats_backlog_dec(sch, skb);
 	qdisc_bstats_update(sch, skb);
 
@@ -392,7 +392,7 @@ static int choke_change(struct Qdisc *sch, struct nlattr *opt,
 				}
 				dropped += qdisc_pkt_len(skb);
 				qdisc_qstats_backlog_dec(sch, skb);
-				--sch->q.qlen;
+				qdisc_qlen_dec(sch);
 				rtnl_qdisc_drop(skb, sch);
 			}
 			qdisc_tree_reduce_backlog(sch, oqlen - sch->q.qlen, dropped);
diff --git a/net/sched/sch_drr.c b/net/sched/sch_drr.c
index 01335a49e091444747635ee8bc7e22ded504d571..925fa0cfd730ce72e45e8983ba02eb913afb1235 100644
--- a/net/sched/sch_drr.c
+++ b/net/sched/sch_drr.c
@@ -366,7 +366,7 @@ static int drr_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 	}
 
 	sch->qstats.backlog += len;
-	sch->q.qlen++;
+	qdisc_qlen_inc(sch);
 	return err;
 }
 
@@ -399,7 +399,7 @@ static struct sk_buff *drr_dequeue(struct Qdisc *sch)
 			bstats_update(&cl->bstats, skb);
 			qdisc_bstats_update(sch, skb);
 			qdisc_qstats_backlog_dec(sch, skb);
-			sch->q.qlen--;
+			qdisc_qlen_dec(sch);
 			return skb;
 		}
 
diff --git a/net/sched/sch_dualpi2.c b/net/sched/sch_dualpi2.c
index fe6f5e8896257674b9f175e01428b89e299a7dda..d093d058decbc577f3b311e1e8513260c167bff0 100644
--- a/net/sched/sch_dualpi2.c
+++ b/net/sched/sch_dualpi2.c
@@ -415,7 +415,7 @@ static int dualpi2_enqueue_skb(struct sk_buff *skb, struct Qdisc *sch,
 		dualpi2_skb_cb(skb)->apply_step = skb_apply_step(skb, q);
 
 		/* Keep the overall qdisc stats consistent */
-		++sch->q.qlen;
+		qdisc_qlen_inc(sch);
 		qdisc_qstats_backlog_inc(sch, skb);
 		++q->packets_in_l;
 		if (!q->l_head_ts)
@@ -530,7 +530,7 @@ static struct sk_buff *dequeue_packet(struct Qdisc *sch,
 		qdisc_qstats_backlog_dec(q->l_queue, skb);
 
 		/* Keep the global queue size consistent */
-		--sch->q.qlen;
+		qdisc_qlen_dec(sch);
 		q->memory_used -= skb->truesize;
 	} else if (c_len) {
 		skb = __qdisc_dequeue_head(&sch->q);
diff --git a/net/sched/sch_etf.c b/net/sched/sch_etf.c
index c74d778c32a1eda639650df4d1d103c5338f14e6..ada87a81da6ac4c20e036b5391eb4efe9795ab91 100644
--- a/net/sched/sch_etf.c
+++ b/net/sched/sch_etf.c
@@ -189,7 +189,7 @@ static int etf_enqueue_timesortedlist(struct sk_buff *nskb, struct Qdisc *sch,
 	rb_insert_color_cached(&nskb->rbnode, &q->head, leftmost);
 
 	qdisc_qstats_backlog_inc(sch, nskb);
-	sch->q.qlen++;
+	qdisc_qlen_inc(sch);
 
 	/* Now we may need to re-arm the qdisc watchdog for the next packet. */
 	reset_watchdog(sch);
@@ -222,7 +222,7 @@ static void timesortedlist_drop(struct Qdisc *sch, struct sk_buff *skb,
 		qdisc_qstats_backlog_dec(sch, skb);
 		qdisc_drop(skb, sch, &to_free);
 		qdisc_qstats_overlimit(sch);
-		sch->q.qlen--;
+		qdisc_qlen_dec(sch);
 	}
 
 	kfree_skb_list(to_free);
@@ -247,7 +247,7 @@ static void timesortedlist_remove(struct Qdisc *sch, struct sk_buff *skb)
 
 	q->last = skb->tstamp;
 
-	sch->q.qlen--;
+	qdisc_qlen_dec(sch);
 }
 
 static struct sk_buff *etf_dequeue_timesortedlist(struct Qdisc *sch)
@@ -426,7 +426,7 @@ static void timesortedlist_clear(struct Qdisc *sch)
 
 		rb_erase_cached(&skb->rbnode, &q->head);
 		rtnl_kfree_skbs(skb, skb);
-		sch->q.qlen--;
+		qdisc_qlen_dec(sch);
 	}
 }
 
diff --git a/net/sched/sch_ets.c b/net/sched/sch_ets.c
index a4b07b661b7756a675d22c0f84f8f0a713cdb7eb..c817e0a6c14653a35f5ebb9de1a5ccc44d1a2f98 100644
--- a/net/sched/sch_ets.c
+++ b/net/sched/sch_ets.c
@@ -449,7 +449,7 @@ static int ets_qdisc_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 	}
 
 	sch->qstats.backlog += len;
-	sch->q.qlen++;
+	qdisc_qlen_inc(sch);
 	return err;
 }
 
@@ -458,7 +458,7 @@ ets_qdisc_dequeue_skb(struct Qdisc *sch, struct sk_buff *skb)
 {
 	qdisc_bstats_update(sch, skb);
 	qdisc_qstats_backlog_dec(sch, skb);
-	sch->q.qlen--;
+	qdisc_qlen_dec(sch);
 	return skb;
 }
 
diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c
index f2edcf872981fd8181dfb97a3bc665fd4a869115..1e34ac136b15cf24742f2810d201420cf763021a 100644
--- a/net/sched/sch_fq.c
+++ b/net/sched/sch_fq.c
@@ -497,7 +497,7 @@ static void fq_dequeue_skb(struct Qdisc *sch, struct fq_flow *flow,
 	fq_erase_head(sch, flow, skb);
 	skb_mark_not_on_list(skb);
 	qdisc_qstats_backlog_dec(sch, skb);
-	sch->q.qlen--;
+	qdisc_qlen_dec(sch);
 	qdisc_bstats_update(sch, skb);
 }
 
@@ -597,7 +597,7 @@ static int fq_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 	flow_queue_add(f, skb);
 
 	qdisc_qstats_backlog_inc(sch, skb);
-	sch->q.qlen++;
+	qdisc_qlen_inc(sch);
 
 	return NET_XMIT_SUCCESS;
 }
@@ -801,7 +801,7 @@ static void fq_reset(struct Qdisc *sch)
 	struct fq_flow *f;
 	unsigned int idx;
 
-	sch->q.qlen = 0;
+	WRITE_ONCE(sch->q.qlen, 0);
 	sch->qstats.backlog = 0;
 
 	fq_flow_purge(&q->internal);
diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
index f9ef95e69f72870c8a6a16b7e7911d1287b8e83e..3c60a8ec4682b174ebf0df34f356eb6132356764 100644
--- a/net/sched/sch_fq_codel.c
+++ b/net/sched/sch_fq_codel.c
@@ -178,7 +178,7 @@ static unsigned int fq_codel_drop(struct Qdisc *sch, unsigned int max_packets,
 	q->memory_usage -= mem;
 	__qdisc_qstats_drop(sch, i);
 	sch->qstats.backlog -= len;
-	sch->q.qlen -= i;
+	WRITE_ONCE(sch->q.qlen, sch->q.qlen - i);
 	return idx;
 }
 
@@ -215,7 +215,8 @@ static int fq_codel_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 	get_codel_cb(skb)->mem_usage = skb->truesize;
 	q->memory_usage += get_codel_cb(skb)->mem_usage;
 	memory_limited = q->memory_usage > q->memory_limit;
-	if (++sch->q.qlen <= sch->limit && !memory_limited)
+	qdisc_qlen_inc(sch);
+	if (sch->q.qlen <= sch->limit && !memory_limited)
 		return NET_XMIT_SUCCESS;
 
 	prev_backlog = sch->qstats.backlog;
@@ -265,7 +266,7 @@ static struct sk_buff *dequeue_func(struct codel_vars *vars, void *ctx)
 		skb = dequeue_head(flow);
 		q->backlogs[flow - q->flows] -= qdisc_pkt_len(skb);
 		q->memory_usage -= get_codel_cb(skb)->mem_usage;
-		sch->q.qlen--;
+		qdisc_qlen_dec(sch);
 		sch->qstats.backlog -= qdisc_pkt_len(skb);
 	}
 	return skb;
diff --git a/net/sched/sch_fq_pie.c b/net/sched/sch_fq_pie.c
index 154c70f489f289066db5d61bb51e58aaf328f16e..dba49d44a5d2412b2deb983bf87428ade7944e51 100644
--- a/net/sched/sch_fq_pie.c
+++ b/net/sched/sch_fq_pie.c
@@ -185,7 +185,7 @@ static int fq_pie_qdisc_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 		q->stats.packets_in++;
 		q->memory_usage += skb->truesize;
 		sch->qstats.backlog += pkt_len;
-		sch->q.qlen++;
+		qdisc_qlen_inc(sch);
 		flow_queue_add(sel_flow, skb);
 		if (list_empty(&sel_flow->flowchain)) {
 			list_add_tail(&sel_flow->flowchain, &q->new_flows);
@@ -263,7 +263,7 @@ static struct sk_buff *fq_pie_qdisc_dequeue(struct Qdisc *sch)
 		skb = dequeue_head(flow);
 		pkt_len = qdisc_pkt_len(skb);
 		sch->qstats.backlog -= pkt_len;
-		sch->q.qlen--;
+		qdisc_qlen_dec(sch);
 		qdisc_bstats_update(sch, skb);
 	}
 
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index a93321db8fd75d30c61e146c290bbc139c37c913..32ace8659ab86457cd1b1655810e0f4105149c47 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -118,7 +118,7 @@ static inline struct sk_buff *__skb_dequeue_bad_txq(struct Qdisc *q)
 				qdisc_qstats_cpu_qlen_dec(q);
 			} else {
 				qdisc_qstats_backlog_dec(q, skb);
-				q->q.qlen--;
+				qdisc_qlen_dec(q);
 			}
 		} else {
 			skb = SKB_XOFF_MAGIC;
@@ -159,7 +159,7 @@ static inline void qdisc_enqueue_skb_bad_txq(struct Qdisc *q,
 		qdisc_qstats_cpu_qlen_inc(q);
 	} else {
 		qdisc_qstats_backlog_inc(q, skb);
-		q->q.qlen++;
+		qdisc_qlen_inc(q);
 	}
 
 	if (lock)
@@ -188,7 +188,7 @@ static inline void dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
 		} else {
 			q->qstats.requeues++;
 			qdisc_qstats_backlog_inc(q, skb);
-			q->q.qlen++;
+			qdisc_qlen_inc(q);
 		}
 
 		skb = next;
@@ -294,7 +294,7 @@ static struct sk_buff *dequeue_skb(struct Qdisc *q, bool *validate,
 				qdisc_qstats_cpu_qlen_dec(q);
 			} else {
 				qdisc_qstats_backlog_dec(q, skb);
-				q->q.qlen--;
+				qdisc_qlen_dec(q);
 			}
 		} else {
 			skb = NULL;
diff --git a/net/sched/sch_hfsc.c b/net/sched/sch_hfsc.c
index 83b2ca2e37fc82cfebf089e6c0e36f18af939887..e71a565100edf60881ca7542faa408c5bb1a0984 100644
--- a/net/sched/sch_hfsc.c
+++ b/net/sched/sch_hfsc.c
@@ -1561,7 +1561,7 @@ hfsc_enqueue(struct sk_buff *skb, struct Qdisc *sch, struct sk_buff **to_free)
 	}
 
 	sch->qstats.backlog += len;
-	sch->q.qlen++;
+	qdisc_qlen_inc(sch);
 
 	if (first && !cl_in_el_or_vttree(cl)) {
 		if (cl->cl_flags & HFSC_RSC)
@@ -1650,7 +1650,7 @@ hfsc_dequeue(struct Qdisc *sch)
 
 	qdisc_bstats_update(sch, skb);
 	qdisc_qstats_backlog_dec(sch, skb);
-	sch->q.qlen--;
+	qdisc_qlen_dec(sch);
 
 	return skb;
 }
diff --git a/net/sched/sch_hhf.c b/net/sched/sch_hhf.c
index 95e5d9bfd9c8c0cac08e080b8f1e0332e722aa3b..69b6f0a5471cb9a3b7b760144683f2b249091d89 100644
--- a/net/sched/sch_hhf.c
+++ b/net/sched/sch_hhf.c
@@ -359,7 +359,7 @@ static unsigned int hhf_drop(struct Qdisc *sch, struct sk_buff **to_free)
 	if (bucket->head) {
 		struct sk_buff *skb = dequeue_head(bucket);
 
-		sch->q.qlen--;
+		qdisc_qlen_dec(sch);
 		qdisc_qstats_backlog_dec(sch, skb);
 		qdisc_drop(skb, sch, to_free);
 	}
@@ -399,7 +399,8 @@ static int hhf_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 		}
 		bucket->deficit = weight * q->quantum;
 	}
-	if (++sch->q.qlen <= sch->limit)
+	qdisc_qlen_inc(sch);
+	if (sch->q.qlen <= sch->limit)
 		return NET_XMIT_SUCCESS;
 
 	prev_backlog = sch->qstats.backlog;
@@ -442,7 +443,7 @@ static struct sk_buff *hhf_dequeue(struct Qdisc *sch)
 
 	if (bucket->head) {
 		skb = dequeue_head(bucket);
-		sch->q.qlen--;
+		qdisc_qlen_dec(sch);
 		qdisc_qstats_backlog_dec(sch, skb);
 	}
 
diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index eb12381795ce1bb0f3b8c5f502e16ad64c4408c8..c22ccd8eae8c73323ccdf425e62857b3b851d74e 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -651,7 +651,7 @@ static int htb_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 	}
 
 	sch->qstats.backlog += len;
-	sch->q.qlen++;
+	qdisc_qlen_inc(sch);
 	return NET_XMIT_SUCCESS;
 }
 
@@ -951,7 +951,7 @@ static struct sk_buff *htb_dequeue(struct Qdisc *sch)
 ok:
 		qdisc_bstats_update(sch, skb);
 		qdisc_qstats_backlog_dec(sch, skb);
-		sch->q.qlen--;
+		qdisc_qlen_dec(sch);
 		return skb;
 	}
 
diff --git a/net/sched/sch_mq.c b/net/sched/sch_mq.c
index a0133a7b9d3b09a0d2a6064234c8fdef60dbf955..ec8c91d3fde04e59daec2aecdb14d6bf50715e15 100644
--- a/net/sched/sch_mq.c
+++ b/net/sched/sch_mq.c
@@ -143,10 +143,10 @@ EXPORT_SYMBOL_NS_GPL(mq_attach, "NET_SCHED_INTERNAL");
 void mq_dump_common(struct Qdisc *sch, struct sk_buff *skb)
 {
 	struct net_device *dev = qdisc_dev(sch);
+	unsigned int qlen = 0;
 	struct Qdisc *qdisc;
 	unsigned int ntx;
 
-	sch->q.qlen = 0;
 	gnet_stats_basic_sync_init(&sch->bstats);
 	memset(&sch->qstats, 0, sizeof(sch->qstats));
 
@@ -163,10 +163,11 @@ void mq_dump_common(struct Qdisc *sch, struct sk_buff *skb)
 				     &qdisc->bstats, false);
 		gnet_stats_add_queue(&sch->qstats, qdisc->cpu_qstats,
 				     &qdisc->qstats);
-		sch->q.qlen += qdisc_qlen(qdisc);
+		qlen += qdisc_qlen(qdisc);
 
 		spin_unlock_bh(qdisc_lock(qdisc));
 	}
+	WRITE_ONCE(sch->q.qlen, qlen);
 }
 EXPORT_SYMBOL_NS_GPL(mq_dump_common, "NET_SCHED_INTERNAL");
 
diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
index 002add5ce9e0ab04a6260495d1bec02983c2a204..91a92992cd24ab6c30bf7db2288c08cd493c7bc3 100644
--- a/net/sched/sch_mqprio.c
+++ b/net/sched/sch_mqprio.c
@@ -555,10 +555,11 @@ static int mqprio_dump(struct Qdisc *sch, struct sk_buff *skb)
 	struct mqprio_sched *priv = qdisc_priv(sch);
 	struct nlattr *nla = (struct nlattr *)skb_tail_pointer(skb);
 	struct tc_mqprio_qopt opt = { 0 };
+	unsigned int qlen = 0;
 	struct Qdisc *qdisc;
 	unsigned int ntx;
 
-	sch->q.qlen = 0;
+	qlen = 0;
 	gnet_stats_basic_sync_init(&sch->bstats);
 	memset(&sch->qstats, 0, sizeof(sch->qstats));
 
@@ -575,10 +576,11 @@ static int mqprio_dump(struct Qdisc *sch, struct sk_buff *skb)
 				     &qdisc->bstats, false);
 		gnet_stats_add_queue(&sch->qstats, qdisc->cpu_qstats,
 				     &qdisc->qstats);
-		sch->q.qlen += qdisc_qlen(qdisc);
+		qlen += qdisc_qlen(qdisc);
 
 		spin_unlock_bh(qdisc_lock(qdisc));
 	}
+	WRITE_ONCE(sch->q.qlen, qlen);
 
 	mqprio_qopt_reconstruct(dev, &opt);
 	opt.hw = priv->hw_offload;
@@ -663,12 +665,12 @@ static int mqprio_dump_class_stats(struct Qdisc *sch, unsigned long cl,
 	__acquires(d->lock)
 {
 	if (cl >= TC_H_MIN_PRIORITY) {
-		int i;
-		__u32 qlen;
-		struct gnet_stats_queue qstats = {0};
-		struct gnet_stats_basic_sync bstats;
 		struct net_device *dev = qdisc_dev(sch);
 		struct netdev_tc_txq tc = dev->tc_to_txq[cl & TC_BITMASK];
+		struct gnet_stats_queue qstats = {0};
+		struct gnet_stats_basic_sync bstats;
+		u32 qlen = 0;
+		int i;
 
 		gnet_stats_basic_sync_init(&bstats);
 		/* Drop lock here it will be reclaimed before touching
@@ -689,11 +691,11 @@ static int mqprio_dump_class_stats(struct Qdisc *sch, unsigned long cl,
 					     &qdisc->bstats, false);
 			gnet_stats_add_queue(&qstats, qdisc->cpu_qstats,
 					     &qdisc->qstats);
-			sch->q.qlen += qdisc_qlen(qdisc);
+			qlen += qdisc_qlen(qdisc);
 
 			spin_unlock_bh(qdisc_lock(qdisc));
 		}
-		qlen = qdisc_qlen(sch) + qstats.qlen;
+		qlen = qlen + qstats.qlen;
 
 		/* Reclaim root sleeping lock before completing stats */
 		if (d->lock)
diff --git a/net/sched/sch_multiq.c b/net/sched/sch_multiq.c
index 9f822fee113df6562ddac89092357434547a4599..4e465d11e3d75e36b875b66f8c8087c2e15cdad9 100644
--- a/net/sched/sch_multiq.c
+++ b/net/sched/sch_multiq.c
@@ -76,7 +76,7 @@ multiq_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 
 	ret = qdisc_enqueue(skb, qdisc, to_free);
 	if (ret == NET_XMIT_SUCCESS) {
-		sch->q.qlen++;
+		qdisc_qlen_inc(sch);
 		return NET_XMIT_SUCCESS;
 	}
 	if (net_xmit_drop_count(ret))
@@ -106,7 +106,7 @@ static struct sk_buff *multiq_dequeue(struct Qdisc *sch)
 			skb = qdisc->dequeue(qdisc);
 			if (skb) {
 				qdisc_bstats_update(sch, skb);
-				sch->q.qlen--;
+				qdisc_qlen_dec(sch);
 				return skb;
 			}
 		}
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index 20df1c08b1e9d04e9495f1a69eff0dd96049f914..4498dd440a02ea7a089c92ebc005d5064b87e2d2 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -416,7 +416,7 @@ static void tfifo_enqueue(struct sk_buff *nskb, struct Qdisc *sch)
 		rb_insert_color(&nskb->rbnode, &q->t_root);
 	}
 	q->t_len++;
-	sch->q.qlen++;
+	qdisc_qlen_inc(sch);
 }
 
 /* netem can't properly corrupt a megapacket (like we get from GSO), so instead
@@ -752,19 +752,19 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch)
 					if (net_xmit_drop_count(err))
 						qdisc_qstats_drop(sch);
 					sch->qstats.backlog -= pkt_len;
-					sch->q.qlen--;
+					qdisc_qlen_dec(sch);
 					qdisc_tree_reduce_backlog(sch, 1, pkt_len);
 				}
 				goto tfifo_dequeue;
 			}
-			sch->q.qlen--;
+			qdisc_qlen_dec(sch);
 			goto deliver;
 		}
 
 		if (q->qdisc) {
 			skb = q->qdisc->ops->dequeue(q->qdisc);
 			if (skb) {
-				sch->q.qlen--;
+				qdisc_qlen_dec(sch);
 				goto deliver;
 			}
 		}
@@ -777,7 +777,7 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch)
 	if (q->qdisc) {
 		skb = q->qdisc->ops->dequeue(q->qdisc);
 		if (skb) {
-			sch->q.qlen--;
+			qdisc_qlen_dec(sch);
 			goto deliver;
 		}
 	}
diff --git a/net/sched/sch_prio.c b/net/sched/sch_prio.c
index 9e2b9a490db23d858b27b7fc073b05a06535b05e..fe42ae3d6b696b2fc47f4d397af32e950eeec194 100644
--- a/net/sched/sch_prio.c
+++ b/net/sched/sch_prio.c
@@ -86,7 +86,7 @@ prio_enqueue(struct sk_buff *skb, struct Qdisc *sch, struct sk_buff **to_free)
 	ret = qdisc_enqueue(skb, qdisc, to_free);
 	if (ret == NET_XMIT_SUCCESS) {
 		sch->qstats.backlog += len;
-		sch->q.qlen++;
+		qdisc_qlen_inc(sch);
 		return NET_XMIT_SUCCESS;
 	}
 	if (net_xmit_drop_count(ret))
@@ -119,7 +119,7 @@ static struct sk_buff *prio_dequeue(struct Qdisc *sch)
 		if (skb) {
 			qdisc_bstats_update(sch, skb);
 			qdisc_qstats_backlog_dec(sch, skb);
-			sch->q.qlen--;
+			qdisc_qlen_dec(sch);
 			return skb;
 		}
 	}
diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
index 699e45873f86145e96abd0d9ca77a6d0ff763b1b..195c434aae5f7e03d1a1238ed73bb64b3f04e105 100644
--- a/net/sched/sch_qfq.c
+++ b/net/sched/sch_qfq.c
@@ -1152,12 +1152,12 @@ static struct sk_buff *qfq_dequeue(struct Qdisc *sch)
 	if (!skb)
 		return NULL;
 
-	sch->q.qlen--;
+	qdisc_qlen_dec(sch);
 
 	skb = agg_dequeue(in_serv_agg, cl, len);
 
 	if (!skb) {
-		sch->q.qlen++;
+		qdisc_qlen_inc(sch);
 		return NULL;
 	}
 
@@ -1265,7 +1265,7 @@ static int qfq_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 
 	_bstats_update(&cl->bstats, len, gso_segs);
 	sch->qstats.backlog += len;
-	++sch->q.qlen;
+	qdisc_qlen_inc(sch);
 
 	agg = cl->agg;
 	/* if the class is active, then done here */
diff --git a/net/sched/sch_red.c b/net/sched/sch_red.c
index c8d3d09f15e3919d6468964561130bfc79fb215b..61b9064d39f222bdfe5021e93e8172b7ae60c408 100644
--- a/net/sched/sch_red.c
+++ b/net/sched/sch_red.c
@@ -133,7 +133,7 @@ static int red_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 	ret = qdisc_enqueue(skb, child, to_free);
 	if (likely(ret == NET_XMIT_SUCCESS)) {
 		sch->qstats.backlog += len;
-		sch->q.qlen++;
+		qdisc_qlen_inc(sch);
 	} else if (net_xmit_drop_count(ret)) {
 		q->stats.pdrop++;
 		qdisc_qstats_drop(sch);
@@ -159,7 +159,7 @@ static struct sk_buff *red_dequeue(struct Qdisc *sch)
 	if (skb) {
 		qdisc_bstats_update(sch, skb);
 		qdisc_qstats_backlog_dec(sch, skb);
-		sch->q.qlen--;
+		qdisc_qlen_dec(sch);
 	} else {
 		if (!red_is_idling(&q->vars))
 			red_start_of_idle_period(&q->vars);
diff --git a/net/sched/sch_sfb.c b/net/sched/sch_sfb.c
index 01373866212894c57e7de58706ee464879303955..17b6ce223ad3a6f2d289c3ebe27cce8168c66b2b 100644
--- a/net/sched/sch_sfb.c
+++ b/net/sched/sch_sfb.c
@@ -407,7 +407,7 @@ static int sfb_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 	ret = qdisc_enqueue(skb, child, to_free);
 	if (likely(ret == NET_XMIT_SUCCESS)) {
 		sch->qstats.backlog += len;
-		sch->q.qlen++;
+		qdisc_qlen_inc(sch);
 		increment_qlen(&cb, q);
 	} else if (net_xmit_drop_count(ret)) {
 		q->stats.childdrop++;
@@ -436,7 +436,7 @@ static struct sk_buff *sfb_dequeue(struct Qdisc *sch)
 	if (skb) {
 		qdisc_bstats_update(sch, skb);
 		qdisc_qstats_backlog_dec(sch, skb);
-		sch->q.qlen--;
+		qdisc_qlen_dec(sch);
 		decrement_qlen(skb, q);
 	}
 
diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
index c3f3181dba5424eb9d26362a1628653bb9392e89..132afe4d923c56e25bcb406a7cc47acef0f0e9b5 100644
--- a/net/sched/sch_sfq.c
+++ b/net/sched/sch_sfq.c
@@ -300,7 +300,7 @@ static unsigned int sfq_drop(struct Qdisc *sch, struct sk_buff **to_free)
 		len = qdisc_pkt_len(skb);
 		slot->backlog -= len;
 		sfq_dec(q, x);
-		sch->q.qlen--;
+		qdisc_qlen_dec(sch);
 		qdisc_qstats_backlog_dec(sch, skb);
 		qdisc_drop_reason(skb, sch, to_free, QDISC_DROP_OVERLIMIT);
 		return len;
@@ -454,7 +454,8 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch, struct sk_buff **to_free)
 		/* We could use a bigger initial quantum for new flows */
 		slot->allot = q->quantum;
 	}
-	if (++sch->q.qlen <= q->limit)
+	qdisc_qlen_inc(sch);
+	if (sch->q.qlen <= q->limit)
 		return NET_XMIT_SUCCESS;
 
 	qlen = slot->qlen;
@@ -495,7 +496,7 @@ sfq_dequeue(struct Qdisc *sch)
 	skb = slot_dequeue_head(slot);
 	sfq_dec(q, a);
 	qdisc_bstats_update(sch, skb);
-	sch->q.qlen--;
+	qdisc_qlen_dec(sch);
 	qdisc_qstats_backlog_dec(sch, skb);
 	slot->backlog -= qdisc_pkt_len(skb);
 	/* Is the slot empty? */
@@ -594,7 +595,7 @@ static void sfq_rehash(struct Qdisc *sch)
 			slot->allot = q->quantum;
 		}
 	}
-	sch->q.qlen -= dropped;
+	WRITE_ONCE(sch->q.qlen, sch->q.qlen - dropped);
 	qdisc_tree_reduce_backlog(sch, dropped, drop_len);
 }
 
diff --git a/net/sched/sch_skbprio.c b/net/sched/sch_skbprio.c
index f485f62ab721ab8cde21230c60514708fb479982..52abfb4015a36408046d96b349497419ab5dacf8 100644
--- a/net/sched/sch_skbprio.c
+++ b/net/sched/sch_skbprio.c
@@ -93,7 +93,7 @@ static int skbprio_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 		if (prio < q->lowest_prio)
 			q->lowest_prio = prio;
 
-		sch->q.qlen++;
+		qdisc_qlen_inc(sch);
 		return NET_XMIT_SUCCESS;
 	}
 
@@ -145,7 +145,7 @@ static struct sk_buff *skbprio_dequeue(struct Qdisc *sch)
 	if (unlikely(!skb))
 		return NULL;
 
-	sch->q.qlen--;
+	qdisc_qlen_dec(sch);
 	qdisc_qstats_backlog_dec(sch, skb);
 	qdisc_bstats_update(sch, skb);
 
diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index 8e375281195061da848fb2bfaf79cf125afccac0..885a9bc859166dfb6d20aa0dfbb8f11194e02ba9 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -574,7 +574,7 @@ static int taprio_enqueue_one(struct sk_buff *skb, struct Qdisc *sch,
 	}
 
 	qdisc_qstats_backlog_inc(sch, skb);
-	sch->q.qlen++;
+	qdisc_qlen_inc(sch);
 
 	return qdisc_enqueue(skb, child, to_free);
 }
@@ -755,7 +755,7 @@ static struct sk_buff *taprio_dequeue_from_txq(struct Qdisc *sch, int txq,
 
 	qdisc_bstats_update(sch, skb);
 	qdisc_qstats_backlog_dec(sch, skb);
-	sch->q.qlen--;
+	qdisc_qlen_dec(sch);
 
 	return skb;
 }
diff --git a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c
index f2340164f579a25431979e12ec3d23ab828edd16..25edf11a7d671fe63878b0995998c5920b86ef74 100644
--- a/net/sched/sch_tbf.c
+++ b/net/sched/sch_tbf.c
@@ -231,7 +231,7 @@ static int tbf_segment(struct sk_buff *skb, struct Qdisc *sch,
 			len += seg_len;
 		}
 	}
-	sch->q.qlen += nb;
+	WRITE_ONCE(sch->q.qlen, sch->q.qlen + nb);
 	sch->qstats.backlog += len;
 	if (nb > 0) {
 		qdisc_tree_reduce_backlog(sch, 1 - nb, prev_len - len);
@@ -264,7 +264,7 @@ static int tbf_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 	}
 
 	sch->qstats.backlog += len;
-	sch->q.qlen++;
+	qdisc_qlen_inc(sch);
 	return NET_XMIT_SUCCESS;
 }
 
@@ -309,7 +309,7 @@ static struct sk_buff *tbf_dequeue(struct Qdisc *sch)
 			q->tokens = toks;
 			q->ptokens = ptoks;
 			qdisc_qstats_backlog_dec(sch, skb);
-			sch->q.qlen--;
+			qdisc_qlen_dec(sch);
 			qdisc_bstats_update(sch, skb);
 			return skb;
 		}
diff --git a/net/sched/sch_teql.c b/net/sched/sch_teql.c
index ec4039a201a2c2c502bc649fa5f6a0e4feee8fd5..bd10da46f5ddbc53f914648066dab526c8064e55 100644
--- a/net/sched/sch_teql.c
+++ b/net/sched/sch_teql.c
@@ -107,7 +107,7 @@ teql_dequeue(struct Qdisc *sch)
 	} else {
 		qdisc_bstats_update(sch, skb);
 	}
-	sch->q.qlen = dat->q.qlen + q->q.qlen;
+	WRITE_ONCE(sch->q.qlen, dat->q.qlen + q->q.qlen);
 	return skb;
 }
 
-- 
2.53.0.1213.gd9a14994de-goog


^ permalink raw reply related

* [PATCH v3 net-next 05/15] net/sched: annotate data-races around sch->qstats.backlog
From: Eric Dumazet @ 2026-04-10 18:22 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Jamal Hadi Salim, Jiri Pirko, netdev, eric.dumazet,
	Eric Dumazet
In-Reply-To: <20260410182257.774311-1-edumazet@google.com>

Add qstats_backlog_sub() and qstats_backlog_add() helpers
and use them instead of open-coding them.

These helpers use WRITE_ONCE() to prevent store-tearing.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/sch_generic.h | 16 +++++++++++++---
 net/sched/sch_api.c       |  2 +-
 net/sched/sch_cake.c      |  8 ++++----
 net/sched/sch_cbs.c       |  2 +-
 net/sched/sch_codel.c     |  2 +-
 net/sched/sch_drr.c       |  2 +-
 net/sched/sch_ets.c       |  2 +-
 net/sched/sch_fq_codel.c  |  4 ++--
 net/sched/sch_fq_pie.c    |  4 ++--
 net/sched/sch_gred.c      |  2 +-
 net/sched/sch_hfsc.c      |  2 +-
 net/sched/sch_htb.c       |  2 +-
 net/sched/sch_netem.c     |  2 +-
 net/sched/sch_prio.c      |  2 +-
 net/sched/sch_qfq.c       |  2 +-
 net/sched/sch_red.c       |  2 +-
 net/sched/sch_sfb.c       |  4 ++--
 net/sched/sch_sfq.c       |  2 +-
 net/sched/sch_tbf.c       |  4 ++--
 19 files changed, 38 insertions(+), 28 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 27d705246dbde99eda02f225dd745f14c73bd830..b0564a39caf4471619b74179a06a0e41e3765d94 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -965,10 +965,15 @@ static inline void qdisc_bstats_update(struct Qdisc *sch,
 	bstats_update(&sch->bstats, skb);
 }
 
+static inline void qstats_backlog_sub(struct Qdisc *sch, u32 val)
+{
+	WRITE_ONCE(sch->qstats.backlog, sch->qstats.backlog - val);
+}
+
 static inline void qdisc_qstats_backlog_dec(struct Qdisc *sch,
 					    const struct sk_buff *skb)
 {
-	sch->qstats.backlog -= qdisc_pkt_len(skb);
+	qstats_backlog_sub(sch, qdisc_pkt_len(skb));
 }
 
 static inline void qdisc_qstats_cpu_backlog_dec(struct Qdisc *sch,
@@ -977,10 +982,15 @@ static inline void qdisc_qstats_cpu_backlog_dec(struct Qdisc *sch,
 	this_cpu_sub(sch->cpu_qstats->backlog, qdisc_pkt_len(skb));
 }
 
+static inline void qstats_backlog_add(struct Qdisc *sch, u32 val)
+{
+	WRITE_ONCE(sch->qstats.backlog, sch->qstats.backlog + val);
+}
+
 static inline void qdisc_qstats_backlog_inc(struct Qdisc *sch,
 					    const struct sk_buff *skb)
 {
-	sch->qstats.backlog += qdisc_pkt_len(skb);
+	qstats_backlog_add(sch, qdisc_pkt_len(skb));
 }
 
 static inline void qdisc_qstats_cpu_backlog_inc(struct Qdisc *sch,
@@ -1294,7 +1304,7 @@ static inline void qdisc_update_stats_at_enqueue(struct Qdisc *sch,
 		qdisc_qstats_cpu_qlen_inc(sch);
 		this_cpu_add(sch->cpu_qstats->backlog, pkt_len);
 	} else {
-		sch->qstats.backlog += pkt_len;
+		qstats_backlog_add(sch, pkt_len);
 		qdisc_qlen_inc(sch);
 	}
 }
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 0dd3efd86393870e9695dddb4a471c5bf854f81e..292bc8bb7a79922a83865ed54083c04ff72742ff 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -806,7 +806,7 @@ void qdisc_tree_reduce_backlog(struct Qdisc *sch, int n, int len)
 			cops->qlen_notify(sch, cl);
 		}
 		WRITE_ONCE(sch->q.qlen, sch->q.qlen - n);
-		sch->qstats.backlog -= len;
+		qstats_backlog_sub(sch, len);
 		__qdisc_qstats_drop(sch, drops);
 	}
 	rcu_read_unlock();
diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c
index f2d9aded493264f6235d43c7e99db1d4c27408e6..32e672820c00a88c6d8fe77a6308405e016525ea 100644
--- a/net/sched/sch_cake.c
+++ b/net/sched/sch_cake.c
@@ -1596,7 +1596,7 @@ static unsigned int cake_drop(struct Qdisc *sch, struct sk_buff **to_free)
 	q->buffer_used      -= skb->truesize;
 	b->backlogs[idx]    -= len;
 	b->tin_backlog      -= len;
-	sch->qstats.backlog -= len;
+	qstats_backlog_sub(sch, len);
 
 	flow->dropped++;
 	b->tin_dropped++;
@@ -1826,7 +1826,7 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 		b->bytes	    += slen;
 		b->backlogs[idx]    += slen;
 		b->tin_backlog      += slen;
-		sch->qstats.backlog += slen;
+		qstats_backlog_add(sch, slen);
 		q->avg_window_bytes += slen;
 
 		qdisc_tree_reduce_backlog(sch, 1-numsegs, len-slen);
@@ -1863,7 +1863,7 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 		b->bytes	    += len - ack_pkt_len;
 		b->backlogs[idx]    += len - ack_pkt_len;
 		b->tin_backlog      += len - ack_pkt_len;
-		sch->qstats.backlog += len - ack_pkt_len;
+		qstats_backlog_add(sch, len - ack_pkt_len);
 		q->avg_window_bytes += len - ack_pkt_len;
 	}
 
@@ -1978,7 +1978,7 @@ static struct sk_buff *cake_dequeue_one(struct Qdisc *sch)
 		len = qdisc_pkt_len(skb);
 		b->backlogs[q->cur_flow] -= len;
 		b->tin_backlog		 -= len;
-		sch->qstats.backlog      -= len;
+		qstats_backlog_sub(sch, len);
 		q->buffer_used		 -= skb->truesize;
 		qdisc_qlen_dec(sch);
 
diff --git a/net/sched/sch_cbs.c b/net/sched/sch_cbs.c
index a75e58876797952f2218725f6da5cff29f330ae2..2cfa0fd92829ad7eba7454e09dc17eb8f22519b8 100644
--- a/net/sched/sch_cbs.c
+++ b/net/sched/sch_cbs.c
@@ -96,7 +96,7 @@ static int cbs_child_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 	if (err != NET_XMIT_SUCCESS)
 		return err;
 
-	sch->qstats.backlog += len;
+	qstats_backlog_add(sch, len);
 	qdisc_qlen_inc(sch);
 
 	return NET_XMIT_SUCCESS;
diff --git a/net/sched/sch_codel.c b/net/sched/sch_codel.c
index 317aae0ec7bd6aedb4bae09b18423c981fed16e7..91dd2e629af8f2d1a29f439a6dbb5c186fa01d33 100644
--- a/net/sched/sch_codel.c
+++ b/net/sched/sch_codel.c
@@ -42,7 +42,7 @@ static struct sk_buff *dequeue_func(struct codel_vars *vars, void *ctx)
 	struct sk_buff *skb = __qdisc_dequeue_head(&sch->q);
 
 	if (skb) {
-		sch->qstats.backlog -= qdisc_pkt_len(skb);
+		qstats_backlog_sub(sch, qdisc_pkt_len(skb));
 		prefetch(&skb->end); /* we'll need skb_shinfo() */
 	}
 	return skb;
diff --git a/net/sched/sch_drr.c b/net/sched/sch_drr.c
index 925fa0cfd730ce72e45e8983ba02eb913afb1235..3f6687fa9666257952be5d44f9e3460845fe2a40 100644
--- a/net/sched/sch_drr.c
+++ b/net/sched/sch_drr.c
@@ -365,7 +365,7 @@ static int drr_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 		cl->deficit = cl->quantum;
 	}
 
-	sch->qstats.backlog += len;
+	qstats_backlog_add(sch, len);
 	qdisc_qlen_inc(sch);
 	return err;
 }
diff --git a/net/sched/sch_ets.c b/net/sched/sch_ets.c
index c817e0a6c14653a35f5ebb9de1a5ccc44d1a2f98..1cc559634ed27ce5a6630186a51a8ac8180dad96 100644
--- a/net/sched/sch_ets.c
+++ b/net/sched/sch_ets.c
@@ -448,7 +448,7 @@ static int ets_qdisc_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 		cl->deficit = cl->quantum;
 	}
 
-	sch->qstats.backlog += len;
+	qstats_backlog_add(sch, len);
 	qdisc_qlen_inc(sch);
 	return err;
 }
diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
index 3c60a8ec4682b174ebf0df34f356eb6132356764..3a348be18551033dcf41ce632eb4b563221040fa 100644
--- a/net/sched/sch_fq_codel.c
+++ b/net/sched/sch_fq_codel.c
@@ -177,7 +177,7 @@ static unsigned int fq_codel_drop(struct Qdisc *sch, unsigned int max_packets,
 	q->backlogs[idx] -= len;
 	q->memory_usage -= mem;
 	__qdisc_qstats_drop(sch, i);
-	sch->qstats.backlog -= len;
+	qstats_backlog_sub(sch, len);
 	WRITE_ONCE(sch->q.qlen, sch->q.qlen - i);
 	return idx;
 }
@@ -267,7 +267,7 @@ static struct sk_buff *dequeue_func(struct codel_vars *vars, void *ctx)
 		q->backlogs[flow - q->flows] -= qdisc_pkt_len(skb);
 		q->memory_usage -= get_codel_cb(skb)->mem_usage;
 		qdisc_qlen_dec(sch);
-		sch->qstats.backlog -= qdisc_pkt_len(skb);
+		qdisc_qstats_backlog_dec(sch, skb);
 	}
 	return skb;
 }
diff --git a/net/sched/sch_fq_pie.c b/net/sched/sch_fq_pie.c
index dba49d44a5d2412b2deb983bf87428ade7944e51..197f0df0a6eb06ab4ce25eefe01d32a35dbd84af 100644
--- a/net/sched/sch_fq_pie.c
+++ b/net/sched/sch_fq_pie.c
@@ -184,7 +184,7 @@ static int fq_pie_qdisc_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 		pkt_len = qdisc_pkt_len(skb);
 		q->stats.packets_in++;
 		q->memory_usage += skb->truesize;
-		sch->qstats.backlog += pkt_len;
+		qstats_backlog_add(sch, pkt_len);
 		qdisc_qlen_inc(sch);
 		flow_queue_add(sel_flow, skb);
 		if (list_empty(&sel_flow->flowchain)) {
@@ -262,7 +262,7 @@ static struct sk_buff *fq_pie_qdisc_dequeue(struct Qdisc *sch)
 	if (flow->head) {
 		skb = dequeue_head(flow);
 		pkt_len = qdisc_pkt_len(skb);
-		sch->qstats.backlog -= pkt_len;
+		qstats_backlog_sub(sch, pkt_len);
 		qdisc_qlen_dec(sch);
 		qdisc_bstats_update(sch, skb);
 	}
diff --git a/net/sched/sch_gred.c b/net/sched/sch_gred.c
index 8ae65572162c188cca5ac8f030dc6f2054a7fcd0..fcc1a4c0363624293986f221c70572ce6503e220 100644
--- a/net/sched/sch_gred.c
+++ b/net/sched/sch_gred.c
@@ -388,7 +388,7 @@ static int gred_offload_dump_stats(struct Qdisc *sch)
 		bytes += u64_stats_read(&hw_stats->stats.bstats[i].bytes);
 		packets += u64_stats_read(&hw_stats->stats.bstats[i].packets);
 		sch->qstats.qlen += hw_stats->stats.qstats[i].qlen;
-		sch->qstats.backlog += hw_stats->stats.qstats[i].backlog;
+		qstats_backlog_add(sch, hw_stats->stats.qstats[i].backlog);
 		__qdisc_qstats_drop(sch, hw_stats->stats.qstats[i].drops);
 		sch->qstats.requeues += hw_stats->stats.qstats[i].requeues;
 		sch->qstats.overlimits += hw_stats->stats.qstats[i].overlimits;
diff --git a/net/sched/sch_hfsc.c b/net/sched/sch_hfsc.c
index e71a565100edf60881ca7542faa408c5bb1a0984..59409ee2d2ff9279d7439b744030c0e845386de0 100644
--- a/net/sched/sch_hfsc.c
+++ b/net/sched/sch_hfsc.c
@@ -1560,7 +1560,7 @@ hfsc_enqueue(struct sk_buff *skb, struct Qdisc *sch, struct sk_buff **to_free)
 		return err;
 	}
 
-	sch->qstats.backlog += len;
+	qstats_backlog_add(sch, len);
 	qdisc_qlen_inc(sch);
 
 	if (first && !cl_in_el_or_vttree(cl)) {
diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index c22ccd8eae8c73323ccdf425e62857b3b851d74e..1e600f65c8769a74286c4f060b0d45da9a13eeeb 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -650,7 +650,7 @@ static int htb_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 		htb_activate(q, cl);
 	}
 
-	sch->qstats.backlog += len;
+	qstats_backlog_add(sch, len);
 	qdisc_qlen_inc(sch);
 	return NET_XMIT_SUCCESS;
 }
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index 4498dd440a02ea7a089c92ebc005d5064b87e2d2..2a2cdd1e4cc206ba00b8dd1821bef87156050950 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -751,7 +751,7 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch)
 				if (err != NET_XMIT_SUCCESS) {
 					if (net_xmit_drop_count(err))
 						qdisc_qstats_drop(sch);
-					sch->qstats.backlog -= pkt_len;
+					qstats_backlog_sub(sch, pkt_len);
 					qdisc_qlen_dec(sch);
 					qdisc_tree_reduce_backlog(sch, 1, pkt_len);
 				}
diff --git a/net/sched/sch_prio.c b/net/sched/sch_prio.c
index fe42ae3d6b696b2fc47f4d397af32e950eeec194..e4dd56a890725b4c14d6715c96f5b3fa44a8f4f2 100644
--- a/net/sched/sch_prio.c
+++ b/net/sched/sch_prio.c
@@ -85,7 +85,7 @@ prio_enqueue(struct sk_buff *skb, struct Qdisc *sch, struct sk_buff **to_free)
 
 	ret = qdisc_enqueue(skb, qdisc, to_free);
 	if (ret == NET_XMIT_SUCCESS) {
-		sch->qstats.backlog += len;
+		qstats_backlog_add(sch, len);
 		qdisc_qlen_inc(sch);
 		return NET_XMIT_SUCCESS;
 	}
diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
index 195c434aae5f7e03d1a1238ed73bb64b3f04e105..cb56787e1d258c06f2e86959c3b2cfaeb12df1ac 100644
--- a/net/sched/sch_qfq.c
+++ b/net/sched/sch_qfq.c
@@ -1264,7 +1264,7 @@ static int qfq_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 	}
 
 	_bstats_update(&cl->bstats, len, gso_segs);
-	sch->qstats.backlog += len;
+	qstats_backlog_add(sch, len);
 	qdisc_qlen_inc(sch);
 
 	agg = cl->agg;
diff --git a/net/sched/sch_red.c b/net/sched/sch_red.c
index 61b9064d39f222bdfe5021e93e8172b7ae60c408..7db97c96351309bc3e64fa50570a1928f2b2ce55 100644
--- a/net/sched/sch_red.c
+++ b/net/sched/sch_red.c
@@ -132,7 +132,7 @@ static int red_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 	len = qdisc_pkt_len(skb);
 	ret = qdisc_enqueue(skb, child, to_free);
 	if (likely(ret == NET_XMIT_SUCCESS)) {
-		sch->qstats.backlog += len;
+		qstats_backlog_add(sch, len);
 		qdisc_qlen_inc(sch);
 	} else if (net_xmit_drop_count(ret)) {
 		q->stats.pdrop++;
diff --git a/net/sched/sch_sfb.c b/net/sched/sch_sfb.c
index 17b6ce223ad3a6f2d289c3ebe27cce8168c66b2b..2258567cbcaf70863eace85d347efda882a00145 100644
--- a/net/sched/sch_sfb.c
+++ b/net/sched/sch_sfb.c
@@ -406,7 +406,7 @@ static int sfb_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 	memcpy(&cb, sfb_skb_cb(skb), sizeof(cb));
 	ret = qdisc_enqueue(skb, child, to_free);
 	if (likely(ret == NET_XMIT_SUCCESS)) {
-		sch->qstats.backlog += len;
+		qstats_backlog_add(sch, len);
 		qdisc_qlen_inc(sch);
 		increment_qlen(&cb, q);
 	} else if (net_xmit_drop_count(ret)) {
@@ -582,7 +582,7 @@ static int sfb_dump(struct Qdisc *sch, struct sk_buff *skb)
 		.penalty_burst = q->penalty_burst,
 	};
 
-	sch->qstats.backlog = q->qdisc->qstats.backlog;
+	WRITE_ONCE(sch->qstats.backlog, READ_ONCE(q->qdisc->qstats.backlog));
 	opts = nla_nest_start_noflag(skb, TCA_OPTIONS);
 	if (opts == NULL)
 		goto nla_put_failure;
diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
index 132afe4d923c56e25bcb406a7cc47acef0f0e9b5..76885d813e6b270f00e052395c1f98fc130f3b49 100644
--- a/net/sched/sch_sfq.c
+++ b/net/sched/sch_sfq.c
@@ -425,7 +425,7 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch, struct sk_buff **to_free)
 		/* We know we have at least one packet in queue */
 		head = slot_dequeue_head(slot);
 		delta = qdisc_pkt_len(head) - qdisc_pkt_len(skb);
-		sch->qstats.backlog -= delta;
+		qstats_backlog_sub(sch, delta);
 		slot->backlog -= delta;
 		qdisc_drop_reason(head, sch, to_free, QDISC_DROP_FLOW_LIMIT);
 
diff --git a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c
index 25edf11a7d671fe63878b0995998c5920b86ef74..67c7aaaf8f607e82ad13b7fdf177405a1dd075bb 100644
--- a/net/sched/sch_tbf.c
+++ b/net/sched/sch_tbf.c
@@ -232,7 +232,7 @@ static int tbf_segment(struct sk_buff *skb, struct Qdisc *sch,
 		}
 	}
 	WRITE_ONCE(sch->q.qlen, sch->q.qlen + nb);
-	sch->qstats.backlog += len;
+	qstats_backlog_add(sch, len);
 	if (nb > 0) {
 		qdisc_tree_reduce_backlog(sch, 1 - nb, prev_len - len);
 		consume_skb(skb);
@@ -263,7 +263,7 @@ static int tbf_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 		return ret;
 	}
 
-	sch->qstats.backlog += len;
+	qstats_backlog_add(sch, len);
 	qdisc_qlen_inc(sch);
 	return NET_XMIT_SUCCESS;
 }
-- 
2.53.0.1213.gd9a14994de-goog


^ permalink raw reply related

* [PATCH v3 net-next 06/15] net/sched: sch_sfb: annotate data-races in sfb_dump_stats()
From: Eric Dumazet @ 2026-04-10 18:22 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Jamal Hadi Salim, Jiri Pirko, netdev, eric.dumazet,
	Eric Dumazet
In-Reply-To: <20260410182257.774311-1-edumazet@google.com>

sfb_dump_stats() only runs with RTNL held,
reading fields that can be changed in qdisc fast path.

Add READ_ONCE()/WRITE_ONCE() annotations.

Alternative would be to acquire the qdisc spinlock, but our long-term
goal is to make qdisc dump operations lockless as much as we can.

tc_sfb_xstats fields don't need to be latched atomically,
otherwise this bug would have been caught earlier.

Fixes: edb09eb17ed8 ("net: sched: do not acquire qdisc spinlock in qdisc/class stats dump")
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/sched/sch_sfb.c | 46 +++++++++++++++++++++++++++------------------
 1 file changed, 28 insertions(+), 18 deletions(-)

diff --git a/net/sched/sch_sfb.c b/net/sched/sch_sfb.c
index 2258567cbcaf70863eace85d347efda882a00145..315edd7f87fcf1600d69a3a92733ddb9fee55e99 100644
--- a/net/sched/sch_sfb.c
+++ b/net/sched/sch_sfb.c
@@ -202,11 +202,14 @@ static u32 sfb_compute_qlen(u32 *prob_r, u32 *avgpm_r, const struct sfb_sched_da
 	const struct sfb_bucket *b = &q->bins[q->slot].bins[0][0];
 
 	for (i = 0; i < SFB_LEVELS * SFB_NUMBUCKETS; i++) {
-		if (qlen < b->qlen)
-			qlen = b->qlen;
-		totalpm += b->p_mark;
-		if (prob < b->p_mark)
-			prob = b->p_mark;
+		u32 b_qlen = READ_ONCE(b->qlen);
+		u32 b_mark = READ_ONCE(b->p_mark);
+
+		if (qlen < b_qlen)
+			qlen = b_qlen;
+		totalpm += b_mark;
+		if (prob < b_mark)
+			prob = b_mark;
 		b++;
 	}
 	*prob_r = prob;
@@ -295,7 +298,8 @@ static int sfb_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 
 	if (unlikely(sch->q.qlen >= q->limit)) {
 		qdisc_qstats_overlimit(sch);
-		q->stats.queuedrop++;
+		WRITE_ONCE(q->stats.queuedrop,
+			   q->stats.queuedrop + 1);
 		goto drop;
 	}
 
@@ -348,7 +352,8 @@ static int sfb_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 
 	if (unlikely(minqlen >= q->max)) {
 		qdisc_qstats_overlimit(sch);
-		q->stats.bucketdrop++;
+		WRITE_ONCE(q->stats.bucketdrop,
+			   q->stats.bucketdrop + 1);
 		goto drop;
 	}
 
@@ -374,7 +379,8 @@ static int sfb_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 		}
 		if (sfb_rate_limit(skb, q)) {
 			qdisc_qstats_overlimit(sch);
-			q->stats.penaltydrop++;
+			WRITE_ONCE(q->stats.penaltydrop,
+				   q->stats.penaltydrop + 1);
 			goto drop;
 		}
 		goto enqueue;
@@ -390,14 +396,17 @@ static int sfb_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 			 * In either case, we want to start dropping packets.
 			 */
 			if (r < (p_min - SFB_MAX_PROB / 2) * 2) {
-				q->stats.earlydrop++;
+				WRITE_ONCE(q->stats.earlydrop,
+					   q->stats.earlydrop + 1);
 				goto drop;
 			}
 		}
 		if (INET_ECN_set_ce(skb)) {
-			q->stats.marked++;
+			WRITE_ONCE(q->stats.marked,
+				   q->stats.marked + 1);
 		} else {
-			q->stats.earlydrop++;
+			WRITE_ONCE(q->stats.earlydrop,
+				   q->stats.earlydrop + 1);
 			goto drop;
 		}
 	}
@@ -410,7 +419,8 @@ static int sfb_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 		qdisc_qlen_inc(sch);
 		increment_qlen(&cb, q);
 	} else if (net_xmit_drop_count(ret)) {
-		q->stats.childdrop++;
+		WRITE_ONCE(q->stats.childdrop,
+			   q->stats.childdrop + 1);
 		qdisc_qstats_drop(sch);
 	}
 	return ret;
@@ -599,12 +609,12 @@ static int sfb_dump_stats(struct Qdisc *sch, struct gnet_dump *d)
 {
 	struct sfb_sched_data *q = qdisc_priv(sch);
 	struct tc_sfb_xstats st = {
-		.earlydrop = q->stats.earlydrop,
-		.penaltydrop = q->stats.penaltydrop,
-		.bucketdrop = q->stats.bucketdrop,
-		.queuedrop = q->stats.queuedrop,
-		.childdrop = q->stats.childdrop,
-		.marked = q->stats.marked,
+		.earlydrop = READ_ONCE(q->stats.earlydrop),
+		.penaltydrop = READ_ONCE(q->stats.penaltydrop),
+		.bucketdrop = READ_ONCE(q->stats.bucketdrop),
+		.queuedrop = READ_ONCE(q->stats.queuedrop),
+		.childdrop = READ_ONCE(q->stats.childdrop),
+		.marked = READ_ONCE(q->stats.marked),
 	};
 
 	st.maxqlen = sfb_compute_qlen(&st.maxprob, &st.avgprob, q);
-- 
2.53.0.1213.gd9a14994de-goog


^ permalink raw reply related

* [PATCH v3 net-next 07/15] net/sched: sch_red: annotate data-races in red_dump_stats()
From: Eric Dumazet @ 2026-04-10 18:22 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Jamal Hadi Salim, Jiri Pirko, netdev, eric.dumazet,
	Eric Dumazet
In-Reply-To: <20260410182257.774311-1-edumazet@google.com>

red_dump_stats() only runs with RTNL held,
reading fields that can be changed in qdisc fast path.

Add READ_ONCE()/WRITE_ONCE() annotations.

Alternative would be to acquire the qdisc spinlock, but our long-term
goal is to make qdisc dump operations lockless as much as we can.

tc_red_xstats fields don't need to be latched atomically,
otherwise this bug would have been caught earlier.

Fixes: edb09eb17ed8 ("net: sched: do not acquire qdisc spinlock in qdisc/class stats dump")
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/sched/sch_red.c | 31 +++++++++++++++++++++----------
 1 file changed, 21 insertions(+), 10 deletions(-)

diff --git a/net/sched/sch_red.c b/net/sched/sch_red.c
index 7db97c96351309bc3e64fa50570a1928f2b2ce55..268f1ba4520cd74da60e7b1ca08974fcfdced680 100644
--- a/net/sched/sch_red.c
+++ b/net/sched/sch_red.c
@@ -90,17 +90,20 @@ static int red_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 	case RED_PROB_MARK:
 		qdisc_qstats_overlimit(sch);
 		if (!red_use_ecn(q)) {
-			q->stats.prob_drop++;
+			WRITE_ONCE(q->stats.prob_drop,
+				   q->stats.prob_drop + 1);
 			goto congestion_drop;
 		}
 
 		if (INET_ECN_set_ce(skb)) {
-			q->stats.prob_mark++;
+			WRITE_ONCE(q->stats.prob_mark,
+				   q->stats.prob_mark + 1);
 			skb = tcf_qevent_handle(&q->qe_mark, sch, skb, to_free, &ret);
 			if (!skb)
 				return NET_XMIT_CN | ret;
 		} else if (!red_use_nodrop(q)) {
-			q->stats.prob_drop++;
+			WRITE_ONCE(q->stats.prob_drop,
+				   q->stats.prob_drop + 1);
 			goto congestion_drop;
 		}
 
@@ -111,17 +114,20 @@ static int red_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 		reason = QDISC_DROP_OVERLIMIT;
 		qdisc_qstats_overlimit(sch);
 		if (red_use_harddrop(q) || !red_use_ecn(q)) {
-			q->stats.forced_drop++;
+			WRITE_ONCE(q->stats.forced_drop,
+				   q->stats.forced_drop + 1);
 			goto congestion_drop;
 		}
 
 		if (INET_ECN_set_ce(skb)) {
-			q->stats.forced_mark++;
+			WRITE_ONCE(q->stats.forced_mark,
+				   q->stats.forced_mark + 1);
 			skb = tcf_qevent_handle(&q->qe_mark, sch, skb, to_free, &ret);
 			if (!skb)
 				return NET_XMIT_CN | ret;
 		} else if (!red_use_nodrop(q)) {
-			q->stats.forced_drop++;
+			WRITE_ONCE(q->stats.forced_drop,
+				   q->stats.forced_drop + 1);
 			goto congestion_drop;
 		}
 
@@ -135,7 +141,8 @@ static int red_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 		qstats_backlog_add(sch, len);
 		qdisc_qlen_inc(sch);
 	} else if (net_xmit_drop_count(ret)) {
-		q->stats.pdrop++;
+		WRITE_ONCE(q->stats.pdrop,
+			   q->stats.pdrop + 1);
 		qdisc_qstats_drop(sch);
 	}
 	return ret;
@@ -463,9 +470,13 @@ static int red_dump_stats(struct Qdisc *sch, struct gnet_dump *d)
 		dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_QDISC_RED,
 					      &hw_stats_request);
 	}
-	st.early = q->stats.prob_drop + q->stats.forced_drop;
-	st.pdrop = q->stats.pdrop;
-	st.marked = q->stats.prob_mark + q->stats.forced_mark;
+	st.early = READ_ONCE(q->stats.prob_drop) +
+		   READ_ONCE(q->stats.forced_drop);
+
+	st.pdrop = READ_ONCE(q->stats.pdrop);
+
+	st.marked = READ_ONCE(q->stats.prob_mark) +
+		    READ_ONCE(q->stats.forced_mark);
 
 	return gnet_stats_copy_app(d, &st, sizeof(st));
 }
-- 
2.53.0.1213.gd9a14994de-goog


^ permalink raw reply related

* [PATCH v3 net-next 08/15] net/sched: sch_fq_codel: remove data-races from fq_codel_dump_stats()
From: Eric Dumazet @ 2026-04-10 18:22 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Jamal Hadi Salim, Jiri Pirko, netdev, eric.dumazet,
	Eric Dumazet
In-Reply-To: <20260410182257.774311-1-edumazet@google.com>

fq_codel_dump_stats() acquires the qdisc spinlock a bit too late.

Move this acquisition before we fill st.qdisc_stats with live data.

Fixes: edb09eb17ed8 ("net: sched: do not acquire qdisc spinlock in qdisc/class stats dump")
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/sched/sch_fq_codel.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
index 3a348be18551033dcf41ce632eb4b563221040fa..95769b19a04fb392e14a48353e94fb2ee299565a 100644
--- a/net/sched/sch_fq_codel.c
+++ b/net/sched/sch_fq_codel.c
@@ -586,6 +586,8 @@ static int fq_codel_dump_stats(struct Qdisc *sch, struct gnet_dump *d)
 	};
 	struct list_head *pos;
 
+	sch_tree_lock(sch);
+
 	st.qdisc_stats.maxpacket = q->cstats.maxpacket;
 	st.qdisc_stats.drop_overlimit = q->drop_overlimit;
 	st.qdisc_stats.ecn_mark = q->cstats.ecn_mark;
@@ -594,7 +596,6 @@ static int fq_codel_dump_stats(struct Qdisc *sch, struct gnet_dump *d)
 	st.qdisc_stats.memory_usage  = q->memory_usage;
 	st.qdisc_stats.drop_overmemory = q->drop_overmemory;
 
-	sch_tree_lock(sch);
 	list_for_each(pos, &q->new_flows)
 		st.qdisc_stats.new_flows_len++;
 
-- 
2.53.0.1213.gd9a14994de-goog


^ permalink raw reply related

* [PATCH v3 net-next 11/15] net_sched: sch_hhf: annotate data-races in hhf_dump_stats()
From: Eric Dumazet @ 2026-04-10 18:22 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Jamal Hadi Salim, Jiri Pirko, netdev, eric.dumazet,
	Eric Dumazet
In-Reply-To: <20260410182257.774311-1-edumazet@google.com>

hhf_dump_stats() only runs with RTNL held,
reading fields that can be changed in qdisc fast path.

Add READ_ONCE()/WRITE_ONCE() annotations.

Fixes: edb09eb17ed8 ("net: sched: do not acquire qdisc spinlock in qdisc/class stats dump")
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/sched/sch_hhf.c | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/net/sched/sch_hhf.c b/net/sched/sch_hhf.c
index 69b6f0a5471cb9a3b7b760144683f2b249091d89..1e25b75daae2e5de31bd212dfa1f6d7aea927174 100644
--- a/net/sched/sch_hhf.c
+++ b/net/sched/sch_hhf.c
@@ -198,7 +198,8 @@ static struct hh_flow_state *seek_list(const u32 hash,
 				return NULL;
 			list_del(&flow->flowchain);
 			kfree(flow);
-			q->hh_flows_current_cnt--;
+			WRITE_ONCE(q->hh_flows_current_cnt,
+				   q->hh_flows_current_cnt - 1);
 		} else if (flow->hash_id == hash) {
 			return flow;
 		}
@@ -226,7 +227,7 @@ static struct hh_flow_state *alloc_new_hh(struct list_head *head,
 	}
 
 	if (q->hh_flows_current_cnt >= q->hh_flows_limit) {
-		q->hh_flows_overlimit++;
+		WRITE_ONCE(q->hh_flows_overlimit, q->hh_flows_overlimit + 1);
 		return NULL;
 	}
 	/* Create new entry. */
@@ -234,7 +235,7 @@ static struct hh_flow_state *alloc_new_hh(struct list_head *head,
 	if (!flow)
 		return NULL;
 
-	q->hh_flows_current_cnt++;
+	WRITE_ONCE(q->hh_flows_current_cnt, q->hh_flows_current_cnt + 1);
 	INIT_LIST_HEAD(&flow->flowchain);
 	list_add_tail(&flow->flowchain, head);
 
@@ -309,7 +310,7 @@ static enum wdrr_bucket_idx hhf_classify(struct sk_buff *skb, struct Qdisc *sch)
 			return WDRR_BUCKET_FOR_NON_HH;
 		flow->hash_id = hash;
 		flow->hit_timestamp = now;
-		q->hh_flows_total_cnt++;
+		WRITE_ONCE(q->hh_flows_total_cnt, q->hh_flows_total_cnt + 1);
 
 		/* By returning without updating counters in q->hhf_arrays,
 		 * we implicitly implement "shielding" (see Optimization O1).
@@ -404,7 +405,7 @@ static int hhf_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 		return NET_XMIT_SUCCESS;
 
 	prev_backlog = sch->qstats.backlog;
-	q->drop_overlimit++;
+	WRITE_ONCE(q->drop_overlimit, q->drop_overlimit + 1);
 	/* Return Congestion Notification only if we dropped a packet from this
 	 * bucket.
 	 */
@@ -687,10 +688,10 @@ static int hhf_dump_stats(struct Qdisc *sch, struct gnet_dump *d)
 {
 	struct hhf_sched_data *q = qdisc_priv(sch);
 	struct tc_hhf_xstats st = {
-		.drop_overlimit = q->drop_overlimit,
-		.hh_overlimit	= q->hh_flows_overlimit,
-		.hh_tot_count	= q->hh_flows_total_cnt,
-		.hh_cur_count	= q->hh_flows_current_cnt,
+		.drop_overlimit = READ_ONCE(q->drop_overlimit),
+		.hh_overlimit	= READ_ONCE(q->hh_flows_overlimit),
+		.hh_tot_count	= READ_ONCE(q->hh_flows_total_cnt),
+		.hh_cur_count	= READ_ONCE(q->hh_flows_current_cnt),
 	};
 
 	return gnet_stats_copy_app(d, &st, sizeof(st));
-- 
2.53.0.1213.gd9a14994de-goog


^ permalink raw reply related

* [PATCH v3 net-next 12/15] net/sched: sch_choke: annotate data-races in choke_dump_stats()
From: Eric Dumazet @ 2026-04-10 18:22 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Jamal Hadi Salim, Jiri Pirko, netdev, eric.dumazet,
	Eric Dumazet
In-Reply-To: <20260410182257.774311-1-edumazet@google.com>

choke_dump_stats() only runs with RTNL held,
reading fields that can be changed in qdisc fast path.

Add READ_ONCE()/WRITE_ONCE() annotations.

Fixes: edb09eb17ed8 ("net: sched: do not acquire qdisc spinlock in qdisc/class stats dump")
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/sched/sch_choke.c | 26 ++++++++++++++++----------
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/net/sched/sch_choke.c b/net/sched/sch_choke.c
index cd0785ad8e74314e6d5c88144ffcf64f286e02dd..73d3e673dc7b16cf2b9ac1d622da280c2ceb064a 100644
--- a/net/sched/sch_choke.c
+++ b/net/sched/sch_choke.c
@@ -229,7 +229,7 @@ static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 
 		/* Draw a packet at random from queue and compare flow */
 		if (choke_match_random(q, skb, &idx)) {
-			q->stats.matched++;
+			WRITE_ONCE(q->stats.matched, q->stats.matched + 1);
 			choke_drop_by_idx(sch, idx, to_free);
 			goto congestion_drop;
 		}
@@ -241,11 +241,13 @@ static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 			qdisc_qstats_overlimit(sch);
 			if (use_harddrop(q) || !use_ecn(q) ||
 			    !INET_ECN_set_ce(skb)) {
-				q->stats.forced_drop++;
+				WRITE_ONCE(q->stats.forced_drop,
+					   q->stats.forced_drop + 1);
 				goto congestion_drop;
 			}
 
-			q->stats.forced_mark++;
+			WRITE_ONCE(q->stats.forced_mark,
+				   q->stats.forced_mark + 1);
 		} else if (++q->vars.qcount) {
 			if (red_mark_probability(p, &q->vars, q->vars.qavg)) {
 				q->vars.qcount = 0;
@@ -253,11 +255,13 @@ static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 
 				qdisc_qstats_overlimit(sch);
 				if (!use_ecn(q) || !INET_ECN_set_ce(skb)) {
-					q->stats.prob_drop++;
+					WRITE_ONCE(q->stats.prob_drop,
+					           q->stats.prob_drop + 1);
 					goto congestion_drop;
 				}
 
-				q->stats.prob_mark++;
+				WRITE_ONCE(q->stats.prob_mark,
+					   q->stats.prob_mark + 1);
 			}
 		} else
 			q->vars.qR = red_random(p);
@@ -272,7 +276,7 @@ static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 		return NET_XMIT_SUCCESS;
 	}
 
-	q->stats.pdrop++;
+	WRITE_ONCE(q->stats.pdrop, q->stats.pdrop + 1);
 	return qdisc_drop(skb, sch, to_free);
 
 congestion_drop:
@@ -461,10 +465,12 @@ static int choke_dump_stats(struct Qdisc *sch, struct gnet_dump *d)
 {
 	struct choke_sched_data *q = qdisc_priv(sch);
 	struct tc_choke_xstats st = {
-		.early	= q->stats.prob_drop + q->stats.forced_drop,
-		.marked	= q->stats.prob_mark + q->stats.forced_mark,
-		.pdrop	= q->stats.pdrop,
-		.matched = q->stats.matched,
+		.early	= READ_ONCE(q->stats.prob_drop) +
+			  READ_ONCE(q->stats.forced_drop),
+		.marked	= READ_ONCE(q->stats.prob_mark) +
+			  READ_ONCE(q->stats.forced_mark),
+		.pdrop	= READ_ONCE(q->stats.pdrop),
+		.matched = READ_ONCE(q->stats.matched),
 	};
 
 	return gnet_stats_copy_app(d, &st, sizeof(st));
-- 
2.53.0.1213.gd9a14994de-goog


^ permalink raw reply related

* [PATCH v3 net-next 14/15] net/sched: mq: no longer acquire qdisc spinlocks in dump operations
From: Eric Dumazet @ 2026-04-10 18:22 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Jamal Hadi Salim, Jiri Pirko, netdev, eric.dumazet,
	Eric Dumazet
In-Reply-To: <20260410182257.774311-1-edumazet@google.com>

Prepare mq_dump_common(), mqprio_dump() and mqprio_dump_class_stats()
for RTNL avoidance.

Use private variables instead of assuming sch->bstats and sch->qstats
can be used when folding stats from children.

This means the children qdisc spinlocks no longer need to be acquired.

Add qdisc_qlen_lockless() helper, and change gnet_stats_add_basic()
prototype.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/gen_stats.h   |  9 +++--
 include/net/sch_generic.h | 14 ++++++++
 net/core/gen_estimator.c  | 24 ++++++-------
 net/core/gen_stats.c      | 17 +++++-----
 net/sched/sch_mq.c        | 33 +++++++++++-------
 net/sched/sch_mqprio.c    | 71 +++++++++++++++++++--------------------
 6 files changed, 95 insertions(+), 73 deletions(-)

diff --git a/include/net/gen_stats.h b/include/net/gen_stats.h
index 7aa2b8e1fb298c4f994a745b114fc4da785ddf4b..5484b67298e3fe94fe84f0e929799362d21499df 100644
--- a/include/net/gen_stats.h
+++ b/include/net/gen_stats.h
@@ -21,6 +21,11 @@ struct gnet_stats_basic_sync {
 	struct u64_stats_sync syncp;
 } __aligned(2 * sizeof(u64));
 
+struct gnet_stats {
+	u64	bytes;
+	u64	packets;
+};
+
 struct net_rate_estimator;
 
 struct gnet_dump {
@@ -49,9 +54,9 @@ int gnet_stats_start_copy_compat(struct sk_buff *skb, int type,
 int gnet_stats_copy_basic(struct gnet_dump *d,
 			  struct gnet_stats_basic_sync __percpu *cpu,
 			  struct gnet_stats_basic_sync *b, bool running);
-void gnet_stats_add_basic(struct gnet_stats_basic_sync *bstats,
+void gnet_stats_add_basic(struct gnet_stats *bstats,
 			  struct gnet_stats_basic_sync __percpu *cpu,
-			  struct gnet_stats_basic_sync *b, bool running);
+			  const struct gnet_stats_basic_sync *b, bool running);
 int gnet_stats_copy_basic_hw(struct gnet_dump *d,
 			     struct gnet_stats_basic_sync __percpu *cpu,
 			     struct gnet_stats_basic_sync *b, bool running);
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index b0564a39caf4471619b74179a06a0e41e3765d94..92683be33527bb0a5147d095ba08f5f8494933dd 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -542,6 +542,11 @@ static inline int qdisc_qlen(const struct Qdisc *q)
 	return q->q.qlen;
 }
 
+static inline int qdisc_qlen_lockless(const struct Qdisc *q)
+{
+	return READ_ONCE(q->q.qlen);
+}
+
 static inline void qdisc_qlen_inc(struct Qdisc *q)
 {
 	WRITE_ONCE(q->q.qlen, q->q.qlen + 1);
@@ -947,6 +952,15 @@ static inline void _bstats_update(struct gnet_stats_basic_sync *bstats,
 	u64_stats_update_end(&bstats->syncp);
 }
 
+static inline void _bstats_set(struct gnet_stats_basic_sync *bstats,
+			       u64 bytes, u64 packets)
+{
+	u64_stats_update_begin(&bstats->syncp);
+	u64_stats_set(&bstats->bytes, bytes);
+	u64_stats_set(&bstats->packets, packets);
+	u64_stats_update_end(&bstats->syncp);
+}
+
 static inline void bstats_update(struct gnet_stats_basic_sync *bstats,
 				 const struct sk_buff *skb)
 {
diff --git a/net/core/gen_estimator.c b/net/core/gen_estimator.c
index c34e58c6c3e666743e72978f9a78cf7f95a360c3..40990aee45590f2c56c070b0d28f856fc82d1f55 100644
--- a/net/core/gen_estimator.c
+++ b/net/core/gen_estimator.c
@@ -60,9 +60,10 @@ struct net_rate_estimator {
 };
 
 static void est_fetch_counters(struct net_rate_estimator *e,
-			       struct gnet_stats_basic_sync *b)
+			       struct gnet_stats *b)
 {
-	gnet_stats_basic_sync_init(b);
+	b->packets = 0;
+	b->bytes = 0;
 	if (e->stats_lock)
 		spin_lock(e->stats_lock);
 
@@ -76,18 +77,15 @@ static void est_fetch_counters(struct net_rate_estimator *e,
 static void est_timer(struct timer_list *t)
 {
 	struct net_rate_estimator *est = timer_container_of(est, t, timer);
-	struct gnet_stats_basic_sync b;
-	u64 b_bytes, b_packets;
+	struct gnet_stats b;
 	u64 rate, brate;
 
 	est_fetch_counters(est, &b);
-	b_bytes = u64_stats_read(&b.bytes);
-	b_packets = u64_stats_read(&b.packets);
 
-	brate = (b_bytes - est->last_bytes) << (10 - est->intvl_log);
+	brate = (b.bytes - est->last_bytes) << (10 - est->intvl_log);
 	brate = (brate >> est->ewma_log) - (est->avbps >> est->ewma_log);
 
-	rate = (b_packets - est->last_packets) << (10 - est->intvl_log);
+	rate = (b.packets - est->last_packets) << (10 - est->intvl_log);
 	rate = (rate >> est->ewma_log) - (est->avpps >> est->ewma_log);
 
 	preempt_disable_nested();
@@ -97,8 +95,8 @@ static void est_timer(struct timer_list *t)
 	write_seqcount_end(&est->seq);
 	preempt_enable_nested();
 
-	est->last_bytes = b_bytes;
-	est->last_packets = b_packets;
+	est->last_bytes = b.bytes;
+	est->last_packets = b.packets;
 
 	est->next_jiffies += ((HZ/4) << est->intvl_log);
 
@@ -138,7 +136,7 @@ int gen_new_estimator(struct gnet_stats_basic_sync *bstats,
 {
 	struct gnet_estimator *parm = nla_data(opt);
 	struct net_rate_estimator *old, *est;
-	struct gnet_stats_basic_sync b;
+	struct gnet_stats b;
 	int intvl_log;
 
 	if (nla_len(opt) < sizeof(*parm))
@@ -172,8 +170,8 @@ int gen_new_estimator(struct gnet_stats_basic_sync *bstats,
 	est_fetch_counters(est, &b);
 	if (lock)
 		local_bh_enable();
-	est->last_bytes = u64_stats_read(&b.bytes);
-	est->last_packets = u64_stats_read(&b.packets);
+	est->last_bytes = b.bytes;
+	est->last_packets = b.packets;
 
 	if (lock)
 		spin_lock_bh(lock);
diff --git a/net/core/gen_stats.c b/net/core/gen_stats.c
index 1a2380e74272de8eaf3d4ef453e56105a31e9edf..14ee7a4e3709ad5c64a158d3c8d1177ada3a32b0 100644
--- a/net/core/gen_stats.c
+++ b/net/core/gen_stats.c
@@ -123,10 +123,9 @@ void gnet_stats_basic_sync_init(struct gnet_stats_basic_sync *b)
 }
 EXPORT_SYMBOL(gnet_stats_basic_sync_init);
 
-static void gnet_stats_add_basic_cpu(struct gnet_stats_basic_sync *bstats,
+static void gnet_stats_add_basic_cpu(struct gnet_stats *bstats,
 				     struct gnet_stats_basic_sync __percpu *cpu)
 {
-	u64 t_bytes = 0, t_packets = 0;
 	int i;
 
 	for_each_possible_cpu(i) {
@@ -140,19 +139,18 @@ static void gnet_stats_add_basic_cpu(struct gnet_stats_basic_sync *bstats,
 			packets = u64_stats_read(&bcpu->packets);
 		} while (u64_stats_fetch_retry(&bcpu->syncp, start));
 
-		t_bytes += bytes;
-		t_packets += packets;
+		bstats->bytes += bytes;
+		bstats->packets += packets;
 	}
-	_bstats_update(bstats, t_bytes, t_packets);
 }
 
-void gnet_stats_add_basic(struct gnet_stats_basic_sync *bstats,
+void gnet_stats_add_basic(struct gnet_stats *bstats,
 			  struct gnet_stats_basic_sync __percpu *cpu,
-			  struct gnet_stats_basic_sync *b, bool running)
+			  const struct gnet_stats_basic_sync *b, bool running)
 {
 	unsigned int start;
-	u64 bytes = 0;
 	u64 packets = 0;
+	u64 bytes = 0;
 
 	WARN_ON_ONCE((cpu || running) && in_hardirq());
 
@@ -167,7 +165,8 @@ void gnet_stats_add_basic(struct gnet_stats_basic_sync *bstats,
 		packets = u64_stats_read(&b->packets);
 	} while (running && u64_stats_fetch_retry(&b->syncp, start));
 
-	_bstats_update(bstats, bytes, packets);
+	bstats->bytes += bytes;
+	bstats->packets += packets;
 }
 EXPORT_SYMBOL(gnet_stats_add_basic);
 
diff --git a/net/sched/sch_mq.c b/net/sched/sch_mq.c
index ec8c91d3fde04e59daec2aecdb14d6bf50715e15..0d83e69f2f679988d56920c16acb659d2d1ba636 100644
--- a/net/sched/sch_mq.c
+++ b/net/sched/sch_mq.c
@@ -143,30 +143,39 @@ EXPORT_SYMBOL_NS_GPL(mq_attach, "NET_SCHED_INTERNAL");
 void mq_dump_common(struct Qdisc *sch, struct sk_buff *skb)
 {
 	struct net_device *dev = qdisc_dev(sch);
+	struct gnet_stats_queue qstats = { 0 };
+	struct gnet_stats bstats = { 0 };
+	const struct Qdisc *qdisc;
 	unsigned int qlen = 0;
-	struct Qdisc *qdisc;
 	unsigned int ntx;
 
-	gnet_stats_basic_sync_init(&sch->bstats);
-	memset(&sch->qstats, 0, sizeof(sch->qstats));
-
 	/* MQ supports lockless qdiscs. However, statistics accounting needs
 	 * to account for all, none, or a mix of locked and unlocked child
 	 * qdiscs. Percpu stats are added to counters in-band and locking
 	 * qdisc totals are added at end.
 	 */
+	rcu_read_lock();
 	for (ntx = 0; ntx < dev->num_tx_queues; ntx++) {
-		qdisc = rtnl_dereference(netdev_get_tx_queue(dev, ntx)->qdisc_sleeping);
-		spin_lock_bh(qdisc_lock(qdisc));
+		qdisc = rcu_dereference(netdev_get_tx_queue(dev, ntx)->qdisc_sleeping);
 
-		gnet_stats_add_basic(&sch->bstats, qdisc->cpu_bstats,
-				     &qdisc->bstats, false);
-		gnet_stats_add_queue(&sch->qstats, qdisc->cpu_qstats,
+		gnet_stats_add_basic(&bstats, qdisc->cpu_bstats,
+				     &qdisc->bstats, true);
+		gnet_stats_add_queue(&qstats, qdisc->cpu_qstats,
 				     &qdisc->qstats);
-		qlen += qdisc_qlen(qdisc);
-
-		spin_unlock_bh(qdisc_lock(qdisc));
+		qlen += qdisc_qlen_lockless(qdisc);
 	}
+	rcu_read_unlock();
+
+	spin_lock_bh(qdisc_lock(sch));
+	_bstats_set(&sch->bstats, bstats.bytes, bstats.packets);
+	spin_unlock_bh(qdisc_lock(sch));
+
+	WRITE_ONCE(sch->qstats.qlen, qstats.qlen);
+	WRITE_ONCE(sch->qstats.backlog, qstats.backlog);
+	WRITE_ONCE(sch->qstats.drops, qstats.drops);
+	WRITE_ONCE(sch->qstats.requeues, qstats.requeues);
+	WRITE_ONCE(sch->qstats.overlimits, qstats.overlimits);
+
 	WRITE_ONCE(sch->q.qlen, qlen);
 }
 EXPORT_SYMBOL_NS_GPL(mq_dump_common, "NET_SCHED_INTERNAL");
diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
index 91a92992cd24ab6c30bf7db2288c08cd493c7bc3..0f58b3a3e99a100df929de110fe0bda1a44cc7d6 100644
--- a/net/sched/sch_mqprio.c
+++ b/net/sched/sch_mqprio.c
@@ -554,32 +554,40 @@ static int mqprio_dump(struct Qdisc *sch, struct sk_buff *skb)
 	struct net_device *dev = qdisc_dev(sch);
 	struct mqprio_sched *priv = qdisc_priv(sch);
 	struct nlattr *nla = (struct nlattr *)skb_tail_pointer(skb);
+	struct gnet_stats_queue qstats = { 0 };
 	struct tc_mqprio_qopt opt = { 0 };
+	struct gnet_stats bstats = { 0 };
+	const struct Qdisc *qdisc;
 	unsigned int qlen = 0;
-	struct Qdisc *qdisc;
 	unsigned int ntx;
 
-	qlen = 0;
-	gnet_stats_basic_sync_init(&sch->bstats);
-	memset(&sch->qstats, 0, sizeof(sch->qstats));
-
 	/* MQ supports lockless qdiscs. However, statistics accounting needs
 	 * to account for all, none, or a mix of locked and unlocked child
 	 * qdiscs. Percpu stats are added to counters in-band and locking
 	 * qdisc totals are added at end.
 	 */
+	rcu_read_lock();
 	for (ntx = 0; ntx < dev->num_tx_queues; ntx++) {
-		qdisc = rtnl_dereference(netdev_get_tx_queue(dev, ntx)->qdisc_sleeping);
-		spin_lock_bh(qdisc_lock(qdisc));
+		qdisc = rcu_dereference(netdev_get_tx_queue(dev, ntx)->qdisc_sleeping);
 
-		gnet_stats_add_basic(&sch->bstats, qdisc->cpu_bstats,
-				     &qdisc->bstats, false);
-		gnet_stats_add_queue(&sch->qstats, qdisc->cpu_qstats,
+		gnet_stats_add_basic(&bstats, qdisc->cpu_bstats,
+				     &qdisc->bstats, true);
+		gnet_stats_add_queue(&qstats, qdisc->cpu_qstats,
 				     &qdisc->qstats);
-		qlen += qdisc_qlen(qdisc);
-
-		spin_unlock_bh(qdisc_lock(qdisc));
+		qlen += qdisc_qlen_lockless(qdisc);
 	}
+	rcu_read_unlock();
+
+	spin_lock_bh(qdisc_lock(sch));
+	_bstats_set(&sch->bstats, bstats.bytes, bstats.packets);
+	spin_unlock_bh(qdisc_lock(sch));
+
+	WRITE_ONCE(sch->qstats.qlen, qstats.qlen);
+	WRITE_ONCE(sch->qstats.backlog, qstats.backlog);
+	WRITE_ONCE(sch->qstats.drops, qstats.drops);
+	WRITE_ONCE(sch->qstats.requeues, qstats.requeues);
+	WRITE_ONCE(sch->qstats.overlimits, qstats.overlimits);
+
 	WRITE_ONCE(sch->q.qlen, qlen);
 
 	mqprio_qopt_reconstruct(dev, &opt);
@@ -661,45 +669,34 @@ static int mqprio_dump_class(struct Qdisc *sch, unsigned long cl,
 
 static int mqprio_dump_class_stats(struct Qdisc *sch, unsigned long cl,
 				   struct gnet_dump *d)
-	__releases(d->lock)
-	__acquires(d->lock)
 {
 	if (cl >= TC_H_MIN_PRIORITY) {
 		struct net_device *dev = qdisc_dev(sch);
 		struct netdev_tc_txq tc = dev->tc_to_txq[cl & TC_BITMASK];
-		struct gnet_stats_queue qstats = {0};
+		struct gnet_stats_queue qstats = { 0 };
 		struct gnet_stats_basic_sync bstats;
+		struct gnet_stats _bstats = { 0 };
 		u32 qlen = 0;
 		int i;
 
-		gnet_stats_basic_sync_init(&bstats);
-		/* Drop lock here it will be reclaimed before touching
-		 * statistics this is required because the d->lock we
-		 * hold here is the look on dev_queue->qdisc_sleeping
-		 * also acquired below.
-		 */
-		if (d->lock)
-			spin_unlock_bh(d->lock);
-
+		rcu_read_lock();
 		for (i = tc.offset; i < tc.offset + tc.count; i++) {
-			struct netdev_queue *q = netdev_get_tx_queue(dev, i);
-			struct Qdisc *qdisc = rtnl_dereference(q->qdisc);
-
-			spin_lock_bh(qdisc_lock(qdisc));
+			const struct netdev_queue *q = netdev_get_tx_queue(dev, i);
+			const struct Qdisc *qdisc = rcu_dereference(q->qdisc);
 
-			gnet_stats_add_basic(&bstats, qdisc->cpu_bstats,
-					     &qdisc->bstats, false);
+			gnet_stats_add_basic(&_bstats, qdisc->cpu_bstats,
+					     &qdisc->bstats, true);
 			gnet_stats_add_queue(&qstats, qdisc->cpu_qstats,
 					     &qdisc->qstats);
-			qlen += qdisc_qlen(qdisc);
-
-			spin_unlock_bh(qdisc_lock(qdisc));
+			qlen += qdisc_qlen_lockless(qdisc);
 		}
+		rcu_read_unlock();
+		u64_stats_init(&bstats.syncp);
+		u64_stats_set(&bstats.bytes, _bstats.bytes);
+		u64_stats_set(&bstats.packets, _bstats.packets);
+
 		qlen = qlen + qstats.qlen;
 
-		/* Reclaim root sleeping lock before completing stats */
-		if (d->lock)
-			spin_lock_bh(d->lock);
 		if (gnet_stats_copy_basic(d, NULL, &bstats, false) < 0 ||
 		    gnet_stats_copy_queue(d, NULL, &qstats, qlen) < 0)
 			return -1;
-- 
2.53.0.1213.gd9a14994de-goog


^ permalink raw reply related

* Re: [PATCH net] netrom: do some basic forms of validation on incoming frames
From: Dan Cross @ 2026-04-10 18:23 UTC (permalink / raw)
  To: jj
  Cc: Simon Horman, Greg Kroah-Hartman, Jakub Kicinski, netdev,
	linux-kernel, David S. Miller, Eric Dumazet, Paolo Abeni,
	linux-hams, Yizhe Zhuang, stable
In-Reply-To: <18e3df62-34f9-4de0-903b-19919d7ae2ca@eastlink.ca>

On Fri, Apr 10, 2026 at 11:49 AM jj <ve1jot@eastlink.ca> wrote:
> This is NOT an obsolete protocol..this is in use by amateur radio
> operators world-wide...we use it for RF comms usually, because what
> happens if the internet goes "down", we can still provide comms over
> slower RF links....(plus it's a fun mode)please PLEASE do not drop...and
> sorry for the noise...

There are at least three separable issues being conflated here.

One is whether amateur radio operators are using AX.25, NET/ROM, and
ROSE.  They are; that's indisputable.

Another is whether those operators are using the implementation in the
Linux kernel.  Some are (myself included), though many fewer than are
using the protocols generally.

The third is whether preserving the implementation of these in the
kernel is the best mechanism for using those protocols on Linux-based
systems.  For that, I would argue that no, it is not.

Taking just AX.25, the current implementation has known deficiencies:
it is buggy, implements an older version of the protocol, and at best
receives nominal maintenance: notably, the newer networking tools
(`ip`, `ss`, etc) meant as replacements for `netstat`, `route`, and
`ifconfig` have not been updated to incorporate information about the
amateur radio protocols, and recent changes have left them broken for
long stretches of time.  More details are available online, such as at
https://blog.habets.se/2021/11/AX25-user-space.html

There is very little to recommend the kernel implementations, and any
unique functionality they once provided, such as IP over AX.25, can be
done via other means in userspace; e.g., one can use TAP/TUN for IP
over AX.25.

Therefore, it would be better to remove these from the kernel, and
implement them in userspace instead, or use an existing userspace
implementation (e.g., LinBPQ or similar).  Backwards compatibility
with existing Linux applications that expect to use the sockets API
with amateur radio could `LD_PRELOAD` a shim compatibility library
that simulates the current programming interface.  There is simply no
reason to preserve these in the kernel, and bluntly, the
implementation is pure drag at this point.

Note that this doesn't preclude anyone from using AX.25 et al on
Linux, or force dependency on the Internet: it just moves the
implementation of those protocols out of the kernel and into a normal
userspace program, which is arguably easier to maintain and iterate on
for the ham community, anyway.

        - Dan C.
          (KZ2X)

> On 2026-04-10 07:28, Simon Horman wrote:
> > On Fri, Apr 10, 2026 at 07:24:36AM +0200, Greg Kroah-Hartman wrote:
> >> On Thu, Apr 09, 2026 at 08:32:35PM -0700, Jakub Kicinski wrote:
> >>> On Thu, 9 Apr 2026 20:03:28 +0100 Simon Horman wrote:
> >>>> I expect that checking skb->len isn't sufficient here
> >>>> and pskb_may_pull needs to be used to ensure that
> >>>> the data is also available in the linear section of the skb.
> >>> Or for simplicity we could also be testing against skb_headlen()
> >>> since we don't expect any legit non-linear frames here? Dunno.
> > Sure, that's find by me if it leads to simpler code than
> > using pskb_may_pull(). Else I'd lean towards pskb_may_pull()
> > as it is a more general approach that feels worth proliferating.
> >
> >> I'll be glad to change this either way, your call.  Given that this is
> >> an obsolete protocol that seems to only be a target for drive-by fuzzers
> >> to attack, whatever the simplest thing to do to quiet them up I'll be
> >> glad to implement.
> >>
> >> Or can we just delete this stuff entirely?  :)
> > Deleting sounds good to me.
> > But we likely need a deprecation process.
> > In which case fixing these bugs still makes sense for the short term.
> >
>

^ permalink raw reply

* [PATCH v3 net-next 03/15] net/sched: add READ_ONCE() in gnet_stats_add_queue[_cpu]
From: Eric Dumazet @ 2026-04-10 18:22 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Jamal Hadi Salim, Jiri Pirko, netdev, eric.dumazet,
	Eric Dumazet
In-Reply-To: <20260410182257.774311-1-edumazet@google.com>

Stats are read locklessly, add READ_ONCE() to prevent load-stearing.

Write side will be handled in separate patches.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/core/gen_stats.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/net/core/gen_stats.c b/net/core/gen_stats.c
index b71ccaec0991461333dbe465ee619bca4a06e75b..1a2380e74272de8eaf3d4ef453e56105a31e9edf 100644
--- a/net/core/gen_stats.c
+++ b/net/core/gen_stats.c
@@ -345,11 +345,11 @@ static void gnet_stats_add_queue_cpu(struct gnet_stats_queue *qstats,
 	for_each_possible_cpu(i) {
 		const struct gnet_stats_queue *qcpu = per_cpu_ptr(q, i);
 
-		qstats->qlen += qcpu->qlen;
-		qstats->backlog += qcpu->backlog;
-		qstats->drops += qcpu->drops;
-		qstats->requeues += qcpu->requeues;
-		qstats->overlimits += qcpu->overlimits;
+		qstats->qlen += READ_ONCE(qcpu->qlen);
+		qstats->backlog += READ_ONCE(qcpu->backlog);
+		qstats->drops += READ_ONCE(qcpu->drops);
+		qstats->requeues += READ_ONCE(qcpu->requeues);
+		qstats->overlimits += READ_ONCE(qcpu->overlimits);
 	}
 }
 
@@ -360,11 +360,11 @@ void gnet_stats_add_queue(struct gnet_stats_queue *qstats,
 	if (cpu) {
 		gnet_stats_add_queue_cpu(qstats, cpu);
 	} else {
-		qstats->qlen += q->qlen;
-		qstats->backlog += q->backlog;
-		qstats->drops += q->drops;
-		qstats->requeues += q->requeues;
-		qstats->overlimits += q->overlimits;
+		qstats->qlen += READ_ONCE(q->qlen);
+		qstats->backlog += READ_ONCE(q->backlog);
+		qstats->drops += READ_ONCE(q->drops);
+		qstats->requeues += READ_ONCE(q->requeues);
+		qstats->overlimits += READ_ONCE(q->overlimits);
 	}
 }
 EXPORT_SYMBOL(gnet_stats_add_queue);
-- 
2.53.0.1213.gd9a14994de-goog


^ permalink raw reply related

* [PATCH v3 net-next 09/15] net/sched: sch_pie: annotate data-races in pie_dump_stats()
From: Eric Dumazet @ 2026-04-10 18:22 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Jamal Hadi Salim, Jiri Pirko, netdev, eric.dumazet,
	Eric Dumazet
In-Reply-To: <20260410182257.774311-1-edumazet@google.com>

pie_dump_stats() only runs with RTNL held,
reading fields that can be changed in qdisc fast path.

Add READ_ONCE()/WRITE_ONCE() annotations.

Alternative would be to acquire the qdisc spinlock, but our long-term
goal is to make qdisc dump operations lockless as much as we can.

tc_pie_xstats fields don't need to be latched atomically,
otherwise this bug would have been caught earlier.

Fixes: edb09eb17ed8 ("net: sched: do not acquire qdisc spinlock in qdisc/class stats dump")
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/pie.h   |  2 +-
 net/sched/sch_pie.c | 38 +++++++++++++++++++-------------------
 2 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/include/net/pie.h b/include/net/pie.h
index 01cbc66825a40bd21c0a044b1180cbbc346785df..1f3db0c355149b41823a891c9156cac625122031 100644
--- a/include/net/pie.h
+++ b/include/net/pie.h
@@ -104,7 +104,7 @@ static inline void pie_vars_init(struct pie_vars *vars)
 	vars->dq_tstamp = DTIME_INVALID;
 	vars->accu_prob = 0;
 	vars->dq_count = DQCOUNT_INVALID;
-	vars->avg_dq_rate = 0;
+	WRITE_ONCE(vars->avg_dq_rate, 0);
 }
 
 static inline struct pie_skb_cb *get_pie_cb(const struct sk_buff *skb)
diff --git a/net/sched/sch_pie.c b/net/sched/sch_pie.c
index 16f3f629cb8e4be71431f7e50a278e3c7fdba8d0..fb53fbf0e328571be72b66ba4e75a938e1963422 100644
--- a/net/sched/sch_pie.c
+++ b/net/sched/sch_pie.c
@@ -90,7 +90,7 @@ static int pie_qdisc_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 	bool enqueue = false;
 
 	if (unlikely(qdisc_qlen(sch) >= sch->limit)) {
-		q->stats.overlimit++;
+		WRITE_ONCE(q->stats.overlimit, q->stats.overlimit + 1);
 		goto out;
 	}
 
@@ -104,7 +104,7 @@ static int pie_qdisc_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 		/* If packet is ecn capable, mark it if drop probability
 		 * is lower than 10%, else drop it.
 		 */
-		q->stats.ecn_mark++;
+		WRITE_ONCE(q->stats.ecn_mark, q->stats.ecn_mark + 1);
 		enqueue = true;
 	}
 
@@ -114,15 +114,15 @@ static int pie_qdisc_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 		if (!q->params.dq_rate_estimator)
 			pie_set_enqueue_time(skb);
 
-		q->stats.packets_in++;
+		WRITE_ONCE(q->stats.packets_in, q->stats.packets_in + 1);
 		if (qdisc_qlen(sch) > q->stats.maxq)
-			q->stats.maxq = qdisc_qlen(sch);
+			WRITE_ONCE(q->stats.maxq, qdisc_qlen(sch));
 
 		return qdisc_enqueue_tail(skb, sch);
 	}
 
 out:
-	q->stats.dropped++;
+	WRITE_ONCE(q->stats.dropped, q->stats.dropped + 1);
 	q->vars.accu_prob = 0;
 	return qdisc_drop_reason(skb, sch, to_free, reason);
 }
@@ -267,11 +267,11 @@ void pie_process_dequeue(struct sk_buff *skb, struct pie_params *params,
 			count = count / dtime;
 
 			if (vars->avg_dq_rate == 0)
-				vars->avg_dq_rate = count;
+				WRITE_ONCE(vars->avg_dq_rate, count);
 			else
-				vars->avg_dq_rate =
+				WRITE_ONCE(vars->avg_dq_rate,
 				    (vars->avg_dq_rate -
-				     (vars->avg_dq_rate >> 3)) + (count >> 3);
+				     (vars->avg_dq_rate >> 3)) + (count >> 3));
 
 			/* If the queue has receded below the threshold, we hold
 			 * on to the last drain rate calculated, else we reset
@@ -381,7 +381,7 @@ void pie_calculate_probability(struct pie_params *params, struct pie_vars *vars,
 	if (delta > 0) {
 		/* prevent overflow */
 		if (vars->prob < oldprob) {
-			vars->prob = MAX_PROB;
+			WRITE_ONCE(vars->prob, MAX_PROB);
 			/* Prevent normalization error. If probability is at
 			 * maximum value already, we normalize it here, and
 			 * skip the check to do a non-linear drop in the next
@@ -392,7 +392,7 @@ void pie_calculate_probability(struct pie_params *params, struct pie_vars *vars,
 	} else {
 		/* prevent underflow */
 		if (vars->prob > oldprob)
-			vars->prob = 0;
+			WRITE_ONCE(vars->prob, 0);
 	}
 
 	/* Non-linear drop in probability: Reduce drop probability quickly if
@@ -403,7 +403,7 @@ void pie_calculate_probability(struct pie_params *params, struct pie_vars *vars,
 		/* Reduce drop probability to 98.4% */
 		vars->prob -= vars->prob / 64;
 
-	vars->qdelay = qdelay;
+	WRITE_ONCE(vars->qdelay, qdelay);
 	vars->backlog_old = backlog;
 
 	/* We restart the measurement cycle if the following conditions are met
@@ -502,21 +502,21 @@ static int pie_dump_stats(struct Qdisc *sch, struct gnet_dump *d)
 	struct pie_sched_data *q = qdisc_priv(sch);
 	struct tc_pie_xstats st = {
 		.prob		= q->vars.prob << BITS_PER_BYTE,
-		.delay		= ((u32)PSCHED_TICKS2NS(q->vars.qdelay)) /
+		.delay		= ((u32)PSCHED_TICKS2NS(READ_ONCE(q->vars.qdelay))) /
 				   NSEC_PER_USEC,
-		.packets_in	= q->stats.packets_in,
-		.overlimit	= q->stats.overlimit,
-		.maxq		= q->stats.maxq,
-		.dropped	= q->stats.dropped,
-		.ecn_mark	= q->stats.ecn_mark,
+		.packets_in	= READ_ONCE(q->stats.packets_in),
+		.overlimit	= READ_ONCE(q->stats.overlimit),
+		.maxq		= READ_ONCE(q->stats.maxq),
+		.dropped	= READ_ONCE(q->stats.dropped),
+		.ecn_mark	= READ_ONCE(q->stats.ecn_mark),
 	};
 
 	/* avg_dq_rate is only valid if dq_rate_estimator is enabled */
 	st.dq_rate_estimating = q->params.dq_rate_estimator;
 
 	/* unscale and return dq_rate in bytes per sec */
-	if (q->params.dq_rate_estimator)
-		st.avg_dq_rate = q->vars.avg_dq_rate *
+	if (st.dq_rate_estimating)
+		st.avg_dq_rate = READ_ONCE(q->vars.avg_dq_rate) *
 				 (PSCHED_TICKS_PER_SEC) >> PIE_SCALE;
 
 	return gnet_stats_copy_app(d, &st, sizeof(st));
-- 
2.53.0.1213.gd9a14994de-goog


^ permalink raw reply related

* [PATCH v3 net-next 01/15] net/sched: rename qstats_overlimit_inc() to qstats_cpu_overlimit_inc()
From: Eric Dumazet @ 2026-04-10 18:22 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Jamal Hadi Salim, Jiri Pirko, netdev, eric.dumazet,
	Eric Dumazet
In-Reply-To: <20260410182257.774311-1-edumazet@google.com>

qstats_overlimit_inc() is only used to increment per cpu overlimits.

It can use this_cpu_inc() to avoid this_cpu_ptr() extra cost
and avoid potential store tearing.

Change qstats_overlimit_inc() name and its argument type.

Also add a WRITE_ONCE() in qdisc_qstats_overlimit() to prevent
store tearing.

$ scripts/bloat-o-meter -t vmlinux.0 vmlinux.1
add/remove: 0/0 grow/shrink: 0/5 up/down: 0/-72 (-72)
Function                                     old     new   delta
tcf_skbmod_act                               772     764      -8
tcf_police_act                               733     725      -8
tcf_mirred_to_dev                           1126    1114     -12
tcf_ife_act                                 1077    1061     -16
tcf_mirred_act                              1324    1296     -28
Total: Before=29610901, After=29610829, chg -0.00%

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/act_api.h     | 2 +-
 include/net/sch_generic.h | 6 +++---
 net/sched/act_ife.c       | 4 ++--
 net/sched/act_police.c    | 2 +-
 net/sched/act_skbmod.c    | 2 +-
 5 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/include/net/act_api.h b/include/net/act_api.h
index d11b791079302f50c47e174979767e0b24afc59a..2ec4ef9a5d0c8e9110f92f135cc3c31a38af0479 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -250,7 +250,7 @@ static inline void tcf_action_inc_drop_qstats(struct tc_action *a)
 static inline void tcf_action_inc_overlimit_qstats(struct tc_action *a)
 {
 	if (likely(a->cpu_qstats)) {
-		qstats_overlimit_inc(this_cpu_ptr(a->cpu_qstats));
+		qstats_cpu_overlimit_inc(a->cpu_qstats);
 		return;
 	}
 	atomic_inc(&a->tcfa_overlimits);
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 5af262ec4bbd2d5021904df127a849e52c26178a..3ee383c6fc3f66f1aecd9ebc675fbd143852c150 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -1004,9 +1004,9 @@ static inline void qstats_drop_inc(struct gnet_stats_queue *qstats)
 	qstats->drops++;
 }
 
-static inline void qstats_overlimit_inc(struct gnet_stats_queue *qstats)
+static inline void qstats_cpu_overlimit_inc(struct gnet_stats_queue __percpu *qstats)
 {
-	qstats->overlimits++;
+	this_cpu_inc(qstats->overlimits);
 }
 
 static inline void qdisc_qstats_drop(struct Qdisc *sch)
@@ -1021,7 +1021,7 @@ static inline void qdisc_qstats_cpu_drop(struct Qdisc *sch)
 
 static inline void qdisc_qstats_overlimit(struct Qdisc *sch)
 {
-	sch->qstats.overlimits++;
+	WRITE_ONCE(sch->qstats.overlimits, sch->qstats.overlimits + 1);
 }
 
 static inline int qdisc_qstats_copy(struct gnet_dump *d, struct Qdisc *sch)
diff --git a/net/sched/act_ife.c b/net/sched/act_ife.c
index d5e8a91bb4eb9f1f1f084e199b5ada4e7f7e7205..e1b825e14900d6f46bbfd1b7f72ab6cd554d8a73 100644
--- a/net/sched/act_ife.c
+++ b/net/sched/act_ife.c
@@ -750,7 +750,7 @@ static int tcf_ife_decode(struct sk_buff *skb, const struct tc_action *a,
 			 */
 			pr_info_ratelimited("Unknown metaid %d dlen %d\n",
 					    mtype, dlen);
-			qstats_overlimit_inc(this_cpu_ptr(ife->common.cpu_qstats));
+			qstats_cpu_overlimit_inc(ife->common.cpu_qstats);
 		}
 	}
 
@@ -814,7 +814,7 @@ static int tcf_ife_encode(struct sk_buff *skb, const struct tc_action *a,
 		/* abuse overlimits to count when we allow packet
 		 * with no metadata
 		 */
-		qstats_overlimit_inc(this_cpu_ptr(ife->common.cpu_qstats));
+		qstats_cpu_overlimit_inc(ife->common.cpu_qstats);
 		return action;
 	}
 	/* could be stupid policy setup or mtu config
diff --git a/net/sched/act_police.c b/net/sched/act_police.c
index 12ea9e5a600536b603ea73cc99b4c00381287219..8060f43e4d11c0a26e1475db06b76426f50c5975 100644
--- a/net/sched/act_police.c
+++ b/net/sched/act_police.c
@@ -307,7 +307,7 @@ TC_INDIRECT_SCOPE int tcf_police_act(struct sk_buff *skb,
 	}
 
 inc_overlimits:
-	qstats_overlimit_inc(this_cpu_ptr(police->common.cpu_qstats));
+	qstats_cpu_overlimit_inc(police->common.cpu_qstats);
 inc_drops:
 	if (ret == TC_ACT_SHOT)
 		qstats_drop_inc(this_cpu_ptr(police->common.cpu_qstats));
diff --git a/net/sched/act_skbmod.c b/net/sched/act_skbmod.c
index 23ca46138f040d38de37684439873921bc9c86af..a464b0a3c1b81dba6c28c1141aa38c5c7cad3acb 100644
--- a/net/sched/act_skbmod.c
+++ b/net/sched/act_skbmod.c
@@ -87,7 +87,7 @@ TC_INDIRECT_SCOPE int tcf_skbmod_act(struct sk_buff *skb,
 	return p->action;
 
 drop:
-	qstats_overlimit_inc(this_cpu_ptr(d->common.cpu_qstats));
+	qstats_cpu_overlimit_inc(d->common.cpu_qstats);
 	return TC_ACT_SHOT;
 }
 
-- 
2.53.0.1213.gd9a14994de-goog


^ permalink raw reply related

* [PATCH v3 net-next 10/15] net/sched: sch_fq_pie: annotate data-races in fq_pie_dump_stats()
From: Eric Dumazet @ 2026-04-10 18:22 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Jamal Hadi Salim, Jiri Pirko, netdev, eric.dumazet,
	Eric Dumazet
In-Reply-To: <20260410182257.774311-1-edumazet@google.com>

fq_codel_dump_stats() acquires the qdisc spinlock a bit too late.

Move this acquisition before we fill tc_fq_pie_xstats with live data.

Alternative would be to add READ_ONCE() and WRITE_ONCE() annotations,
but the spinlock is needed anyway.

Fixes: ec97ecf1ebe4 ("net: sched: add Flow Queue PIE packet scheduler")
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/sched/sch_fq_pie.c | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/net/sched/sch_fq_pie.c b/net/sched/sch_fq_pie.c
index 197f0df0a6eb06ab4ce25eefe01d32a35dbd84af..72f48fa4010bebbe6be212938b457db21ff3c5a0 100644
--- a/net/sched/sch_fq_pie.c
+++ b/net/sched/sch_fq_pie.c
@@ -509,18 +509,19 @@ static int fq_pie_dump(struct Qdisc *sch, struct sk_buff *skb)
 static int fq_pie_dump_stats(struct Qdisc *sch, struct gnet_dump *d)
 {
 	struct fq_pie_sched_data *q = qdisc_priv(sch);
-	struct tc_fq_pie_xstats st = {
-		.packets_in	= q->stats.packets_in,
-		.overlimit	= q->stats.overlimit,
-		.overmemory	= q->overmemory,
-		.dropped	= q->stats.dropped,
-		.ecn_mark	= q->stats.ecn_mark,
-		.new_flow_count = q->new_flow_count,
-		.memory_usage   = q->memory_usage,
-	};
+	struct tc_fq_pie_xstats st = { 0 };
 	struct list_head *pos;
 
 	sch_tree_lock(sch);
+
+	st.packets_in	= q->stats.packets_in;
+	st.overlimit	= q->stats.overlimit;
+	st.overmemory	= q->overmemory;
+	st.dropped	= q->stats.dropped;
+	st.ecn_mark	= q->stats.ecn_mark;
+	st.new_flow_count = q->new_flow_count;
+	st.memory_usage   = q->memory_usage;
+
 	list_for_each(pos, &q->new_flows)
 		st.new_flows_len++;
 
-- 
2.53.0.1213.gd9a14994de-goog


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox