Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net] netrom: do some basic forms of validation on incoming frames
From: Simon Horman @ 2026-04-09 19:03 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: netdev, linux-kernel, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, linux-hams, Yizhe Zhuang, stable
In-Reply-To: <2026040730-untagged-groin-bbb7@gregkh>

On Tue, Apr 07, 2026 at 10:45:31AM +0200, Greg Kroah-Hartman wrote:
> There is a lack of much validation of frame size coming from a
> netrom-based device.  While these devices are "trusted" doing some
> sanity checks is good to at least keep the fuzzing tools happy when they
> stumble across this ancient protocol and light up with a range of bug
> reports.
> 
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Paolo Abeni <pabeni@redhat.com>
> Cc: Simon Horman <horms@kernel.org>
> Cc: linux-hams@vger.kernel.org
> Assisted-by: gregkh_clanker_2000
> Reviewed-by: Yizhe Zhuang <yizhe@darknavy.com>
> Cc: stable <stable@kernel.org>
> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Hi Greg 2000!

I expect that checking skb->len isn't sufficient here
and pskb_may_pull needs to be used to ensure that
the data is also available in the linear section of the skb.

Also, although I'm all for incremental enhancements,
I do suspect that similar problems exist in the call
chain of these functions.

...

^ permalink raw reply

* Re: [PATCH net-next v4] selftests/net: convert so_txtime to drv-net
From: Willem de Bruijn @ 2026-04-09 19:10 UTC (permalink / raw)
  To: Willem de Bruijn, netdev
  Cc: davem, kuba, edumazet, pabeni, horms, linux-kselftest, shuah,
	Willem de Bruijn
In-Reply-To: <20260409164238.661091-1-willemdebruijn.kernel@gmail.com>

Willem de Bruijn wrote:
> From: Willem de Bruijn <willemb@google.com>
> 
> In preparation for extending to pacing hardware offload, convert the
> so_txtime.sh test to a drv-net test that can be run against netdevsim
> and real hardware.
> 
> Also update so_txtime.c to not exit on first failure, but run to
> completion and report exit code there. This helps with debugging
> unexpected results, especially when processing multiple packets,
> as in the "reverse_order" testcase.
> 
> Signed-off-by: Willem de Bruijn <willemb@google.com>

> +def main() -> None:
> +    """Boilerplate ksft main."""
> +    with NetDrvEpEnv(__file__) as cfg:
> +        # Record original root qdisc
> +        cmd_obj = cmd((f"tc -j qdisc show dev {cfg.ifname} root"))
> +        qdisc_root = json.loads(cmd_obj.stdout)[0].get("kind", None)
> +
> +        ksft_run([test_so_txtime_mono, test_so_txtime_etf], args=(cfg,))
> +
> +        # Restore original root qdisc. If mq, populate with default_qdisc nodes
> +        if (qdisc_root):

I evidently couldn't resist a touch up after running through pylint.

Unnecessary parentheses. Only a warn. But I can resubmit irrespective
of other concerns.

Again, could add a tc helper (in a separate patch) to hide some of the
open coded ugliness too.

> +            cmd(f"tc qdisc replace dev {cfg.ifname} root {qdisc_root}")
> +    ksft_exit()

^ permalink raw reply

* Re: [PATCH v2 net 2/2] net: hamradio: scc: validate bufsize in SIOCSCCSMEM ioctl
From: Joerg Reuter @ 2026-04-09 19:27 UTC (permalink / raw)
  To: Mashiro Chen
  Cc: netdev, andrew+netdev, davem, edumazet, kuba, pabeni, linux-hams,
	linux-kernel, stable
In-Reply-To: <20260409024927.24397-3-mashiro.chen@mailbox.org>

Looks great, thanks!

    73, Joerg

> The SIOCSCCSMEM ioctl copies a scc_mem_config from user space and
> assigns its bufsize field directly to scc->stat.bufsize without any
> range validation:
> 
>   scc->stat.bufsize = memcfg.bufsize;
> 
> If a privileged user (CAP_SYS_RAWIO) sets bufsize to 0, the receive
> interrupt handler later calls dev_alloc_skb(0) and immediately writes
> a KISS type byte via skb_put_u8() into a zero-capacity socket buffer,
> corrupting the adjacent skb_shared_info region.
> 
> Reject bufsize values smaller than 16; this is large enough to hold
> at least one KISS header byte plus useful data.
> 
> Cc: stable@vger.kernel.org
> Cc: linux-hams@vger.kernel.org
Acked-by: Joerg Reuter <jreuter@yaina.de>
> Signed-off-by: Mashiro Chen <mashiro.chen@mailbox.org>
> ---
>  drivers/net/hamradio/scc.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/net/hamradio/scc.c b/drivers/net/hamradio/scc.c
> index ae5048efde686a..8569db4a71401c 100644
> --- a/drivers/net/hamradio/scc.c
> +++ b/drivers/net/hamradio/scc.c
> @@ -1909,6 +1909,8 @@ static int scc_net_siocdevprivate(struct net_device *dev,
>  			if (!capable(CAP_SYS_RAWIO)) return -EPERM;
>  			if (!arg || copy_from_user(&memcfg, arg, sizeof(memcfg)))
>  				return -EINVAL;
> +			if (memcfg.bufsize < 16)
> +				return -EINVAL;
>  			scc->stat.bufsize   = memcfg.bufsize;
>  			return 0;
>  		
> -- 
> 2.53.0
> 

-- 
Joerg Reuter                                    http://yaina.de/jreuter
And I make my way to where the warm scent of soil fills the evening air. 
Everything is waiting quietly out there....                 (Anne Clark)

^ permalink raw reply

* Re: [PATCH] nfc: llcp: fix missing return after LLCP_CLOSED check in recv_hdlc and recv_disc
From: Lekë Hapçiu @ 2026-04-09 19:34 UTC (permalink / raw)
  To: horms
  Cc: netdev, davem, linux-nfc, kuba, krzysztof.kozlowski, stable,
	framemain
In-Reply-To: <20260409164527.GP469338@kernel.org>

Thanks for the pointer. Withdrawing this patch — the existing
submission at:
  https://lore.kernel.org/all/20260408081006.3723-1-qjx1298677004@gmail.com/
covers the same fix.

Lekë

^ permalink raw reply

* Re: [PATCH] nfc: nci: fix OOB heap read in nci_core_init_rsp_packet_v1()
From: Lekë Hapçiu @ 2026-04-09 19:37 UTC (permalink / raw)
  To: horms; +Cc: netdev, kuba, davem, edumazet, pabeni, security, framemain
In-Reply-To: <20260408190505.GK469338@kernel.org>

Thanks for the review. I will address all three points across both
patches (v1 and v2) and send them together as a patchset:

  - add an early skb->len >= sizeof(*rsp_1) guard before any struct
    field access
  - compute the rsp_2 pointer using the raw chip-supplied
    num_supported_rf_interfaces (validated against skb->len), not the
    already-capped ndev value
  - investigate whether pskb_may_pull is needed in the NCI receive path
    before these handlers are called

Lekë

^ permalink raw reply

* Re: [PATCH net-next v2 3/3] net: mdio: treat PSE EPROBE_DEFER as non-fatal during PHY registration
From: Andrew Lunn @ 2026-04-09 19:54 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Kory Maincent, Carlo Szelinsky, o.rempel, andrew+netdev,
	hkallweit1, kuba, davem, edumazet, pabeni, horms, netdev,
	linux-kernel
In-Reply-To: <adfPAQDiYX6eIjrT@shell.armlinux.org.uk>

On Thu, Apr 09, 2026 at 05:08:33PM +0100, Russell King (Oracle) wrote:
> On Thu, Apr 09, 2026 at 05:34:56PM +0200, Andrew Lunn wrote:
> > I still think we should be deferring probe until we have all the parts
> > available. The question is, how do we actually do that?
> 
> Indeed...
> 
> > We could insist that MACs being used with PSE need to call
> > phylink_connect() in probe, so we can return EPROBE_DEFER. We might
> > actually need a new API method, phylink_connect_probe(). That can call
> > down into phylib, maybe again new API methods, which will not bind
> > genphy, but return EPROBE_DEFER.

I did not say i would be easy...

> How would MACs know whether they should call phylink_connect_probe()
> or phylink_connect_phy() ?

It would not. Anybody with a board using PSE would need to modify the
MAC driver to use phylink_connect_probe(), if they have a slow to load
PSE device.

> What do we do about MAC drivers that are a single driver and device,
> but are made up of several network devices (like Marvell PP2) ?

It would need more care, but it should work. You might end up removing
a perfectly good device because the other one is missing its PHY,
which is not ideal, but hopefully you get there in the end.

> We also have network drivers that provide a MDIO bus for a different
> network device, which makes connecting the PHY harder in the probe
> path.

Yes, we would see such setup doing more deferred probing, but again,
they should get there in the end. The most common systems doing this
are using the FEC. Are there any board using the FEC and problematic
PSE?

> Lastly, what do we do where a PHY driver hasn't been configured or
> doesn't exist for the PHY?

I was wondering if we can get from the driver core some idea where we
are in the deferred probing window. If we are 2/3 of the way through
the window, fall back to genphy?

I'm not saying we should change all MAC drivers, or recommend new MAC
drivers connect to the PHY in probe. I just want to offer the option
if you have a problematic PSE or PHY, change the MAC driver.

What we have also said in the past, it is the bootloaders problem to
download firmware into the PHY, or PSE, so that it is ready to go by
the time Linux boots. That would also be the simpler solution here.

    Andrew

^ permalink raw reply

* Re: [PATCH net-next 1/2] selftests: drv-net: Add ntuple (NFC) flow steering test
From: Dimitri Daskalakis @ 2026-04-09 20:28 UTC (permalink / raw)
  To: Jakub Kicinski, Michael Chan
  Cc: David S . Miller, Andrew Lunn, Eric Dumazet, Paolo Abeni,
	Shuah Khan, Willem de Bruijn, Petr Machata, David Wei,
	Chris J Arges, Carolina Jubran, Dimitri Daskalakis, netdev,
	linux-kselftest
In-Reply-To: <20260409085055.0834111c@kernel.org>



On 4/9/26 8:50 AM, Jakub Kicinski wrote:
> On Tue,  7 Apr 2026 09:49:53 -0700 Dimitri Daskalakis wrote:
>> Add a test for ethtool NFC (ntuple) flow steering rules. The test
>> creates an ntuple rule matching on various flow fields and verifies
>> that traffic is steered to the correct queue.
> Hi Michael, how accurate is the stats refresh timer in bnxt?
> This test is seeing ~10% of flakiness on bnxt, fewer packets
> got counted than we sent. Could be something else but I suspect
> the stats just didn't get refreshed. We give it 25% margin right 
> now.
>
> Dimitiri, this skips for some drivers because they don't auto-enable
> ntuple filters. Looks like other selftests have the same check and also
> skip in netdev CI. So probably a separate / follow up task but I think
> we need to add code to enable the filters if they were disabled.

Sounds good, I will follow up with enabling ntuple filters across the
selftests.

^ permalink raw reply

* [PATCH v2][next] netfilter: x_tables: Avoid a couple -Wflex-array-member-not-at-end warnings
From: Gustavo A. R. Silva @ 2026-04-09 20:28 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Florian Westphal, Phil Sutter, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman
  Cc: netfilter-devel, coreteam, netdev, linux-kernel,
	Gustavo A. R. Silva, linux-hardening

-Wflex-array-member-not-at-end was introduced in GCC-14, and we are
getting ready to enable it, globally.

struct compat_xt_standard_target and struct compat_xt_error_target are
only used in xt_compat_check_entry_offsets(). Remove these structs and
instead define the same memory layout on the stack via flexible struct
compat_xt_entry_target and DEFINE_RAW_FLEX(). Adjust the rest of the
code accordingly.

With these changes, fix the following warnings:

1 net/netfilter/x_tables.c:816:39: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
1 net/netfilter/x_tables.c:811:39: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
---
Changes in v2:
 - Update verdict after (compat_uint_t *)st->data;

v1:
 - Link: https://lore.kernel.org/linux-hardening/adbIKC0cZcK7VcCF@kspp/

 net/netfilter/x_tables.c | 31 ++++++++++++++-----------------
 1 file changed, 14 insertions(+), 17 deletions(-)

diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index b39017c80548..746012196d83 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -817,17 +817,6 @@ int xt_compat_match_to_user(const struct xt_entry_match *m,
 }
 EXPORT_SYMBOL_GPL(xt_compat_match_to_user);
 
-/* non-compat version may have padding after verdict */
-struct compat_xt_standard_target {
-	struct compat_xt_entry_target t;
-	compat_uint_t verdict;
-};
-
-struct compat_xt_error_target {
-	struct compat_xt_entry_target t;
-	char errorname[XT_FUNCTION_MAXNAMELEN];
-};
-
 int xt_compat_check_entry_offsets(const void *base, const char *elems,
 				  unsigned int target_offset,
 				  unsigned int next_offset)
@@ -850,18 +839,26 @@ int xt_compat_check_entry_offsets(const void *base, const char *elems,
 		return -EINVAL;
 
 	if (strcmp(t->u.user.name, XT_STANDARD_TARGET) == 0) {
-		const struct compat_xt_standard_target *st = (const void *)t;
+		DEFINE_RAW_FLEX(const struct compat_xt_entry_target, st, data,
+				sizeof(compat_uint_t));
+		compat_uint_t *verdict;
 
-		if (COMPAT_XT_ALIGN(target_offset + sizeof(*st)) != next_offset)
+		st = (const void *)t;
+		verdict = (compat_uint_t *)st->data;
+
+		if (COMPAT_XT_ALIGN(target_offset + __struct_size(st)) !=
+				next_offset)
 			return -EINVAL;
 
-		if (!verdict_ok(st->verdict))
+		if (!verdict_ok(*verdict))
 			return -EINVAL;
 	} else if (strcmp(t->u.user.name, XT_ERROR_TARGET) == 0) {
-		const struct compat_xt_error_target *et = (const void *)t;
+		DEFINE_RAW_FLEX(const struct compat_xt_entry_target, et, data,
+				XT_FUNCTION_MAXNAMELEN);
+		et = (const void *)t;
 
-		if (!error_tg_ok(t->u.target_size, sizeof(*et),
-				 et->errorname, sizeof(et->errorname)))
+		if (!error_tg_ok(t->u.target_size, __struct_size(et),
+				 et->data, __member_size(et->data)))
 			return -EINVAL;
 	}
 
-- 
2.43.0


^ permalink raw reply related

* [PATCH net 0/2] mlx5 misc fixes 2026-04-09
From: Tariq Toukan @ 2026-04-09 20:28 UTC (permalink / raw)
  To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
	David S. Miller
  Cc: Boris Pismenny, Saeed Mahameed, Leon Romanovsky, Tariq Toukan,
	Mark Bloch, Jianbo Liu, Kees Cook, Lama Kayal, Michal Swiatkowski,
	Gal Pressman, Roy Novich, Roi Dayan, Raed Salem, netdev,
	linux-rdma, linux-kernel, Dragos Tatulea

Hi,

This small patchset provides misc bug fixes from Gal to the mlx5 Eth
driver.

Thanks,
Tariq.

Gal Pressman (2):
  net/mlx5e: Fix features not applied during netdev registration
  net/mlx5e: IPsec, fix ASO poll timeout with read_poll_timeout_atomic()

 .../mellanox/mlx5/core/en_accel/ipsec_offload.c      | 12 ++++--------
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c    |  8 ++++++++
 2 files changed, 12 insertions(+), 8 deletions(-)


base-commit: ebe560ea5f54134279356703e73b7f867c89db13
-- 
2.44.0


^ permalink raw reply

* [PATCH net 1/2] net/mlx5e: Fix features not applied during netdev registration
From: Tariq Toukan @ 2026-04-09 20:28 UTC (permalink / raw)
  To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
	David S. Miller
  Cc: Boris Pismenny, Saeed Mahameed, Leon Romanovsky, Tariq Toukan,
	Mark Bloch, Jianbo Liu, Kees Cook, Lama Kayal, Michal Swiatkowski,
	Gal Pressman, Roy Novich, Roi Dayan, Raed Salem, netdev,
	linux-rdma, linux-kernel, Dragos Tatulea
In-Reply-To: <20260409202852.158059-1-tariqt@nvidia.com>

From: Gal Pressman <gal@nvidia.com>

mlx5e_fix_features() returns early when the netdevice is not present.
This is correct during profile transitions where priv is cleared, but it
also incorrectly blocks feature fixups during register_netdev(), when
the device is also not yet present.

It is not trivial to distinguish between both cases as we cannot use
priv to carry state, and in both cases reg_state == NETREG_REGISTERED.

Force a netdev features update after register_netdev() completes, where
the device is present and fix_features() can actually work.

This is not a pretty solution, as it results in an additional features
update call (register_netdevice() already calls
__netdev_update_features() internally), but it is the simplest,
cleanest, and most robust way I found to fix this issue after multiple
attempts.

This fixes an issue on systems where CQE compression is enabled by
default, RXHASH remains enabled after registration despite the two
features being mutually exclusive.

Fixes: ab4b01bfdaa6 ("net/mlx5e: Verify dev is present for fix features ndo")
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index b6c12460b54a..0b8b44bbcb9e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -6756,6 +6756,14 @@ static int _mlx5e_probe(struct auxiliary_device *adev)
 		goto err_resume;
 	}

+	/* mlx5e_fix_features() returns early when the device is not present
+	 * to avoid dereferencing cleared priv during profile changes.
+	 * This also causes it to be a no-op during register_netdev(), where
+	 * the device is not yet present.
+	 * Trigger an additional features update that will actually work.
+	 */
+	mlx5e_update_features(netdev);
+
 	mlx5e_dcbnl_init_app(priv);
 	mlx5_core_uplink_netdev_set(mdev, netdev);
 	mlx5e_params_print_info(mdev, &priv->channels.params);
-- 
2.44.0

^ permalink raw reply related

* [PATCH net 2/2] net/mlx5e: IPsec, fix ASO poll timeout with read_poll_timeout_atomic()
From: Tariq Toukan @ 2026-04-09 20:28 UTC (permalink / raw)
  To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
	David S. Miller
  Cc: Boris Pismenny, Saeed Mahameed, Leon Romanovsky, Tariq Toukan,
	Mark Bloch, Jianbo Liu, Kees Cook, Lama Kayal, Michal Swiatkowski,
	Gal Pressman, Roy Novich, Roi Dayan, Raed Salem, netdev,
	linux-rdma, linux-kernel, Dragos Tatulea
In-Reply-To: <20260409202852.158059-1-tariqt@nvidia.com>

From: Gal Pressman <gal@nvidia.com>

The do-while poll loop uses jiffies for its timeout:
  expires = jiffies + msecs_to_jiffies(10);

jiffies is sampled at an arbitrary point within the current tick, so the
first partial tick contributes anywhere from a full tick down to nearly
zero real time. For small msecs_to_jiffies() results this is
significant, the effective poll window can be much shorter than the
requested 10ms, and in the worst case the loop exits after a single
iteration (e.g., when HZ=100), well before the device has delivered the
CQE.

Replace the loop with read_poll_timeout_atomic(), which counts elapsed
time via udelay() accounting rather than jiffies, guaranteeing the full
poll window regardless of HZ.

Additionally, read_poll_timeout_atomic() executes the poll operation one
more time after the timeout has expired, giving the CQE a final chance
to be detected. The old do-while loop could exit without a final poll if
the timeout expired during the udelay() between iterations.

Fixes: 76e463f6508b ("net/mlx5e: Overcome slow response for first IPsec ASO WQE")
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 .../mellanox/mlx5/core/en_accel/ipsec_offload.c      | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_offload.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_offload.c
index 05faad5083d9..145677ce9640 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_offload.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_offload.c
@@ -1,6 +1,8 @@
 // SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
 /* Copyright (c) 2017, Mellanox Technologies inc. All rights reserved. */
 
+#include <linux/iopoll.h>
+
 #include "mlx5_core.h"
 #include "en.h"
 #include "ipsec.h"
@@ -592,7 +594,6 @@ int mlx5e_ipsec_aso_query(struct mlx5e_ipsec_sa_entry *sa_entry,
 	struct mlx5_wqe_aso_ctrl_seg *ctrl;
 	struct mlx5e_hw_objs *res;
 	struct mlx5_aso_wqe *wqe;
-	unsigned long expires;
 	u8 ds_cnt;
 	int ret;
 
@@ -614,13 +615,8 @@ int mlx5e_ipsec_aso_query(struct mlx5e_ipsec_sa_entry *sa_entry,
 	mlx5e_ipsec_aso_copy(ctrl, data);
 
 	mlx5_aso_post_wqe(aso->aso, false, &wqe->ctrl);
-	expires = jiffies + msecs_to_jiffies(10);
-	do {
-		ret = mlx5_aso_poll_cq(aso->aso, false);
-		if (ret)
-			/* We are in atomic context */
-			udelay(10);
-	} while (ret && time_is_after_jiffies(expires));
+	read_poll_timeout_atomic(mlx5_aso_poll_cq, ret, !ret, 10,
+				 10 * USEC_PER_MSEC, false, aso->aso, false);
 	if (!ret)
 		memcpy(sa_entry->ctx, aso->ctx, MLX5_ST_SZ_BYTES(ipsec_aso));
 	spin_unlock_bh(&aso->lock);
-- 
2.44.0


^ permalink raw reply related

* Re: [PATCH] MAINTAINERS: Remove Salil Mehta as HiSilicon HNS3/HNS Ethernet maintainer
From: patchwork-bot+netdevbpf @ 2026-04-09 20:30 UTC (permalink / raw)
  To: Salil Mehta; +Cc: davem, netdev, kuba, salil.mehta, shenjian15, shaojijie
In-Reply-To: <20260409000430.7217-1-salil.mehta@huawei.com>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Thu, 9 Apr 2026 01:04:30 +0100 you wrote:
> From: Salil Mehta <salil.mehta@opnsrc.net>
> 
> Closing this chapter and a long wonderful journey with my team, I sign off one
> last time with my Huawei email address. Remove my maintainer entry for the
> HiSilicon HNS and HNS3 10G/100G Ethernet drivers, and add a CREDITS entry for
> my co-authorship and maintenance contributions to these drivers.
> 
> [...]

Here is the summary with links:
  - MAINTAINERS: Remove Salil Mehta as HiSilicon HNS3/HNS Ethernet maintainer
    https://git.kernel.org/netdev/net/c/eb216e422044

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH v9 02/10] x86/bhi: Make clear_bhb_loop() effective on newer CPUs
From: Dave Hansen @ 2026-04-09 20:36 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Pawan Gupta, x86, Jon Kohler, Nikolay Borisov, H. Peter Anvin,
	Josh Poimboeuf, David Kaplan, Sean Christopherson,
	Borislav Petkov, Dave Hansen, Peter Zijlstra, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, KP Singh, Jiri Olsa,
	David S. Miller, David Laight, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, David Ahern, Martin KaFai Lau, Eduard Zingerman,
	Song Liu, Yonghong Song, John Fastabend, Stanislav Fomichev,
	Hao Luo, Paolo Bonzini, Jonathan Corbet, linux-kernel, kvm,
	Asit Mallick, Tao Zhang, bpf, netdev, linux-doc, chao.gao
In-Reply-To: <CALMp9eRfNsghM_RnDXOs=SJYObfPa5A1aOVDZno_zJ=XotfmRw@mail.gmail.com>

On 4/7/26 17:47, Jim Mattson wrote:
> On Tue, Apr 7, 2026 at 4:41 PM Dave Hansen <dave.hansen@intel.com> wrote:
>> On 4/7/26 16:27, Jim Mattson wrote:
>>> What is your proposed BHI_DIS_S override mechanism, then?
>> Let me make sure I get this right. The desire is to:
>>
>> 1. Have hypervisors lie to guests about the CPU they are running on (for
>>    the benefit of large/diverse migration pools)
>> 2. Have guests be allowed to boot with BHI_DIS_S for performance
>> 3. Have apps in those guests that care about security to opt back in to
>>    BHI_DIS_S for themselves?
> I just want guests on heterogeneous migration pools to properly
> protect themselves from native BHI when running on host kernels at
> least as far back as Linux v6.6.
> 
> To that end, I would be satisfied with using the longer BHB clearing
> sequence when HYPERVISOR is true and BHI_CTRL is false.

If the guests can't get mitigation information from model/family because
the hypervisor is lying (or may lie), then it's on the hypervisor to
figure it out.

I'm not sure we want to just assume that all hypervisors are going to
lie all the time about this.

I kinda think we should just let Pawan's series move forward and then we
can debate the lying hypervisor problem once the series is settled.

^ permalink raw reply

* Re: [PATCH net-next v2 5/5] ethtool: strset: check nla_len overflow
From: Stanislav Fomichev @ 2026-04-09 20:39 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Stanislav Fomichev, Jakub Kicinski, Hangbin Liu, Donald Hunter,
	David S. Miller, Eric Dumazet, Paolo Abeni, Simon Horman, netdev,
	linux-kernel
In-Reply-To: <bb2e7087-aa36-4556-8778-b65d11354779@lunn.ch>

On 04/09, Andrew Lunn wrote:
> > I guess... Should we update ethtool.yaml doc to tell the users to prefer
> > ioctl over netlink for strset-get and mention this new EMSGSIZE?
> 
> No. The ioctl is deprecated. It can still be used for drivers which
> need it, but netlink is the preferred method.

I'm with you on deprecating ioctl and pushing for netlink, but I'm not sure
how we can recommend this specific api call if it consistently can return
EMSGSIZE for some devices? Or am I reading this whole series wrong?

^ permalink raw reply

* Re: [PATCH net-next] net/mlx5: Use dma_wmb() for completion queue doorbell updates
From: Tariq Toukan @ 2026-04-09 20:46 UTC (permalink / raw)
  To: lirongqing, Saeed Mahameed, Leon Romanovsky, Tariq Toukan,
	Mark Bloch, Andrew Lunn, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	Boris Pismenny, Richard Cochran, Cosmin Ratiu, Dragos Tatulea,
	Carolina Jubran, Kees Cook, Akiva Goldberger, Simon Horman,
	netdev, linux-rdma, linux-kernel, bpf
In-Reply-To: <20260402055206.2311-1-lirongqing@baidu.com>



On 02/04/2026 8:52, lirongqing wrote:
> From: Li RongQing <lirongqing@baidu.com>
> 
> dma_wmb() barriers are specifically for ordering writes to DMA
> coherent memory that is accessible to both the CPU and DMA capable
> devices.
> 
> The dma_wmb() barrier is lighter than wmb() on some architectures
> because it only ensures ordering for DMA writes, not for all writes
> including MMIO accesses.
> 
> In the MLX5 driver, completion queue (CQ) doorbell records are
> allocated as DMA coherent memory via mlx5_dma_zalloc_coherent_node().
> The CQ update pattern is:
>    1. Update CQ space (device reads via DMA)
>    2. Update doorbell record (device reads via DMA)
>    3. Memory barrier
>    4. Enable more CQEs
> 
> Since only DMA coherent memory accesses are involved (no MMIO accesses
> follow), can safely use dma_wmb() instead of wmb().
> 
> This change improves performance slightly on architectures where
> dma_wmb() is lighter than wmb().
> 
> Signed-off-by: Li RongQing <lirongqing@baidu.com>
> ---

Hi,

Sorry for the delay.
Thanks for your patch.

The idea looks valid.
This is the kind of patches that better go through intensive testing 
before acceptance, I'm picking it for internal testing and will update.

PS: I know you have one more patch [1] pending testing. It looks good so 
far, I'll verify and send an update soon.

Regards,
Tariq

[1] 
https://patchwork.kernel.org/project/netdevbpf/patch/20260317003544.2583-1-lirongqing@baidu.com/

>   drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c    | 2 +-
>   drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c    | 2 +-
>   drivers/net/ethernet/mellanox/mlx5/core/en_rx.c     | 2 +-
>   drivers/net/ethernet/mellanox/mlx5/core/en_tx.c     | 2 +-
>   drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c | 2 +-
>   drivers/net/ethernet/mellanox/mlx5/core/lib/aso.c   | 2 +-
>   drivers/net/ethernet/mellanox/mlx5/core/wc.c        | 2 +-
>   7 files changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
> index 1b76647..7bd6dfc 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
> @@ -259,7 +259,7 @@ static bool mlx5e_ptp_poll_ts_cq(struct mlx5e_cq *cq, int napi_budget)
>   	mlx5_cqwq_update_db_record(cqwq);
>   
>   	/* ensure cq space is freed before enabling more cqes */
> -	wmb();
> +	dma_wmb();
>   
>   	while (metadata_buff_sz > 0)
>   		mlx5e_ptp_metadata_fifo_push(&ptpsq->metadata_freelist,
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
> index 80f9fc1..dde8856 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
> @@ -805,7 +805,7 @@ bool mlx5e_poll_xdpsq_cq(struct mlx5e_cq *cq)
>   	mlx5_cqwq_update_db_record(&cq->wq);
>   
>   	/* ensure cq space is freed before enabling more cqes */
> -	wmb();
> +	dma_wmb();
>   
>   	sq->cc = sqcc;
>   	return (i == MLX5E_TX_CQ_POLL_BUDGET);
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> index 268e208..f17e7f1 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> @@ -2447,7 +2447,7 @@ int mlx5e_poll_rx_cq(struct mlx5e_cq *cq, int budget)
>   	mlx5_cqwq_update_db_record(cqwq);
>   
>   	/* ensure cq space is freed before enabling more cqes */
> -	wmb();
> +	dma_wmb();
>   
>   	return work_done;
>   }
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
> index 9f02726..7ba319f 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
> @@ -849,7 +849,7 @@ bool mlx5e_poll_tx_cq(struct mlx5e_cq *cq, int napi_budget)
>   	mlx5_cqwq_update_db_record(&cq->wq);
>   
>   	/* ensure cq space is freed before enabling more cqes */
> -	wmb();
> +	dma_wmb();
>   
>   	sq->dma_fifo_cc = dma_fifo_cc;
>   	sq->cc = sqcc;
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c b/drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c
> index 1f6bde5..1341874 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c
> @@ -384,7 +384,7 @@ static inline void mlx5_fpga_conn_cqes(struct mlx5_fpga_conn *conn,
>   
>   	mlx5_fpga_dbg(conn->fdev, "Re-arming CQ with cc# %u\n", conn->cq.wq.cc);
>   	/* ensure cq space is freed before enabling more cqes */
> -	wmb();
> +	dma_wmb();
>   	mlx5_fpga_conn_arm_cq(conn);
>   }
>   
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/aso.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/aso.c
> index 614cd57..8f7a89a 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/lib/aso.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/aso.c
> @@ -421,7 +421,7 @@ int mlx5_aso_poll_cq(struct mlx5_aso *aso, bool with_data)
>   	mlx5_cqwq_update_db_record(&cq->wq);
>   
>   	/* ensure cq space is freed before enabling more cqes */
> -	wmb();
> +	dma_wmb();
>   
>   	if (with_data)
>   		aso->cc += MLX5_ASO_WQEBBS_DATA;
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/wc.c b/drivers/net/ethernet/mellanox/mlx5/core/wc.c
> index 7d3d4d7..1afbdd19 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/wc.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/wc.c
> @@ -314,7 +314,7 @@ static void mlx5_wc_post_nop(struct mlx5_wc_sq *sq, unsigned int *offset,
>   	/* ensure doorbell record is visible to device before ringing the
>   	 * doorbell
>   	 */
> -	wmb();
> +	dma_wmb();
>   
>   	mlx5_iowrite64_copy(sq, mmio_wqe, sizeof(mmio_wqe), *offset);
>   


^ permalink raw reply

* Re: [RFC net PATCH v1] net: pcs: pcs-mtk-lynxi: fix bpi-r3 serdes configuration
From: Daniel Golle @ 2026-04-09 20:55 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Frank Wunderlich, Chester A. Unal, Felix Fietkau,
	Alexander Couzens, Andrew Lunn, Heiner Kallweit, Russell King,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Matthias Brugger, AngeloGioacchino Del Regno, Frank Wunderlich,
	netdev, linux-kernel, linux-arm-kernel, linux-mediatek
In-Reply-To: <20260409164942.wbmwtkpd5d5zibyy@skbuf>

On Thu, Apr 09, 2026 at 07:49:42PM +0300, Vladimir Oltean wrote:
> I notice Arınc, listed by ./scripts/get_maintainer.pl drivers/net/dsa/mt7530.c,
> and Felix, listed by ./scripts/get_maintainer.pl drivers/net/ethernet/mediatek/mtk_eth_soc.c,
> are not on CC. Maybe they have more info.
> 
> Only the switch port has a chance of having a non-zero default polarity
> setting? (coming from the efuse, if I understood this discussion properly)
> https://lore.kernel.org/netdev/C59EED96-3973-4074-A4D8-C264949D447E@linux.dev/
> The GMAC doesn't?

Yes, vendor SDK uses DT mediatek,pnswap{,-rx,-tx} properties only for the
SoC GMACs. For MT7531 there are **no** strap pins deciding the SerDes
polarity, and also no software-way to override the defaults in the vendor
SDK.

However, the MT7531 datasheet quite clearly states:
Register 000050EC QPHY_WRAP_CTRL -- QPHY wrapper control
Reset value: 0x00000501

BIT 1 RX_BIT_POLARITY -- RX bit polarity control
 1'b0: normal
 1'b1: inverted

BIT 0 TX_BIT_POLARITY -- TX bit polarity control (TX default inversed in MT7531)
 1'b0: normal
 1'b1: inverted

Hence the best would be to just assume the documented default in the driver
as well.

A quick register dump using the BPi-R3 confirms that this applies to *both*
SerDes PCS on MT7531A (port 5 and port 6) equally, both read 0x00000501
after reset.

^ permalink raw reply

* Re: [PATCH v9 02/10] x86/bhi: Make clear_bhb_loop() effective on newer CPUs
From: Jim Mattson @ 2026-04-09 21:06 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Pawan Gupta, x86, Jon Kohler, Nikolay Borisov, H. Peter Anvin,
	Josh Poimboeuf, David Kaplan, Sean Christopherson,
	Borislav Petkov, Dave Hansen, Peter Zijlstra, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, KP Singh, Jiri Olsa,
	David S. Miller, David Laight, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, David Ahern, Martin KaFai Lau, Eduard Zingerman,
	Song Liu, Yonghong Song, John Fastabend, Stanislav Fomichev,
	Hao Luo, Paolo Bonzini, Jonathan Corbet, linux-kernel, kvm,
	Asit Mallick, Tao Zhang, bpf, netdev, linux-doc, chao.gao
In-Reply-To: <410df9f6-69ec-483f-9009-0a9b8c9162a9@intel.com>

On Thu, Apr 9, 2026 at 1:36 PM Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 4/7/26 17:47, Jim Mattson wrote:
> > On Tue, Apr 7, 2026 at 4:41 PM Dave Hansen <dave.hansen@intel.com> wrote:
> >> On 4/7/26 16:27, Jim Mattson wrote:
> >>> What is your proposed BHI_DIS_S override mechanism, then?
> >> Let me make sure I get this right. The desire is to:
> >>
> >> 1. Have hypervisors lie to guests about the CPU they are running on (for
> >>    the benefit of large/diverse migration pools)
> >> 2. Have guests be allowed to boot with BHI_DIS_S for performance
> >> 3. Have apps in those guests that care about security to opt back in to
> >>    BHI_DIS_S for themselves?
> > I just want guests on heterogeneous migration pools to properly
> > protect themselves from native BHI when running on host kernels at
> > least as far back as Linux v6.6.
> >
> > To that end, I would be satisfied with using the longer BHB clearing
> > sequence when HYPERVISOR is true and BHI_CTRL is false.
>
> If the guests can't get mitigation information from model/family because
> the hypervisor is lying (or may lie), then it's on the hypervisor to
> figure it out.
>
> I'm not sure we want to just assume that all hypervisors are going to
> lie all the time about this.

Without any information, that is exactly what we must assume. There is
precedent for this.

In vulnerable_to_its():

        /*
         * If a VMM did not expose ITS_NO, assume that a guest could
         * be running on a vulnerable hardware or may migrate to such
         * hardware.
         */
        if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
                return true;


In cpu_set_bug_bits():

        /*
         * Intel parts with eIBRS are vulnerable to BHI attacks. Parts with
         * BHI_NO still need to use the BHI mitigation to prevent Intra-mode
         * attacks.  When virtualized, eIBRS could be hidden, assume vulnerable.
         */
        if (!cpu_matches(cpu_vuln_whitelist, NO_BHI) &&
            (boot_cpu_has(X86_FEATURE_IBRS_ENHANCED) ||
             boot_cpu_has(X86_FEATURE_HYPERVISOR)))
                setup_force_cpu_bug(X86_BUG_BHI);

...and...

        if (c->x86_vendor == X86_VENDOR_AMD) {
                if (!cpu_has(c, X86_FEATURE_TSA_SQ_NO) ||
                    !cpu_has(c, X86_FEATURE_TSA_L1_NO)) {
                        if (cpu_matches(cpu_vuln_blacklist, TSA) ||
                            /* Enable bug on Zen guests to allow for
live migration. */
                            (cpu_has(c, X86_FEATURE_HYPERVISOR) &&
cpu_has(c, X86_FEATURE_ZEN)))
                                setup_force_cpu_bug(X86_BUG_TSA);
                }
        }


In check_null_seg_clears_base():

        /*
         * CPUID bit above wasn't set. If this kernel is still running
         * as a HV guest, then the HV has decided not to advertize
         * that CPUID bit for whatever reason. For example, one
         * member of the migration pool might be vulnerable. Which
         * means, the bug is present: set the BUG flag and return.
         */
        if (cpu_has(c, X86_FEATURE_HYPERVISOR)) {
                set_cpu_bug(c, X86_BUG_NULL_SEG);
                return;
        }

The hypervisor could provide more information so that the guest can
determine when it's safe to use the short sequence, but that's just
icing on the cake. The default, out-of-the-box configuration must be
safe.

^ permalink raw reply

* Re: [PATCH net-next 0/7] tcp: restrict rcv_wnd and window_clamp to representable window
From: Simon Baatz @ 2026-04-09 21:24 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Neal Cardwell, Kuniyuki Iwashima, David S. Miller, David Ahern,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Shuah Khan, netdev,
	linux-kernel, linux-kselftest
In-Reply-To: <CANn89iJO5DTWeVVfMYh7y3e9Npsu+FQ_a=W9ZqbMtb_wLeBL7A@mail.gmail.com>

Hi Eric,

On Thu, Apr 09, 2026 at 07:52:03AM -0700, Eric Dumazet wrote:
> On Wed, Apr 8, 2026 at 2:50???PM Simon Baatz via B4 Relay
> <devnull+gmbnomis.gmail.com@kernel.org> wrote:
> >
> > Hi,
> >
> > this series ensures that rcv_wnd and window_clamp do not exceed the
> > maximum window size representable for the connection's window scale
> > factor.
> >
> > This is most visible when TCP window scaling is not used for a
> > connection. In that case, the advertised window is limited to 65535
> > bytes, but rcv_wnd or window_clamp can still grow beyond 65535 when
> > large receive buffers are used. The resulting mismatch breaks
> > calculations that depend on the advertised window, such as the ACK
> > decision in __tcp_ack_snd_check(), and can prevent immediate ACKs.
> >
> > Similar effects may also occur when window scaling is in use, e.g. if
> > the application dynamically adjusts SO_RCVBUF in unusual ways or when
> > the rmem sysctl parameters change during a connection???s lifetime.
> >
> > Summary:
> >
> > - Patch 1 keeps rcv_wnd capped by the (window scale-limited)
> >   window_clamp at connection start.
> > - Patch 3 and 6 ensure that window_clamp is limited to the
> >   representable window when it is updated.
> > - The other patches add packetdrill tests to verify the new behavior.
> >
> > A simple iperf test on a virtme-ng VM (Intel i5-7500, 4 cores,
> > loopback) shows a noticeable improvement with window scaling disabled:
> 
> Explain why we should spend time reviewing patches trying to help
> stacks from 2 decades ago,
> risking breaking other usages.
> 
> Almost every time we change the rcvbuf logic, we introduce bugs.

As soon as someone gives me access to a link with a bandwidth delay
product of probably > 500 MB I am happy to provide another set of
benchmarks results:

`./defaults.sh
sysctl -q net.ipv4.tcp_rmem="4096 2147483647 2147483647"`

    0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
   +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
   +0 bind(3, ..., ...) = 0
   +0 listen(3, 1) = 0

   +0 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
   +0 > S. 0:0(0) ack 1 win 65535 <mss 1460,nop,nop,sackOK,nop,wscale 14>
   +0 < . 1:1(0) ack 1 win 32792

   +0 accept(3, ..., ...) = 4

   +0 getsockopt(4, IPPROTO_TCP, 10, [1073725440], [4]) = 0
   +0 < P. 1:65001(65000) ack 1 win 32792
   +0 > . 1:1(0) ack 65001 win 65535
   +0 < P. 65001:130001(65000) ack 1 win 32792
   +0 > . 1:1(0) ack 130001 win 65535
   +0 < P. 130001:195001(65000) ack 1 win 32792
   +0 > . 1:1(0) ack 195001 win 65535
   +0 < P. 195001:260001(65000) ack 1 win 32792
   +0 > . 1:1(0) ack 260001 win 65535
   +0 < P. 260001:325001(65000) ack 1 win 32792
   +0 > . 1:1(0) ack 325001 win 65535
   +0 < P. 325001:390001(65000) ack 1 win 32792
   +0 > . 1:1(0) ack 390001 win 65535
   +0 getsockopt(4, IPPROTO_TCP, 10, [2113929215], [4]) = 0
+.1 %{ assert tcpi_rcv_wnd <= 1073725440, tcpi_rcv_wnd }%

Fails with:

 AssertionError: 1074511872

on a current kernel.
 
So, I think we should spend time reviewing this because currently we
just pretend to clamp the window at its limits.
 
> Not using window scaling in 2026 and expecting "iperf improvement" is
> quite something!

I wondered if providing these numbers was a good idea and apparently
it wasn't.  I just found the difference to be striking.  The only
thing I wanted to demonstrate is that basing our calculations on
bogus window sizes can have real effects.

> Out of curiosity, which legacy product is stuck in the 20th century?

I have half a dozen of these products "stuck in the 20th century" at
home.  They are called IoT devices and I find saying that TCP
connections to such devices need not to have proper sequence number
acceptability tests according to RFC 9293 quite something.  ;-)

- Simon

-- 
Simon Baatz <gmbnomis@gmail.com>

^ permalink raw reply

* Re: [PATCH net-next 0/7] tcp: restrict rcv_wnd and window_clamp to representable window
From: Eric Dumazet @ 2026-04-09 21:28 UTC (permalink / raw)
  To: Simon Baatz
  Cc: Neal Cardwell, Kuniyuki Iwashima, David S. Miller, David Ahern,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Shuah Khan, netdev,
	linux-kernel, linux-kselftest
In-Reply-To: <adgY-Yn6-OophYXb@gandalf.schnuecks.de>

On Thu, Apr 9, 2026 at 2:24 PM Simon Baatz <gmbnomis@gmail.com> wrote:
>
> Hi Eric,
>
> On Thu, Apr 09, 2026 at 07:52:03AM -0700, Eric Dumazet wrote:
> > On Wed, Apr 8, 2026 at 2:50???PM Simon Baatz via B4 Relay
> > <devnull+gmbnomis.gmail.com@kernel.org> wrote:
> > >
> > > Hi,
> > >
> > > this series ensures that rcv_wnd and window_clamp do not exceed the
> > > maximum window size representable for the connection's window scale
> > > factor.
> > >
> > > This is most visible when TCP window scaling is not used for a
> > > connection. In that case, the advertised window is limited to 65535
> > > bytes, but rcv_wnd or window_clamp can still grow beyond 65535 when
> > > large receive buffers are used. The resulting mismatch breaks
> > > calculations that depend on the advertised window, such as the ACK
> > > decision in __tcp_ack_snd_check(), and can prevent immediate ACKs.
> > >
> > > Similar effects may also occur when window scaling is in use, e.g. if
> > > the application dynamically adjusts SO_RCVBUF in unusual ways or when
> > > the rmem sysctl parameters change during a connection???s lifetime.
> > >
> > > Summary:
> > >
> > > - Patch 1 keeps rcv_wnd capped by the (window scale-limited)
> > >   window_clamp at connection start.
> > > - Patch 3 and 6 ensure that window_clamp is limited to the
> > >   representable window when it is updated.
> > > - The other patches add packetdrill tests to verify the new behavior.
> > >
> > > A simple iperf test on a virtme-ng VM (Intel i5-7500, 4 cores,
> > > loopback) shows a noticeable improvement with window scaling disabled:
> >
> > Explain why we should spend time reviewing patches trying to help
> > stacks from 2 decades ago,
> > risking breaking other usages.
> >
> > Almost every time we change the rcvbuf logic, we introduce bugs.
>
> As soon as someone gives me access to a link with a bandwidth delay
> product of probably > 500 MB I am happy to provide another set of
> benchmarks results:
>
> `./defaults.sh
> sysctl -q net.ipv4.tcp_rmem="4096 2147483647 2147483647"`

Please do not do this. Stick to reasonable limits.

You might have missed that we are flooded with bug reports (and buggy patches).
We have very limited time for bugs not proven by real-world conditions.

^ permalink raw reply

* Re: [PATCH 1/2] net: fix skb_ext_total_length() BUILD_BUG_ON with CONFIG_GCOV_PROFILE_ALL
From: Konstantin Khorenko @ 2026-04-09 21:43 UTC (permalink / raw)
  To: Paolo Abeni, David S . Miller, Eric Dumazet, Jakub Kicinski
  Cc: Simon Horman, Thomas Weißschuh, Arnd Bergmann,
	Peter Oberparleiter, Mikhail Zaslonko, netdev, linux-kernel,
	Pavel Tikhomirov, Vasileios Almpanis
In-Reply-To: <4f744383-1dc1-415a-a8da-5fe8f59daa35@redhat.com>

On 4/7/26 09:55, Paolo Abeni wrote:
> On 4/2/26 4:05 PM, Konstantin Khorenko wrote:
...
>>
>> Fixes: 5d21d0a65b57 ("net: generalize calculation of skb extensions length")
>> Fixes: d6e5794b06c0 ("net: avoid build bug in skb extension length calculation")
>>
>> Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
> 
> No empty lines in the tags area.

Sure, will fix.

> Also given the commit description, isn't the introduction of the 5th skb
> extension a better fixes tag?

Well, if we did not have 5d21d0a65b57 ("net: generalize calculation of skb extensions length"), we 
won't have a problem even after 5th skb extension.

On the other hand, yes, the defect reveals itself only after the appearance of the 5th skb extension, 
so we can also treat it guilty.

i will change the Fixes: tag.

>> Reviewed-by: Thomas Weißschuh <linux@weissschuh.net>
>> ---
>>   net/core/skbuff.c | 4 +---
>>   1 file changed, 1 insertion(+), 3 deletions(-)
>>
>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>> index 0e217041958a..47c7f0ab6e84 100644
>> --- a/net/core/skbuff.c
>> +++ b/net/core/skbuff.c
>> @@ -5145,7 +5145,7 @@ static const u8 skb_ext_type_len[] = {
>>   #endif
>>   };
>>   
>> -static __always_inline unsigned int skb_ext_total_length(void)
>> +static __always_inline __no_profile unsigned int skb_ext_total_length(void)
>>   {
>>   	unsigned int l = SKB_EXT_CHUNKSIZEOF(struct skb_ext);
>>   	int i;
>> @@ -5159,9 +5159,7 @@ static __always_inline unsigned int skb_ext_total_length(void)
>>   static void skb_extensions_init(void)
>>   {
>>   	BUILD_BUG_ON(SKB_EXT_NUM > 8);
>> -#if !IS_ENABLED(CONFIG_KCOV_INSTRUMENT_ALL)
>>   	BUILD_BUG_ON(skb_ext_total_length() > 255);
>> -#endif
> 
> Sashiko notes that there could be still build breakage:
> 
> https://sashiko.dev/#/patchset/20260402140558.1437002-1-khorenko%40virtuozzo.com
> 
> Could you please double check the above?
Sashiko is great!

The concern about KCOV is valid in theory but doesn't apply in practice. Here's why:

__no_profile (__no_profile_instrument_function__) indeed only prevents GCOV profiling counters
(-fprofile-arcs) from being inserted. It has no effect on KCOV instrumentation
(-fsanitize-coverage=trace-pc), which would require __no_sanitize_coverage instead.

However, KCOV instrumentation does not break constant folding in the first place.
I verified this with a standalone test: a __always_inline function with a loop over a const array 
(mimicking skb_ext_total_length()), compiled with different instrumentation flags:

  * No instrumentation: BUILD_BUG_ON passes (constant folded)
  * GCOV (-fprofile-arcs -ftest-coverage -fno-tree-loop-im): BUILD_BUG_ON fails
  * KCOV (-fsanitize-coverage=trace-pc): BUILD_BUG_ON passes (constant folded)
  * GCOV + atomic (-fprofile-arcs -ftest-coverage -fno-tree-loop-im -fprofile-update=atomic): 
BUILD_BUG_ON fails

The difference is in how GCC instruments code. GCOV inserts global counter increments inside the loop 
body. Combined with -fno-tree-loop-im, these counter operations prevent GCC from proving the loop 
result is a compile-time constant.

KCOV only inserts __sanitizer_cov_trace_pc() callbacks at basic block entries - these are opaque 
function calls that don't participate in value computation, so GCC can still see the loop iterates 
over a const array and fold it.

 > I think a 'noinline' in skb_extensions_init() would address any
 > complains on patch 2/2

Yes, will add "noinline" to be on a safe side.

--
Konstantin Khorenko

^ permalink raw reply

* [PATCH v2 1/2] net: fix skb_ext_total_length() BUILD_BUG_ON with CONFIG_GCOV_PROFILE_ALL
From: Konstantin Khorenko @ 2026-04-09 21:47 UTC (permalink / raw)
  To: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Thomas Weißschuh, Arnd Bergmann,
	Peter Oberparleiter, Mikhail Zaslonko, netdev, linux-kernel,
	Pavel Tikhomirov, Vasileios Almpanis, Konstantin Khorenko
In-Reply-To: <20260409214736.2651198-1-khorenko@virtuozzo.com>

When CONFIG_GCOV_PROFILE_ALL=y is enabled, the kernel fails to build:

  In file included from <command-line>:
  In function 'skb_extensions_init',
      inlined from 'skb_init' at net/core/skbuff.c:5214:2:
  ././include/linux/compiler_types.h:706:45: error: call to
    '__compiletime_assert_1490' declared with attribute error:
    BUILD_BUG_ON failed: skb_ext_total_length() > 255

CONFIG_GCOV_PROFILE_ALL adds -fprofile-arcs -ftest-coverage
-fno-tree-loop-im to CFLAGS globally. GCC inserts branch profiling
counters into the skb_ext_total_length() loop and, combined with
-fno-tree-loop-im (which disables loop invariant motion), cannot
constant-fold the result.
BUILD_BUG_ON requires a compile-time constant and fails.

The issue manifests in kernels with 5+ SKB extension types enabled
(e.g., after addition of SKB_EXT_CAN, SKB_EXT_PSP). With 4 extensions
GCC can still unroll and fold the loop despite GCOV instrumentation;
with 5+ it gives up.

Mark skb_ext_total_length() with __no_profile to prevent GCOV from
inserting counters into this function. Without counters the loop is
"clean" and GCC can constant-fold it even with -fno-tree-loop-im active.
This allows BUILD_BUG_ON to work correctly while keeping GCOV profiling
for the rest of the kernel.

This also removes the CONFIG_KCOV_INSTRUMENT_ALL preprocessor guard
introduced by d6e5794b06c0. That guard was added as a precaution because
KCOV instrumentation was also suspected of inhibiting constant folding.
However, KCOV uses -fsanitize-coverage=trace-pc, which inserts
lightweight trace callbacks that do not interfere with GCC's constant
folding or loop optimization passes. Only GCOV's -fprofile-arcs combined
with -fno-tree-loop-im actually prevents the compiler from evaluating
the loop at compile time. The guard is therefore unnecessary and can be
safely removed.

Fixes: 96ea3a1e2d31 ("can: add CAN skb extension infrastructure")
Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
Reviewed-by: Thomas Weissschuh <linux@weissschuh.net>
---
 net/core/skbuff.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 43ee86dcf2ea..59fb4b2bb821 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -5142,7 +5142,7 @@ static const u8 skb_ext_type_len[] = {
 #endif
 };

-static __always_inline unsigned int skb_ext_total_length(void)
+static __always_inline __no_profile unsigned int skb_ext_total_length(void)
 {
 	unsigned int l = SKB_EXT_CHUNKSIZEOF(struct skb_ext);
 	int i;
@@ -5156,9 +5156,7 @@ static __always_inline unsigned int skb_ext_total_length(void)
 static void skb_extensions_init(void)
 {
 	BUILD_BUG_ON(SKB_EXT_NUM > 8);
-#if !IS_ENABLED(CONFIG_KCOV_INSTRUMENT_ALL)
 	BUILD_BUG_ON(skb_ext_total_length() > 255);
-#endif

 	skbuff_ext_cache = kmem_cache_create("skbuff_ext_cache",
 					     SKB_EXT_ALIGN_VALUE * skb_ext_total_length(),
-- 
2.43.5

^ permalink raw reply related

* [PATCH v2 0/2] net: fix skb_ext BUILD_BUG_ON failures with GCOV
From: Konstantin Khorenko @ 2026-04-09 21:47 UTC (permalink / raw)
  To: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Thomas Weißschuh, Arnd Bergmann,
	Peter Oberparleiter, Mikhail Zaslonko, netdev, linux-kernel,
	Pavel Tikhomirov, Vasileios Almpanis, Konstantin Khorenko
In-Reply-To: <20260402140558.1437002-1-khorenko@virtuozzo.com>

This mini-series fixes build failures in net/core/skbuff.c when the
kernel is built with CONFIG_GCOV_PROFILE_ALL=y.

This is part of a larger effort to add -fprofile-update=atomic to
global CFLAGS_GCOV (posted earlier as a combined series) [1].

The companion patch:
 - gcov: add -fprofile-update=atomic globally (sent to gcov/kbuild
   maintainers, depends on this series) [2]

There was another companion patch here, but i appeared to be already
applied recently: 8b72aa5704c7 ("iommupt/amdv1: mark
amdv1pt_install_leaf_entry as __always_inline").

Patch 1/2 fixes a pre-existing build failure with CONFIG_GCOV_PROFILE_ALL:
GCOV counters prevent GCC from constant-folding the skb_ext_total_length()
loop.  It also removes the CONFIG_KCOV_INSTRUMENT_ALL preprocessor guard
from d6e5794b06c0: that guard was a precaution in case KCOV instrumentation
also prevented constant folding, but KCOV's -fsanitize-coverage=trace-pc
does not interfere with GCC's constant folding (verified experimentally
with GCC 14.2 and GCC 16.0.1), so the guard is unnecessary.

Patch 2/2 is an additional fix needed when -fprofile-update=atomic is
added to CFLAGS_GCOV: __no_profile on the __always_inline function alone
is insufficient because after inlining, the code resides in the caller's
profiled body.  The caller (skb_extensions_init) needs __no_profile and
noinline to prevent re-exposure to GCOV instrumentation.

Changes v1 -> v2:
 - Patch 1/2: expanded the commit message to explain why removing the
   CONFIG_KCOV_INSTRUMENT_ALL guard is safe (KCOV's
   -fsanitize-coverage=trace-pc does not inhibit constant folding,
   unlike GCOV's -fprofile-arcs + -fno-tree-loop-im).
   Changed Fixes tag to point at the commit that introduced the 5th
   SKB extension type which triggered the failure.
   Removed empty lines in tags area.

 - Patch 2/2: added noinline to skb_extensions_init() to prevent
   the compiler from inlining it into skb_init(), which would
   re-expose the function body to GCOV instrumentation.

v1: https://lore.kernel.org/lkml/20260402140558.1437002-1-khorenko@virtuozzo.com/T/#t

[1] https://lore.kernel.org/lkml/20260401142020.1434243-1-khorenko@virtuozzo.com/T/#t
[2] https://lore.kernel.org/lkml/20260402141831.1437357-1-khorenko@virtuozzo.com/T/#t

Tested with:
 - GCC 14.2.1, CONFIG_GCOV_PROFILE_ALL=y
 - GCC 14.2.1, CONFIG_KCOV_INSTRUMENT_ALL=y (GCOV disabled)
 - GCC 16.0.1 20260327 (experimental), CONFIG_GCOV_PROFILE_ALL=y

Konstantin Khorenko (2):
  net: fix skb_ext_total_length() BUILD_BUG_ON with
    CONFIG_GCOV_PROFILE_ALL
  net: add noinline __no_profile to skb_extensions_init() for GCOV
    compatibility

 net/core/skbuff.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

-- 
2.43.5

^ permalink raw reply

* [PATCH v2 2/2] net: add noinline __no_profile to skb_extensions_init() for GCOV compatibility
From: Konstantin Khorenko @ 2026-04-09 21:47 UTC (permalink / raw)
  To: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Thomas Weißschuh, Arnd Bergmann,
	Peter Oberparleiter, Mikhail Zaslonko, netdev, linux-kernel,
	Pavel Tikhomirov, Vasileios Almpanis, Konstantin Khorenko
In-Reply-To: <20260409214736.2651198-1-khorenko@virtuozzo.com>

With -fprofile-update=atomic in global CFLAGS_GCOV, GCC still cannot
constant-fold the skb_ext_total_length() loop when it is inlined into a
profiled caller.  The existing __no_profile on skb_ext_total_length()
itself is insufficient because after __always_inline expansion the code
resides in the caller's body, which still carries GCOV instrumentation.

Mark skb_extensions_init() with __no_profile so the BUILD_BUG_ON checks
can be evaluated at compile time.  Also mark it noinline to prevent the
compiler from inlining it into skb_init() (which lacks __no_profile),
which would re-expose the function body to GCOV instrumentation.

Build-tested with both CONFIG_GCOV_PROFILE_ALL=y and
CONFIG_KCOV_INSTRUMENT_ALL=y.

Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
---
 net/core/skbuff.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 59fb4b2bb821..8d75e352e3b1 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -5153,7 +5153,7 @@ static __always_inline __no_profile unsigned int skb_ext_total_length(void)
 	return l;
 }

-static void skb_extensions_init(void)
+static noinline void __no_profile skb_extensions_init(void)
 {
 	BUILD_BUG_ON(SKB_EXT_NUM > 8);
 	BUILD_BUG_ON(skb_ext_total_length() > 255);
-- 
2.43.5

^ permalink raw reply related

* [PATCH v2 net-next 00/15] net/sched: prepare RTNL removal from qdisc dumps
From: Eric Dumazet @ 2026-04-09 21:48 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Jamal Hadi Salim, Jiri Pirko, Kuniyuki Iwashima,
	netdev, eric.dumazet, Eric Dumazet

We add annotations for data-races, so that most dump methods
can run in parallel with data path.

Then change mq and mqprio to no longer acquire each children
qdisc spinlock.

Next round of patches will wait for linux-7.2.

Eric Dumazet (15):
  net/sched: rename qstats_overlimit_inc() to qstats_cpu_overlimit_inc()
  net/sched: add qstats_cpu_drop_inc() helper
  net/sched: add READ_ONCE() in gnet_stats_add_queue[_cpu]
  net/sched: add qdisc_qlen_inc() and qdisc_qlen_dec()
  net/sched: annotate data-races around sch->qstats.backlog
  net/sched: sch_sfb: annotate data-races in sfb_dump_stats()
  net/sched: sch_red: annotate data-races in red_dump_stats()
  net/sched: sch_fq_codel: remove data-races from fq_codel_dump_stats()
  net/sched: sch_pie: annotate data-races in pie_dump_stats()
  net/sched: sch_fq_pie: annotate data-races in fq_pie_dump_stats()
  net_sched: sch_hhf: annotate data-races in hhf_dump_stats()
  net/sched: sch_choke: annotate data-races in choke_dump_stats()
  net/sched: sch_cake: annotate data-races in cake_dump_stats()
  net/sched: mq: no longer acquire qdisc spinlocks in dump operations
  net/sched: taprio: prepare taprio_dump() for RTNL removal

 include/net/act_api.h     |   4 +-
 include/net/gen_stats.h   |   9 +-
 include/net/pie.h         |   2 +-
 include/net/sch_generic.h |  60 +++++--
 net/core/gen_estimator.c  |  24 ++-
 net/core/gen_stats.c      |  37 ++--
 net/sched/act_api.c       |   2 +-
 net/sched/act_bpf.c       |   2 +-
 net/sched/act_ife.c       |  12 +-
 net/sched/act_mpls.c      |   2 +-
 net/sched/act_police.c    |   4 +-
 net/sched/act_skbedit.c   |   2 +-
 net/sched/act_skbmod.c    |   2 +-
 net/sched/sch_api.c       |   4 +-
 net/sched/sch_cake.c      | 353 +++++++++++++++++++++-----------------
 net/sched/sch_cbs.c       |   6 +-
 net/sched/sch_choke.c     |  34 ++--
 net/sched/sch_codel.c     |   2 +-
 net/sched/sch_drr.c       |   6 +-
 net/sched/sch_dualpi2.c   |   4 +-
 net/sched/sch_etf.c       |   8 +-
 net/sched/sch_ets.c       |   6 +-
 net/sched/sch_fq.c        |   4 +-
 net/sched/sch_fq_codel.c  |  14 +-
 net/sched/sch_fq_pie.c    |  27 +--
 net/sched/sch_generic.c   |   8 +-
 net/sched/sch_hfsc.c      |   6 +-
 net/sched/sch_hhf.c       |  26 +--
 net/sched/sch_htb.c       |   6 +-
 net/sched/sch_mq.c        |  34 ++--
 net/sched/sch_mqprio.c    |  79 +++++----
 net/sched/sch_multiq.c    |   4 +-
 net/sched/sch_netem.c     |  12 +-
 net/sched/sch_pie.c       |  38 ++--
 net/sched/sch_prio.c      |   6 +-
 net/sched/sch_qfq.c       |   8 +-
 net/sched/sch_red.c       |  37 ++--
 net/sched/sch_sfb.c       |  54 +++---
 net/sched/sch_sfq.c       |   9 +-
 net/sched/sch_skbprio.c   |   4 +-
 net/sched/sch_taprio.c    |  46 ++---
 net/sched/sch_tbf.c       |  10 +-
 net/sched/sch_teql.c      |   2 +-
 43 files changed, 569 insertions(+), 450 deletions(-)

-- 
2.53.0.1213.gd9a14994de-goog


^ permalink raw reply

* [PATCH v2 net-next 01/15] net/sched: rename qstats_overlimit_inc() to qstats_cpu_overlimit_inc()
From: Eric Dumazet @ 2026-04-09 21:49 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Jamal Hadi Salim, Jiri Pirko, Kuniyuki Iwashima,
	netdev, eric.dumazet, Eric Dumazet
In-Reply-To: <20260409214914.3072827-1-edumazet@google.com>

qstats_overlimit_inc() is only used to increment per cpu overlimits.

It can use this_cpu_inc() to avoid this_cpu_ptr() extra cost
and avoid potential store tearing.

Change qstats_overlimit_inc() name and its argument type.

Also add a WRITE_ONCE() in qdisc_qstats_overlimit() to prevent
store tearing.

$ scripts/bloat-o-meter -t vmlinux.0 vmlinux.1
add/remove: 0/0 grow/shrink: 0/5 up/down: 0/-72 (-72)
Function                                     old     new   delta
tcf_skbmod_act                               772     764      -8
tcf_police_act                               733     725      -8
tcf_mirred_to_dev                           1126    1114     -12
tcf_ife_act                                 1077    1061     -16
tcf_mirred_act                              1324    1296     -28
Total: Before=29610901, After=29610829, chg -0.00%

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/act_api.h     | 2 +-
 include/net/sch_generic.h | 6 +++---
 net/sched/act_ife.c       | 4 ++--
 net/sched/act_police.c    | 2 +-
 net/sched/act_skbmod.c    | 2 +-
 5 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/include/net/act_api.h b/include/net/act_api.h
index d11b791079302f50c47e174979767e0b24afc59a..2ec4ef9a5d0c8e9110f92f135cc3c31a38af0479 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -250,7 +250,7 @@ static inline void tcf_action_inc_drop_qstats(struct tc_action *a)
 static inline void tcf_action_inc_overlimit_qstats(struct tc_action *a)
 {
 	if (likely(a->cpu_qstats)) {
-		qstats_overlimit_inc(this_cpu_ptr(a->cpu_qstats));
+		qstats_cpu_overlimit_inc(a->cpu_qstats);
 		return;
 	}
 	atomic_inc(&a->tcfa_overlimits);
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 5af262ec4bbd2d5021904df127a849e52c26178a..3ee383c6fc3f66f1aecd9ebc675fbd143852c150 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -1004,9 +1004,9 @@ static inline void qstats_drop_inc(struct gnet_stats_queue *qstats)
 	qstats->drops++;
 }
 
-static inline void qstats_overlimit_inc(struct gnet_stats_queue *qstats)
+static inline void qstats_cpu_overlimit_inc(struct gnet_stats_queue __percpu *qstats)
 {
-	qstats->overlimits++;
+	this_cpu_inc(qstats->overlimits);
 }
 
 static inline void qdisc_qstats_drop(struct Qdisc *sch)
@@ -1021,7 +1021,7 @@ static inline void qdisc_qstats_cpu_drop(struct Qdisc *sch)
 
 static inline void qdisc_qstats_overlimit(struct Qdisc *sch)
 {
-	sch->qstats.overlimits++;
+	WRITE_ONCE(sch->qstats.overlimits, sch->qstats.overlimits + 1);
 }
 
 static inline int qdisc_qstats_copy(struct gnet_dump *d, struct Qdisc *sch)
diff --git a/net/sched/act_ife.c b/net/sched/act_ife.c
index d5e8a91bb4eb9f1f1f084e199b5ada4e7f7e7205..e1b825e14900d6f46bbfd1b7f72ab6cd554d8a73 100644
--- a/net/sched/act_ife.c
+++ b/net/sched/act_ife.c
@@ -750,7 +750,7 @@ static int tcf_ife_decode(struct sk_buff *skb, const struct tc_action *a,
 			 */
 			pr_info_ratelimited("Unknown metaid %d dlen %d\n",
 					    mtype, dlen);
-			qstats_overlimit_inc(this_cpu_ptr(ife->common.cpu_qstats));
+			qstats_cpu_overlimit_inc(ife->common.cpu_qstats);
 		}
 	}
 
@@ -814,7 +814,7 @@ static int tcf_ife_encode(struct sk_buff *skb, const struct tc_action *a,
 		/* abuse overlimits to count when we allow packet
 		 * with no metadata
 		 */
-		qstats_overlimit_inc(this_cpu_ptr(ife->common.cpu_qstats));
+		qstats_cpu_overlimit_inc(ife->common.cpu_qstats);
 		return action;
 	}
 	/* could be stupid policy setup or mtu config
diff --git a/net/sched/act_police.c b/net/sched/act_police.c
index 12ea9e5a600536b603ea73cc99b4c00381287219..8060f43e4d11c0a26e1475db06b76426f50c5975 100644
--- a/net/sched/act_police.c
+++ b/net/sched/act_police.c
@@ -307,7 +307,7 @@ TC_INDIRECT_SCOPE int tcf_police_act(struct sk_buff *skb,
 	}
 
 inc_overlimits:
-	qstats_overlimit_inc(this_cpu_ptr(police->common.cpu_qstats));
+	qstats_cpu_overlimit_inc(police->common.cpu_qstats);
 inc_drops:
 	if (ret == TC_ACT_SHOT)
 		qstats_drop_inc(this_cpu_ptr(police->common.cpu_qstats));
diff --git a/net/sched/act_skbmod.c b/net/sched/act_skbmod.c
index 23ca46138f040d38de37684439873921bc9c86af..a464b0a3c1b81dba6c28c1141aa38c5c7cad3acb 100644
--- a/net/sched/act_skbmod.c
+++ b/net/sched/act_skbmod.c
@@ -87,7 +87,7 @@ TC_INDIRECT_SCOPE int tcf_skbmod_act(struct sk_buff *skb,
 	return p->action;
 
 drop:
-	qstats_overlimit_inc(this_cpu_ptr(d->common.cpu_qstats));
+	qstats_cpu_overlimit_inc(d->common.cpu_qstats);
 	return TC_ACT_SHOT;
 }
 
-- 
2.53.0.1213.gd9a14994de-goog


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox