* RE: [net-next,v3,04/13] net: renesas: rswitch: add register definitions for vlan support
From: Michael Dege @ 2026-04-09 13:43 UTC (permalink / raw)
To: Paolo Abeni
Cc: paul@pbarker.dev, niklas.soderlund, kuba@kernel.org,
Yoshihiro Shimoda, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org, edumazet@google.com,
linux-renesas-soc@vger.kernel.org, andrew+netdev@lunn.ch,
davem@davemloft.net
In-Reply-To: <20260402125919.234448-1-pabeni@redhat.com>
Hello Paolo,
Thank you for the review comments.
> -----Original Message-----
> From: Paolo Abeni <pabeni@redhat.com>
> Sent: Thursday, April 2, 2026 2:59 PM
> To: Michael Dege <michael.dege@renesas.com>
> Cc: paul@pbarker.dev; niklas.soderlund <niklas.soderlund@ragnatech.se>; kuba@kernel.org; Yoshihiro
> Shimoda <yoshihiro.shimoda.uh@renesas.com>; netdev@vger.kernel.org; linux-kernel@vger.kernel.org;
> edumazet@google.com; pabeni@redhat.com; linux-renesas-soc@vger.kernel.org; andrew+netdev@lunn.ch;
> davem@davemloft.net
> Subject: Re: [net-next,v3,04/13] net: renesas: rswitch: add register definitions for vlan support
>
> This is an AI-generated review of your patch. The human sending this email has considered the AI
> review valid, or at least plausible.
>
> For local reproduction steps see
> https://netdev-ai.bots.linux.dev/ai-
> local.html&data=05%7C02%7Cmichael.dege%40renesas.com%7Caf9485ccd5524c5f84e908de90b7afa0%7C53d82571da19
> 47e49cb4625a166a4a2a%7C0%7C0%7C639107315768063102%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYi
> OiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=K9VAYytiJjTcdyrv98
> jVvkUrYLDTdRFL3HPzVLwEm1s%3D&reserved=0
> ---
> net: renesas: rswitch: add register definitions for vlan support
>
> This commit adds missing register and bit definitions for vlan support in the Renesas R-Switch driver.
>
> > diff --git a/drivers/net/ethernet/renesas/rswitch.h
> > b/drivers/net/ethernet/renesas/rswitch.h
> > index 340524d995ac..15d3fa48b0e3 100644
> > --- a/drivers/net/ethernet/renesas/rswitch.h
> > +++ b/drivers/net/ethernet/renesas/rswitch.h
>
> [ ... ]
>
> > @@ -847,6 +903,98 @@ enum rswitch_gwca_mode {
> > #define FWMACAGC_MACAGOG BIT(28)
> > #define FWMACAGC_MACDESOG BIT(29)
> >
> > +//FWMACTL0
>
> [ ... ]
>
> > +// FWVLANTEM
> > +#define VLANTUEN GENMASK(28, 16)
> > +#define VLANTEN GENAMSK(12, 0)
> ^^^^^^^^
>
> Is GENAMSK a typo? The correct macro is GENMASK, as used on the line immediately above for VLANTUEN.
> GENAMSK is not defined in the kernel and will cause compilation errors if VLANTEN is used.
Yes, this was a typo. Will be fixed in the next version
Best regards,
Michael
>
> > +
> > +// FWVLANTS
>
> [ ... ]
^ permalink raw reply
* RE: [PATCH net-next v3 04/13] net: renesas: rswitch: add register definitions for vlan support
From: Michael Dege @ 2026-04-09 13:45 UTC (permalink / raw)
To: Paolo Abeni, Yoshihiro Shimoda, Andrew Lunn, David S. Miller,
Eric Dumazet, Jakub Kicinski, niklas.soderlund, Paul Barker
Cc: netdev@vger.kernel.org, linux-renesas-soc@vger.kernel.org,
linux-kernel@vger.kernel.org
In-Reply-To: <9c101981-4a85-48c3-ad0c-c16a362929eb@redhat.com>
Hello Paolo,
Thank you for your comment.
> -----Original Message-----
> From: Paolo Abeni <pabeni@redhat.com>
> Sent: Thursday, April 2, 2026 3:03 PM
> To: Michael Dege <michael.dege@renesas.com>; Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>;
> Andrew Lunn <andrew+netdev@lunn.ch>; David S. Miller <davem@davemloft.net>; Eric Dumazet
> <edumazet@google.com>; Jakub Kicinski <kuba@kernel.org>; niklas.soderlund
> <niklas.soderlund@ragnatech.se>; Paul Barker <paul@pbarker.dev>
> Cc: netdev@vger.kernel.org; linux-renesas-soc@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH net-next v3 04/13] net: renesas: rswitch: add register definitions for vlan
> support
>
> On 3/31/26 12:03 PM, Michael Dege wrote:
> @@ -847,6 +903,98 @@ enum rswitch_gwca_mode {
> > #define FWMACAGC_MACAGOG BIT(28)
> > #define FWMACAGC_MACDESOG BIT(29)
> >
> > +//FWMACTL0
>
> Please always use /* */ for comments
Unfortunately ,I missed this. Will be fixed in the next version.
Best regards,
Michael
>
> /P
^ permalink raw reply
* Re: [PATCH RFC net-next 0/4] improve hw flow offload byte accounting
From: Pablo Neira Ayuso @ 2026-04-09 13:52 UTC (permalink / raw)
To: Daniel Golle
Cc: Felix Fietkau, John Crispin, Lorenzo Bianconi, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Matthias Brugger, AngeloGioacchino Del Regno, Simon Horman,
Florian Westphal, Phil Sutter, netdev, linux-kernel,
linux-arm-kernel, linux-mediatek, netfilter-devel, coreteam
In-Reply-To: <cover.1775739840.git.daniel@makrotopia.org>
On Thu, Apr 09, 2026 at 02:07:22PM +0100, Daniel Golle wrote:
> Hardware flow counters report raw byte counts whose semantics
> vary by vendor -- some count ingress L2 frames, others egress
> L2, others L3. The nf_flow_table framework currently passes
> these bytes straight to conntrack without conversion, and
> sub-interfaces (VLAN, PPPoE) that are bypassed by hw offload
> never see any counter updates at all.
I see, but that is part of the feature itself? Why pretend that these
interface are really seeing traffic while they don't. This aspiration
of trying to do all hardware offload fully transparent (when it is not
the case, not mentioning semantic changes in how packet handling is
done compared to the software plane) does not sound convincing to me.
On top of this, this issue also exists in the software plane: Devices
that are bypasses do not get their counters bumped.
Maybe if this is really a requirement, then this should address the
issue for software too, but is it worth the effort to add
infrastructure for this purpose?
> This series lets drivers declare what their counters represent,
> so the framework can normalize to L3 for conntrack and
> propagate per-layer stats to encap sub-interfaces.
>
> Questions:
> - Sub-interface stats accesses vlan_dev_priv() directly --
> should there be a generic netdev callback instead?
> - Are there hw offload drivers whose counters do not fit the
> ingress-L2 / egress-L2 / L3 model?
>
> Daniel Golle (4):
> net: flow_offload: let drivers report byte counter semantics
> nf_flow_table: track sub-interface and bridge ifindex in flow tuple
> nf_flow_table: convert hw byte counts and update sub-interface stats
> net: ethernet: mtk_eth_soc: report INGRESS_L2 byte_type in flow stats
>
> .../net/ethernet/mediatek/mtk_ppe_offload.c | 1 +
> include/net/flow_offload.h | 7 +
> include/net/netfilter/nf_flow_table.h | 5 +
> net/netfilter/nf_flow_table_core.c | 2 +
> net/netfilter/nf_flow_table_offload.c | 174 +++++++++++++++++-
> net/netfilter/nf_flow_table_path.c | 8 +
> 6 files changed, 195 insertions(+), 2 deletions(-)
>
> --
> 2.53.0
^ permalink raw reply
* Re: [PATCH net-next v11 03/14] net: Add lease info to queue-get response
From: Daniel Borkmann @ 2026-04-09 13:52 UTC (permalink / raw)
To: Jakub Kicinski
Cc: netdev, bpf, davem, razor, pabeni, willemb, sdf, john.fastabend,
martin.lau, jordan, maciej.fijalkowski, magnus.karlsson, dw, toke,
yangzhenze, wangdongdong.6
In-Reply-To: <731d2749-abf6-403c-a1f4-ffe9c8d1e8ad@iogearbox.net>
On 4/9/26 3:43 PM, Daniel Borkmann wrote:
> On 4/9/26 12:12 AM, Jakub Kicinski wrote:
>> On Wed, 8 Apr 2026 11:09:34 +0200 Daniel Borkmann wrote:
>>>>> +void netif_put_rx_queue_lease_locked(struct net_device *orig_dev,
>>>>> + struct net_device *dev)
>>>>> +{
>>>>> + if (orig_dev != dev)
>>>>> + netdev_unlock(dev);
>>>>> +}
>>>>
>>>> Pretty sure I already complained about these ugly helpers.
>>>> I'll try to find the time tomorrow to come up with something better.
>>>
>>> Ok, sounds good. Happy to adapt if you find something better and then I'll
>>> work this into the series, and also integrate the things mentioned in my
>>> cover letter reply (netkit nl dump + additional tests).
>>
>> Hi! How would you feel about something like the following on top?
>>
>> --->8----------
>>
>> net: remove the netif_get_rx_queue_lease_locked() helpers
>>
>> The netif_get_rx_queue_lease_locked() API hides the locking
>> and the descend onto the leased queue. Making the code
>> harder to follow (at least to me). Remove the API and open
>> code the descend a bit. Most of the code now looks like:
>>
>> if (!leased)
>> return __helper(x);
>>
>> hw_rxq = ..
>> netdev_lock(hw_rxq->dev);
>> ret = __helper(x);
>> netdev_unlock(hw_rxq->dev);
>>
>> return ret;
>>
>> Of course if we have more code paths that need the wrapping
>> we may need to revisit. For now, IMHO, having to know what
>> netif_get_rx_queue_lease_locked() does is not worth the 20LoC
>> it saves.
>>
>> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> Thanks for looking into it! That looks good to me. I've also retested that
> it still works.
>
> Maybe small nits could be below to move the netif_is_queue_leasee into the
> netdev_rx_queue.h header since its used outside of core and it might be
> worth to also have the lock assertion in netdev_queue_get_dma_dev.
>
> Do you want me to add your patch on top for a v12 of the series?
>
> Thanks,
> Daniel
>
> include/net/netdev_rx_queue.h | 2 ++
> net/core/dev.h | 1 -
> net/core/netdev_queues.c | 2 ++
> net/xdp/xsk.c | 2 --
> 4 files changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/include/net/netdev_rx_queue.h b/include/net/netdev_rx_queue.h
> index 9415a94d333d..e9f7f098609b 100644
> --- a/include/net/netdev_rx_queue.h
> +++ b/include/net/netdev_rx_queue.h
> @@ -76,6 +76,8 @@ struct netdev_rx_queue *
> __netif_get_rx_queue_lease(struct net_device **dev, unsigned int *rxq,
> enum netif_lease_dir dir);
>
> +bool netif_is_queue_leasee(const struct net_device *dev);
> +
> int netdev_rx_queue_restart(struct net_device *dev, unsigned int rxq);
> void netdev_rx_queue_lease(struct netdev_rx_queue *rxq_dst,
> struct netdev_rx_queue *rxq_src);
> diff --git a/net/core/dev.h b/net/core/dev.h
> index 376bac4a82da..95edb2d4eff8 100644
> --- a/net/core/dev.h
> +++ b/net/core/dev.h
> @@ -101,7 +101,6 @@ int netdev_queue_config_validate(struct net_device *dev, int rxq_idx,
>
> bool netif_rxq_has_mp(struct net_device *dev, unsigned int rxq_idx);
> bool netif_rxq_is_leased(struct net_device *dev, unsigned int rxq_idx);
> -bool netif_is_queue_leasee(const struct net_device *dev);
(Ok, if so then netif_rxq_is_leased would have to move too..
maybe its fine as-is then.)
Thanks,
Daniel
^ permalink raw reply
* Re: [PATCH v2] net/mlx5: Fix OOB access and stack information leak in PTP event handling
From: Carolina Jubran @ 2026-04-09 13:54 UTC (permalink / raw)
To: Prathamesh Deshpande
Cc: leon, linux-kernel, linux-rdma, mbloch, netdev, richardcochran,
saeedm, tariqt
In-Reply-To: <20260402003047.24684-1-prathameshdeshpande7@gmail.com>
Hi Prathamesh, thanks for the pacth!
On 02/04/2026 3:30, Prathamesh Deshpande wrote:
> In mlx5_pps_event(), several critical issues were identified during
> review by Sashiko:
>
> 1. The 'pin' index from the hardware event was used without bounds
> checking to index 'pin_config' and 'pps_info->start', leading to
> potential out-of-bounds memory access.
> 2. 'ptp_event' was not zero-initialized. Since it contains a union,
> assigning a timestamp partially leaves the 'ts_raw' field with
> uninitialized stack memory, which can leak kernel data or
> corrupt time sync logic in hardpps().
> 3. A NULL 'pin_config' could be dereferenced if initialization failed.
> 4. 'clock->ptp' could be NULL if ptp_clock_register() failed.
>
> Fix these by zero-initializing the event struct, adding a bounds
> check against MAX_PIN_NUM, and adding appropriate NULL guards.
>
> Fixes: 7c39afb394c7 ("net/mlx5: PTP code migration to driver core section")
>
> Signed-off-by: Prathamesh Deshpande <prathameshdeshpande7@gmail.com>
> ---
> v2:
> - Zero-initialize ptp_event to prevent stack information leak [Sashiko].
> - Add bounds check for hardware pin index to prevent OOB access [Sashiko].
> - Add NULL guard for pin_config to handle initialization failures [Sashiko].
> - Add NULL check for clock->ptp as originally intended.
>
> drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c | 12 +++++++++---
> 1 file changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c
> index bd4e042077af..a4d8c5c39abc 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c
> @@ -1164,12 +1164,18 @@ static int mlx5_pps_event(struct notifier_block *nb,
> pps_nb);
> struct mlx5_core_dev *mdev = clock_state->mdev;
> struct mlx5_clock *clock = mdev->clock;
> - struct ptp_clock_event ptp_event;
> + struct ptp_clock_event ptp_event = {};
> struct mlx5_eqe *eqe = data;
> int pin = eqe->data.pps.pin;
> unsigned long flags;
> u64 ns;
>
> + if (!clock->ptp_info.pin_config)
> + return NOTIFY_OK;
> +
> + if (pin < 0 || pin >= MAX_PIN_NUM)
> + return NOTIFY_OK;
pin is defined as u8 in struct mlx5_eqe_pps, so pin < 0 is dead code.
As for the upper bound: in order to receive a PPS event on a pin, the
user must first configure it via mlx5_ptp_enable, which already
validates the index (rq->extts.index >= clock->ptp_info.n_pins returns
-EINVAL) and since the mtpps register only defines capabilities for 8
pins, so n_pins cannot exceed MAX_PIN_NUM.
Maybe wrap it with WARN_ON_ONCE instead of silently returning, so if
future hardware adds support for more pins we would notice rather than
silently dropping events.
> +
> switch (clock->ptp_info.pin_config[pin].func) {
> case PTP_PF_EXTTS:
> ptp_event.index = pin;
> @@ -1185,8 +1191,8 @@ static int mlx5_pps_event(struct notifier_block *nb,
> } else {
> ptp_event.type = PTP_CLOCK_EXTTS;
> }
> - /* TODOL clock->ptp can be NULL if ptp_clock_register fails */
> - ptp_clock_event(clock->ptp, &ptp_event);
> + if (clock->ptp)
> + ptp_clock_event(clock->ptp, &ptp_event);
> break;
> case PTP_PF_PEROUT:
> if (clock->shared) {
^ permalink raw reply
* Re: [PATCH v2] net/mlx5: Fix OOB access and stack information leak in PTP event handling
From: Carolina Jubran @ 2026-04-09 13:58 UTC (permalink / raw)
To: Prathamesh Deshpande
Cc: leon, linux-kernel, linux-rdma, mbloch, netdev, richardcochran,
saeedm, tariqt
In-Reply-To: <20260402003047.24684-1-prathameshdeshpande7@gmail.com>
Hi Prathamesh, thanks for the patch!
On 02/04/2026 3:30, Prathamesh Deshpande wrote:
> In mlx5_pps_event(), several critical issues were identified during
> review by Sashiko:
>
> 1. The 'pin' index from the hardware event was used without bounds
> checking to index 'pin_config' and 'pps_info->start', leading to
> potential out-of-bounds memory access.
> 2. 'ptp_event' was not zero-initialized. Since it contains a union,
> assigning a timestamp partially leaves the 'ts_raw' field with
> uninitialized stack memory, which can leak kernel data or
> corrupt time sync logic in hardpps().
> 3. A NULL 'pin_config' could be dereferenced if initialization failed.
> 4. 'clock->ptp' could be NULL if ptp_clock_register() failed.
>
> Fix these by zero-initializing the event struct, adding a bounds
> check against MAX_PIN_NUM, and adding appropriate NULL guards.
>
> Fixes: 7c39afb394c7 ("net/mlx5: PTP code migration to driver core section")
>
> Signed-off-by: Prathamesh Deshpande <prathameshdeshpande7@gmail.com>
> ---
> v2:
> - Zero-initialize ptp_event to prevent stack information leak [Sashiko].
> - Add bounds check for hardware pin index to prevent OOB access [Sashiko].
> - Add NULL guard for pin_config to handle initialization failures [Sashiko].
> - Add NULL check for clock->ptp as originally intended.
>
> drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c | 12 +++++++++---
> 1 file changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c
> index bd4e042077af..a4d8c5c39abc 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c
> @@ -1164,12 +1164,18 @@ static int mlx5_pps_event(struct notifier_block *nb,
> pps_nb);
> struct mlx5_core_dev *mdev = clock_state->mdev;
> struct mlx5_clock *clock = mdev->clock;
> - struct ptp_clock_event ptp_event;
> + struct ptp_clock_event ptp_event = {};
> struct mlx5_eqe *eqe = data;
> int pin = eqe->data.pps.pin;
> unsigned long flags;
> u64 ns;
>
> + if (!clock->ptp_info.pin_config)
> + return NOTIFY_OK;
> +
> + if (pin < 0 || pin >= MAX_PIN_NUM)
> + return NOTIFY_OK;
pin is defined as u8 in struct mlx5_eqe_pps, so pin < 0 is dead code.
As for the upper bound: in order to receive a PPS event on a pin, the
user must
first configure it via mlx5_ptp_enable, which already validates the index
(rq->extts.index >= clock->ptp_info.n_pins returns -EINVAL) and since
the mtpps
register only defines capabilities for 8 pins, so n_pins cannot exceed
MAX_PIN_NUM.
Maybe wrap it with WARN_ON_ONCE instead of silently returning, so if future
hardware adds support for more pins we would notice rather than silently
dropping
events.
> +
> switch (clock->ptp_info.pin_config[pin].func) {
> case PTP_PF_EXTTS:
> ptp_event.index = pin;
> @@ -1185,8 +1191,8 @@ static int mlx5_pps_event(struct notifier_block *nb,
> } else {
> ptp_event.type = PTP_CLOCK_EXTTS;
> }
> - /* TODOL clock->ptp can be NULL if ptp_clock_register fails */
> - ptp_clock_event(clock->ptp, &ptp_event);
> + if (clock->ptp)
> + ptp_clock_event(clock->ptp, &ptp_event);
> break;
> case PTP_PF_PEROUT:
> if (clock->shared) {
^ permalink raw reply
* Re: [PATCH v2] net/mlx5: Fix OOB access and stack information leak in PTP event handling
From: Carolina Jubran @ 2026-04-09 14:07 UTC (permalink / raw)
To: Prathamesh Deshpande
Cc: leon, linux-kernel, linux-rdma, mbloch, netdev, richardcochran,
saeedm, tariqt
In-Reply-To: <20260402003047.24684-1-prathameshdeshpande7@gmail.com>
Hi Prathamesh, thanks for the patch!
On 02/04/2026 3:30, Prathamesh Deshpande wrote:
> In mlx5_pps_event(), several critical issues were identified during
> review by Sashiko:
>
> 1. The 'pin' index from the hardware event was used without bounds
> checking to index 'pin_config' and 'pps_info->start', leading to
> potential out-of-bounds memory access.
> 2. 'ptp_event' was not zero-initialized. Since it contains a union,
> assigning a timestamp partially leaves the 'ts_raw' field with
> uninitialized stack memory, which can leak kernel data or
> corrupt time sync logic in hardpps().
> 3. A NULL 'pin_config' could be dereferenced if initialization failed.
> 4. 'clock->ptp' could be NULL if ptp_clock_register() failed.
>
> Fix these by zero-initializing the event struct, adding a bounds
> check against MAX_PIN_NUM, and adding appropriate NULL guards.
>
> Fixes: 7c39afb394c7 ("net/mlx5: PTP code migration to driver core section")
>
> Signed-off-by: Prathamesh Deshpande <prathameshdeshpande7@gmail.com>
> ---
> v2:
> - Zero-initialize ptp_event to prevent stack information leak [Sashiko].
> - Add bounds check for hardware pin index to prevent OOB access [Sashiko].
> - Add NULL guard for pin_config to handle initialization failures [Sashiko].
> - Add NULL check for clock->ptp as originally intended.
>
> drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c | 12 +++++++++---
> 1 file changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c
> index bd4e042077af..a4d8c5c39abc 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c
> @@ -1164,12 +1164,18 @@ static int mlx5_pps_event(struct notifier_block *nb,
> pps_nb);
> struct mlx5_core_dev *mdev = clock_state->mdev;
> struct mlx5_clock *clock = mdev->clock;
> - struct ptp_clock_event ptp_event;
> + struct ptp_clock_event ptp_event = {};
> struct mlx5_eqe *eqe = data;
> int pin = eqe->data.pps.pin;
> unsigned long flags;
> u64 ns;
>
> + if (!clock->ptp_info.pin_config)
> + return NOTIFY_OK;
> +
> + if (pin < 0 || pin >= MAX_PIN_NUM)
> + return NOTIFY_OK;
pin is defined as u8 in struct mlx5_eqe_pps, so pin < 0 is dead code.
As for the upper bound: in order to receive a PPS event on a pin, the
user must
first configure it via mlx5_ptp_enable, which already validates the index
(rq->extts.index >= clock->ptp_info.n_pins returns -EINVAL) and since
the mtpps
register only defines capabilities for 8 pins, so n_pins cannot exceed
MAX_PIN_NUM.
Maybe wrap it with WARN_ON_ONCE instead of silently returning, so if future
hardware adds support for more pins we would notice rather than silently
dropping
events.
> +
> switch (clock->ptp_info.pin_config[pin].func) {
> case PTP_PF_EXTTS:
> ptp_event.index = pin;
> @@ -1185,8 +1191,8 @@ static int mlx5_pps_event(struct notifier_block *nb,
> } else {
> ptp_event.type = PTP_CLOCK_EXTTS;
> }
> - /* TODOL clock->ptp can be NULL if ptp_clock_register fails */
> - ptp_clock_event(clock->ptp, &ptp_event);
> + if (clock->ptp)
> + ptp_clock_event(clock->ptp, &ptp_event);
> break;
> case PTP_PF_PEROUT:
> if (clock->shared) {
^ permalink raw reply
* Re: [PATCH v2] net/mlx5: Fix OOB access and stack information leak in PTP event handling
From: Carolina Jubran @ 2026-04-09 14:10 UTC (permalink / raw)
To: Prathamesh Deshpande
Cc: leon, linux-kernel, linux-rdma, mbloch, netdev, richardcochran,
saeedm, tariqt
In-Reply-To: <20260402003047.24684-1-prathameshdeshpande7@gmail.com>
Hi Prathamesh, thanks for the patch!
On 02/04/2026 3:30, Prathamesh Deshpande wrote:
> In mlx5_pps_event(), several critical issues were identified during
> review by Sashiko:
>
> 1. The 'pin' index from the hardware event was used without bounds
> checking to index 'pin_config' and 'pps_info->start', leading to
> potential out-of-bounds memory access.
> 2. 'ptp_event' was not zero-initialized. Since it contains a union,
> assigning a timestamp partially leaves the 'ts_raw' field with
> uninitialized stack memory, which can leak kernel data or
> corrupt time sync logic in hardpps().
> 3. A NULL 'pin_config' could be dereferenced if initialization failed.
> 4. 'clock->ptp' could be NULL if ptp_clock_register() failed.
>
> Fix these by zero-initializing the event struct, adding a bounds
> check against MAX_PIN_NUM, and adding appropriate NULL guards.
>
> Fixes: 7c39afb394c7 ("net/mlx5: PTP code migration to driver core section")
>
> Signed-off-by: Prathamesh Deshpande <prathameshdeshpande7@gmail.com>
> ---
> v2:
> - Zero-initialize ptp_event to prevent stack information leak [Sashiko].
> - Add bounds check for hardware pin index to prevent OOB access [Sashiko].
> - Add NULL guard for pin_config to handle initialization failures [Sashiko].
> - Add NULL check for clock->ptp as originally intended.
>
> drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c | 12 +++++++++---
> 1 file changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c
> index bd4e042077af..a4d8c5c39abc 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c
> @@ -1164,12 +1164,18 @@ static int mlx5_pps_event(struct notifier_block *nb,
> pps_nb);
> struct mlx5_core_dev *mdev = clock_state->mdev;
> struct mlx5_clock *clock = mdev->clock;
> - struct ptp_clock_event ptp_event;
> + struct ptp_clock_event ptp_event = {};
> struct mlx5_eqe *eqe = data;
> int pin = eqe->data.pps.pin;
> unsigned long flags;
> u64 ns;
>
> + if (!clock->ptp_info.pin_config)
> + return NOTIFY_OK;
> +
> + if (pin < 0 || pin >= MAX_PIN_NUM)
> + return NOTIFY_OK;
pin is defined as u8 in struct mlx5_eqe_pps, so pin < 0 is dead code.
As for the upper bound: in order to receive a PPS event on a pin, the
user must
first configure it via mlx5_ptp_enable, which already validates the index
(rq->extts.index >= clock->ptp_info.n_pins returns -EINVAL) and since
the mtpps
register only defines capabilities for 8 pins, so n_pins cannot exceed
MAX_PIN_NUM.
Maybe wrap it with WARN_ON_ONCE instead of silently returning, so if future
hardware adds support for more pins we would notice rather than silently
dropping
events.
> +
> switch (clock->ptp_info.pin_config[pin].func) {
> case PTP_PF_EXTTS:
> ptp_event.index = pin;
> @@ -1185,8 +1191,8 @@ static int mlx5_pps_event(struct notifier_block *nb,
> } else {
> ptp_event.type = PTP_CLOCK_EXTTS;
> }
> - /* TODOL clock->ptp can be NULL if ptp_clock_register fails */
> - ptp_clock_event(clock->ptp, &ptp_event);
> + if (clock->ptp)
> + ptp_clock_event(clock->ptp, &ptp_event);
> break;
> case PTP_PF_PEROUT:
> if (clock->shared) {
^ permalink raw reply
* Re: [PATCH v2] net/mlx5: Fix OOB access and stack information leak in PTP event handling
From: Carolina Jubran @ 2026-04-09 14:16 UTC (permalink / raw)
To: Prathamesh Deshpande
Cc: leon, linux-kernel, linux-rdma, mbloch, netdev, richardcochran,
saeedm, tariqt
In-Reply-To: <20260402003047.24684-1-prathameshdeshpande7@gmail.com>
Hi Prathamesh, thanks for the patch!
On 02/04/2026 3:30, Prathamesh Deshpande wrote:
> In mlx5_pps_event(), several critical issues were identified during
> review by Sashiko:
>
> 1. The 'pin' index from the hardware event was used without bounds
> checking to index 'pin_config' and 'pps_info->start', leading to
> potential out-of-bounds memory access.
> 2. 'ptp_event' was not zero-initialized. Since it contains a union,
> assigning a timestamp partially leaves the 'ts_raw' field with
> uninitialized stack memory, which can leak kernel data or
> corrupt time sync logic in hardpps().
> 3. A NULL 'pin_config' could be dereferenced if initialization failed.
> 4. 'clock->ptp' could be NULL if ptp_clock_register() failed.
>
> Fix these by zero-initializing the event struct, adding a bounds
> check against MAX_PIN_NUM, and adding appropriate NULL guards.
>
> Fixes: 7c39afb394c7 ("net/mlx5: PTP code migration to driver core section")
>
> Signed-off-by: Prathamesh Deshpande <prathameshdeshpande7@gmail.com>
> ---
> v2:
> - Zero-initialize ptp_event to prevent stack information leak [Sashiko].
> - Add bounds check for hardware pin index to prevent OOB access [Sashiko].
> - Add NULL guard for pin_config to handle initialization failures [Sashiko].
> - Add NULL check for clock->ptp as originally intended.
>
> drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c | 12 +++++++++---
> 1 file changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c
> index bd4e042077af..a4d8c5c39abc 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c
> @@ -1164,12 +1164,18 @@ static int mlx5_pps_event(struct notifier_block *nb,
> pps_nb);
> struct mlx5_core_dev *mdev = clock_state->mdev;
> struct mlx5_clock *clock = mdev->clock;
> - struct ptp_clock_event ptp_event;
> + struct ptp_clock_event ptp_event = {};
> struct mlx5_eqe *eqe = data;
> int pin = eqe->data.pps.pin;
> unsigned long flags;
> u64 ns;
>
> + if (!clock->ptp_info.pin_config)
> + return NOTIFY_OK;
> +
> + if (pin < 0 || pin >= MAX_PIN_NUM)
> + return NOTIFY_OK;
pin is defined as u8 in struct mlx5_eqe_pps, so pin < 0 is dead code.
As for the upper bound: in order to receive a PPS event on a pin, the
user must
first configure it via mlx5_ptp_enable, which already validates the index
(rq->extts.index >= clock->ptp_info.n_pins returns -EINVAL) and since
the mtpps
register only defines capabilities for 8 pins, so n_pins cannot exceed
MAX_PIN_NUM.
Maybe wrap it with WARN_ON_ONCE instead of silently returning, so if future
hardware adds support for more pins we would notice rather than silently
dropping
events.
> +
> switch (clock->ptp_info.pin_config[pin].func) {
> case PTP_PF_EXTTS:
> ptp_event.index = pin;
> @@ -1185,8 +1191,8 @@ static int mlx5_pps_event(struct notifier_block *nb,
> } else {
> ptp_event.type = PTP_CLOCK_EXTTS;
> }
> - /* TODOL clock->ptp can be NULL if ptp_clock_register fails */
> - ptp_clock_event(clock->ptp, &ptp_event);
> + if (clock->ptp)
> + ptp_clock_event(clock->ptp, &ptp_event);
> break;
> case PTP_PF_PEROUT:
> if (clock->shared) {
^ permalink raw reply
* Re: [PATCH net v2] net: phy: fix a return path in get_phy_c45_ids()
From: Russell King (Oracle) @ 2026-04-09 14:18 UTC (permalink / raw)
To: Charles Perry
Cc: netdev, Andrew Lunn, Heiner Kallweit, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Florian Fainelli,
linux-kernel
In-Reply-To: <20260409133654.3203336-1-charles.perry@microchip.com>
On Thu, Apr 09, 2026 at 06:36:54AM -0700, Charles Perry wrote:
> The return value of phy_c45_probe_present() is stored in "ret", not
> "phy_reg", fix this. "phy_reg" always has a positive value if we reach
> this return path (since it would have returned earlier otherwise), which
> means that the original goal of the patch of not considering -ENODEV
> fatal wasn't achieved.
>
> Fixes: 17b447539408 ("net: phy: c45 scanning: Don't consider -ENODEV fatal")
> Signed-off-by: Charles Perry <charles.perry@microchip.com>
> Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Thanks!
Note: you don't need to resend just because you've received another r-b.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
^ permalink raw reply
* Re: [PATCH RFC net-next 0/4] improve hw flow offload byte accounting
From: Daniel Golle @ 2026-04-09 14:21 UTC (permalink / raw)
To: Pablo Neira Ayuso
Cc: Felix Fietkau, John Crispin, Lorenzo Bianconi, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Matthias Brugger, AngeloGioacchino Del Regno, Simon Horman,
Florian Westphal, Phil Sutter, netdev, linux-kernel,
linux-arm-kernel, linux-mediatek, netfilter-devel, coreteam
In-Reply-To: <adevKeasLkEB5zZ4@chamomile>
On Thu, Apr 09, 2026 at 03:52:41PM +0200, Pablo Neira Ayuso wrote:
> On Thu, Apr 09, 2026 at 02:07:22PM +0100, Daniel Golle wrote:
> > Hardware flow counters report raw byte counts whose semantics
> > vary by vendor -- some count ingress L2 frames, others egress
> > L2, others L3. The nf_flow_table framework currently passes
> > these bytes straight to conntrack without conversion, and
> > sub-interfaces (VLAN, PPPoE) that are bypassed by hw offload
> > never see any counter updates at all.
>
> I see, but that is part of the feature itself? Why pretend that these
> interface are really seeing traffic while they don't. This aspiration
> of trying to do all hardware offload fully transparent (when it is not
> the case, not mentioning semantic changes in how packet handling is
> done compared to the software plane) does not sound convincing to me.
Please explain what you mean by offloading not being fully
transparent. If the MAC hardware offloads VLAN encap/decap, for
example, we also maintain the counters correctly (it just so happens),
just the flow-offloading case results in a weird overall picture:
hardware interface counters keep increasing, encap interfaces (802.1Q,
PPPoE) don't. That makes it confusing and hard to understand what's
happening when only looking at the interface counters (ie. "what is
all that traffic on my physical WAN interface which isn't PPPoE? Can't
be that all of that is the modems management interface, SNMP, ...")
>
> On top of this, this issue also exists in the software plane: Devices
> that are bypasses do not get their counters bumped.
>
> Maybe if this is really a requirement, then this should address the
> issue for software too, but is it worth the effort to add
> infrastructure for this purpose?
To me it would feel more correct to see counters increasing also
for offloaded traffic on software interfaces such as PPPoE or VLAN.
I honestly didn't think about the software fastpath, and yes, I think
it should be addressed there too.
> > This series lets drivers declare what their counters represent,
> > so the framework can normalize to L3 for conntrack and
> > propagate per-layer stats to encap sub-interfaces.
This part could also been seen as an independent fix as currently
conntrack stats for the same traffic differ in case of software
offloading (pure L3 bytes) and hardware offloading (L2 ingress bytes
in case of mtk_ppe).
^ permalink raw reply
* Re: [PATCH v2] bpf: guard sock_ops rtt_min access with is_locked_tcp_sock
From: Alexei Starovoitov @ 2026-04-09 14:29 UTC (permalink / raw)
To: Werner Kasselman
Cc: Martin KaFai Lau, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, John Fastabend, David S . Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Shuah Khan, bpf@vger.kernel.org,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-kselftest@vger.kernel.org, stable@vger.kernel.org
In-Reply-To: <20260409061026.3926858-1-werner@verivus.com>
On Wed, Apr 8, 2026 at 11:10 PM Werner Kasselman <werner@verivus.ai> wrote:
>
> sock_ops_convert_ctx_access() emits guarded reads for tcp_sock-backed
> bpf_sock_ops fields such as snd_cwnd, srtt_us, snd_ssthresh, rcv_nxt,
> snd_nxt, snd_una, mss_cache, ecn_flags, rate_delivered, and
> rate_interval_us. Those accesses go through SOCK_OPS_GET_TCP_SOCK_FIELD(),
> which checks is_locked_tcp_sock before dereferencing sock_ops.sk.
>
> The rtt_min case is different. Because it reads a subfield of
> struct minmax, it uses a custom open-coded load sequence instead of the
> usual helper macro, and that sequence currently dereferences sock_ops.sk
> without checking is_locked_tcp_sock first.
>
> This is unsafe when sock_ops.sk points to a request_sock-backed object
> instead of a locked full tcp_sock. That is reachable not only from the
> SYNACK header option callbacks, but also from other request_sock-backed
> sock_ops callbacks such as BPF_SOCK_OPS_TIMEOUT_INIT,
> BPF_SOCK_OPS_RWND_INIT, and BPF_SOCK_OPS_NEEDS_ECN. In those cases,
> reading ctx->rtt_min makes the generated code treat a request_sock as a
> tcp_sock and read beyond the end of the request_sock allocation.
>
> Fix the rtt_min conversion by adding the same is_locked_tcp_sock guard
> used for the other tcp_sock field reads. Also make the accessed subfield
> explicit by using offsetof(struct minmax_sample, v).
>
> Add a selftest that verifies request_sock-backed sock_ops callbacks see
> ctx->rtt_min as zero after the fix.
>
> Found via AST-based call-graph analysis using sqry.
>
> Fixes: 44f0e43037d3 ("bpf: Add support for reading sk_state and more")
> Cc: stable@vger.kernel.org
> Signed-off-by: Werner Kasselman <werner@verivus.com>
> ---
> net/core/filter.c | 53 +++++++++++++++----
> .../selftests/bpf/prog_tests/tcpbpf_user.c | 9 ++++
> .../selftests/bpf/progs/test_tcpbpf_kern.c | 21 ++++++++
> tools/testing/selftests/bpf/test_tcpbpf.h | 6 +++
> 4 files changed, 79 insertions(+), 10 deletions(-)
>
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 78b548158..5040bf7e4 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -10827,16 +10827,49 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type,
> case offsetof(struct bpf_sock_ops, rtt_min):
> BUILD_BUG_ON(sizeof_field(struct tcp_sock, rtt_min) !=
> sizeof(struct minmax));
> - BUILD_BUG_ON(sizeof(struct minmax) <
> - sizeof(struct minmax_sample));
> -
> - *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
> - struct bpf_sock_ops_kern, sk),
> - si->dst_reg, si->src_reg,
> - offsetof(struct bpf_sock_ops_kern, sk));
> - *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg,
> - offsetof(struct tcp_sock, rtt_min) +
> - sizeof_field(struct minmax_sample, t));
> + BUILD_BUG_ON(sizeof_field(struct bpf_sock_ops, rtt_min) !=
> + sizeof_field(struct minmax_sample, v));
> + off = offsetof(struct tcp_sock, rtt_min) +
> + offsetof(struct minmax_sample, v);
> +
> + {
> + int fullsock_reg = si->dst_reg, reg = BPF_REG_9, jmp = 2;
> +
please de-claude your patches before posting.
pw-bot: cr
^ permalink raw reply
* [GIT PULL] Networking for v7.0-rc8
From: Paolo Abeni @ 2026-04-09 14:32 UTC (permalink / raw)
To: torvalds; +Cc: kuba, davem, netdev, linux-kernel
Hi Linus!
The following changes since commit f8f5627a8aeab15183eef8930bf75ba88a51622f:
Merge tag 'net-7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net (2026-04-02 09:57:06 -0700)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git net-7.0-rc8
for you to fetch changes up to b4afe3fa76a88ee7d3d8802b43fde89aa02f8e0d:
Merge branch 'net-lan966x-fix-page_pool-error-handling-and-error-paths' (2026-04-09 15:17:25 +0200)
----------------------------------------------------------------
Including fixes from netfilter, IPsec and wireless. This is again
considerably bigger than the old average. No known outstanding
regressions.
Current release - regressions:
- net: increase IP_TUNNEL_RECURSION_LIMIT to 5
- eth: ice: fix PTP timestamping broken by SyncE code on E825C
Current release - new code bugs:
- eth: stmmac: dwmac-motorcomm: fix eFUSE MAC address read failure
Previous releases - regressions:
- core: fix cross-cache free of KFENCE-allocated skb head
- sched: act_csum: validate nested VLAN headers
- rxrpc: fix call removal to use RCU safe deletion
- xfrm:
- wait for RCU readers during policy netns exit
- fix refcount leak in xfrm_migrate_policy_find
- wifi: rt2x00usb: fix devres lifetime
- mptcp: fix slab-use-after-free in __inet_lookup_established
- ipvs: fix NULL deref in ip_vs_add_service error path
- eth: airoha: fix memory leak in airoha_qdma_rx_process()
- eth: lan966x: fix use-after-free and leak in lan966x_fdma_reload()
Previous releases - always broken:
- ipv6: ioam: fix potential NULL dereferences in __ioam6_fill_trace_data()
- ipv4: nexthop: avoid duplicate NHA_HW_STATS_ENABLE on nexthop group dump
- bridge: guard local VLAN-0 FDB helpers against NULL vlan group
- xsk: tailroom reservation and MTU validation
- rxrpc:
- fix to request an ack if window is limited
- fix RESPONSE authenticator parser OOB read
- netfilter: nft_ct: fix use-after-free in timeout object destroy
- batman-adv: hold claim backbone gateways by reference
- eth: stmmac: fix PTP ref clock for Tegra234
- eth: idpf: fix PREEMPT_RT raw/bh spinlock nesting for async VC handling
- eth: ipa: fix GENERIC_CMD register field masks for IPA v5.0+
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
----------------------------------------------------------------
Agalakov Daniil (1):
e1000: check return value of e1000_read_eeprom
Aleksandr Loktionov (1):
ixgbe: stop re-reading flash on every get_drvinfo for e610
Alex Dvoretsky (1):
igb: remove napi_synchronize() in igb_down()
Alexander Koskovich (2):
net: ipa: fix GENERIC_CMD register field masks for IPA v5.0+
net: ipa: fix event ring index not programmed for IPA v5.0+
Alice Mikityanska (1):
l2tp: Drop large packets with UDP encap
Allison Henderson (1):
MAINTAINERS: Update email for Allison Henderson
Alok Tiwari (2):
rxrpc: Fix use of wrong skb when comparing queued RESP challenge serial
rxrpc: Fix rack timer warning to report unexpected mode
Anderson Nascimento (1):
rxrpc: Fix key reference count leak from call->key
Andrea Mayer (2):
seg6: separate dst_cache for input and output paths in seg6 lwtunnel
selftests: seg6: add test for dst_cache isolation in seg6 lwtunnel
Arnd Bergmann (1):
net: fec: make FIXED_PHY dependency unconditional
Chris J Arges (1):
net: increase IP_TUNNEL_RECURSION_LIMIT to 5
Daniel Golle (1):
selftests: net: bridge_vlan_mcast: wait for h1 before querier check
David Carlier (4):
net: altera-tse: fix skb leak on DMA mapping error in tse_start_xmit()
net: lan966x: fix page_pool error handling in lan966x_fdma_rx_alloc_page_pool()
net: lan966x: fix page pool leak in error paths
net: lan966x: fix use-after-free and leak in lan966x_fdma_reload()
David Howells (9):
rxrpc: Fix key quota calculation for multitoken keys
rxrpc: Fix key parsing memleak
rxrpc: Fix anonymous key handling
rxrpc: Fix call removal to use RCU safe deletion
rxrpc: Fix key/keyring checks in setsockopt(RXRPC_SECURITY_KEY/KEYRING)
rxrpc: Fix missing error checks for rxkad encryption/decryption failure
rxrpc: Fix integer overflow in rxgk_verify_response()
rxrpc: Fix leak of rxgk context in rxgk_verify_response()
rxrpc: Fix buffer overread in rxgk_do_verify_authenticator()
Douya Le (1):
rxrpc: Only put the call ref if one was acquired
Emil Tantilov (3):
idpf: fix PREEMPT_RT raw/bh spinlock nesting for async VC handling
idpf: improve locking around idpf_vc_xn_push_free()
idpf: set the payload size before calling the async handler
Eric Dumazet (2):
net: lapbether: handle NETDEV_PRE_TYPE_CHANGE
ipv6: ioam: fix potential NULL dereferences in __ioam6_fill_trace_data()
Fabio Baltieri (1):
net: txgbe: leave space for null terminators on property_entry
Felix Gu (1):
net: mdio: realtek-rtl9300: use scoped device_for_each_child_node loop
Fernando Fernandez Mancera (3):
ipv4: nexthop: avoid duplicate NHA_HW_STATS_ENABLE on nexthop group dump
ipv4: nexthop: allocate skb dynamically in rtm_get_nexthop()
selftests: nft_queue.sh: add a parallel stress test
Florian Westphal (1):
netfilter: nfnetlink_queue: make hash table per queue
Greg Kroah-Hartman (3):
xfrm_user: fix info leak in build_mapping()
xfrm_user: fix info leak in build_report()
net: rfkill: prevent unlimited numbers of rfkill events from being created
Haoze Xie (1):
batman-adv: hold claim backbone gateways by reference
Jakub Kicinski (10):
Merge branch 'net-stmmac-fix-tegra234-mgbe-clock'
Merge branch 'xsk-tailroom-reservation-and-mtu-validation'
net: avoid nul-deref trying to bind mp to incapable device
Merge branch 'seg6-fix-dst_cache-sharing-in-seg6-lwtunnel'
Merge branch 'rxrpc-miscellaneous-fixes'
Merge tag 'nf-26-04-08' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Merge tag 'batadv-net-pullrequest-20260408' of https://git.open-mesh.org/linux-merge
Merge tag 'ipsec-2026-04-08' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Merge tag 'wireless-2026-04-08' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
Merge branch '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Jiayuan Chen (2):
net: skb: fix cross-cache free of KFENCE-allocated skb head
mptcp: fix slab-use-after-free in __inet_lookup_established
Jiexun Wang (1):
af_unix: read UNIX_DIAG_VFS data under unix_state_lock
Johan Alvarado (1):
net: stmmac: dwmac-motorcomm: fix eFUSE MAC address read failure
Johan Hovold (1):
wifi: rt2x00usb: fix devres lifetime
John Pavlick (1):
net: sfp: add quirks for Hisense and HSGQ GPON ONT SFP modules
Jon Hunter (2):
net: stmmac: Fix PTP ref clock for Tegra234
dt-bindings: net: Fix Tegra234 MGBE PTP clock
Justin Iurman (1):
net: ioam6: fix OOB and missing lock
Keenan Dong (3):
xfrm: account XFRMA_IF_ID in aevent size calculation
rxrpc: fix RESPONSE authenticator parser OOB read
rxrpc: fix oversized RESPONSE authenticator length check
Kohei Enju (1):
ice: ptp: don't WARN when controlling PF is unavailable
Kotlyarov Mihail (1):
xfrm: fix refcount leak in xfrm_migrate_policy_find
Li RongQing (1):
devlink: Fix incorrect skb socket family dumping
Lorenzo Bianconi (1):
net: airoha: Fix memory leak in airoha_qdma_rx_process()
Luxiao Xu (1):
rxrpc: fix reference count leak in rxrpc_server_keyring()
Maciej Fijalkowski (8):
xsk: tighten UMEM headroom validation to account for tailroom and min frame
xsk: respect tailroom for ZC setups
xsk: fix XDP_UMEM_SG_FLAG issues
xsk: validate MTU against usable frame size on bind
selftests: bpf: introduce a common routine for reading procfs
selftests: bpf: fix pkt grow tests
selftests: bpf: have a separate variable for drop test
selftests: bpf: adjust rx_dropped xskxceiver's test to respect tailroom
Marc Dionne (1):
rxrpc: Fix to request an ack if window is limited
Matthieu Baerts (NGI0) (1):
Revert "mptcp: add needs_id for netlink appending addr"
Michael Guralnik (1):
net/mlx5: Update the list of the PCI supported devices
Michal Schmidt (1):
ixgbevf: add missing negotiate_features op to Hyper-V ops table
Muhammad Alifa Ramdhan (1):
net/tls: fix use-after-free in -EBUSY error path of tls_do_encryption
Nikolaos Gkarlis (1):
rtnetlink: add missing netlink_ns_capable() check for peer netns
Oleh Konko (2):
tipc: fix bc_ackers underflow on duplicate GRP_ACK_MSG
rxrpc: Fix RxGK token loading to check bounds
Paolo Abeni (1):
Merge branch 'net-lan966x-fix-page_pool-error-handling-and-error-paths'
Pengpeng Hou (5):
net: qualcomm: qca_uart: report the consumed byte on RX skb allocation failure
nfc: s3fwrn5: allocate rx skb before consuming bytes
wifi: brcmfmac: validate bsscfg indices in IF events
rxrpc: proc: size address buffers for %pISpc output
nfc: pn533: allocate rx skb before consuming bytes
Petr Oros (1):
ice: fix PTP timestamping broken by SyncE code on E825C
Qi Tang (1):
xfrm: hold dev ref until after transport_finish NF_HOOK
Qingfang Deng (1):
MAINTAINERS: orphan PPP over Ethernet driver
Raju Rangoju (1):
MAINTAINERS: Add Prashanth as additional maintainer for amd-xgbe driver
Ren Wei (1):
netfilter: xt_multiport: validate range encoding in checkentry
Ruide Cao (2):
batman-adv: reject oversized global TT response buffers
net: sched: act_csum: validate nested VLAN headers
Stefano Garzarella (1):
vsock/test: fix send_buf()/recv_buf() EINTR handling
Steffen Klassert (1):
xfrm: Wait for RCU readers during policy netns exit
Thomas Fourier (1):
wifi: brcmsmac: Fix dma_free_coherent() size
Tuan Do (1):
netfilter: nft_ct: fix use-after-free in timeout object destroy
Tyllis Xu (1):
net: stmmac: fix integer underflow in chain mode
Wang Jie (1):
rxrpc: only handle RESPONSE during service challenge
Weiming Shi (1):
ipvs: fix NULL deref in ip_vs_add_service error path
Xiang Mei (1):
netfilter: nfnetlink_log: initialize nfgenmsg in NLMSG_DONE terminator
Yasuaki Torimaru (1):
xfrm: clear trailing padding in build_polexpire()
Yiqi Sun (1):
ipv4: icmp: fix null-ptr-deref in icmp_build_probe()
Yuqi Xu (1):
rxrpc: reject undecryptable rxkad response tickets
Zhengchuan Liang (2):
net: af_key: zero aligned sockaddr tail in PF_KEY exports
netfilter: ip6t_eui64: reject invalid MAC header for all packets
Zijing Yin (1):
bridge: guard local VLAN-0 FDB helpers against NULL vlan group
.../bindings/net/nvidia,tegra234-mgbe.yaml | 4 +-
MAINTAINERS | 6 +-
drivers/net/ethernet/airoha/airoha_eth.c | 3 +-
drivers/net/ethernet/altera/altera_tse_main.c | 1 +
drivers/net/ethernet/freescale/Kconfig | 2 +-
drivers/net/ethernet/intel/e1000/e1000_ethtool.c | 8 +-
drivers/net/ethernet/intel/ice/ice_ptp.c | 30 ++--
drivers/net/ethernet/intel/idpf/idpf_virtchnl.c | 20 ++-
drivers/net/ethernet/intel/idpf/idpf_virtchnl.h | 5 +-
drivers/net/ethernet/intel/igb/igb_main.c | 3 +-
drivers/net/ethernet/intel/ixgbe/devlink/devlink.c | 2 +-
drivers/net/ethernet/intel/ixgbe/ixgbe.h | 2 +-
drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c | 13 +-
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 10 ++
drivers/net/ethernet/intel/ixgbevf/vf.c | 7 +
drivers/net/ethernet/mellanox/mlx5/core/main.c | 1 +
.../net/ethernet/microchip/lan966x/lan966x_fdma.c | 28 ++-
drivers/net/ethernet/qualcomm/qca_uart.c | 2 +-
drivers/net/ethernet/stmicro/stmmac/chain_mode.c | 11 +-
.../net/ethernet/stmicro/stmmac/dwmac-motorcomm.c | 8 +
drivers/net/ethernet/stmicro/stmmac/dwmac-tegra.c | 19 +-
drivers/net/ethernet/wangxun/txgbe/txgbe_type.h | 8 +-
drivers/net/ipa/reg/gsi_reg-v5.0.c | 9 +-
drivers/net/mdio/mdio-realtek-rtl9300.c | 3 +-
drivers/net/phy/sfp.c | 16 ++
drivers/net/wan/lapbether.c | 13 +-
.../wireless/broadcom/brcm80211/brcmfmac/fweh.c | 5 +
.../net/wireless/broadcom/brcm80211/brcmsmac/dma.c | 2 +-
drivers/net/wireless/ralink/rt2x00/rt2x00usb.c | 2 +-
drivers/nfc/pn533/uart.c | 11 +-
drivers/nfc/s3fwrn5/uart.c | 10 +-
include/net/ip_tunnels.h | 2 +-
include/net/netfilter/nf_conntrack_timeout.h | 1 +
include/net/netfilter/nf_queue.h | 1 -
include/net/xdp_sock.h | 2 +-
include/net/xdp_sock_drv.h | 23 ++-
include/trace/events/rxrpc.h | 4 +-
net/batman-adv/bridge_loop_avoidance.c | 27 ++-
net/batman-adv/translation-table.c | 9 +-
net/bridge/br_fdb.c | 6 +
net/core/netdev_rx_queue.c | 2 +-
net/core/rtnetlink.c | 40 +++--
net/core/skbuff.c | 5 +-
net/devlink/health.c | 2 +-
net/ipv4/icmp.c | 7 +
net/ipv4/nexthop.c | 41 +++--
net/ipv4/xfrm4_input.c | 5 +-
net/ipv6/ioam6.c | 33 ++--
net/ipv6/netfilter/ip6t_eui64.c | 3 +-
net/ipv6/seg6_iptunnel.c | 34 ++--
net/ipv6/xfrm6_input.c | 5 +-
net/key/af_key.c | 52 ++++--
net/l2tp/l2tp_core.c | 5 +
net/mptcp/pm_kernel.c | 24 +--
net/mptcp/protocol.c | 2 +
net/mptcp/protocol.h | 1 +
net/mptcp/subflow.c | 15 +-
net/netfilter/ipvs/ip_vs_ctl.c | 1 -
net/netfilter/nfnetlink_log.c | 8 +-
net/netfilter/nfnetlink_queue.c | 139 +++++----------
net/netfilter/nft_ct.c | 2 +-
net/netfilter/xt_multiport.c | 34 +++-
net/rfkill/core.c | 35 ++--
net/rxrpc/af_rxrpc.c | 6 -
net/rxrpc/ar-internal.h | 2 +-
net/rxrpc/call_object.c | 25 ++-
net/rxrpc/conn_event.c | 19 +-
net/rxrpc/input_rack.c | 2 +-
net/rxrpc/io_thread.c | 3 +-
net/rxrpc/key.c | 40 +++--
net/rxrpc/output.c | 2 +
net/rxrpc/proc.c | 37 ++--
net/rxrpc/rxgk.c | 19 +-
net/rxrpc/rxkad.c | 63 ++++---
net/rxrpc/sendmsg.c | 2 +-
net/rxrpc/server_key.c | 3 +
net/sched/act_csum.c | 6 +-
net/tipc/group.c | 6 +-
net/tls/tls_sw.c | 10 ++
net/unix/diag.c | 21 ++-
net/xdp/xdp_umem.c | 3 +-
net/xdp/xsk.c | 4 +-
net/xdp/xsk_buff_pool.c | 32 +++-
net/xfrm/xfrm_input.c | 18 +-
net/xfrm/xfrm_policy.c | 5 +-
net/xfrm/xfrm_user.c | 14 +-
tools/testing/selftests/bpf/prog_tests/test_xsk.c | 55 +++---
tools/testing/selftests/bpf/prog_tests/test_xsk.h | 23 +++
tools/testing/selftests/bpf/prog_tests/xsk.c | 19 ++
tools/testing/selftests/bpf/progs/xsk_xdp_progs.c | 4 +-
tools/testing/selftests/bpf/xskxceiver.c | 23 +++
tools/testing/selftests/net/Makefile | 1 +
.../selftests/net/forwarding/bridge_vlan_mcast.sh | 1 +
tools/testing/selftests/net/netfilter/nf_queue.c | 50 +++++-
tools/testing/selftests/net/netfilter/nft_queue.sh | 83 +++++++--
tools/testing/selftests/net/srv6_iptunnel_cache.sh | 197 +++++++++++++++++++++
tools/testing/vsock/util.c | 8 +-
97 files changed, 1172 insertions(+), 468 deletions(-)
create mode 100755 tools/testing/selftests/net/srv6_iptunnel_cache.sh
^ permalink raw reply
* Re: [PATCH net] l2tp: take a session reference in pppol2tp_ioctl()
From: Simon Horman @ 2026-04-09 14:46 UTC (permalink / raw)
To: Yiqi Sun; +Cc: jchapman, davem, edumazet, kuba, pabeni, netdev
In-Reply-To: <20260404133245.2391409-1-sunyiqixm@gmail.com>
On Sat, Apr 04, 2026 at 09:32:45PM +0800, Yiqi Sun wrote:
> pppol2tp_ioctl() reads sock->sk->sk_user_data and dereferences the
> returned l2tp_session without taking a reference on it.
>
> Since the ppp socket/session lifetime rework, session teardown runs
> asynchronously and can clear sk_user_data and drop the last session
> reference in parallel with ioctl(). This leaves ioctl() with a stale
> session pointer and can trigger a use-after-free.
>
> Fix this by using pppol2tp_sock_to_session() in pppol2tp_ioctl() and
> dropping the session reference before returning. This matches the
> existing getsockopt/setsockopt paths.
>
> Fixes: c5cbaef992d64 ("l2tp: refactor ppp socket/session relationship")
> Signed-off-by: Yiqi Sun <sunyiqixm@gmail.com>
> ---
> net/l2tp/l2tp_ppp.c | 88 +++++++++++++++++++++++++++------------------
> 1 file changed, 54 insertions(+), 34 deletions(-)
>
> diff --git a/net/l2tp/l2tp_ppp.c b/net/l2tp/l2tp_ppp.c
> index ae4543d5597b..e6d7d3537180 100644
> --- a/net/l2tp/l2tp_ppp.c
> +++ b/net/l2tp/l2tp_ppp.c
> @@ -1042,66 +1042,79 @@ static int pppol2tp_tunnel_copy_stats(struct pppol2tp_ioc_stats *stats,
> static int pppol2tp_ioctl(struct socket *sock, unsigned int cmd,
> unsigned long arg)
> {
> + struct sock *sk = sock->sk;
> struct pppol2tp_ioc_stats stats;
> struct l2tp_session *session;
> + int err;
> +
> + err = -ENOTCONN;
> + if (!sk->sk_user_data)
> + goto end;
I think it would be cleaner to simply:
return -ENOTCONN;
> +
> + err = -EBADF;
> + session = pppol2tp_sock_to_session(sk);
> + if (!session)
> + goto end;
And, similarly here.
...
> @@ -1111,15 +1124,22 @@ static int pppol2tp_ioctl(struct socket *sock, unsigned int cmd,
> stats.tunnel_id = session->tunnel->tunnel_id;
> stats.using_ipsec = l2tp_tunnel_uses_xfrm(session->tunnel);
>
> - if (copy_to_user((void __user *)arg, &stats, sizeof(stats)))
> - return -EFAULT;
> + if (copy_to_user((void __user *)arg, &stats, sizeof(stats))) {
> + err = -EFAULT;
> + goto end_put_sess;
> + }
> + err = 0;
> break;
>
> default:
> - return -ENOIOCTLCMD;
> + err = -ENOIOCTLCMD;
I would suggest a goto here.
> + break;
> }
>
And setting err = 0 here, rather than in multiple places above.
> - return 0;
> +end_put_sess:
I think "out_put_session" would be a slightly better name for this label.
> + l2tp_session_put(session);
> +end:
> + return err;
> }
>
> /*****************************************************************************
> --
> 2.34.1
>
^ permalink raw reply
* Re: [PATCH net-next v11 03/14] net: Add lease info to queue-get response
From: Jakub Kicinski @ 2026-04-09 14:46 UTC (permalink / raw)
To: Daniel Borkmann
Cc: netdev, bpf, davem, razor, pabeni, willemb, sdf, john.fastabend,
martin.lau, jordan, maciej.fijalkowski, magnus.karlsson, dw, toke,
yangzhenze, wangdongdong.6
In-Reply-To: <5a5feea9-7675-4ccf-aa8e-3a2e476ce8f5@iogearbox.net>
On Thu, 9 Apr 2026 15:52:42 +0200 Daniel Borkmann wrote:
> >> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> > Thanks for looking into it! That looks good to me. I've also retested that
> > it still works.
> >
> > Maybe small nits could be below to move the netif_is_queue_leasee into the
> > netdev_rx_queue.h header since its used outside of core and it might be
> > worth to also have the lock assertion in netdev_queue_get_dma_dev.
I suspected it may irk you :) No strong preference on the placement.
We do include the ../core/dev.h in a couple of places but agreed that
it is slightly ugly.
> > Do you want me to add your patch on top for a v12 of the series?
Yes, please. Let's get it into 7.1.
I think the test has to be reworked but of the available options seems
like merging it as is and following up quickly is the best. I've only
set up the container testing in our CI yesterday anyway so there may
be more things that need changing in the test as we gain experience :S
^ permalink raw reply
* Re: [PATCH net-next 0/7] tcp: restrict rcv_wnd and window_clamp to representable window
From: Eric Dumazet @ 2026-04-09 14:52 UTC (permalink / raw)
To: gmbnomis
Cc: Neal Cardwell, Kuniyuki Iwashima, David S. Miller, David Ahern,
Jakub Kicinski, Paolo Abeni, Simon Horman, Shuah Khan, netdev,
linux-kernel, linux-kselftest
In-Reply-To: <20260408-tcp_rcv_exact_clamp_and_wnd-v1-0-76a6f212e153@gmail.com>
On Wed, Apr 8, 2026 at 2:50 PM Simon Baatz via B4 Relay
<devnull+gmbnomis.gmail.com@kernel.org> wrote:
>
> Hi,
>
> this series ensures that rcv_wnd and window_clamp do not exceed the
> maximum window size representable for the connection's window scale
> factor.
>
> This is most visible when TCP window scaling is not used for a
> connection. In that case, the advertised window is limited to 65535
> bytes, but rcv_wnd or window_clamp can still grow beyond 65535 when
> large receive buffers are used. The resulting mismatch breaks
> calculations that depend on the advertised window, such as the ACK
> decision in __tcp_ack_snd_check(), and can prevent immediate ACKs.
>
> Similar effects may also occur when window scaling is in use, e.g. if
> the application dynamically adjusts SO_RCVBUF in unusual ways or when
> the rmem sysctl parameters change during a connection’s lifetime.
>
> Summary:
>
> - Patch 1 keeps rcv_wnd capped by the (window scale-limited)
> window_clamp at connection start.
> - Patch 3 and 6 ensure that window_clamp is limited to the
> representable window when it is updated.
> - The other patches add packetdrill tests to verify the new behavior.
>
> A simple iperf test on a virtme-ng VM (Intel i5-7500, 4 cores,
> loopback) shows a noticeable improvement with window scaling disabled:
Explain why we should spend time reviewing patches trying to help
stacks from 2 decades ago,
risking breaking other usages.
Almost every time we change the rcvbuf logic, we introduce bugs.
Not using window scaling in 2026 and expecting "iperf improvement" is
quite something!
Out of curiosity, which legacy product is stuck in the 20th century?
^ permalink raw reply
* Re: [PATCH net-next] selftests: net: py: add test case filtering and listing
From: Jakub Kicinski @ 2026-04-09 14:56 UTC (permalink / raw)
To: Breno Leitao
Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, shuah,
petrm, willemb, linux-kselftest
In-Reply-To: <adedNCKW0WE_FqZK@gmail.com>
On Thu, 9 Apr 2026 05:41:59 -0700 Breno Leitao wrote:
> > + print(f"Usage: {sys.argv[0]} [-h|-l] [-t|-T name]\n"
> > + f"\t-h print help\n"
> > + f"\t-l list all tests\n"
>
> I initially expected the help text to mention "(all or filtered)" based
> on the commit message, but since this option lists all tests
> unconditionally, the current wording is correct.
Ugh, good catch. Not sure how I lost this. It does display a filtered
list. I wanted the filtering to take effect so that one can see what
tests would have been executed with the filters without running them.
Sort of like a --dry-run.
LMK if you have any thoughts on this, otherwise I'll rephrase as:
f"\t-l list tests (filtered, if filters were specified)\n"
--
pw-bot: cr
^ permalink raw reply
* [PATCH net-next 0/5] net: reduce sk_filter() (and friends) bloat
From: Eric Dumazet @ 2026-04-09 14:56 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, netdev, eric.dumazet, Eric Dumazet
Some functions return an error by value, and a drop_reason
by an output parameter. This extra parameter can force stack canaries.
A drop_reason is enough and more efficient.
This series reduces bloat by 678 bytes on x86_64:
$ scripts/bloat-o-meter -t vmlinux.old vmlinux.final
add/remove: 0/0 grow/shrink: 3/18 up/down: 79/-757 (-678)
Function old new delta
vsock_queue_rcv_skb 50 79 +29
ipmr_cache_report 1290 1315 +25
ip6mr_cache_report 1322 1347 +25
tcp_v6_rcv 3169 3167 -2
packet_rcv_spkt 329 327 -2
unix_dgram_sendmsg 1731 1726 -5
netlink_unicast 957 945 -12
netlink_dump 1372 1359 -13
sk_filter_trim_cap 889 858 -31
netlink_broadcast_filtered 1633 1595 -38
tcp_v4_rcv 3152 3111 -41
raw_rcv_skb 122 80 -42
ping_queue_rcv_skb 109 61 -48
ping_rcv 215 162 -53
rawv6_rcv_skb 278 224 -54
__sk_receive_skb 690 632 -58
raw_rcv 591 527 -64
udpv6_queue_rcv_one_skb 935 869 -66
udp_queue_rcv_one_skb 919 853 -66
tun_net_xmit 1146 1074 -72
sock_queue_rcv_skb_reason 166 76 -90
Total: Before=29722890, After=29722212, chg -0.00%
Future conversions from sock_queue_rcv_skb() to sock_queue_rcv_skb_reason()
can be done later.
Eric Dumazet (5):
net: change sock_queue_rcv_skb_reason() to return a drop_reason
net: always set reason in sk_filter_trim_cap()
net: change sk_filter_reason() to return the reason by value
tcp: change tcp_filter() to return the reason by value
net: change sk_filter_trim_cap() to return a drop_reason by value
drivers/net/tun.c | 8 +++++---
include/linux/filter.h | 15 ++++++++-------
include/net/sock.h | 17 ++++++++++++++---
include/net/tcp.h | 6 +++---
net/can/bcm.c | 5 ++---
net/can/isotp.c | 3 ++-
net/can/j1939/socket.c | 3 ++-
net/can/raw.c | 3 ++-
net/core/filter.c | 30 +++++++++++++-----------------
net/core/sock.c | 29 +++++++++++------------------
net/ipv4/ping.c | 3 ++-
net/ipv4/raw.c | 3 ++-
net/ipv4/tcp_ipv4.c | 6 ++++--
net/ipv4/udp.c | 3 ++-
net/ipv6/raw.c | 3 ++-
net/ipv6/tcp_ipv6.c | 6 ++++--
net/ipv6/udp.c | 3 ++-
net/rose/rose_in.c | 3 +--
18 files changed, 81 insertions(+), 68 deletions(-)
--
2.53.0.1213.gd9a14994de-goog
^ permalink raw reply
* [PATCH net-next 1/5] net: change sock_queue_rcv_skb_reason() to return a drop_reason
From: Eric Dumazet @ 2026-04-09 14:56 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, netdev, eric.dumazet, Eric Dumazet
In-Reply-To: <20260409145625.2306224-1-edumazet@google.com>
Change sock_queue_rcv_skb_reason() to return the drop_reason directly
instead of using a reference.
This is part of an effort to remove stack canaries and reduce bloat.
$ scripts/bloat-o-meter -t vmlinux.old vmlinux.new
add/remove: 0/0 grow/shrink: 3/7 up/down: 79/-301 (-222)
Function old new delta
vsock_queue_rcv_skb 50 79 +29
ipmr_cache_report 1290 1315 +25
ip6mr_cache_report 1322 1347 +25
packet_rcv_spkt 329 327 -2
sock_queue_rcv_skb_reason 166 128 -38
raw_rcv_skb 122 80 -42
ping_queue_rcv_skb 109 61 -48
ping_rcv 215 162 -53
rawv6_rcv_skb 278 224 -54
raw_rcv 591 527 -64
Total: Before=29722890, After=29722668, chg -0.00%
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
include/net/sock.h | 17 ++++++++++++++---
net/can/bcm.c | 5 ++---
net/can/isotp.c | 3 ++-
net/can/j1939/socket.c | 3 ++-
net/can/raw.c | 3 ++-
net/core/sock.c | 20 ++++++--------------
net/ipv4/ping.c | 3 ++-
net/ipv4/raw.c | 3 ++-
net/ipv6/raw.c | 3 ++-
9 files changed, 34 insertions(+), 26 deletions(-)
diff --git a/include/net/sock.h b/include/net/sock.h
index 7d51ac9e7d9a87a7f6f0453a3d3e2c6ed34dc151..5831a4d1ebe77e3d6f568d208fafb072f9635242 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -2502,12 +2502,23 @@ int __sk_queue_drop_skb(struct sock *sk, struct sk_buff_head *sk_queue,
struct sk_buff *skb));
int __sock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb);
-int sock_queue_rcv_skb_reason(struct sock *sk, struct sk_buff *skb,
- enum skb_drop_reason *reason);
+enum skb_drop_reason
+sock_queue_rcv_skb_reason(struct sock *sk, struct sk_buff *skb);
static inline int sock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
{
- return sock_queue_rcv_skb_reason(sk, skb, NULL);
+ enum skb_drop_reason drop_reason = sock_queue_rcv_skb_reason(sk, skb);
+
+ switch (drop_reason) {
+ case SKB_DROP_REASON_SOCKET_RCVBUFF:
+ return -ENOMEM;
+ case SKB_DROP_REASON_PROTO_MEM:
+ return -ENOBUFS;
+ case 0:
+ return 0;
+ default:
+ return -EPERM;
+ }
}
int sock_queue_err_skb(struct sock *sk, struct sk_buff *skb);
diff --git a/net/can/bcm.c b/net/can/bcm.c
index fd9fa072881e22ced725fa77dd096dea07fb73a6..d6291381afb09c36b30557f1d1a328c9ed0579f7 100644
--- a/net/can/bcm.c
+++ b/net/can/bcm.c
@@ -363,7 +363,6 @@ static void bcm_send_to_user(struct bcm_op *op, struct bcm_msg_head *head,
struct sockaddr_can *addr;
struct sock *sk = op->sk;
unsigned int datalen = head->nframes * op->cfsiz;
- int err;
unsigned int *pflags;
enum skb_drop_reason reason;
@@ -420,8 +419,8 @@ static void bcm_send_to_user(struct bcm_op *op, struct bcm_msg_head *head,
addr->can_family = AF_CAN;
addr->can_ifindex = op->rx_ifindex;
- err = sock_queue_rcv_skb_reason(sk, skb, &reason);
- if (err < 0) {
+ reason = sock_queue_rcv_skb_reason(sk, skb);
+ if (reason) {
struct bcm_sock *bo = bcm_sk(sk);
sk_skb_reason_drop(sk, skb, reason);
diff --git a/net/can/isotp.c b/net/can/isotp.c
index 2770f43f4951884658d54ac90bd1e0ae21c24102..c48b4a818297e2a1348a2b64016d0f4ff613e683 100644
--- a/net/can/isotp.c
+++ b/net/can/isotp.c
@@ -291,7 +291,8 @@ static void isotp_rcv_skb(struct sk_buff *skb, struct sock *sk)
addr->can_family = AF_CAN;
addr->can_ifindex = skb->dev->ifindex;
- if (sock_queue_rcv_skb_reason(sk, skb, &reason) < 0)
+ reason = sock_queue_rcv_skb_reason(sk, skb);
+ if (reason)
sk_skb_reason_drop(sk, skb, reason);
}
diff --git a/net/can/j1939/socket.c b/net/can/j1939/socket.c
index 0502b030d23851652be252f1342861332ce97367..50a598ef5fd4a5f5e007816a341e04ddbcc724e6 100644
--- a/net/can/j1939/socket.c
+++ b/net/can/j1939/socket.c
@@ -333,7 +333,8 @@ static void j1939_sk_recv_one(struct j1939_sock *jsk, struct sk_buff *oskb)
if (skb->sk)
skcb->msg_flags |= MSG_DONTROUTE;
- if (sock_queue_rcv_skb_reason(&jsk->sk, skb, &reason) < 0)
+ reason = sock_queue_rcv_skb_reason(&jsk->sk, skb);
+ if (reason)
sk_skb_reason_drop(&jsk->sk, skb, reason);
}
diff --git a/net/can/raw.c b/net/can/raw.c
index eee244ffc31ecc0e1cc1aae29cd1d13a4e6b54ca..56c95c768778accaec42cb998c7d679a42c85894 100644
--- a/net/can/raw.c
+++ b/net/can/raw.c
@@ -207,7 +207,8 @@ static void raw_rcv(struct sk_buff *oskb, void *data)
if (oskb->sk == sk)
*pflags |= MSG_CONFIRM;
- if (sock_queue_rcv_skb_reason(sk, skb, &reason) < 0)
+ reason = sock_queue_rcv_skb_reason(sk, skb);
+ if (reason)
sk_skb_reason_drop(sk, skb, reason);
}
diff --git a/net/core/sock.c b/net/core/sock.c
index e821b95e00151ab6b4be89209abcbaa494234433..d39a4d6ccafd9e03f4e82482d3f3e46ce5d58771 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -520,32 +520,24 @@ int __sock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
}
EXPORT_SYMBOL(__sock_queue_rcv_skb);
-int sock_queue_rcv_skb_reason(struct sock *sk, struct sk_buff *skb,
- enum skb_drop_reason *reason)
+enum skb_drop_reason
+sock_queue_rcv_skb_reason(struct sock *sk, struct sk_buff *skb)
{
enum skb_drop_reason drop_reason;
int err;
err = sk_filter_reason(sk, skb, &drop_reason);
if (err)
- goto out;
+ return drop_reason;
err = __sock_queue_rcv_skb(sk, skb);
switch (err) {
case -ENOMEM:
- drop_reason = SKB_DROP_REASON_SOCKET_RCVBUFF;
- break;
+ return SKB_DROP_REASON_SOCKET_RCVBUFF;
case -ENOBUFS:
- drop_reason = SKB_DROP_REASON_PROTO_MEM;
- break;
- default:
- drop_reason = SKB_NOT_DROPPED_YET;
- break;
+ return SKB_DROP_REASON_PROTO_MEM;
}
-out:
- if (reason)
- *reason = drop_reason;
- return err;
+ return SKB_NOT_DROPPED_YET;
}
EXPORT_SYMBOL(sock_queue_rcv_skb_reason);
diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
index bda245c808938cf2a26270bcd83c74898e5b36dd..1273d1028ed9ca91734481f65652bfc43efd039a 100644
--- a/net/ipv4/ping.c
+++ b/net/ipv4/ping.c
@@ -935,7 +935,8 @@ static enum skb_drop_reason __ping_queue_rcv_skb(struct sock *sk,
pr_debug("ping_queue_rcv_skb(sk=%p,sk->num=%d,skb=%p)\n",
inet_sk(sk), inet_sk(sk)->inet_num, skb);
- if (sock_queue_rcv_skb_reason(sk, skb, &reason) < 0) {
+ reason = sock_queue_rcv_skb_reason(sk, skb);
+ if (reason) {
sk_skb_reason_drop(sk, skb, reason);
pr_debug("ping_queue_rcv_skb -> failed\n");
return reason;
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 34859e537b4926f15996dd1a684ae59a55a1643a..319428bf06bb89932c0b4295a0f96c275f8ecab1 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -300,7 +300,8 @@ static int raw_rcv_skb(struct sock *sk, struct sk_buff *skb)
/* Charge it to the socket. */
ipv4_pktinfo_prepare(sk, skb, true);
- if (sock_queue_rcv_skb_reason(sk, skb, &reason) < 0) {
+ reason = sock_queue_rcv_skb_reason(sk, skb);
+ if (reason) {
sk_skb_reason_drop(sk, skb, reason);
return NET_RX_DROP;
}
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 0ac7046911000d30056e0d1f49a58964c61308cf..3cc58698cbbd3a16cf0145e0afff7a6cec8dc56f 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -369,7 +369,8 @@ static inline int rawv6_rcv_skb(struct sock *sk, struct sk_buff *skb)
/* Charge it to the socket. */
skb_dst_drop(skb);
- if (sock_queue_rcv_skb_reason(sk, skb, &reason) < 0) {
+ reason = sock_queue_rcv_skb_reason(sk, skb);
+ if (reason) {
sk_skb_reason_drop(sk, skb, reason);
return NET_RX_DROP;
}
--
2.53.0.1213.gd9a14994de-goog
^ permalink raw reply related
* [PATCH net-next 2/5] net: always set reason in sk_filter_trim_cap()
From: Eric Dumazet @ 2026-04-09 14:56 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, netdev, eric.dumazet, Eric Dumazet
In-Reply-To: <20260409145625.2306224-1-edumazet@google.com>
sk_filter_trim_cap() will soon return the drop reason by value.
Make sure *reason is cleared when no error is returned,
to ease this conversion.
$ scripts/bloat-o-meter -t vmlinux.old vmlinux.new
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-7 (-7)
Function old new delta
sk_filter_trim_cap 889 882 -7
Total: Before=29722668, After=29722661, chg -0.00%
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/core/filter.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/net/core/filter.c b/net/core/filter.c
index cf2113af4bc9ae7c976d7c55f3092643e1d875b6..5569d83b8be06dc1fe64ddff2ae338acd1622ed7 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -121,7 +121,7 @@ EXPORT_SYMBOL_GPL(copy_bpf_fprog_from_user);
* @sk: sock associated with &sk_buff
* @skb: buffer to filter
* @cap: limit on how short the eBPF program may trim the packet
- * @reason: record drop reason on errors (negative return value)
+ * @reason: record drop reason
*
* Run the eBPF program and then cut skb->data to correct size returned by
* the program. If pkt_len is 0 we toss packet. If skb->len is smaller
@@ -168,11 +168,10 @@ int sk_filter_trim_cap(struct sock *sk, struct sk_buff *skb,
pkt_len = bpf_prog_run_save_cb(filter->prog, skb);
skb->sk = save_sk;
err = pkt_len ? pskb_trim(skb, max(cap, pkt_len)) : -EPERM;
- if (err)
- *reason = SKB_DROP_REASON_SOCKET_FILTER;
}
rcu_read_unlock();
+ *reason = err ? SKB_DROP_REASON_SOCKET_FILTER : 0;
return err;
}
EXPORT_SYMBOL(sk_filter_trim_cap);
--
2.53.0.1213.gd9a14994de-goog
^ permalink raw reply related
* [PATCH net-next 3/5] net: change sk_filter_reason() to return the reason by value
From: Eric Dumazet @ 2026-04-09 14:56 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, netdev, eric.dumazet, Eric Dumazet
In-Reply-To: <20260409145625.2306224-1-edumazet@google.com>
sk_filter_trim_cap will soon return the reason by value,
do the same for sk_filter_reason().
$ scripts/bloat-o-meter -t vmlinux.old vmlinux.new
add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-21 (-21)
Function old new delta
sock_queue_rcv_skb_reason 128 126 -2
tun_net_xmit 1146 1127 -19
Total: Before=29722661, After=29722640, chg -0.00%
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
drivers/net/tun.c | 8 +++++---
include/linux/filter.h | 9 ++++++---
net/core/sock.c | 4 ++--
3 files changed, 13 insertions(+), 8 deletions(-)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index c492fda6fc15a79c13f56cb15dc273331b854422..b183189f185354051bded95f43bd77ee4f7cde24 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1031,9 +1031,11 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
goto drop;
}
- if (tfile->socket.sk->sk_filter &&
- sk_filter_reason(tfile->socket.sk, skb, &drop_reason))
- goto drop;
+ if (tfile->socket.sk->sk_filter) {
+ drop_reason = sk_filter_reason(tfile->socket.sk, skb);
+ if (drop_reason)
+ goto drop;
+ }
len = run_ebpf_filter(tun, skb, len);
if (len == 0) {
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 44d7ae95ddbccd0ba72740b5547e91e1990686f2..59931e5810b4fcff5788616a3875767421dba3bc 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -1102,10 +1102,13 @@ static inline int sk_filter(struct sock *sk, struct sk_buff *skb)
return sk_filter_trim_cap(sk, skb, 1, &ignore_reason);
}
-static inline int sk_filter_reason(struct sock *sk, struct sk_buff *skb,
- enum skb_drop_reason *reason)
+static inline enum skb_drop_reason
+sk_filter_reason(struct sock *sk, struct sk_buff *skb)
{
- return sk_filter_trim_cap(sk, skb, 1, reason);
+ enum skb_drop_reason drop_reason;
+
+ sk_filter_trim_cap(sk, skb, 1, &drop_reason);
+ return drop_reason;
}
struct bpf_prog *bpf_prog_select_runtime(struct bpf_prog *fp, int *err);
diff --git a/net/core/sock.c b/net/core/sock.c
index d39a4d6ccafd9e03f4e82482d3f3e46ce5d58771..1ffcb15d0fc5e39201aab24616d40a37aa41c823 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -526,8 +526,8 @@ sock_queue_rcv_skb_reason(struct sock *sk, struct sk_buff *skb)
enum skb_drop_reason drop_reason;
int err;
- err = sk_filter_reason(sk, skb, &drop_reason);
- if (err)
+ drop_reason = sk_filter_reason(sk, skb);
+ if (drop_reason)
return drop_reason;
err = __sock_queue_rcv_skb(sk, skb);
--
2.53.0.1213.gd9a14994de-goog
^ permalink raw reply related
* [PATCH net-next 4/5] tcp: change tcp_filter() to return the reason by value
From: Eric Dumazet @ 2026-04-09 14:56 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, netdev, eric.dumazet, Eric Dumazet
In-Reply-To: <20260409145625.2306224-1-edumazet@google.com>
sk_filter_trim_cap() will soon return the reason by value,
do the same for tcp_filter().
Note:
tcp_filter() is no longer inlined. Following patch will inline it again.
$ scripts/bloat-o-meter -t vmlinux.4 vmlinux.5
add/remove: 2/0 grow/shrink: 0/2 up/down: 186/-43 (143)
Function old new delta
tcp_filter - 154 +154
__pfx_tcp_filter - 32 +32
tcp_v4_rcv 3152 3143 -9
tcp_v6_rcv 3169 3135 -34
Total: Before=29722640, After=29722783, chg +0.00%
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
include/net/tcp.h | 8 +++++---
net/ipv4/tcp_ipv4.c | 6 ++++--
net/ipv6/tcp_ipv6.c | 6 ++++--
3 files changed, 13 insertions(+), 7 deletions(-)
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 6156d1d068e142f696ec9dfff63e3aaebb0171bc..098e52269a04cb8938812a8f43caf11f9d5c68a3 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1683,12 +1683,14 @@ static inline bool tcp_checksum_complete(struct sk_buff *skb)
bool tcp_add_backlog(struct sock *sk, struct sk_buff *skb,
enum skb_drop_reason *reason);
-static inline int tcp_filter(struct sock *sk, struct sk_buff *skb,
- enum skb_drop_reason *reason)
+static inline enum skb_drop_reason
+tcp_filter(struct sock *sk, struct sk_buff *skb)
{
const struct tcphdr *th = (const struct tcphdr *)skb->data;
+ enum skb_drop_reason reason;
- return sk_filter_trim_cap(sk, skb, __tcp_hdrlen(th), reason);
+ sk_filter_trim_cap(sk, skb, __tcp_hdrlen(th), &reason);
+ return reason;
}
void tcp_set_state(struct sock *sk, int state);
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 69ab236072e7142d5ca9d0703d99f02c1e17c738..e2da3246a641e24328985cf558c322211df02b84 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -2164,7 +2164,8 @@ int tcp_v4_rcv(struct sk_buff *skb)
}
refcounted = true;
nsk = NULL;
- if (!tcp_filter(sk, skb, &drop_reason)) {
+ drop_reason = tcp_filter(sk, skb);
+ if (!drop_reason) {
th = (const struct tcphdr *)skb->data;
iph = ip_hdr(skb);
tcp_v4_fill_cb(skb, iph, th);
@@ -2225,7 +2226,8 @@ int tcp_v4_rcv(struct sk_buff *skb)
nf_reset_ct(skb);
- if (tcp_filter(sk, skb, &drop_reason))
+ drop_reason = tcp_filter(sk, skb);
+ if (drop_reason)
goto discard_and_relse;
th = (const struct tcphdr *)skb->data;
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 8dc3874e8b9252da60f21ad77a5ca834532e650a..d64d28e9842f7db69389034ac2ecdc76f405d379 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1794,7 +1794,8 @@ INDIRECT_CALLABLE_SCOPE int tcp_v6_rcv(struct sk_buff *skb)
}
refcounted = true;
nsk = NULL;
- if (!tcp_filter(sk, skb, &drop_reason)) {
+ drop_reason = tcp_filter(sk, skb);
+ if (!drop_reason) {
th = (const struct tcphdr *)skb->data;
hdr = ipv6_hdr(skb);
tcp_v6_fill_cb(skb, hdr, th);
@@ -1855,7 +1856,8 @@ INDIRECT_CALLABLE_SCOPE int tcp_v6_rcv(struct sk_buff *skb)
nf_reset_ct(skb);
- if (tcp_filter(sk, skb, &drop_reason))
+ drop_reason = tcp_filter(sk, skb);
+ if (drop_reason)
goto discard_and_relse;
th = (const struct tcphdr *)skb->data;
--
2.53.0.1213.gd9a14994de-goog
^ permalink raw reply related
* [PATCH net-next 5/5] net: change sk_filter_trim_cap() to return a drop_reason by value
From: Eric Dumazet @ 2026-04-09 14:56 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, netdev, eric.dumazet, Eric Dumazet
In-Reply-To: <20260409145625.2306224-1-edumazet@google.com>
Current return value can be replaced with the drop_reason,
reducing kernel bloat:
$ scripts/bloat-o-meter -t vmlinux.old vmlinux.new
add/remove: 0/2 grow/shrink: 1/11 up/down: 32/-603 (-571)
Function old new delta
tcp_v6_rcv 3135 3167 +32
unix_dgram_sendmsg 1731 1726 -5
netlink_unicast 957 945 -12
netlink_dump 1372 1359 -13
sk_filter_trim_cap 882 858 -24
tcp_v4_rcv 3143 3111 -32
__pfx_tcp_filter 32 - -32
netlink_broadcast_filtered 1633 1595 -38
sock_queue_rcv_skb_reason 126 76 -50
tun_net_xmit 1127 1074 -53
__sk_receive_skb 690 632 -58
udpv6_queue_rcv_one_skb 935 869 -66
udp_queue_rcv_one_skb 919 853 -66
tcp_filter 154 - -154
Total: Before=29722783, After=29722212, chg -0.00%
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
include/linux/filter.h | 14 ++++++--------
include/net/tcp.h | 4 +---
net/core/filter.c | 31 ++++++++++++++-----------------
net/core/sock.c | 5 +++--
net/ipv4/udp.c | 3 ++-
net/ipv6/udp.c | 3 ++-
net/rose/rose_in.c | 3 +--
7 files changed, 29 insertions(+), 34 deletions(-)
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 59931e5810b4fcff5788616a3875767421dba3bc..5ac08aa70123cf97ab91dea7e11e47b210a42d4a 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -1092,23 +1092,21 @@ bpf_jit_binary_lock_ro(struct bpf_binary_header *hdr)
return set_memory_rox((unsigned long)hdr, hdr->size >> PAGE_SHIFT);
}
-int sk_filter_trim_cap(struct sock *sk, struct sk_buff *skb, unsigned int cap,
- enum skb_drop_reason *reason);
+enum skb_drop_reason
+sk_filter_trim_cap(struct sock *sk, struct sk_buff *skb, unsigned int cap);
static inline int sk_filter(struct sock *sk, struct sk_buff *skb)
{
- enum skb_drop_reason ignore_reason;
+ enum skb_drop_reason drop_reason;
- return sk_filter_trim_cap(sk, skb, 1, &ignore_reason);
+ drop_reason = sk_filter_trim_cap(sk, skb, 1);
+ return drop_reason ? -EPERM : 0;
}
static inline enum skb_drop_reason
sk_filter_reason(struct sock *sk, struct sk_buff *skb)
{
- enum skb_drop_reason drop_reason;
-
- sk_filter_trim_cap(sk, skb, 1, &drop_reason);
- return drop_reason;
+ return sk_filter_trim_cap(sk, skb, 1);
}
struct bpf_prog *bpf_prog_select_runtime(struct bpf_prog *fp, int *err);
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 098e52269a04cb8938812a8f43caf11f9d5c68a3..49f45bcff917942e993c627dd3d0017369186f67 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1687,10 +1687,8 @@ static inline enum skb_drop_reason
tcp_filter(struct sock *sk, struct sk_buff *skb)
{
const struct tcphdr *th = (const struct tcphdr *)skb->data;
- enum skb_drop_reason reason;
- sk_filter_trim_cap(sk, skb, __tcp_hdrlen(th), &reason);
- return reason;
+ return sk_filter_trim_cap(sk, skb, __tcp_hdrlen(th));
}
void tcp_set_state(struct sock *sk, int state);
diff --git a/net/core/filter.c b/net/core/filter.c
index 5569d83b8be06dc1fe64ddff2ae338acd1622ed7..bf9c37b27646943e3a6fdad2fadf00f5e1ea8244 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -121,20 +121,20 @@ EXPORT_SYMBOL_GPL(copy_bpf_fprog_from_user);
* @sk: sock associated with &sk_buff
* @skb: buffer to filter
* @cap: limit on how short the eBPF program may trim the packet
- * @reason: record drop reason
*
* Run the eBPF program and then cut skb->data to correct size returned by
* the program. If pkt_len is 0 we toss packet. If skb->len is smaller
* than pkt_len we keep whole skb->data. This is the socket level
* wrapper to bpf_prog_run. It returns 0 if the packet should
- * be accepted or -EPERM if the packet should be tossed.
+ * be accepted or a drop_reason if the packet should be tossed.
*
*/
-int sk_filter_trim_cap(struct sock *sk, struct sk_buff *skb,
- unsigned int cap, enum skb_drop_reason *reason)
+enum skb_drop_reason
+sk_filter_trim_cap(struct sock *sk, struct sk_buff *skb, unsigned int cap)
{
- int err;
+ enum skb_drop_reason drop_reason;
struct sk_filter *filter;
+ int err;
/*
* If the skb was allocated from pfmemalloc reserves, only
@@ -143,21 +143,17 @@ int sk_filter_trim_cap(struct sock *sk, struct sk_buff *skb,
*/
if (skb_pfmemalloc(skb) && !sock_flag(sk, SOCK_MEMALLOC)) {
NET_INC_STATS(sock_net(sk), LINUX_MIB_PFMEMALLOCDROP);
- *reason = SKB_DROP_REASON_PFMEMALLOC;
- return -ENOMEM;
+ return SKB_DROP_REASON_PFMEMALLOC;
}
err = BPF_CGROUP_RUN_PROG_INET_INGRESS(sk, skb);
- if (err) {
- *reason = SKB_DROP_REASON_SOCKET_FILTER;
- return err;
- }
+ if (err)
+ return SKB_DROP_REASON_SOCKET_FILTER;
err = security_sock_rcv_skb(sk, skb);
- if (err) {
- *reason = SKB_DROP_REASON_SECURITY_HOOK;
- return err;
- }
+ if (err)
+ return SKB_DROP_REASON_SECURITY_HOOK;
+ drop_reason = 0;
rcu_read_lock();
filter = rcu_dereference(sk->sk_filter);
if (filter) {
@@ -168,11 +164,12 @@ int sk_filter_trim_cap(struct sock *sk, struct sk_buff *skb,
pkt_len = bpf_prog_run_save_cb(filter->prog, skb);
skb->sk = save_sk;
err = pkt_len ? pskb_trim(skb, max(cap, pkt_len)) : -EPERM;
+ if (err)
+ drop_reason = SKB_DROP_REASON_SOCKET_FILTER;
}
rcu_read_unlock();
- *reason = err ? SKB_DROP_REASON_SOCKET_FILTER : 0;
- return err;
+ return drop_reason;
}
EXPORT_SYMBOL(sk_filter_trim_cap);
diff --git a/net/core/sock.c b/net/core/sock.c
index 1ffcb15d0fc5e39201aab24616d40a37aa41c823..367fd7bad4ac2e6557dc73519ac0c04debb43cb3 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -544,11 +544,12 @@ EXPORT_SYMBOL(sock_queue_rcv_skb_reason);
int __sk_receive_skb(struct sock *sk, struct sk_buff *skb,
const int nested, unsigned int trim_cap, bool refcounted)
{
- enum skb_drop_reason reason = SKB_DROP_REASON_NOT_SPECIFIED;
+ enum skb_drop_reason reason;
int rc = NET_RX_SUCCESS;
int err;
- if (sk_filter_trim_cap(sk, skb, trim_cap, &reason))
+ reason = sk_filter_trim_cap(sk, skb, trim_cap);
+ if (reason)
goto discard_and_relse;
skb->dev = NULL;
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index ab415de32443c5d32eedca5f093d5d96681f6b48..2fddc7b6b7172045286a8a0902f8bcf41aaca7c4 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -2392,7 +2392,8 @@ static int udp_queue_rcv_one_skb(struct sock *sk, struct sk_buff *skb)
udp_lib_checksum_complete(skb))
goto csum_error;
- if (sk_filter_trim_cap(sk, skb, sizeof(struct udphdr), &drop_reason))
+ drop_reason = sk_filter_trim_cap(sk, skb, sizeof(struct udphdr));
+ if (drop_reason)
goto drop;
udp_csum_pull_header(skb);
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index d7cf4c9508b2c5c92753eddf8de1717f52347fbf..3fac9cb47ae00fe26c60ba2aee61748b4a241221 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -853,7 +853,8 @@ static int udpv6_queue_rcv_one_skb(struct sock *sk, struct sk_buff *skb)
udp_lib_checksum_complete(skb))
goto csum_error;
- if (sk_filter_trim_cap(sk, skb, sizeof(struct udphdr), &drop_reason))
+ drop_reason = sk_filter_trim_cap(sk, skb, sizeof(struct udphdr));
+ if (drop_reason)
goto drop;
udp_csum_pull_header(skb);
diff --git a/net/rose/rose_in.c b/net/rose/rose_in.c
index 0276b393f0e530ea2d8f689a3dd95364849910ac..3aff3c2d45a956a5c791beb5c2d5c4e4d7063d6a 100644
--- a/net/rose/rose_in.c
+++ b/net/rose/rose_in.c
@@ -101,7 +101,6 @@ static int rose_state2_machine(struct sock *sk, struct sk_buff *skb, int framety
*/
static int rose_state3_machine(struct sock *sk, struct sk_buff *skb, int frametype, int ns, int nr, int q, int d, int m)
{
- enum skb_drop_reason dr; /* ignored */
struct rose_sock *rose = rose_sk(sk);
int queued = 0;
@@ -163,7 +162,7 @@ static int rose_state3_machine(struct sock *sk, struct sk_buff *skb, int framety
rose_frames_acked(sk, nr);
if (ns == rose->vr) {
rose_start_idletimer(sk);
- if (!sk_filter_trim_cap(sk, skb, ROSE_MIN_LEN, &dr) &&
+ if (!sk_filter_trim_cap(sk, skb, ROSE_MIN_LEN) &&
__sock_queue_rcv_skb(sk, skb) == 0) {
rose->vr = (rose->vr + 1) % ROSE_MODULUS;
queued = 1;
--
2.53.0.1213.gd9a14994de-goog
^ permalink raw reply related
* Re: [PATCH net-next] selftests: net: py: add test case filtering and listing
From: Breno Leitao @ 2026-04-09 14:58 UTC (permalink / raw)
To: Jakub Kicinski
Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, shuah,
petrm, willemb, linux-kselftest
In-Reply-To: <20260409075617.0b22db0a@kernel.org>
On Thu, Apr 09, 2026 at 07:56:17AM -0700, Jakub Kicinski wrote:
> On Thu, 9 Apr 2026 05:41:59 -0700 Breno Leitao wrote:
> > > + print(f"Usage: {sys.argv[0]} [-h|-l] [-t|-T name]\n"
> > > + f"\t-h print help\n"
> > > + f"\t-l list all tests\n"
> >
> > I initially expected the help text to mention "(all or filtered)" based
> > on the commit message, but since this option lists all tests
> > unconditionally, the current wording is correct.
>
> Ugh, good catch. Not sure how I lost this. It does display a filtered
> list. I wanted the filtering to take effect so that one can see what
> tests would have been executed with the filters without running them.
> Sort of like a --dry-run.
>
> LMK if you have any thoughts on this, otherwise I'll rephrase as:
>
> f"\t-l list tests (filtered, if filters were specified)\n"
Ack, the above is fine.
^ permalink raw reply
* Re: [PATCH net-next v3] selftests/net: convert so_txtime to drv-net
From: Willem de Bruijn @ 2026-04-09 15:01 UTC (permalink / raw)
To: Willem de Bruijn, Jakub Kicinski, Willem de Bruijn
Cc: netdev, davem, edumazet, pabeni, horms, Willem de Bruijn
In-Reply-To: <willemdebruijn.kernel.1e324d665bb85@gmail.com>
Willem de Bruijn wrote:
> Jakub Kicinski wrote:
> > On Sun, 5 Apr 2026 22:49:22 -0400 Willem de Bruijn wrote:
> > > +@ksft_variants(_test_variants_mono())
> > > +def test_so_txtime_mono(cfg, ipver, args_tx, args_rx):
> > > + """Run all variants of monotonic (fq) tests."""
> > > + cmd(f"tc qdisc replace dev {cfg.ifname} root fq")
> > > + test_so_txtime(cfg, "mono", ipver, args_tx, args_rx, False)
> > > +
> > > +
> > > +def _test_variants_etf():
> > > + for ipver in ["4", "6"]:
> > > + for testcase in [
> > > + ["no_delay", "a,-1", "a,-1", True],
> > > + ["zero_delay", "a,0", "a,0", True],
> > > + ["one_pkt", "a,10", "a,10", False],
> > > + ["in_order", "a,10,b,20", "a,10,b,20", False],
> > > + ["reverse_order", "a,20,b,10", "b,10,a,20", False],
> > > + ]:
> > > + name = f"_v{ipver}_{testcase[0]}"
> >
> > nit: looking at the results in NIPA:
> > https://netdev-ctrl.bots.linux.dev/logs/vmksft/net-drv/results/593442/5-so-txtime-py/stdout
> > the leading _ seems unnecessary?
> >
> > > + yield KsftNamedVariant(
> > > + name, ipver, testcase[1], testcase[2], testcase[3]
> > > + )
> > > +
> > > +
> > > +@ksft_variants(_test_variants_etf())
> > > +def test_so_txtime_etf(cfg, ipver, args_tx, args_rx, expect_fail):
> > > + """Run all variants of etf tests."""
> > > + try:
> > > + # ETF does not support change, so remove and re-add it instead.
> > > + cmd_prefix = f"tc qdisc replace dev {cfg.ifname} root"
> > > + cmd(f"{cmd_prefix} pfifo_fast")
> > > + cmd(f"{cmd_prefix} etf clockid CLOCK_TAI delta 400000")
> > > + except Exception as e:
> > > + raise KsftSkipEx("tc does not support qdisc etf. skipping") from e
> > > +
> > > + test_so_txtime(cfg, "tai", ipver, args_tx, args_rx, expect_fail)
> >
> > I _think_ we'll leave ETF installed on the device after the test?
> > That seems not super great. As we discussed before rebuilding the
> > whole hierarchy will be tedious but we could at least replace with
> > mq on exit and let it put whatever the default qdisc is as its leaves?
>
> Good point. We can not set mq on netkit. It fails netif_is_multiqueue
> in mq_init_common. I'll do the following.
>
> @@ -81,6 +81,8 @@ def main() -> None:
> """Boilerplate ksft main."""
> with NetDrvEpEnv(__file__) as cfg:
> ksft_run([test_so_txtime_mono, test_so_txtime_etf], args=(cfg,))
> + if not cfg._ns:
> + cmd(f"tc qdisc replace dev {cfg.ifname} root mq")
> ksft_exit()
Actually, looking at a private field is not a good idea.
> Alternatively could record the root qdisc at the start of the test and
> restore that.
This should work:
def main() -> None:
"""Boilerplate ksft main."""
with NetDrvEpEnv(__file__) as cfg:
+ # Record original root qdisc
+ cmd_obj = cmd((f"tc -j qdisc show dev {cfg.ifname} root"))
+ qdisc_root = json.loads(cmd_obj.stdout)[0].get("kind", None)
+
ksft_run([test_so_txtime_mono, test_so_txtime_etf], args=(cfg,))
+
+ # Restore original root qdisc. If mq, populate with default_qdisc nodes
+ if (qdisc_root):
+ cmd(f"tc qdisc replace dev {cfg.ifname} root {qdisc_root}")
ksft_exit()
Do we want to add a tc command similar to ip, bpftool, etc.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox