Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next v2 4/4] selftests: net: getsockopt_iter: add raw ICMP_FILTER coverage
From: Stanislav Fomichev @ 2026-06-30 18:20 UTC (permalink / raw)
  To: Breno Leitao
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Willem de Bruijn, Shuah Khan, netdev, linux-kernel,
	linux-kselftest, kernel-team
In-Reply-To: <20260630-getsockopt_phase2-v2-4-193335f3d4d1@debian.org>

On 06/30, Breno Leitao wrote:
> Exercise the raw getsockopt path now backed by sockopt_t. ICMP_FILTER
> returns a fixed-size struct and, unlike the int/u64 options already
> covered, clamps the length down to the user buffer on a short read
> instead of failing, so check that semantic explicitly along with the
> exact and oversized cases, the -EOPNOTSUPP path on a non-ICMP raw
> socket, and an unknown optname.
> 
> Signed-off-by: Breno Leitao <leitao@debian.org>

Acked-by: Stanislav Fomichev <sdf@fomichev.me>

^ permalink raw reply

* Re: [PATCH net-next v2 3/4] ipv4: raw: convert do_raw_getsockopt to sockopt_t
From: Stanislav Fomichev @ 2026-06-30 18:20 UTC (permalink / raw)
  To: Breno Leitao
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Willem de Bruijn, Shuah Khan, netdev, linux-kernel,
	linux-kselftest, kernel-team
In-Reply-To: <20260630-getsockopt_phase2-v2-3-193335f3d4d1@debian.org>

On 06/30, Breno Leitao wrote:
> Continue converting the proto-layer getsockopt callbacks to the sockopt_t
> interface, switching do_raw_getsockopt() and its raw_geticmpfilter()
> helper to take a sockopt_t.
> 
> The thin raw_getsockopt() wrapper keeps its __user signature for now: it
> builds a user-backed sockopt_t with sockopt_init_user(), calls the helper,
> and writes the returned length back to optlen. The helper uses
> copy_to_iter() instead of copy_to_user(). No functional change.
> 
> Signed-off-by: Breno Leitao <leitao@debian.org>

Acked-by: Stanislav Fomichev <sdf@fomichev.me>

^ permalink raw reply

* Re: [PATCH net-next v2 2/4] udp: convert udp_lib_getsockopt to sockopt_t
From: Stanislav Fomichev @ 2026-06-30 18:20 UTC (permalink / raw)
  To: Breno Leitao
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Willem de Bruijn, Shuah Khan, netdev, linux-kernel,
	linux-kselftest, kernel-team
In-Reply-To: <20260630-getsockopt_phase2-v2-2-193335f3d4d1@debian.org>

On 06/30, Breno Leitao wrote:
> In preparation for converting the proto-layer getsockopt callbacks to the
> sockopt_t interface, switch udp_lib_getsockopt() to take a sockopt_t.
> 
> The thin udp_getsockopt()/udpv6_getsockopt() wrappers keep their __user
> signature for now: they build a user-backed sockopt_t with
> sockopt_init_user(), call the helper, and write the returned length back
> to optlen. The helper uses copy_to_iter() instead of copy_to_user().
> No functional change.
> 
> Signed-off-by: Breno Leitao <leitao@debian.org>

Acked-by: Stanislav Fomichev <sdf@fomichev.me>

^ permalink raw reply

* Re: [PATCH net-next v2 1/4] net: add sockopt_init_user() for getsockopt conversion
From: Stanislav Fomichev @ 2026-06-30 18:19 UTC (permalink / raw)
  To: Breno Leitao
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Willem de Bruijn, Shuah Khan, netdev, linux-kernel,
	linux-kselftest, kernel-team
In-Reply-To: <20260630-getsockopt_phase2-v2-1-193335f3d4d1@debian.org>

On 06/30, Breno Leitao wrote:
> Add a helper that initializes a user-backed sockopt_t from the (optval,
> optlen) __user pair passed to a getsockopt() callback.
> 
> It is used by transitional __user getsockopt wrappers while the
> proto-layer getsockopt callbacks are converted to take a sockopt_t, and
> is removed once the conversion is complete.
> 
> The goal is to help to convert leafs. Example:
> 
>  sock_common_getsockopt(... char __user *optval, int __user *optlen)
>       → udp_getsockopt(sk, level, optname, optval__user, optlen__user)
>                → udp_lib_getsockopt(sk, level, optname, &opt)   /* needs a sockopt_t */
> 
> Signed-off-by: Breno Leitao <leitao@debian.org>

Acked-by: Stanislav Fomichev <sdf@fomichev.me>

^ permalink raw reply

* Re: [PATCH net] net/smc: fix UAF in smc_cdc_rx_handler() by pinning the socket
From: Xiang Mei @ 2026-06-30 18:18 UTC (permalink / raw)
  To: Sidraya Jayagond
  Cc: D . Wythe, Dust Li, Wenjia Zhang, Mahanta Jambigi, Tony Lu,
	Wen Gu, netdev, David S . Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Hans Wippel, linux-rdma, linux-s390,
	Weiming Shi
In-Reply-To: <cbd7c6c1-15e9-4a9a-aaca-4cbb5bd157c7@linux.ibm.com>

On Mon, Jun 29, 2026 at 10:40 PM Sidraya Jayagond <sidraya@linux.ibm.com> wrote:
>
>
>
> On 27/06/26 7:19 am, Xiang Mei wrote:
> > smc_cdc_rx_handler() looks up the connection by token under the link
> > group's conns_lock, drops the lock, and then dereferences conn and the
> > smc_sock derived from it, ending in sock_hold(&smc->sk) inside
> > smc_cdc_msg_recv(). No reference is held across the lock release.
> >
> > The only reference pinning the socket while the connection is
> > discoverable in the link group is taken in smc_lgr_register_conn()
> > (sock_hold) and dropped in __smc_lgr_unregister_conn() (sock_put), both
> > under conns_lock. Once the handler drops conns_lock, a concurrent
> > close() -> smc_release() -> smc_conn_free() -> smc_lgr_unregister_conn()
> > can drop that reference and free the smc_sock, so the handler's later
> > sock_hold() runs on freed memory:
> >
> >   WARNING: lib/refcount.c:25 at refcount_warn_saturate
> >   Workqueue: rxe_wq do_work
> >    refcount_warn_saturate (lib/refcount.c:25)
> >    smc_cdc_msg_recv (net/smc/smc_cdc.c:430)
> >    smc_cdc_rx_handler (net/smc/smc_cdc.c:502)
> >    smc_wr_rx_tasklet_fn (net/smc/smc_wr.c:445)
> >    tasklet_action_common (kernel/softirq.c:938)
> >    handle_softirqs (kernel/softirq.c:622)
> >   Kernel panic - not syncing: panic_on_warn set
> >
> > Only SMC-R is affected. The SMC-D receive tasklet is stopped by
> > tasklet_kill(&conn->rx_tsklet) in smc_conn_free() before the connection
> > is unregistered, so it cannot run concurrently with the free.
> >
> > Take the socket reference while still holding conns_lock, so the
> > registration reference can no longer be the last one, and drop it once
> > the handler is done.
> >
> > Fixes: d7b0e37c1ac1 ("net/smc: restructure CDC message reception")
> > Reported-by: Weiming Shi <bestswngs@gmail.com>
> > Assisted-by: Claude:claude-opus-4-8
> > Signed-off-by: Xiang Mei <xmei5@asu.edu>
> > ---
> >  net/smc/smc_cdc.c | 12 +++++++++---
> >  1 file changed, 9 insertions(+), 3 deletions(-)
> >
> > diff --git a/net/smc/smc_cdc.c b/net/smc/smc_cdc.c
> > index 619b3bab3824..b809139d7e87 100644
> > --- a/net/smc/smc_cdc.c
> > +++ b/net/smc/smc_cdc.c
> > @@ -483,21 +483,27 @@ static void smc_cdc_rx_handler(struct ib_wc *wc, void *buf)
> >       lgr = smc_get_lgr(link);
> >       read_lock_bh(&lgr->conns_lock);
> >       conn = smc_lgr_find_conn(ntohl(cdc->token), lgr);
> > +     if (conn && !conn->out_of_sync)
> > +             sock_hold(&container_of(conn, struct smc_sock, conn)->sk);
> > +     else
> > +             conn = NULL;
> >       read_unlock_bh(&lgr->conns_lock);
> > -     if (!conn || conn->out_of_sync)
> > +     if (!conn)
> >               return;
> >       smc = container_of(conn, struct smc_sock, conn);
> >
>
> Fix looks correct.
> A few nits on the implementation:
> container_of() is called twice for the same conn. The conn = NULL
> sentinel and the second post unlock check can also be dropped. Flip the
> condition, early return inside the lock, compute smc once:
>
>         if (!conn || conn->out_of_sync) {
>                 read_unlock_bh(&lgr->conns_lock);
>                 return;
>         }
>         smc = container_of(conn, struct smc_sock, conn);
>         sock_hold(&smc->sk);
>         read_unlock_bh(&lgr->conns_lock);
>
> Also please initialize smc = NULL at declaration, it's not a bug now
> since the early return guards it, just to make it refactor safe.
>

Thanks so much for the review!

Both suggestions are good. v2 takes the reference under conns_lock and
returns early inside the lock. And smc is also initialized to NULL at
declaration.

Will send v2.

Xiang


> >       if (cdc->prod_flags.failover_validation) {
> >               smc_cdc_msg_validate(smc, cdc, link);
> > -             return;
> > +             goto out;
> >       }
> >       if (smc_cdc_before(ntohs(cdc->seqno),
> >                          conn->local_rx_ctrl.seqno))
> >               /* received seqno is old */
> > -             return;
> > +             goto out;
> >
> >       smc_cdc_msg_recv(smc, cdc);
> > +out:
> > +     sock_put(&smc->sk);
> >  }
> >
> >  static struct smc_wr_rx_handler smc_cdc_rx_handlers[] = {
>

^ permalink raw reply

* Re: [PATCH v6 1/9] block: partitions: of: Skip child nodes without reg property
From: Rob Herring @ 2026-06-30 18:02 UTC (permalink / raw)
  To: Loic Poulain
  Cc: Ulf Hansson, Krzysztof Kozlowski, Conor Dooley, Bjorn Andersson,
	Konrad Dybcio, Jens Axboe, Johannes Berg, Jeff Johnson,
	Bartosz Golaszewski, Marcel Holtmann, Luiz Augusto von Dentz,
	Balakrishna Godavarthi, Rocky Liao, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Srinivas Kandagatla,
	Andrew Lunn, Heiner Kallweit, Russell King, Saravana Kannan,
	Christian Marangi, linux-mmc, devicetree, linux-kernel,
	linux-arm-msm, linux-block, linux-wireless, ath10k,
	linux-bluetooth, netdev, daniel, stable, Bartosz Golaszewski
In-Reply-To: <20260629-block-as-nvmem-v6-1-f02513dcd46d@oss.qualcomm.com>

On Mon, Jun 29, 2026 at 10:55:20AM +0200, Loic Poulain wrote:
> Child nodes of a fixed-partitions node are not necessarily partition
> entries, for example an nvmem-layout node has no reg property. The
> current code passes a NULL reg pointer and uninitialized len to the
> length check, which can result in a kernel panic or silent failure to
> register any partitions.

That does not sound right to me. A fixed-partitions node should only be 
defining partitions with address ranges. I would expect a partition node 
could be nvmem-layout, but not the whole address range. If you wanted 
the latter, then just do:

partitions {
  ...
};

nvmem-layout {
  ...
};

Rob

^ permalink raw reply

* Re: [PATCH net v2] net: qualcomm: rmnet: validate MAP frame length before ingress parsing
From: Xiang Mei @ 2026-06-30 17:43 UTC (permalink / raw)
  To: subash.a.kasiviswanathan
  Cc: sean.tranchetti, netdev, andrew+netdev, davem, edumazet, kuba,
	pabeni, linux-kernel, bestswngs
In-Reply-To: <20260630174110.2003121-1-xmei5@asu.edu>

Thanks so much for your refactor. It's better! I took it for v2 with
two small changes:

- Dropped the goto err1/err0; the helper has no cleanup, so it just returns
  0 or packet_len directly. (err0 was the success path anyway.)

- The no-agg caller leaks the skb when validation fails, since
  __rmnet_map_ingress_handler() is the one that owns and frees it:

if (rmnet_map_validate_packet_len(skb, port))
__rmnet_map_ingress_handler(skb, port);
else
kfree_skb(skb);

I also bounds-check the MAP header up front (skb->len < sizeof(*maph)),
since the no-agg path can get a frame shorter than the header before
maph->pkt_len is read.

v2 sent with you as Suggested-by. Tested with the original repro: OOB gone
on the no-agg path, valid frames and the deagg path unaffected.

Xiang

On Tue, Jun 30, 2026 at 10:41 AM Xiang Mei <xmei5@asu.edu> wrote:
>
> When ingress deaggregation is disabled, rmnet_map_ingress_handler() passes
> the skb straight to __rmnet_map_ingress_handler(), skipping the length
> validation that rmnet_map_deaggregate() performs on the aggregated path.
> The parser then dereferences the MAP header and csum header/trailer based on
> the on-wire pkt_len without checking skb->len, so a short frame is read out
> of bounds:
>
>   BUG: KASAN: slab-out-of-bounds in rmnet_map_checksum_downlink_packet
>   Read of size 1 at addr ffff88801118ed00 by task exploit/147
>   Call Trace:
>    ...
>    rmnet_map_checksum_downlink_packet (drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c:413)
>    __rmnet_map_ingress_handler (drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c:96)
>    rmnet_rx_handler (drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c:129)
>    __netif_receive_skb_core.constprop.0 (net/core/dev.c:6089)
>    netif_receive_skb (net/core/dev.c:6460)
>    tun_get_user (drivers/net/tun.c:1955)
>    tun_chr_write_iter (drivers/net/tun.c:2001)
>    vfs_write (fs/read_write.c:688)
>    ksys_write (fs/read_write.c:740)
>    do_syscall_64 (arch/x86/entry/syscall_64.c:94)
>    ...
>
> Factor that validation out of rmnet_map_deaggregate() into
> rmnet_map_validate_packet_len() and run it on the no-aggregation path too.
> The MAP header is bounds-checked first, since this path can receive a frame
> shorter than the header.
>
> Fixes: ceed73a2cf4a ("drivers: net: ethernet: qualcomm: rmnet: Initial implementation")
> Reported-by: Weiming Shi <bestswngs@gmail.com>
> Suggested-by: Subash Abhinov Kasiviswanathan <subash.a.kasiviswanathan@oss.qualcomm.com>
> Signed-off-by: Xiang Mei <xmei5@asu.edu>
> ---
> v2: Validate on the no-aggregation path by reusing the deaggregation
>     length checks (factored into rmnet_map_validate_packet_len()) instead
>     of adding separate pskb_may_pull() guards in __rmnet_map_ingress_handler().
>
>  .../ethernet/qualcomm/rmnet/rmnet_handlers.c  |  5 +-
>  .../net/ethernet/qualcomm/rmnet/rmnet_map.h   |  1 +
>  .../ethernet/qualcomm/rmnet/rmnet_map_data.c  | 72 ++++++++++---------
>  3 files changed, 45 insertions(+), 33 deletions(-)
>
> diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c b/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c
> index 9f3479500f85..d055a2628d8c 100644
> --- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c
> +++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c
> @@ -126,7 +126,10 @@ rmnet_map_ingress_handler(struct sk_buff *skb,
>
>                 consume_skb(skb);
>         } else {
> -               __rmnet_map_ingress_handler(skb, port);
> +               if (rmnet_map_validate_packet_len(skb, port))
> +                       __rmnet_map_ingress_handler(skb, port);
> +               else
> +                       kfree_skb(skb);
>         }
>  }
>
> diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h
> index b70284095568..60ca8b780c88 100644
> --- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h
> +++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h
> @@ -59,5 +59,6 @@ void rmnet_map_tx_aggregate_init(struct rmnet_port *port);
>  void rmnet_map_tx_aggregate_exit(struct rmnet_port *port);
>  void rmnet_map_update_ul_agg_config(struct rmnet_port *port, u32 size,
>                                     u32 count, u32 time);
> +u32 rmnet_map_validate_packet_len(struct sk_buff *skb, struct rmnet_port *port);
>
>  #endif /* _RMNET_MAP_H_ */
> diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c
> index 8b4640c5d61e..305ae15ae8f3 100644
> --- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c
> +++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c
> @@ -333,54 +333,62 @@ struct rmnet_map_header *rmnet_map_add_map_header(struct sk_buff *skb,
>         return map_header;
>  }
>
> -/* Deaggregates a single packet
> - * A whole new buffer is allocated for each portion of an aggregated frame.
> - * Caller should keep calling deaggregate() on the source skb until 0 is
> - * returned, indicating that there are no more packets to deaggregate. Caller
> - * is responsible for freeing the original skb.
> - */
> -struct sk_buff *rmnet_map_deaggregate(struct sk_buff *skb,
> -                                     struct rmnet_port *port)
> +u32 rmnet_map_validate_packet_len(struct sk_buff *skb, struct rmnet_port *port)
>  {
>         struct rmnet_map_v5_csum_header *next_hdr = NULL;
>         struct rmnet_map_header *maph;
>         void *data = skb->data;
> -       struct sk_buff *skbn;
> -       u8 nexthdr_type;
>         u32 packet_len;
>
> -       if (skb->len == 0)
> -               return NULL;
> +       if (skb->len < sizeof(*maph))
> +               return 0;
>
>         maph = (struct rmnet_map_header *)skb->data;
> +
> +       /* Some hardware can send us empty frames. Catch them */
> +       if (!maph->pkt_len)
> +               return 0;
> +
>         packet_len = ntohs(maph->pkt_len) + sizeof(*maph);
>
>         if (port->data_format & RMNET_FLAGS_INGRESS_MAP_CKSUMV4) {
>                 packet_len += sizeof(struct rmnet_map_dl_csum_trailer);
> -       } else if (port->data_format & RMNET_FLAGS_INGRESS_MAP_CKSUMV5) {
> -               if (!(maph->flags & MAP_CMD_FLAG)) {
> -                       packet_len += sizeof(*next_hdr);
> -                       if (maph->flags & MAP_NEXT_HEADER_FLAG)
> -                               next_hdr = data + sizeof(*maph);
> -                       else
> -                               /* Mapv5 data pkt without csum hdr is invalid */
> -                               return NULL;
> -               }
> +       } else if ((port->data_format & RMNET_FLAGS_INGRESS_MAP_CKSUMV5) &&
> +                  !(maph->flags & MAP_CMD_FLAG)) {
> +               /* Mapv5 data pkt without csum hdr is invalid */
> +               if (!(maph->flags & MAP_NEXT_HEADER_FLAG))
> +                       return 0;
> +
> +               packet_len += sizeof(*next_hdr);
> +               next_hdr = data + sizeof(*maph);
>         }
>
> -       if (((int)skb->len - (int)packet_len) < 0)
> -               return NULL;
> +       if (skb->len < packet_len)
> +               return 0;
>
> -       /* Some hardware can send us empty frames. Catch them */
> -       if (!maph->pkt_len)
> -               return NULL;
> +       if (next_hdr &&
> +           u8_get_bits(next_hdr->header_info, MAPV5_HDRINFO_HDR_TYPE_FMASK) !=
> +           RMNET_MAP_HEADER_TYPE_CSUM_OFFLOAD)
> +               return 0;
>
> -       if (next_hdr) {
> -               nexthdr_type = u8_get_bits(next_hdr->header_info,
> -                                          MAPV5_HDRINFO_HDR_TYPE_FMASK);
> -               if (nexthdr_type != RMNET_MAP_HEADER_TYPE_CSUM_OFFLOAD)
> -                       return NULL;
> -       }
> +       return packet_len;
> +}
> +
> +/* Deaggregates a single packet
> + * A whole new buffer is allocated for each portion of an aggregated frame.
> + * Caller should keep calling deaggregate() on the source skb until 0 is
> + * returned, indicating that there are no more packets to deaggregate. Caller
> + * is responsible for freeing the original skb.
> + */
> +struct sk_buff *rmnet_map_deaggregate(struct sk_buff *skb,
> +                                     struct rmnet_port *port)
> +{
> +       struct sk_buff *skbn;
> +       u32 packet_len;
> +
> +       packet_len = rmnet_map_validate_packet_len(skb, port);
> +       if (!packet_len)
> +               return NULL;
>
>         skbn = alloc_skb(packet_len + RMNET_MAP_DEAGGR_SPACING, GFP_ATOMIC);
>         if (!skbn)
> --
> 2.43.0
>

^ permalink raw reply

* [PATCH net v2] net: qualcomm: rmnet: validate MAP frame length before ingress parsing
From: Xiang Mei @ 2026-06-30 17:41 UTC (permalink / raw)
  To: subash.a.kasiviswanathan, sean.tranchetti, netdev
  Cc: andrew+netdev, davem, edumazet, kuba, pabeni, linux-kernel,
	bestswngs, Xiang Mei

When ingress deaggregation is disabled, rmnet_map_ingress_handler() passes
the skb straight to __rmnet_map_ingress_handler(), skipping the length
validation that rmnet_map_deaggregate() performs on the aggregated path.
The parser then dereferences the MAP header and csum header/trailer based on
the on-wire pkt_len without checking skb->len, so a short frame is read out
of bounds:

  BUG: KASAN: slab-out-of-bounds in rmnet_map_checksum_downlink_packet
  Read of size 1 at addr ffff88801118ed00 by task exploit/147
  Call Trace:
   ...
   rmnet_map_checksum_downlink_packet (drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c:413)
   __rmnet_map_ingress_handler (drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c:96)
   rmnet_rx_handler (drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c:129)
   __netif_receive_skb_core.constprop.0 (net/core/dev.c:6089)
   netif_receive_skb (net/core/dev.c:6460)
   tun_get_user (drivers/net/tun.c:1955)
   tun_chr_write_iter (drivers/net/tun.c:2001)
   vfs_write (fs/read_write.c:688)
   ksys_write (fs/read_write.c:740)
   do_syscall_64 (arch/x86/entry/syscall_64.c:94)
   ...

Factor that validation out of rmnet_map_deaggregate() into
rmnet_map_validate_packet_len() and run it on the no-aggregation path too.
The MAP header is bounds-checked first, since this path can receive a frame
shorter than the header.

Fixes: ceed73a2cf4a ("drivers: net: ethernet: qualcomm: rmnet: Initial implementation")
Reported-by: Weiming Shi <bestswngs@gmail.com>
Suggested-by: Subash Abhinov Kasiviswanathan <subash.a.kasiviswanathan@oss.qualcomm.com>
Signed-off-by: Xiang Mei <xmei5@asu.edu>
---
v2: Validate on the no-aggregation path by reusing the deaggregation
    length checks (factored into rmnet_map_validate_packet_len()) instead
    of adding separate pskb_may_pull() guards in __rmnet_map_ingress_handler().

 .../ethernet/qualcomm/rmnet/rmnet_handlers.c  |  5 +-
 .../net/ethernet/qualcomm/rmnet/rmnet_map.h   |  1 +
 .../ethernet/qualcomm/rmnet/rmnet_map_data.c  | 72 ++++++++++---------
 3 files changed, 45 insertions(+), 33 deletions(-)

diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c b/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c
index 9f3479500f85..d055a2628d8c 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c
@@ -126,7 +126,10 @@ rmnet_map_ingress_handler(struct sk_buff *skb,
 
 		consume_skb(skb);
 	} else {
-		__rmnet_map_ingress_handler(skb, port);
+		if (rmnet_map_validate_packet_len(skb, port))
+			__rmnet_map_ingress_handler(skb, port);
+		else
+			kfree_skb(skb);
 	}
 }
 
diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h
index b70284095568..60ca8b780c88 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h
@@ -59,5 +59,6 @@ void rmnet_map_tx_aggregate_init(struct rmnet_port *port);
 void rmnet_map_tx_aggregate_exit(struct rmnet_port *port);
 void rmnet_map_update_ul_agg_config(struct rmnet_port *port, u32 size,
 				    u32 count, u32 time);
+u32 rmnet_map_validate_packet_len(struct sk_buff *skb, struct rmnet_port *port);
 
 #endif /* _RMNET_MAP_H_ */
diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c
index 8b4640c5d61e..305ae15ae8f3 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c
@@ -333,54 +333,62 @@ struct rmnet_map_header *rmnet_map_add_map_header(struct sk_buff *skb,
 	return map_header;
 }
 
-/* Deaggregates a single packet
- * A whole new buffer is allocated for each portion of an aggregated frame.
- * Caller should keep calling deaggregate() on the source skb until 0 is
- * returned, indicating that there are no more packets to deaggregate. Caller
- * is responsible for freeing the original skb.
- */
-struct sk_buff *rmnet_map_deaggregate(struct sk_buff *skb,
-				      struct rmnet_port *port)
+u32 rmnet_map_validate_packet_len(struct sk_buff *skb, struct rmnet_port *port)
 {
 	struct rmnet_map_v5_csum_header *next_hdr = NULL;
 	struct rmnet_map_header *maph;
 	void *data = skb->data;
-	struct sk_buff *skbn;
-	u8 nexthdr_type;
 	u32 packet_len;
 
-	if (skb->len == 0)
-		return NULL;
+	if (skb->len < sizeof(*maph))
+		return 0;
 
 	maph = (struct rmnet_map_header *)skb->data;
+
+	/* Some hardware can send us empty frames. Catch them */
+	if (!maph->pkt_len)
+		return 0;
+
 	packet_len = ntohs(maph->pkt_len) + sizeof(*maph);
 
 	if (port->data_format & RMNET_FLAGS_INGRESS_MAP_CKSUMV4) {
 		packet_len += sizeof(struct rmnet_map_dl_csum_trailer);
-	} else if (port->data_format & RMNET_FLAGS_INGRESS_MAP_CKSUMV5) {
-		if (!(maph->flags & MAP_CMD_FLAG)) {
-			packet_len += sizeof(*next_hdr);
-			if (maph->flags & MAP_NEXT_HEADER_FLAG)
-				next_hdr = data + sizeof(*maph);
-			else
-				/* Mapv5 data pkt without csum hdr is invalid */
-				return NULL;
-		}
+	} else if ((port->data_format & RMNET_FLAGS_INGRESS_MAP_CKSUMV5) &&
+		   !(maph->flags & MAP_CMD_FLAG)) {
+		/* Mapv5 data pkt without csum hdr is invalid */
+		if (!(maph->flags & MAP_NEXT_HEADER_FLAG))
+			return 0;
+
+		packet_len += sizeof(*next_hdr);
+		next_hdr = data + sizeof(*maph);
 	}
 
-	if (((int)skb->len - (int)packet_len) < 0)
-		return NULL;
+	if (skb->len < packet_len)
+		return 0;
 
-	/* Some hardware can send us empty frames. Catch them */
-	if (!maph->pkt_len)
-		return NULL;
+	if (next_hdr &&
+	    u8_get_bits(next_hdr->header_info, MAPV5_HDRINFO_HDR_TYPE_FMASK) !=
+	    RMNET_MAP_HEADER_TYPE_CSUM_OFFLOAD)
+		return 0;
 
-	if (next_hdr) {
-		nexthdr_type = u8_get_bits(next_hdr->header_info,
-					   MAPV5_HDRINFO_HDR_TYPE_FMASK);
-		if (nexthdr_type != RMNET_MAP_HEADER_TYPE_CSUM_OFFLOAD)
-			return NULL;
-	}
+	return packet_len;
+}
+
+/* Deaggregates a single packet
+ * A whole new buffer is allocated for each portion of an aggregated frame.
+ * Caller should keep calling deaggregate() on the source skb until 0 is
+ * returned, indicating that there are no more packets to deaggregate. Caller
+ * is responsible for freeing the original skb.
+ */
+struct sk_buff *rmnet_map_deaggregate(struct sk_buff *skb,
+				      struct rmnet_port *port)
+{
+	struct sk_buff *skbn;
+	u32 packet_len;
+
+	packet_len = rmnet_map_validate_packet_len(skb, port);
+	if (!packet_len)
+		return NULL;
 
 	skbn = alloc_skb(packet_len + RMNET_MAP_DEAGGR_SPACING, GFP_ATOMIC);
 	if (!skbn)
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH net v3] octeontx2-pf: check DMAC extraction support before filtering
From: Harshitha Ramamurthy @ 2026-06-30 17:38 UTC (permalink / raw)
  To: nshettyj
  Cc: netdev, linux-kernel, sgoutham, gakula, sbhatta, hkelam,
	bbhushan2, andrew+netdev, davem, edumazet, kuba, pabeni, naveenm,
	tduszynski, sumang
In-Reply-To: <20260630162622.2667086-1-nshettyj@marvell.com>

On Tue, Jun 30, 2026 at 9:26 AM <nshettyj@marvell.com> wrote:
>
> From: Suman Ghosh <sumang@marvell.com>
>
> Currently, configuring a VF MAC address via the PF (e.g., 'ip link
> set <pf> vf 0 mac <mac>') blindly attempts to install a DMAC-based
> hardware filter. However, the hardware parser profile might not
> support DMAC extraction.
>
> Check if the hardware parsing profile supports DMAC extraction
> before adding the filter. Additionally, emit a warning message
> to inform the operator if the MAC filter installation fails due
> to missing DMAC extraction support.
>
> Fixes: f0c2982aaf98 ("octeontx2-pf: Add support for SR-IOV management functions")
> Signed-off-by: Suman Ghosh <sumang@marvell.com>
> Signed-off-by: Nitin Shetty J <nshettyj@marvell.com>
>
> ---
> v3:
>  - Update config->mac only after hardware programming succeeds in
>    otx2_set_vf_mac().
> v2:
>  - Move the DMAC extraction check from otx2_set_vf_mac() into
>    otx2_do_set_vf_mac() which already holds pf->mbox.lock, so all
>    mbox operations are under a single lock/unlock pair. All error
>    paths now use the existing goto-out pattern, eliminating the
>    scattered mutex_unlock() + return calls from v1.
>  - Return -EOPNOTSUPP instead of 0 when DMAC extraction is not
>    supported, so the caller gets an explicit error rather than a
>    silent success.
> ---
>  .../ethernet/marvell/octeontx2/nic/otx2_pf.c  | 44 ++++++++++++++++---
>  1 file changed, 38 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
> index b63df5737ff2..697570765957 100644
> --- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
> +++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
> @@ -2517,10 +2517,43 @@ EXPORT_SYMBOL(otx2_config_hwtstamp_set);
>
>  static int otx2_do_set_vf_mac(struct otx2_nic *pf, int vf, const u8 *mac)
>  {
> +       struct npc_get_field_status_req *freq;
> +       struct npc_get_field_status_rsp *frsp;
>         struct npc_install_flow_req *req;
>         int err;
>
>         mutex_lock(&pf->mbox.lock);
> +
> +       /* Skip installing the DMAC filter if the hardware parser profile
> +        * does not support DMAC extraction.
> +        */
> +       freq = otx2_mbox_alloc_msg_npc_get_field_status(&pf->mbox);
> +       if (!freq) {
> +               err = -ENOMEM;
> +               goto out;
> +       }
> +
> +       freq->field = NPC_DMAC;
> +       if (otx2_sync_mbox_msg(&pf->mbox)) {
> +               err = -EINVAL;
> +               goto out;
> +       }
> +
> +       frsp = (struct npc_get_field_status_rsp *)otx2_mbox_get_rsp
> +              (&pf->mbox.mbox, 0, &freq->hdr);
> +       if (IS_ERR(frsp)) {
> +               err = PTR_ERR(frsp);
> +               goto out;
> +       }
> +
> +       if (!frsp->enable) {
> +               netdev_warn(pf->netdev,
> +                           "VF %d MAC filter not installed: DMAC extraction not supported by parser profile\n",
> +                           vf);
> +               err = -EOPNOTSUPP;
> +               goto out;
> +       }
> +
>         req = otx2_mbox_alloc_msg_npc_install_flow(&pf->mbox);
>         if (!req) {
>                 err = -ENOMEM;
> @@ -2559,13 +2592,12 @@ static int otx2_set_vf_mac(struct net_device *netdev, int vf, u8 *mac)
>         if (!is_valid_ether_addr(mac))
>                 return -EINVAL;
>
> -       config = &pf->vf_configs[vf];
> -       ether_addr_copy(config->mac, mac);
> -
>         ret = otx2_do_set_vf_mac(pf, vf, mac);
> -       if (ret == 0)
> -               dev_info(&pdev->dev,
> -                        "Load/Reload VF driver\n");
> +       if (ret == 0) {
> +               config = &pf->vf_configs[vf];
> +               ether_addr_copy(config->mac, mac);
> +               dev_info(&pdev->dev, "Load/Reload VF driver\n");
> +       }

This is a valid change and would have been nice to mention it in the
commit message. Either way:

Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>

>
>         return ret;
>  }
> --
> 2.48.1
>

^ permalink raw reply

* Re: [PATCH net-next v1 0/2] Reuse threaded NAPI kthread across napi_del()/napi_add().
From: Mina Almasry @ 2026-06-30 17:38 UTC (permalink / raw)
  To: Jakub Kicinski, Harshitha Ramamurthy, Jordan Rhee
  Cc: Shuhao Tan, David S . Miller, Eric Dumazet, Paolo Abeni,
	Simon Horman, Andrew Lunn, Shuah Khan, Samiullah Khawaja,
	Kuniyuki Iwashima, netdev, linux-kernel, linux-kselftest
In-Reply-To: <20260629181955.00e63b61@kernel.org>

On Mon, Jun 29, 2026 at 6:19 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Mon, 29 Jun 2026 17:47:03 -0700 Shuhao Tan wrote:
> > > Send a netdev Netlink notification when NAPI is re-created and
> > > let the userspace re-apply the settings?
> >
> > It feels surprising that the userspace needs to reconfigure thread
> > properties when changing NIC configurations unrelated to threading.
> > Another downside is that when userspace configures NIC configurations
> > in quick succession, re-application becomes messy because a previous
> > re-application might still be in progress when the thread is gone.
>
> Can you explain more about your deployment and system configuration
> flow? We may be adding micro optimizations when the problem is that
> we recreate the NAPIs in the first place.

We have an AF_XDP application with extremely low latency and jitter
requirements running on our servers. Sami developed busypolling
threaded napi for them. Since it's an AF_XDP application, they attach
their umem to specific RX queues, and then configure threaded NAPI
busypolling to achieve low latency. That involves using the Netlink
API to set the threaded/busypolling property, grabbing the kthread
PID, and setting some properties on the kthread. Concretely, something
like:

```
  local napi_id
  napi_id=$(call_ynl --output-json --do queue-get \
    --json "{\"ifindex\": ${ifindex}, \"id\": ${q_id}, \"type\": \"rx\"}" | \
    jq -r '."napi-id"')

  echo "Enabling busypolling for queue ${q_id} (NAPI ${napi_id}) on CPU ${cpu}"
  call_ynl --do napi-set --json "{\"id\": \"${napi_id}\",
\"threaded\": \"busy-poll\"}" >/dev/null

  local napi_kthread_pid
  napi_kthread_pid=$(call_ynl --do napi-get --output-json \
    --json "{\"id\": \"${napi_id}\"}" | jq -r '."pid" // empty')

  taskset -pc "${cpu}" "${napi_kthread_pid}" >/dev/null
```

The bug is that the taskset configurations disappear when the user
runs an unrelated ethtool command. Yes, the root cause is that an
unrelated ethtool driver config on GVE, gve_adjust_config() will
recreate the NAPIs. My understanding is that NAPIs are recreated on
ethtool commands as WAI and standard upstream driver behavior; is that
not correct? If we held onto the same NAPIs during driver reconfigs,
this issue would indeed be fixed. Is holding onto the same NAPI during
driver reconfigurations an appropriate/feasible fix here?

Other ideas, FWIW:

1. We could add an optional netlink argument, "remember_napi_kthread,"
that customizes this behavior for users who want to remember thread
config.

2. We add an optional netlink argument: "bind_napi_kthread_to_cpu"
that not only sets threaded but also binds the created thread to a
specific cpu set, and then include that cpu set as part of the
napi_config so that's it's remembered on reconfigs.

But I think the best options are (a) not recreating napis during
driver configs or (b) this patch to park/unpark the thread, TBH.

-- 
Thanks,
Mina

^ permalink raw reply

* RE: [PATCH net-next v6 12/15] onsemi: s2500: Add driver support for TS2500 MAC-PHY
From: Selvamani Rajagopal @ 2026-06-30 17:36 UTC (permalink / raw)
  To: Uwe Kleine-König
  Cc: Andrew Lunn, Piergiorgio Beruto, Heiner Kallweit, Russell King,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Andrew Lunn, Parthiban Veerasooran, Richard Cochran, Rob Herring,
	Krzysztof Kozlowski, Conor Dooley, Simon Horman, Jonathan Corbet,
	Shuah Khan, netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	devicetree@vger.kernel.org, linux-doc@vger.kernel.org, Jerry Ray
In-Reply-To: <akP3jrbFLBfS5UqV@monoceros>

> -----Original Message-----
> From: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
> Sent: Tuesday, June 30, 2026 10:08 AM
> Subject: Re: [PATCH net-next v6 12/15] onsemi: s2500: Add driver support for TS2500
> MAC-PHY
> 
> > +static const struct spi_device_id s2500_ids[] = {
> > +	{ "s2500" },
> > +	{}
> 
> (i.e. use a named initializer, a space between { and } and no empty line
> before MODULE_DEVICE_TABLE()).
> 
> Also the driver should probably have a
> 
> 	MODULE_DEVICE_TABLE(of, s2500_of_match);
> 
> Best regards
> Uwe

Thanks for your review. Will take care of your three comments
(space between {}, no empty line, missing MODULE_DEVICE_TABLE macro)


^ permalink raw reply

* Re: [PATCH v2 net-next] ipv4: fib: fix route re-dump in inet_dump_fib() on multi-batch dump
From: Ido Schimmel @ 2026-06-30 17:24 UTC (permalink / raw)
  To: Pengfei Zhang
  Cc: dsahern, davem, edumazet, kuba, pabeni, horms, netdev,
	linux-kernel, chenzhangqi, baohua, zhangpengfei16
In-Reply-To: <20260630084220.2711025-1-zhangfeionline@gmail.com>

On Tue, Jun 30, 2026 at 04:42:20PM +0800, Pengfei Zhang wrote:
> inet_dump_fib() saves its progress in cb->args[1] as a positional
> index within the current hash chain.  Between batches, a concurrent
> fib_new_table() can insert a new table at the chain head, shifting
> all existing entries.  On resume the saved index lands on a different
> table, causing already-dumped tables to be re-dumped and the
> originally suspended table to restart from the beginning.
> 
> Fix by storing tb->tb_id in cb->args[1] instead of a positional
> index, mirroring the fix applied to inet6_dump_fib() in commit
> 9facb861dc6b ("ipv6: fib6: fix NULL deref in fib6_walk_continue()
> on multi-batch dump").
> 
> Signed-off-by: Pengfei Zhang <zhangfeionline@gmail.com>

Reviewed-by: Ido Schimmel <idosch@nvidia.com>

^ permalink raw reply

* Re: [PATCH net-next v9 2/6] dt-bindings: ethernet: eswin: add EIC7700 eth1 RX clock inversion variant
From: Conor Dooley @ 2026-06-30 17:14 UTC (permalink / raw)
  To: lizhi2
  Cc: devicetree, andrew+netdev, davem, edumazet, kuba, robh, krzk+dt,
	conor+dt, netdev, pabeni, mcoquelin.stm32, alexandre.torgue,
	rmk+kernel, pjw, palmer, aou, alex, linux-riscv, linux-stm32,
	linux-arm-kernel, linux-kernel, maxime.chevallier, ningyu, linmin,
	pinkesh.vaghela, pritesh.patel, weishangjuan, horms, lee, wens
In-Reply-To: <20260630063239.1158-1-lizhi2@eswincomputing.com>

[-- Attachment #1: Type: text/plain, Size: 75 bytes --]

Acked-by: Conor Dooley <conor.dooley@microchip.com>
pw-bot: not-applicable

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply

* Re: [PATCH bpf-next v5 1/3] bpf: Add BPF_FIB_LOOKUP_VLAN flag to bpf_fib_lookup() helper
From: David Ahern @ 2026-06-30 17:13 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Avinash Duduskar, ast, daniel,
	andrii
  Cc: eddyz87, memxor, martin.lau, song, yonghong.song, jolsa, emil,
	john.fastabend, sdf, davem, edumazet, kuba, pabeni, horms, shuah,
	hawk, yatsenko, leon.hwang, kpsingh, a.s.protopopov, ameryhung,
	rongtao, eyal.birger, bpf, netdev, linux-kernel, linux-kselftest
In-Reply-To: <874iik2ew4.fsf@toke.dk>

On 6/30/26 10:04 AM, Toke Høiland-Jørgensen wrote:
> David Ahern <dsahern@kernel.org> writes:
> 
>> On 6/30/26 4:00 AM, Toke Høiland-Jørgensen wrote:
>>>> It does not make sense to require a flag to get lookup output. vlan
>>>> proto of 0 is not valid, so it is a clear indication that the vlan
>>>> output parameters were not set during the lookup.
>>>
>>> Okay, so we could just unconditionally set the VLAN fields, but if we
>>> start rewriting the ifindex that would be a change of the existing
>>> behaviour that could break existing applications, no?
>>
>> Consistently dealing with upper devices is one of the reasons I never
>> sent patches for vlan support.
>>
>> xdp support is at the driver layer for real (physical) devices. The fib
>> lookup is going to return the vlan device index - a virtual device.
>> Support for xdp should not be propagated to virtual devices; it goes
>> against the intent of xdp. Any trip down this path will have to decide
>> how to handle vlan-in-vlan use cases. Where is the line drawn for fast
>> networking?
> 
> Right, which is why we need building blocks that makes it possible for
> XDP programs to do the right thing in the BPF code :)
> 
> A helper that resolves the parent could be used for stacked VLANs as
> well (just calling the helper multiple times).
> 
>>> Specifically, if an XDP application has a table of the interfaces it
>>> forwards between, today they'd get a VLAN interface ifindex, which would
>>> not be in that table, and the application would return XDP_PASS. Whereas
>>> if we change the ifindex and populate the VLAN tag, suddenly the
>>> interface would be in the table, but because the application doesn't
>>> read the returned VLAN tag, it will end up sending packets out without
>>> tagging them, leading to broken forwarding.
>>
>> I have not followed developments over the past few years. Does XDP have
>> support for vlan acceleration in the Tx path now? You really want that
>> to deal with vlans and not replicating s/w processing in ebpf.
> 
> It does not, no. There's TX metadata for AF_XDP, but VLAN support is not
> in there (see include/uapi/linux/if_xdp.h).
> 
> Doesn't mean software VLAN handling can't be useful, though; there are
> use cases other than the very high end where XDP can speed things up
> even if it has to write a VLAN tag or two...
> 
>>> So if we don't want the flag, we'd need some other mechanism to resolve
>>> the parent ifindex, AFAICT? Maybe a xdp_get_parent_ifindex() kfunc, say?
>>> That could also be made generic for other stacked interface types, I
>>> suppose.
>>>
>>> WDYT?
>>
>> dealing with stacked devices is hard :-)
>>
>> What is the return is a bond device or a vlan on a bond device?
> 
> Well, bond devices have XDP support, so you can just redirect to those :)
> 
> But yeah, each type of stacked device would need to pass different
> information through to the XDP program, and the program would need to
> support those. Building a single XDP program that supports all of them
> will require quite a bit of code, and would probably not perform super
> well. But most deployments have distinct subsets of features they need,
> so this does not have to be a blocker, IMO?
> 

Seems to me the fib_lookup for xdp needs to return the bottom device,
not the vlan device, for forwarding to work. That's why I added the
fields to the struct. That allows the program to push the vlan header if
required. My preference (dream?) was that Tx path had support to tell
the redirect the vlan and h/w added it on send.

But really, once stacked devices come into play, I just wanted to make
sure thought is given to different use cases. As you know the lookup
struct if hard bound to 64B and it is trying to cover a lot of use cases.


^ permalink raw reply

* Re: [PATCH net-next] selftests: drv-net: toeplitz: cap the Rx queue count
From: Willem de Bruijn @ 2026-06-30 17:11 UTC (permalink / raw)
  To: Jakub Kicinski, davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, Jakub Kicinski,
	shuah, willemb, noren, gal, linux-kselftest
In-Reply-To: <20260629234354.2154541-1-kuba@kernel.org>

Jakub Kicinski wrote:
> The RPS test needs a free CPU within the first RPS_MAX_CPUS (16)
> cores. This is easily violated if the NIC or env allocates the
> IRQs to cores linearly.
> 
> Cap the Rx queues at 8, we don't need more. This makes the test
> pass on CX7 in NIPA.
> 
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Reviewed-by: Willem de Bruijn <willemb@google.com>

> ---
> CC: shuah@kernel.org
> CC: willemb@google.com
> CC: noren@nvidia.com
> CC: gal@nvidia.com
> CC: linux-kselftest@vger.kernel.org
> ---
>  .../selftests/drivers/net/hw/toeplitz.py      | 22 +++++++++++++++++++
>  1 file changed, 22 insertions(+)
> 
> diff --git a/tools/testing/selftests/drivers/net/hw/toeplitz.py b/tools/testing/selftests/drivers/net/hw/toeplitz.py
> index cd7e080e6f84..571732198b93 100755
> --- a/tools/testing/selftests/drivers/net/hw/toeplitz.py
> +++ b/tools/testing/selftests/drivers/net/hw/toeplitz.py
> @@ -21,6 +21,8 @@ from lib.py import ksft_variants, KsftNamedVariant, KsftSkipEx, KsftFailEx
>  ETH_RSS_HASH_TOP = 1
>  # Must match RPS_MAX_CPUS in toeplitz.c
>  RPS_MAX_CPUS = 16
> +# Cap Rx queues so IRQ pinning leaves free CPUs in the RPS_MAX_CPUS range
> +QUEUE_CAP = 8
>  
>  
>  def _check_rps_and_rfs_not_configured(cfg):
> @@ -48,6 +50,25 @@ RPS_MAX_CPUS = 16
>          return int(data)
>  
>  
> +def _cap_queue_count(cfg):
> +    ehdr = {"header": {"dev-index": cfg.ifindex}}
> +    chans = cfg.ethnl.channels_get(ehdr)
> +
> +    config = {}
> +    restore = {}
> +    for key in ("combined-count", "rx-count"):

This assumes that combined and rx are not set at the same time.
SGTM, not expected in real devices. But technically they could be.

> +        cur = chans.get(key, 0)
> +        if cur > QUEUE_CAP:
> +            config[key] = QUEUE_CAP
> +            restore[key] = cur
> +
> +    if not config:
> +        return
> +
> +    cfg.ethnl.channels_set(ehdr | config)
> +    defer(cfg.ethnl.channels_set, ehdr | restore)
> +
> +
>  def _get_irq_cpus(cfg):
>      """
>      Read the list of IRQs for the device Rx queues.
> @@ -177,6 +198,7 @@ RPS_MAX_CPUS = 16
>      ]
>  
>      if grp:
> +        _cap_queue_count(cfg)
>          _check_rps_and_rfs_not_configured(cfg)
>      if grp == "rss":
>          irq_cpus = ",".join([str(x) for x in _get_irq_cpus(cfg)])
> -- 
> 2.54.0
> 



^ permalink raw reply

* Re: [PATCH net-next v6 12/15] onsemi: s2500: Add driver support for TS2500 MAC-PHY
From: Uwe Kleine-König @ 2026-06-30 17:08 UTC (permalink / raw)
  To: Selvamani.Rajagopal
  Cc: Andrew Lunn, Piergiorgio Beruto, Heiner Kallweit, Russell King,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Andrew Lunn, Parthiban Veerasooran, Richard Cochran, Rob Herring,
	Krzysztof Kozlowski, Conor Dooley, Simon Horman, Jonathan Corbet,
	Shuah Khan, netdev, linux-kernel, devicetree, linux-doc,
	Jerry Ray
In-Reply-To: <20260629-s2500-mac-phy-support-v6-12-18ce79500371@onsemi.com>

[-- Attachment #1: Type: text/plain, Size: 582 bytes --]

Hello,

On Mon, Jun 29, 2026 at 10:23:42AM -0700, Selvamani Rajagopal via B4 Relay wrote:
> +static const struct spi_device_id s2500_ids[] = {
> +	{ "s2500" },
> +	{}
> +};
> +
> +MODULE_DEVICE_TABLE(spi, s2500_ids);

For consistency, please make this:

	static const struct spi_device_id s2500_ids[] = {
		{ .name = "s2500" },
		{ }
	};
	MODULE_DEVICE_TABLE(spi, s2500_ids);

(i.e. use a named initializer, a space between { and } and no empty line
before MODULE_DEVICE_TABLE()).

Also the driver should probably have a

	MODULE_DEVICE_TABLE(of, s2500_of_match);

Best regards
Uwe

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH net-next v2] ipv4: igmp: remove multicast group from hash table on device destruction
From: Ido Schimmel @ 2026-06-30 16:59 UTC (permalink / raw)
  To: Yuyang Huang
  Cc: Jagielski, Jedrzej, David S. Miller, Cong Wang, David Ahern,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org
In-Reply-To: <CADXeF1EyFJ1RaJqJC8UB=rn5Jd5GLU-2WxrvpW_buRg3O3LMYA@mail.gmail.com>

On Tue, Jun 30, 2026 at 04:55:22PM +0900, Yuyang Huang wrote:
> > Hi,
> >
> > why sending this to net-next not to net if that's a bug fix?
> >
> > In the v1 thread it was said
> > >This is a long-standing bug, not a recent regression.
> >
> > so why do not cc stable kernel to get rid of this bug from
> > stable kernels in such case?
> 
> Thanks for the advise, will send this patch to stable kernel.

Please target v3 at net and add a trace given you're claiming for a
use-after-free. That way we know that the problem is real and not a
false-positive from some tool. You can reproduce it by adding enough
delay in inetdev_destroy():

BUG: KASAN: slab-use-after-free in ip_check_mc_rcu+0x2cc/0x500
Read of size 4 at addr ffff88810c571208 by task mausezahn/419

CPU: 2 UID: 0 PID: 419 Comm: mausezahn Not tainted 7.1.0-virtme-g15d4a7c23bf6 #17 PREEMPT(lazy)
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:
 <IRQ>
 dump_stack_lvl+0x4d/0x70
 print_report+0x153/0x4c2
 kasan_report+0xda/0x110
 ip_check_mc_rcu+0x2cc/0x500
 ip_route_input_rcu.part.0+0x13d/0xbc0
 ip_route_input_noref+0xb6/0x110
 ip_rcv_finish_core+0x41b/0x1d90
 ip_rcv_finish+0xea/0x1b0
 ip_rcv+0xb7/0x1b0
 __netif_receive_skb_one_core+0xfc/0x180
 process_backlog+0x1ea/0x5e0
 __napi_poll+0x97/0x480
 net_rx_action+0x97c/0xfa0
 handle_softirqs+0x18c/0x4f0
 do_softirq+0x42/0x60
 </IRQ>

^ permalink raw reply

* Re: [PATCH net] selftests: net: make busywait timeout clock portable
From: Nirmoy Das @ 2026-06-30 16:52 UTC (permalink / raw)
  To: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Shuah Khan
  Cc: netdev, linux-kselftest, stable
In-Reply-To: <2a1c4eb4-a4ba-4fc7-9bda-6a7a8d0be2f1@redhat.com>


On 30.06.26 13:11, Paolo Abeni wrote:
> On 6/26/26 4:49 PM, Nirmoy Das wrote:
>> loopy_wait() expects millisecond timestamps. However, Ubuntu Resolute
>> can use uutils date, where `date -u +%s%3N` returns seconds plus full
>> nanoseconds instead of a 3-digit millisecond field. This makes
>> busywait expire too early and can make vlan_bridge_binding.sh read a
>> stale operstate.
>>
>> Fixes: 25ae948b4478 ("selftests/net: add lib.sh")
>> Cc: stable@vger.kernel.org # 6.8+
>> Link: https://github.com/uutils/coreutils/issues/11658
>> Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com>
>> ---
>>   tools/testing/selftests/net/lib.sh | 19 +++++++++++++++++--
>>   1 file changed, 17 insertions(+), 2 deletions(-)
>>
>> diff --git a/tools/testing/selftests/net/lib.sh b/tools/testing/selftests/net/lib.sh
>> index b40694573f4c7..fcaec058be6d0 100644
>> --- a/tools/testing/selftests/net/lib.sh
>> +++ b/tools/testing/selftests/net/lib.sh
>> @@ -70,12 +70,27 @@ ksft_exit_status_merge()
>>   		$ksft_xfail $ksft_pass $ksft_skip $ksft_fail
>>   }
>>   
>> +timestamp_ms()
>> +{
>> +	local now=$(date -u +%s:%N)
> shellcheck says:
>
>   ^-^ SC2155 (warning): Declare and assign separately to avoid masking
> return values.

Thanks a lot. Sent a v2 with that fixed.


> /P
>

^ permalink raw reply

* [PATCH net v2] selftests: net: make busywait timeout clock portable
From: Nirmoy Das @ 2026-06-30 16:51 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Shuah Khan
  Cc: netdev, linux-kselftest, stable, Nirmoy Das

loopy_wait() expects millisecond timestamps. However, Ubuntu Resolute
can use uutils date, where `date -u +%s%3N` returns seconds plus full
nanoseconds instead of a 3-digit millisecond field. This makes
busywait expire too early and can make vlan_bridge_binding.sh read a
stale operstate.

Fixes: 25ae948b4478 ("selftests/net: add lib.sh")
Cc: stable@vger.kernel.org # 6.8+
Link: https://github.com/uutils/coreutils/issues/11658
Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com>
---
Changes in v2:
- Declare variables separately from command substitutions and propagate
  timestamp failures, addressing ShellCheck SC2155.

 tools/testing/selftests/net/lib.sh | 25 +++++++++++++++++++++++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/net/lib.sh b/tools/testing/selftests/net/lib.sh
index b40694573f4c7..d030d45c0e603 100644
--- a/tools/testing/selftests/net/lib.sh
+++ b/tools/testing/selftests/net/lib.sh
@@ -70,12 +70,33 @@ ksft_exit_status_merge()
 		$ksft_xfail $ksft_pass $ksft_skip $ksft_fail
 }
 
+timestamp_ms()
+{
+	local now
+	local seconds
+	local nanoseconds
+
+	now=$(date -u +%s:%N) || return
+	seconds=${now%:*}
+	nanoseconds=${now#*:}
+
+	if [[ $nanoseconds =~ ^[0-9]+$ ]]; then
+		nanoseconds=${nanoseconds:0:9}
+	else
+		nanoseconds=0
+	fi
+
+	echo $((seconds * 1000 + 10#$nanoseconds / 1000000))
+}
+
 loopy_wait()
 {
 	local sleep_cmd=$1; shift
 	local timeout_ms=$1; shift
+	local start_time
+	local current_time
 
-	local start_time="$(date -u +%s%3N)"
+	start_time=$(timestamp_ms) || return
 	while true
 	do
 		local out
@@ -84,7 +105,7 @@ loopy_wait()
 			return 0
 		fi
 
-		local current_time="$(date -u +%s%3N)"
+		current_time=$(timestamp_ms) || return
 		if ((current_time - start_time > timeout_ms)); then
 			echo -n "$out"
 			return 1

base-commit: e7cffd183c128af12683aba28ba163017ea2b192
-- 
2.43.0

^ permalink raw reply related

* [PATCH net] qede: fix off-by-one in BD ring consumption on build_skb failure
From: Shigeru Yoshida @ 2026-06-30 16:46 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Matvey Kovalev, Shigeru Yoshida, Pavel Zhigulin,
	Jamie Bainbridge
  Cc: netdev, linux-kernel

qede_rx_build_skb() and qede_tpa_rx_build_skb() do not check for a
NULL return from qede_build_skb(). When it returns NULL under memory
pressure, the functions still consume a BD from the ring before
returning NULL. The callers then recycle additional BDs, resulting in
one extra BD being consumed (off-by-one). This desynchronizes the BD
ring, which can corrupt DMA page reference counts and lead to SLUB
freelist corruption.

Commit 4e910dbe3650 ("qede: confirm skb is allocated before using")
added a NULL check inside qede_build_skb() to prevent a NULL pointer
dereference, but did not address the missing NULL checks in the
callers, making this off-by-one reachable.

Fix this by adding NULL checks for the return value of
qede_build_skb() in both qede_rx_build_skb() and
qede_tpa_rx_build_skb(), returning NULL immediately before any BD ring
manipulation.

Fixes: 4e910dbe3650 ("qede: confirm skb is allocated before using")
Signed-off-by: Shigeru Yoshida <syoshida@redhat.com>
---
 drivers/net/ethernet/qlogic/qede/qede_fp.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/ethernet/qlogic/qede/qede_fp.c b/drivers/net/ethernet/qlogic/qede/qede_fp.c
index 33e18bb69774..c11e0d8f98aa 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_fp.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_fp.c
@@ -765,6 +765,9 @@ qede_tpa_rx_build_skb(struct qede_dev *edev,
 	struct sk_buff *skb;

 	skb = qede_build_skb(rxq, bd, len, pad);
+	if (unlikely(!skb))
+		return NULL;
+
 	bd->page_offset += rxq->rx_buf_seg_size;

 	if (bd->page_offset == PAGE_SIZE) {
@@ -812,6 +815,8 @@ qede_rx_build_skb(struct qede_dev *edev,
 	}

 	skb = qede_build_skb(rxq, bd, len, pad);
+	if (unlikely(!skb))
+		return NULL;

 	if (unlikely(qede_realloc_rx_buffer(rxq, bd))) {
 		/* Incr page ref count to reuse on allocation failure so
-- 
2.54.0

^ permalink raw reply related

* Re: [PATCH net-next v3 3/5] net: af_unix: useful handling of LSM denials on SCM_RIGHTS
From: Kuniyuki Iwashima @ 2026-06-30 16:43 UTC (permalink / raw)
  To: Jori Koolstra
  Cc: Christian Brauner, Aleksa Sarai, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, netdev, linux-fsdevel,
	linux-kernel
In-Reply-To: <20260629194327.2270798-4-jkoolstra@xs4all.nl>

On Mon, Jun 29, 2026 at 12:42 PM Jori Koolstra <jkoolstra@xs4all.nl> wrote:
>
> Right now if some LSM such as Smack denies an AF_UNIX socket peer to
> receive an SCM_RIGHTS fd, the SCM_RIGHTS fd array will be cut short at
> that point, and MSG_CTRUNC is set on return of recvmsg(). This is
> highly problematic behaviour, because it leaves the receiver
> wondering what happened. As per man page MSG_CTRUNC is supposed to
> indicate that the control buffer was sized too short, but suddenly
> a permission error might result in the exact same flag being set.
> Moreover, the receiver has no chance to determine how many fds got
> originally sent and how many were suppressed.[1]
>
> Add a SO_RIGHTS_NOTRUNC option to UNIX sockets to enable more useful
> handling of LSM denials when receiving SCM_RIGHTS messages: instead of
> truncating the message at the first blocked fd, keep every fd slot
> and store the LSM errno in the blocked slot.
>
> [1]: https://github.com/uapi-group/kernel-features#useful-handling-of-lsm-denials-on-scm_rights
>
> Signed-off-by: Jori Koolstra <jkoolstra@xs4all.nl>
> ---
>  include/net/af_unix.h             |  1 +
>  include/net/scm.h                 | 15 +++++++++++----
>  include/uapi/asm-generic/socket.h |  3 +++
>  net/compat.c                      |  4 ++--
>  net/core/scm.c                    | 16 +++++++++++-----
>  net/unix/af_unix.c                |  9 +++++++++
>  6 files changed, 37 insertions(+), 11 deletions(-)
>
> diff --git a/include/net/af_unix.h b/include/net/af_unix.h
> index 34f53dde65ce..bb1b3dee02e8 100644
> --- a/include/net/af_unix.h
> +++ b/include/net/af_unix.h
> @@ -49,6 +49,7 @@ struct unix_sock {
>         struct scm_stat         scm_stat;
>         int                     inq_len;
>         bool                    recvmsg_inq;
> +       bool                    scm_rights_notrunc;
>  #if IS_ENABLED(CONFIG_AF_UNIX_OOB)
>         struct sk_buff          *oob_skb;
>  #endif
> diff --git a/include/net/scm.h b/include/net/scm.h
> index c52519669349..761cda0803fb 100644
> --- a/include/net/scm.h
> +++ b/include/net/scm.h
> @@ -50,8 +50,8 @@ struct scm_cookie {
>  #endif
>  };
>
> -void scm_detach_fds(struct msghdr *msg, struct scm_cookie *scm);
> -void scm_detach_fds_compat(struct msghdr *msg, struct scm_cookie *scm);
> +void scm_detach_fds(struct msghdr *msg, struct scm_cookie *scm, bool notrunc);
> +void scm_detach_fds_compat(struct msghdr *msg, struct scm_cookie *scm, bool notrunc);
>  int __scm_send(struct socket *sock, struct msghdr *msg, struct scm_cookie *scm);
>  void __scm_destroy(struct scm_cookie *scm);
>  struct scm_fp_list *scm_fp_dup(struct scm_fp_list *fpl);
> @@ -108,11 +108,18 @@ void scm_recv_unix(struct socket *sock, struct msghdr *msg,
>                    struct scm_cookie *scm, int flags);
>
>  static inline int scm_recv_one_fd(struct file *f, int __user *ufd,
> -                                 unsigned int flags)
> +                                 unsigned int flags, bool notrunc)
>  {
> +       bool filtered;
> +       int error;
> +
>         if (!ufd)
>                 return -EFAULT;
> -       return receive_fd(f, ufd, flags);
> +
> +       error = receive_fd_filtered(f, ufd, flags, &filtered);
> +       if (filtered && notrunc)
> +               return put_user(error, ufd);
> +       return error;
>  }
>
>  #endif /* __LINUX_NET_SCM_H */
> diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h
> index 53b5a8c002b1..c5fb2ee96830 100644
> --- a/include/uapi/asm-generic/socket.h
> +++ b/include/uapi/asm-generic/socket.h
> @@ -150,6 +150,9 @@
>  #define SO_INQ                 84
>  #define SCM_INQ                        SO_INQ
>
> +#define SO_RIGHTS_NOTRUNC      85
> +#define SCM_RIGHTS_NOTRUNC     SO_RIGHTS_NOTRUNC

SCM_RIGHTS_NOTRUNC is not needed as it's not used.


> +
>  #if !defined(__KERNEL__)
>
>  #if __BITS_PER_LONG == 64 || (defined(__x86_64__) && defined(__ILP32__))
> diff --git a/net/compat.c b/net/compat.c
> index d68cf9c3aad5..6bdf4a2c9077 100644
> --- a/net/compat.c
> +++ b/net/compat.c
> @@ -286,7 +286,7 @@ static int scm_max_fds_compat(struct msghdr *msg)
>         return (msg->msg_controllen - sizeof(struct compat_cmsghdr)) / sizeof(int);
>  }
>
> -void scm_detach_fds_compat(struct msghdr *msg, struct scm_cookie *scm)
> +void scm_detach_fds_compat(struct msghdr *msg, struct scm_cookie *scm, bool notrunc)
>  {
>         struct compat_cmsghdr __user *cm =
>                 (struct compat_cmsghdr __user *)msg->msg_control_user;
> @@ -296,7 +296,7 @@ void scm_detach_fds_compat(struct msghdr *msg, struct scm_cookie *scm)
>         int err = 0, i;
>
>         for (i = 0; i < fdmax; i++) {
> -               err = scm_recv_one_fd(scm->fp->fp[i], cmsg_data + i, o_flags);
> +               err = scm_recv_one_fd(scm->fp->fp[i], cmsg_data + i, o_flags, notrunc);
>                 if (err < 0)
>                         break;
>         }
> diff --git a/net/core/scm.c b/net/core/scm.c
> index a73b1eb30fd2..55bab203281a 100644
> --- a/net/core/scm.c
> +++ b/net/core/scm.c
> @@ -351,7 +351,7 @@ static int scm_max_fds(struct msghdr *msg)
>         return (msg->msg_controllen - sizeof(struct cmsghdr)) / sizeof(int);
>  }
>
> -void scm_detach_fds(struct msghdr *msg, struct scm_cookie *scm)
> +void scm_detach_fds(struct msghdr *msg, struct scm_cookie *scm, bool notrunc)
>  {
>         struct cmsghdr __user *cm =
>                 (__force struct cmsghdr __user *)msg->msg_control_user;
> @@ -365,12 +365,12 @@ void scm_detach_fds(struct msghdr *msg, struct scm_cookie *scm)
>                 return;
>
>         if (msg->msg_flags & MSG_CMSG_COMPAT) {
> -               scm_detach_fds_compat(msg, scm);
> +               scm_detach_fds_compat(msg, scm, notrunc);
>                 return;
>         }
>
>         for (i = 0; i < fdmax; i++) {
> -               err = scm_recv_one_fd(scm->fp->fp[i], cmsg_data + i, o_flags);
> +               err = scm_recv_one_fd(scm->fp->fp[i], cmsg_data + i, o_flags, notrunc);
>                 if (err < 0)
>                         break;
>         }
> @@ -542,8 +542,14 @@ void scm_recv_unix(struct socket *sock, struct msghdr *msg,
>         if (!__scm_recv_common(sock->sk, msg, scm, flags))
>                 return;
>
> -       if (scm->fp)
> -               scm_detach_fds(msg, scm);
> +       if (scm->fp) {
> +               struct unix_sock *u;
> +               bool notrunc;
> +
> +               u = unix_sk(sock->sk);
> +               notrunc = READ_ONCE(u->scm_rights_notrunc);

Does this build with CONFIG_UNIX=n ?


> +               scm_detach_fds(msg, scm, notrunc);
> +       }
>
>         if (sock->sk->sk_scm_pidfd)
>                 scm_pidfd_recv(msg, scm);
> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> index f7a9d55eee8a..83274ce18e06 100644
> --- a/net/unix/af_unix.c
> +++ b/net/unix/af_unix.c
> @@ -921,6 +921,7 @@ static bool unix_custom_sockopt(int optname)
>  {
>         switch (optname) {
>         case SO_INQ:
> +       case SO_RIGHTS_NOTRUNC:
>                 return true;
>         default:
>                 return false;
> @@ -956,6 +957,14 @@ static int unix_setsockopt(struct socket *sock, int level, int optname,
>
>                 WRITE_ONCE(u->recvmsg_inq, val);
>                 break;
> +
> +       case SO_RIGHTS_NOTRUNC:
> +               if (val > 1 || val < 0)
> +                       return -EINVAL;
> +
> +               WRITE_ONCE(u->scm_rights_notrunc, val);
> +               break;
> +
>         default:
>                 return -ENOPROTOOPT;
>         }
> --
> 2.54.0
>

^ permalink raw reply

* Re: [PATCH net-next] net: neigh: avoid calling neigh_forced_gc on every alloc when table is full
From: Kuniyuki Iwashima @ 2026-06-30 16:36 UTC (permalink / raw)
  To: Vimal Agrawal; +Cc: kuba, edumazet, netdev, vimal.agrawal
In-Reply-To: <CALkUMdTCYQDe3aR8mvSDMjkmYhXZRJf1S6sMRRkaem+_=Yk9WA@mail.gmail.com>

On Tue, Jun 30, 2026 at 5:01 AM Vimal Agrawal <avimalin@gmail.com> wrote:
>
> Hi Kuniyuki,
>
> You are correct that in this specific test case GC does not help since
> all entries are active/reachable. However, this is not the only
> scenario where entries can exceed gc_thresh3.
>
> In a real workload, the table can exceed gc_thresh3 with a mix of
> active and stale entries. In that case GC does help, but should not be
> called on every allocation attempt — once per 50ms is sufficient for
> GC to make progress without causing lock contention.

My mental model is that gc_thresh3 is the hard limit while gc_thresh2
is the soft limit, so if the total number of entries often exceeds gc_thresh3,
it's clearly wrong.

I think you need to set gc_thresh2 to a proper baseline (it sounds like
your current gc_thresh3 is the one) and gc_thresh3 to gc_thresh2+X
where X covers fluctuations.


>
> The rate limiting also protects against the case where GC cannot
> reclaim anything. Without it, every allocation attempt above
> gc_thresh3 triggers a full table scan holding tbl->lock, even when GC
> has no work to do.
>
> Thanks,
> Vimal
>
> On Mon, Jun 29, 2026 at 11:35 PM Kuniyuki Iwashima <kuniyu@google.com> wrote:
> >
> > On Mon, Jun 29, 2026 at 12:57 AM Vimal Agrawal <avimalin@gmail.com> wrote:
> > >
> > > Hi Kuniyuki,
> > > Thank you for the feedback.
> > > However, the rate limiting issue exists independently of the threshold
> > > values. If entries genuinely exceed gc_thresh3 — regardless of what it
> > > is set to — neigh_forced_gc() is called on every allocation attempt
> > > with no rate limiting. In my workload, most entries are
> > > active/reachable with refcnt > 1, so the GC walk traverses the entire
> > > table without reclaiming anything.
> >
> > This suggests your gc_thresh2/3 do not fit your use case.
> >
> > If GC does not help, there is no point in running it or rate-limiting
> > in the first place.
> >
> >
> > > Increasing gc_thresh3 would make
> > > this worse, not better, as GC now has a larger table to scan on each
> > > call.
> >
> > If you just increase gc_thresh3 slightly, then yes, it won't help.
> >
> >
> > >
> > > Regarding neigh_hash_shift: in my workload, neigh_alloc() returns
> > > ENOBUFS before reaching do_alloc() since GC cannot reclaim any
> > > entries. kzalloc() is never called, so neigh_hash_grow() is not
> > > involved in the latency I observed. The pre-lock time check in
> > > neigh_forced_gc() is a low-cost safeguard that prevents repeated full
> > > table scans regardless of gc_thresh3 value. It does not interfere with
> > > correct GC behaviour — if entries are still above the threshold, GC
> > > runs normally.
> > >
> > >
> > > Hi Jakub,
> > > I tested with different threshold values, filling the table completely
> > > with 32k reachable entries and attempting 1000 additional allocations.
> > > Exported neigh_forced_gc so that it can be profiled
> > >                          no change  10ms   50ms   100ms
> > > max cpu usage %          44%        11.8%  2.56%  1.42%
> > > calls > 100us (of 1000)  101        31     13     7
> > >
> > > At 10ms, max CPU usage is still 11.8% and 31 out of 1000 calls take
> > > more than 100us. Given that 50ms reduces this to 2.56% and 13 calls
> > > respectively, I would prefer 50ms as the threshold. However, I am open
> > > to further discussion on the right value.
> > >
> > > Thanks,
> > > Vimal
> > >
> > >
> > > On Fri, Jun 26, 2026 at 3:17 AM Kuniyuki Iwashima <kuniyu@google.com> wrote:
> > > >
> > > > From: Vimal Agrawal <avimalin@gmail.com>
> > > > Date: Thu, 25 Jun 2026 10:20:20 +0000
> > > > > Once the neighbour table exceeds gc_thresh3, neigh_forced_gc() is called
> > > > > on every allocation attempt with no rate limiting. In workloads with mostly
> > > > > active/reachable entries, the GC walk traverses a large portion of the
> > > > > neighbour table without reclaiming entries, holding tbl->lock for an
> > > > > extended period. This causes severe lock contention and allocation
> > > > > latencies exceeding 16ms under sustained neighbour creation.
> > > > >
> > > > > Add a pre-lock check in neigh_forced_gc() to skip the GC run if one was
> > > > > performed within the last second, avoiding repeated full table scans and
> > > > > lock acquisitions on the hot allocation path.
> > > > >
> > > > > Profiling of neigh_create() shows ~3 orders of magnitude latency
> > > > > improvement with this change.
> > > > >
> > > > > Link:https://lore.kernel.org/netdev/CALkUMdSCpx_ywYCx_ePLdm6yioO1nQWx7sSM=AEgsq0kywHxTw@mail.gmail.com/
> > > >
> > > > From the thread, these look misconfigured.
> > > >
> > > > ---8<---
> > > > net.ipv6.neigh.default.gc_thresh2 = 32768
> > > > net.ipv6.neigh.default.gc_thresh3 = 32768
> > > > ---8<---
> > > >
> > > > If gc_thresh3 is larger enough, gc_thresh2 will give you 5s
> > > > rate limiting.
> > > >
> > > > If the number of active neigh entries constantly exceeds
> > > > gc_thresh3, it will be the correct gc_thresh2 for you.
> > > >
> > > > Also, I guess you want a new kernel param for the first
> > > > neigh_hash_alloc(), which is currently fixed for 3, which
> > > > is too small for some hosts.
> > > >
> > > > 50000 entries require neigh_hash_grow() 13 times.
> > > >
> > > > Can you test this on your real workload, starting from
> > > > neigh_hash_shift=16 and appropriate gc_thresh2/3 ?
> > > >
> > > > ---8<---
> > > > diff --git a/net/core/neighbour.c b/net/core/neighbour.c
> > > > index 1349c0eedb64..a75b3750eec9 100644
> > > > --- a/net/core/neighbour.c
> > > > +++ b/net/core/neighbour.c
> > > > @@ -1817,6 +1817,22 @@ EXPORT_SYMBOL(neigh_parms_release);
> > > >  static struct lock_class_key neigh_table_proxy_queue_class;
> > > >
> > > >  static struct neigh_table __rcu *neigh_tables[NEIGH_NR_TABLES] __read_mostly;
> > > > +static __initdata unsigned long neigh_hash_shift = 3;
> > > > +
> > > > +static int __init neigh_set_hash_shift(char *str)
> > > > +{
> > > > +       ssize_t ret;
> > > > +
> > > > +       if (!str)
> > > > +               return 0;
> > > > +
> > > > +       ret = kstrtoul(str, 0, &neigh_hash_shift);
> > > > +       if (ret)
> > > > +               return 0;
> > > > +
> > > > +       return 1;
> > > > +}
> > > > +__setup("neigh_hash_shift=", neigh_set_hash_shift);
> > > >
> > > >  void neigh_table_init(int index, struct neigh_table *tbl)
> > > >  {
> > > > @@ -1843,7 +1859,7 @@ void neigh_table_init(int index, struct neigh_table *tbl)
> > > >                 panic("cannot create neighbour proc dir entry");
> > > >  #endif
> > > >
> > > > -       RCU_INIT_POINTER(tbl->nht, neigh_hash_alloc(3));
> > > > +       RCU_INIT_POINTER(tbl->nht, neigh_hash_alloc(neigh_hash_shift));
> > > >
> > > >         phsize = (PNEIGH_HASHMASK + 1) * sizeof(struct pneigh_entry *);
> > > >         tbl->phash_buckets = kzalloc(phsize, GFP_KERNEL);
> > > > ---8<---
> > > >
> > > >
> > > >
> > > > > Signed-off-by: Vimal Agrawal <vimal.agrawal@sophos.com>
> > > > > ---
> > > > >  net/core/neighbour.c | 3 +++
> > > > >  1 file changed, 3 insertions(+)
> > > > >
> > > > > diff --git a/net/core/neighbour.c b/net/core/neighbour.c
> > > > > index 1349c0eedb64..078842db3c5f 100644
> > > > > --- a/net/core/neighbour.c
> > > > > +++ b/net/core/neighbour.c
> > > > > @@ -260,6 +260,9 @@ static int neigh_forced_gc(struct neigh_table *tbl)
> > > > >       int shrunk = 0;
> > > > >       int loop = 0;
> > > > >
> > > > > +     if (!time_after(jiffies, READ_ONCE(tbl->last_flush) + HZ))
> > > > > +             return 0;
> > > > > +
> > > > >       NEIGH_CACHE_STAT_INC(tbl, forced_gc_runs);
> > > > >
> > > > >       spin_lock_bh(&tbl->lock);
> > > > > --
> > > > > 2.17.1
> > > > > v

^ permalink raw reply

* Re: [PATCH net-next v2] ionic: Change list definition method
From: Creeley, Brett @ 2026-06-30 16:26 UTC (permalink / raw)
  To: Lei Zhu, brett.creeley, andrew+netdev, davem, edumazet, kuba,
	pabeni; +Cc: netdev
In-Reply-To: <20260630065457.160081-1-zhulei_szu@163.com>



On 6/29/2026 11:54 PM, Lei Zhu wrote:
> [You don't often get email from zhulei_szu@163.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>
> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
>
>
> From: Lei Zhu <zhulei@kylinos.cn>
>
> The LIST_HEAD macro can both define a linked list and initialize
> it in one step. To simplify code, we replace the separate operations
> of linked list definition and manual initialization with the LIST_HEAD
> macro.
>
> Signed-off-by: Lei Zhu <zhulei@kylinos.cn>
> ---
> Changes in v2:
>    - Order the variable declaration lines longest to shortest
>
>   drivers/net/ethernet/pensando/ionic/ionic_rx_filter.c | 7 ++-----
>   1 file changed, 2 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/net/ethernet/pensando/ionic/ionic_rx_filter.c b/drivers/net/ethernet/pensando/ionic/ionic_rx_filter.c
> index 528114877677..c999754afb5f 100644
> --- a/drivers/net/ethernet/pensando/ionic/ionic_rx_filter.c
> +++ b/drivers/net/ethernet/pensando/ionic/ionic_rx_filter.c
> @@ -558,18 +558,15 @@ struct sync_item {
>   void ionic_rx_filter_sync(struct ionic_lif *lif)
>   {
>          struct device *dev = lif->ionic->dev;
> -       struct list_head sync_add_list;
> -       struct list_head sync_del_list;
>          struct sync_item *sync_item;
>          struct ionic_rx_filter *f;
> +       LIST_HEAD(sync_add_list);
> +       LIST_HEAD(sync_del_list);
>          struct hlist_head *head;
>          struct hlist_node *tmp;
>          struct sync_item *spos;
>          unsigned int i;
>
> -       INIT_LIST_HEAD(&sync_add_list);
> -       INIT_LIST_HEAD(&sync_del_list);
> -
LGTM. Thanks for the patch.

Reviewed-by: Brett Creeley <brett.creeley@amd.com>
>          clear_bit(IONIC_LIF_F_FILTER_SYNC_NEEDED, lif->state);
>
>          /* Copy the filters to be added and deleted
> --
> 2.25.1
>


^ permalink raw reply

* [PATCH net v3] octeontx2-pf: check DMAC extraction support before filtering
From: nshettyj @ 2026-06-30 16:26 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: sgoutham, gakula, sbhatta, hkelam, bbhushan2, andrew+netdev,
	davem, edumazet, kuba, pabeni, naveenm, tduszynski, sumang,
	hramamurthy, Nitin Shetty J

From: Suman Ghosh <sumang@marvell.com>

Currently, configuring a VF MAC address via the PF (e.g., 'ip link
set <pf> vf 0 mac <mac>') blindly attempts to install a DMAC-based
hardware filter. However, the hardware parser profile might not
support DMAC extraction.

Check if the hardware parsing profile supports DMAC extraction
before adding the filter. Additionally, emit a warning message
to inform the operator if the MAC filter installation fails due
to missing DMAC extraction support.

Fixes: f0c2982aaf98 ("octeontx2-pf: Add support for SR-IOV management functions")
Signed-off-by: Suman Ghosh <sumang@marvell.com>
Signed-off-by: Nitin Shetty J <nshettyj@marvell.com>

---
v3:
 - Update config->mac only after hardware programming succeeds in
   otx2_set_vf_mac().
v2:
 - Move the DMAC extraction check from otx2_set_vf_mac() into
   otx2_do_set_vf_mac() which already holds pf->mbox.lock, so all
   mbox operations are under a single lock/unlock pair. All error
   paths now use the existing goto-out pattern, eliminating the
   scattered mutex_unlock() + return calls from v1.
 - Return -EOPNOTSUPP instead of 0 when DMAC extraction is not
   supported, so the caller gets an explicit error rather than a
   silent success.
---
 .../ethernet/marvell/octeontx2/nic/otx2_pf.c  | 44 ++++++++++++++++---
 1 file changed, 38 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
index b63df5737ff2..697570765957 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
@@ -2517,10 +2517,43 @@ EXPORT_SYMBOL(otx2_config_hwtstamp_set);
 
 static int otx2_do_set_vf_mac(struct otx2_nic *pf, int vf, const u8 *mac)
 {
+	struct npc_get_field_status_req *freq;
+	struct npc_get_field_status_rsp *frsp;
 	struct npc_install_flow_req *req;
 	int err;
 
 	mutex_lock(&pf->mbox.lock);
+
+	/* Skip installing the DMAC filter if the hardware parser profile
+	 * does not support DMAC extraction.
+	 */
+	freq = otx2_mbox_alloc_msg_npc_get_field_status(&pf->mbox);
+	if (!freq) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	freq->field = NPC_DMAC;
+	if (otx2_sync_mbox_msg(&pf->mbox)) {
+		err = -EINVAL;
+		goto out;
+	}
+
+	frsp = (struct npc_get_field_status_rsp *)otx2_mbox_get_rsp
+	       (&pf->mbox.mbox, 0, &freq->hdr);
+	if (IS_ERR(frsp)) {
+		err = PTR_ERR(frsp);
+		goto out;
+	}
+
+	if (!frsp->enable) {
+		netdev_warn(pf->netdev,
+			    "VF %d MAC filter not installed: DMAC extraction not supported by parser profile\n",
+			    vf);
+		err = -EOPNOTSUPP;
+		goto out;
+	}
+
 	req = otx2_mbox_alloc_msg_npc_install_flow(&pf->mbox);
 	if (!req) {
 		err = -ENOMEM;
@@ -2559,13 +2592,12 @@ static int otx2_set_vf_mac(struct net_device *netdev, int vf, u8 *mac)
 	if (!is_valid_ether_addr(mac))
 		return -EINVAL;
 
-	config = &pf->vf_configs[vf];
-	ether_addr_copy(config->mac, mac);
-
 	ret = otx2_do_set_vf_mac(pf, vf, mac);
-	if (ret == 0)
-		dev_info(&pdev->dev,
-			 "Load/Reload VF driver\n");
+	if (ret == 0) {
+		config = &pf->vf_configs[vf];
+		ether_addr_copy(config->mac, mac);
+		dev_info(&pdev->dev, "Load/Reload VF driver\n");
+	}
 
 	return ret;
 }
-- 
2.48.1


^ permalink raw reply related

* Re: [PATCH net-next v3 5/5] selftest: Add tests for useful handling of LSM denials on SCM_RIGHTS
From: Kuniyuki Iwashima @ 2026-06-30 16:23 UTC (permalink / raw)
  To: Jori Koolstra
  Cc: Jakub Kicinski, Christian Brauner, Aleksa Sarai, David S . Miller,
	Eric Dumazet, Paolo Abeni, Simon Horman, netdev, linux-fsdevel,
	linux-kernel
In-Reply-To: <1957659940.3537950.1782830112890@kpc.webmail.kpnmail.nl>

On Tue, Jun 30, 2026 at 7:35 AM Jori Koolstra <jkoolstra@xs4all.nl> wrote:
>
>
> > Op 30-06-2026 16:17 CEST schreef Jakub Kicinski <kuba@kernel.org>:
> >
> >
> > On Mon, 29 Jun 2026 21:43:27 +0200 Jori Koolstra wrote:
> > > The test uses the following Smack labels:
> > >
> > >    "Sender"   - label for the sending process
> > >    "Receiver" - label for the receiving process
> > >    "SecretX"   - labels for the files being passed
> >
> > Not sure this test belongs in net/
> > 99.9% of people running this test do not use Smack.
> > At the very least you need to use XFAIL instead of SKIP
> > we use skip for problems with the env which are fixable,
> > like a command missing.
>
> Ah, right, because you can only use one of these LSMs at a time?
> I mean one of AppArmour, SELinux, Smack, TOMOYO.
>
> I just need some LSM to trigger the reject of security_file_receive()
> and Smack was the easiest to get going. The series is totally agnostic
> to the used LSM. I am fine with moving the tests elsewhere or porting
> them to SELinux if that is really necessary. We could also drop them
> altogether.
>
> What do you propose?

Maybe tools/testing/selftests/lsm ?

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox