Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] octeon_ep_vf: add NULL check for napi_build_skb()
From: Simon Horman @ 2026-04-08 17:02 UTC (permalink / raw)
  To: David Carlier
  Cc: Veerasenareddy Burru, Sathesh Edara, Shinas Rasheed,
	Satananda Burla, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, netdev, stable
In-Reply-To: <20260403200732.497307-1-devnexen@gmail.com>

On Fri, Apr 03, 2026 at 09:07:32PM +0100, David Carlier wrote:
> napi_build_skb() can return NULL on allocation failure. In
> __octep_vf_oq_process_rx(), the result is used directly without a NULL
> check in both the single-buffer and multi-fragment paths, leading to a
> NULL pointer dereference.
> 
> Add NULL checks after both napi_build_skb() calls, properly advancing
> descriptors and consuming remaining fragments on failure.
> 
> Fixes: 1cd3b407977c ("octeon_ep_vf: add Tx/Rx processing and interrupt support")
> Cc: stable@vger.kernel.org
> Signed-off-by: David Carlier <devnexen@gmail.com>

Hi David,

I appreciate that this is on the fast path, and thus I expect it
is performance critical. But this patch largely duplicates code
already present in the same function. Would it be possible
refactor things a bit - e.g. using helpers - to make the change
a bit cleaner while not hurting performance?

If so, I'd suggest splitting patch(es) that refactor the code
from the patch that fixes the bug.

...

^ permalink raw reply

* Re: [PATCH net-next v3 0/4] net: move .getsockopt away from __user buffers
From: Stanislav Fomichev @ 2026-04-08 17:02 UTC (permalink / raw)
  To: Breno Leitao
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Kuniyuki Iwashima, Willem de Bruijn, metze, axboe,
	Stanislav Fomichev, io-uring, bpf, netdev, Linus Torvalds,
	linux-kernel, kernel-team
In-Reply-To: <20260408-getsockopt-v3-0-061bb9cb355d@debian.org>

On 04/08, Breno Leitao wrote:
> Currently, the .getsockopt callback requires __user pointers:
> 
>   int (*getsockopt)(struct socket *sock, int level,
>                     int optname, char __user *optval, int __user *optlen);
> 
> This prevents kernel callers (io_uring, BPF) from using getsockopt on
> levels other than SOL_SOCKET, since they pass kernel pointers.
> 
> Following Linus' suggestion [0], this series introduces sockopt_t, a
> type-safe wrapper around iov_iter, and a getsockopt_iter callback that
> works with both user and kernel buffers. AF_PACKET and CAN raw are
> converted as initial users, with selftests covering the trickiest
> conversion patterns.
> 
> [0] https://lore.kernel.org/all/CAHk-=whmzrO-BMU=uSVXbuoLi-3tJsO=0kHj1BCPBE3F2kVhTA@mail.gmail.com/
> 
> Updates from v2 to v3:
> 
> * Use two iov in sockopt_t instead of a single one:
>   a) .iter_in that is populated by the caller and will be read-only in
>   the protocols callback.
> 
>   b) .iter_out will be populated by the protocol and it will be sent
>   back to the caller.
> 
>   - This will avoid changing the protocol reset and changing the data
>     source at the callback, making the driver callback implementation
>     and converstion saner.
> 
> * created sockptr_to_sockopt() to convert sockptr to sockopt, making the
>   call to getsockopt_iter straight-forward
> 
> Link: https://lore.kernel.org/all/CAHk-=whmzrO-BMU=uSVXbuoLi-3tJsO=0kHj1BCPBE3F2kVhTA@mail.gmail.com/ [0]
> ---
> Changes in v3:
> - Create Two iov in sockopt_t instead of a single one (Stanislav Fomichev)
> - Implement the sockptr_to_sockopt() helper (Stanislav Fomichev)
> - Link to v2: https://patch.msgid.link/20260401-getsockopt-v2-0-611df6771aff@debian.org
> 
> Changes in v2:
> - Restore optlen even on error path (getsockopt_iter fails)
> - Move af_packet.c and can instead of netlink (given these are the most
>   complicate ones).
> - Link to v1: https://patch.msgid.link/20260130-getsockopt-v1-0-9154fcff6f95@debian.org

LGTM! Not sure what's your plan for the selftest? You wanna keep it
outside or maybe repost v4 with it?

Acked-by: Stanislav Fomichev <sdf@fomichev.me>

I'm also not sure your unconditional 'copy-optlen-back' will work for every
proto, but I think we can put something into sockopt_t to make it avoid
the copy if needed in the future.

^ permalink raw reply

* Re: [net-next v9 07/10] net: bnxt: Implement software USO
From: Joe Damato @ 2026-04-08 17:04 UTC (permalink / raw)
  To: netdev, Michael Chan, Pavan Chebbi, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, horms, linux-kernel,
	leon
In-Reply-To: <adWR1OiPlQCJ8idj@devvm20253.cco0.facebook.com>

On Tue, Apr 07, 2026 at 04:23:00PM -0700, Joe Damato wrote:
> On Tue, Apr 07, 2026 at 03:03:03PM -0700, Joe Damato wrote:
> 
> [...]
> 
> >  v9:
> >    - Added inline slot check to prevent possible overwriting of in-flight
> >      headers (suggested by AI).
> 
> [...]
> 
> >  netdev_tx_t bnxt_sw_udp_gso_xmit(struct bnxt *bp,
> >  				 struct bnxt_tx_ring_info *txr,
> >  				 struct netdev_queue *txq,
> >  				 struct sk_buff *skb)
> >  {
> 
> [...]
> 
> > +
> > +	/* BD backpressure alone cannot prevent overwriting in-flight
> > +	 * headers in the inline buffer. Check slot availability directly.
> > +	 */
> > +	slots = txr->tx_inline_prod - txr->tx_inline_cons;
> > +	slots = BNXT_SW_USO_MAX_SEGS - slots;
> > +
> > +	if (unlikely(slots < num_segs)) {
> > +		netif_txq_try_stop(txq, slots, num_segs);
> > +		return NETDEV_TX_BUSY;
> 
> This is the check I added. AI says this is wrong and netdev_queues.h says:
> 
>   * @get_desc must be a formula or a function call, it must always
>   * return up-to-date information when evaluated!
> 
> which I obviously failed to do, so I'm pretty sure I got this wrong.

So, there's two options to fix this that I can think of. I am leaning torward
option 2, but if there are any strong opinions (or other options that I am
missing) please let me know:

  1. Allocate the maximum number of slots per ring and eliminate this check
     entirely. I figured this would be disliked because it potentially wastes
     memory. The driver would need ring_size / 3 slots, and if we assume the
     maximum is 2048 and the slot size is 256b, that works out to 175kb per
     ring. Of course, this only affects NICs with SW USO and the buffer isn't
     allocated for NICS with HW USO.

     This is probably simpler, but costs more memory than the existing design.

   2. Or, keep the smaller buffer that we have now (BNXT_SW_USO_MAX_SEGS (64)
      * 256b = 16kb per ring) and fix the try_stop like this:

+static inline u16 bnxt_inline_avail(struct bnxt_tx_ring_info *txr)
+{
+       return BNXT_SW_USO_MAX_SEGS -
+              (u16)(txr->tx_inline_prod - READ_ONCE(txr->tx_inline_cons));
+}
+

[...]

-       slots = txr->tx_inline_prod - txr->tx_inline_cons;
-       slots = BNXT_SW_USO_MAX_SEGS - slots;
-
-       if (unlikely(slots < num_segs)) {
-               netif_txq_try_stop(txq, slots, num_segs);
+       if (unlikely(bnxt_inline_avail(txr) < num_segs)) {
+               netif_txq_try_stop(txq, bnxt_inline_avail(txr), num_segs);

^ permalink raw reply

* Re: [PATCH net-next 00/15] net/sched: no longer acquire RTNL in qdisc dumps
From: Eric Dumazet @ 2026-04-08 17:09 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Jamal Hadi Salim, Jiri Pirko, Kuniyuki Iwashima,
	netdev, eric.dumazet
In-Reply-To: <20260408125611.3592751-1-edumazet@google.com>

On Wed, Apr 8, 2026 at 5:56 AM Eric Dumazet <edumazet@google.com> wrote:
>
> This (large) series bring RTNL avoidance to qdisc dumps.
>
> We first add annotations for data-races, so that most dump methods
> can run in parallel with data path.
>
> Then change mq and mqprio to no longer acquire each children
> qdisc spinlock.
>
> Last patch replaces RTNL with RCU for tc_dump_qdisc() and the
> qdisc ops->dump() methods.
>
> Series was too big, RTNL avoidance for class dumps will be done later.
>
> Eric Dumazet (15):
>   net/sched: rename qstats_overlimit_inc() to qstats_cpu_overlimit_inc()
>   net/sched: add qstats_cpu_drop_inc() helper
>   net/sched: add READ_ONCE() in gnet_stats_add_queue[_cpu]
>   net/sched: add qdisc_qlen_inc() and qdisc_qlen_dec()
>   net/sched: annotate data-races around sch->qstats.backlog
>   net/sched: sch_sfb: annotate data-races in sfb_dump_stats()
>   net/sched: sch_red: annotate data-races in red_dump_stats()
>   net/sched: sch_fq_codel: remove data-races from fq_codel_dump_stats()
>   net/sched: sch_pie: annotate data-races in pie_dump_stats()
>   net/sched: sch_fq_pie: annotate data-races in fq_pie_dump_stats()
>   net_sched: sch_hhf: annotate data-races in hhf_dump_stats()
>   net/sched: sch_choke: annotate data-races in choke_dump_stats()
>   net/sched: sch_cake: annotate data-races in cake_dump_stats()
>   net/sched: mq: no longer acquire qdisc spinlocks in dump operations
>   net/sched: convert tc_dump_qdisc() to RCU

Do not spend too much time on this V1, I will send a V2 tomorrow
addressing some issues.

pw-bot: cr

^ permalink raw reply

* Re: [PATCH net-next 0/5] net/sched: netem: cleanups and improvements
From: Simon Horman @ 2026-04-08 17:10 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20260403225324.476787-1-stephen@networkplumber.org>

On Fri, Apr 03, 2026 at 03:52:05PM -0700, Stephen Hemminger wrote:
> This series modernizes the netem qdisc with several cleanups and
> two functional improvements. It is independent of the bug fixes
> that are in process on the net branch.
> 
> The first three patches are housekeeping: replacing pr_info() calls
> with proper netlink extack error reporting, removing unused struct
> members that only existed to declare enum constants, and dropping
> a version string that was never updated.
> 
> The fourth patch adds per-impairment extended statistics
> (delayed, dropped, corrupted, duplicated, reordered, ecn_marked)
> reported via TCA_STATS_APP, following the pattern established by
> RED and FQ_CODEL. A companion iproute2 patch for display will
> follow separately.
> 
> The fifth patch improves the corruption path to handle
> multi-segment skbs using skb_header_pointer()/skb_store_bits(),
> replacing the previous code that only flipped bits in the linear
> header region.

For the series:

Reviewed-by: Simon Horman <horms@kernel.org>


^ permalink raw reply

* Re: [PATCH net-next] l2tp: Drop large packets with UDP encap
From: Alice Mikityanska @ 2026-04-08 17:11 UTC (permalink / raw)
  To: Simon Horman
  Cc: Alice Mikityanska, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, James Chapman, netdev, syzbot+ci3edea60a44225dec
In-Reply-To: <20260408164825.GH469338@kernel.org>

On Wed, 8 Apr 2026 at 19:48, Simon Horman <horms@kernel.org> wrote:
>
> On Fri, Apr 03, 2026 at 08:49:49PM +0300, Alice Mikityanska wrote:
> > From: Alice Mikityanska <alice@isovalent.com>
> >
> > syzbot reported a WARN on my patch series [1]. The actual issue is an
> > overflow of 16-bit UDP length field, and it exists in the upstream code.
> > My series added a debug WARN with an overflow check that exposed the
> > issue, that's why syzbot tripped on my patches, rather than on upstream
> > code.
> >
> > syzbot's repro:
> >
> > # {"procs":1,"slowdown":1,"sandbox":"","sandbox_arg":0,"close_fds":false,"callcomments":true}
> > r0 = socket$pppl2tp(0x18, 0x1, 0x1)
> > r1 = socket$inet6_udp(0xa, 0x2, 0x0)
> > connect$inet6(r1, &(0x7f00000000c0)={0xa, 0x0, 0x0, @loopback, 0xfffffffc}, 0x1c)
> > connect$pppl2tp(r0, &(0x7f0000000240)=@pppol2tpin6={0x18, 0x1, {0x0, r1, 0x4, 0x0, 0x0, 0x0, {0xa, 0x4e22, 0xffff, @ipv4={'\x00', '\xff\xff', @empty}}}}, 0x32)
> > writev(r0, &(0x7f0000000080)=[{&(0x7f0000000000)="ee", 0x34000}], 0x1)
> >
> > It basically sends an oversized (0x34000 bytes) PPPoL2TP packet with UDP
> > encapsulation, and l2tp_xmit_core doesn't check for overflows when it
> > assigns the UDP length field. The value gets trimmed to 16 bites.
> >
> > Add an overflow check that drops oversized packets and avoids sending
> > packets with trimmed UDP length to the wire.
> >
> > syzbot's stack trace (with my patch applied):
> >
> > len >= 65536u
> > WARNING: ./include/linux/udp.h:38 at udp_set_len_short include/linux/udp.h:38 [inline], CPU#1: syz.0.17/5957
> > WARNING: ./include/linux/udp.h:38 at l2tp_xmit_core net/l2tp/l2tp_core.c:1293 [inline], CPU#1: syz.0.17/5957
> > WARNING: ./include/linux/udp.h:38 at l2tp_xmit_skb+0x1204/0x18d0 net/l2tp/l2tp_core.c:1327, CPU#1: syz.0.17/5957
> > Modules linked in:
> > CPU: 1 UID: 0 PID: 5957 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full)
> > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> > RIP: 0010:udp_set_len_short include/linux/udp.h:38 [inline]
> > RIP: 0010:l2tp_xmit_core net/l2tp/l2tp_core.c:1293 [inline]
> > RIP: 0010:l2tp_xmit_skb+0x1204/0x18d0 net/l2tp/l2tp_core.c:1327
> > Code: 0f 0b 90 e9 21 f9 ff ff e8 e9 05 ec f6 90 0f 0b 90 e9 8d f9 ff ff e8 db 05 ec f6 90 0f 0b 90 e9 cc f9 ff ff e8 cd 05 ec f6 90 <0f> 0b 90 e9 de fa ff ff 44 89 f1 80 e1 07 80 c1 03 38 c1 0f 8c 4f
> > RSP: 0018:ffffc90003d67878 EFLAGS: 00010293
> > RAX: ffffffff8ad985e3 RBX: ffff8881a6400090 RCX: ffff8881697f0000
> > RDX: 0000000000000000 RSI: 0000000000034010 RDI: 000000000000ffff
> > RBP: dffffc0000000000 R08: 0000000000000003 R09: 0000000000000004
> > R10: dffffc0000000000 R11: fffff520007acf00 R12: ffff8881baf20900
> > R13: 0000000000034010 R14: ffff8881a640008e R15: ffff8881760f7000
> > FS:  000055557e81f500(0000) GS:ffff8882a9467000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000200000033000 CR3: 00000001612f4000 CR4: 00000000000006f0
> > Call Trace:
> >  <TASK>
> >  pppol2tp_sendmsg+0x40a/0x5f0 net/l2tp/l2tp_ppp.c:302
> >  sock_sendmsg_nosec net/socket.c:727 [inline]
> >  __sock_sendmsg net/socket.c:742 [inline]
> >  sock_write_iter+0x503/0x550 net/socket.c:1195
> >  do_iter_readv_writev+0x619/0x8c0 fs/read_write.c:-1
> >  vfs_writev+0x33c/0x990 fs/read_write.c:1059
> >  do_writev+0x154/0x2e0 fs/read_write.c:1105
> >  do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> >  do_syscall_64+0x14d/0xf80 arch/x86/entry/syscall_64.c:94
> >  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> > RIP: 0033:0x7f636479c629
> > Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
> > RSP: 002b:00007ffffd4241c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000014
> > RAX: ffffffffffffffda RBX: 00007f6364a15fa0 RCX: 00007f636479c629
> > RDX: 0000000000000001 RSI: 0000200000000080 RDI: 0000000000000003
> > RBP: 00007f6364832b39 R08: 0000000000000000 R09: 0000000000000000
> > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> > R13: 00007f6364a15fac R14: 00007f6364a15fa0 R15: 00007f6364a15fa0
> >  </TASK>
> >
> > [1]: https://lore.kernel.org/all/20260226201600.222044-1-alice.kernel@fastmail.im/
> >
> > Reported-by: syzbot+ci3edea60a44225dec@syzkaller.appspotmail.com
> > Closes: https://lore.kernel.org/netdev/69a1dfba.050a0220.3a55be.0026.GAE@google.com/
>
> Hi Alice,
>
> A Fixes tag needs to go here.
> And if it's fixing code present in net - that is, the bug can manifest
> there - then it should be targeted at net rather than net-next.

Thanks for the review! I submitted to net-next, because I wanted to
piggy-back my net-next series on top of this fix without making a
merge conflict, and the bug didn't look that critical to go to net
(sometimes I received feedback that my bugfixes should have been
submitted to -next). I can resubmit to net, if it's something that
deserves backporting, or the maintainers can apply it to net instead.
For the Fixes tag, I can take the closest commit:

Fixes: 0d76751fad77 ("l2tp: Add L2TPv3 IP encapsulation (no UDP) support")

It's old enough (2010) to cover all supported LTS kernels. Or I can go
as deep as:

Fixes: 3557baabf280 ("[L2TP]: PPP over L2TP driver core")

where the traces of this code initially appeared, but the directory
structure is entirely different, and I haven't tested a kernel that
old.

> > Signed-off-by: Alice Mikityanska <alice@isovalent.com>
> > ---
> >  net/l2tp/l2tp_core.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> >
> > diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
> > index c89ae52764b8..157fc23ce4e1 100644
> > --- a/net/l2tp/l2tp_core.c
> > +++ b/net/l2tp/l2tp_core.c
> > @@ -1290,6 +1290,11 @@ static int l2tp_xmit_core(struct l2tp_session *session, struct sk_buff *skb, uns
> >               uh->source = inet->inet_sport;
> >               uh->dest = inet->inet_dport;
> >               udp_len = uhlen + session->hdr_len + data_len;
> > +             if (udp_len > U16_MAX) {
> > +                     kfree_skb(skb);
> > +                     ret = NET_XMIT_DROP;
> > +                     goto out_unlock;
> > +             }
>
> As a fix, this looks like the right approach.
> But I do think this code could benefit from some goto labels
> to handle unwinding error cases.

Definitely agree about goto; I noticed it too, but I just followed the
existing pattern to avoid overloading a bugfix with refactoring. What
would you prefer: a follow-up in net-next with a cleanup, or
integrating this cleanup into this patch?

> >               uh->len = htons(udp_len);
> >
> >               /* Calculate UDP checksum if configured to do so */
> > --
> > 2.53.0
> >

^ permalink raw reply

* Re: [Intel-wired-lan] [PATCH net v2] ice: fix ice_init_link() error return preventing probe
From: Paul Menzel @ 2026-04-08 17:20 UTC (permalink / raw)
  To: Aleksandr Loktionov
  Cc: intel-wired-lan, anthony.l.nguyen, netdev, Paul Greenwalt,
	Simon Horman
In-Reply-To: <20260408141105.2781683-1-aleksandr.loktionov@intel.com>

Dear Aleksandr, dear Paul,


Thank you for the patch.

Am 08.04.26 um 16:11 schrieb Aleksandr Loktionov:
> From: Paul Greenwalt <paul.greenwalt@intel.com>
> 
> ice_init_link() can return an error status from ice_update_link_info()
> or ice_init_phy_user_cfg(), causing probe to fail.
> 
> An incorrect NVM update procedure can result in link/PHY errors, and
> the recommended resolution is to update the NVM using the correct
> procedure. If the driver fails probe due to link errors, the user
> cannot update the NVM to recover. The link/PHY errors logged are
> non-fatal: they are already annotated as 'not a fatal error if this
> fails'.
> 
> Since none of the errors inside ice_init_link() should prevent probe
> from completing, convert it to void and remove the error check in the
> caller. All failures are already logged; callers have no meaningful
> recovery path for link init errors.

Do you have a way to force the error path?

> Fixes: 5b246e533d01 ("ice: split probe into smaller functions")
> Cc: stable@vger.kernel.org
> Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
> Reviewed-by: Simon Horman <horms@kernel.org>
> ---
> v1 -> v2:
>   - Rename the now-unused goto label err_init_link to err_deinit_pf_sw
>     to better describe the cleanup it performs (Simon Horman).
> 
>   drivers/net/ethernet/intel/ice/ice_main.c | 16 +++++-----------
>   1 file changed, 5 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
> index cf116bb..a6b0c09 100644
> --- a/drivers/net/ethernet/intel/ice/ice_main.c
> +++ b/drivers/net/ethernet/intel/ice/ice_main.c
> @@ -4856,16 +4856,14 @@ static void ice_init_wakeup(struct ice_pf *pf)
>   	device_set_wakeup_enable(ice_pf_to_dev(pf), false);
>   }
>   
> -static int ice_init_link(struct ice_pf *pf)
> +static void ice_init_link(struct ice_pf *pf)
>   {
>   	struct device *dev = ice_pf_to_dev(pf);
>   	int err;
>   
>   	err = ice_init_link_events(pf->hw.port_info);
> -	if (err) {
> +	if (err)
>   		dev_err(dev, "ice_init_link_events failed: %d\n", err);
> -		return err;
> -	}
>   
>   	/* not a fatal error if this fails */
>   	err = ice_init_nvm_phy_type(pf->hw.port_info);
> @@ -4899,8 +4897,6 @@ static int ice_init_link(struct ice_pf *pf)
>   	} else {
>   		set_bit(ICE_FLAG_NO_MEDIA, pf->flags);
>   	}
> -
> -	return err;
>   }
>   
>   static int ice_init_pf_sw(struct ice_pf *pf)
> @@ -5043,11 +5039,9 @@ static int ice_init(struct ice_pf *pf)
>   
>   	ice_init_wakeup(pf);
>   
> -	err = ice_init_link(pf);
> -	if (err)
> -		goto err_init_link;
> +	ice_init_link(pf);
>   
>   	err = ice_send_version(pf);
>   	if (err)
> -		goto err_init_link;
> +		goto err_deinit_pf_sw;
>   
> @@ -5069,7 +5063,7 @@ static int ice_init(struct ice_pf *pf)
>   	return 0;
>   
> -err_init_link:
> +err_deinit_pf_sw:

The renaming of the label could be mentioned in the commit message.

>   	ice_deinit_pf_sw(pf);
>   err_init_pf_sw:
>   	ice_dealloc_vsis(pf);
>   unroll_pf_init:


Kind regards,

Paul

^ permalink raw reply

* Re: [PATCH net] can: raw: fix ro->uniq use-after-free in raw_rcv()
From: Sam P @ 2026-04-08 17:22 UTC (permalink / raw)
  To: Oliver Hartkopp, netdev; +Cc: mkl, linux-kernel, linux-can
In-Reply-To: <c67d6642-8078-4144-8b21-f0e882ecd61a@hartkopp.net>

On 08/04/2026 17:28, Oliver Hartkopp wrote:
> Hello Sam,
> 
> many thanks for your investigation and for the provided fix.
> Excellent work!
> 
> Btw. you also suggested a different solution with synchronize_rcu():
> 
> diff --git a/net/can/raw.c b/net/can/raw.c
> index eee244ffc31e..5bb9a84f2471 100644
> --- a/net/can/raw.c
> +++ b/net/can/raw.c
> @@ -431,6 +431,13 @@ static int raw_release(struct socket *sock)
>       if (ro->count > 1)
>           kfree(ro->filter);
> 
> +    /*
> +     * Wait for any in-flight raw_rcv() calls to finish before freeing
> +     * ro->uniq.  can_rx_unregister() scheduled deletion via call_rcu(),
> +     * but RCU readers (raw_rcv in softirq) may still be active.
> +     */
> +    synchronize_rcu();
> +
>       ro->ifindex = 0;
>       ro->bound = 0;
>       ro->dev = NULL;
> 
> 
> Can you tell why you preferred the destructor solution now?

Thank you :) I preferred the destructor solution as it seemed to match the socket lifetime model better and I wasn't sure if the blocking sync in the raw_release() was too heavy-handed for this specific issue, given raw_release() already holds rtnl_lock() and lock_sock(sk). That said, I'm happy to defer to your experience if the sync fix is better suited, I have tested both of them.

> And if I see it correctly the UAF problem might also show up with the
> kfree(ro->filter) statement we can see at the beginning of the above patch.
> 
> So either free_percpu(ro->uniq) and kfree(ro->filter) should be handled after the finalized synchronize_rcu() process, right?

ro->filter isn't accessed in the racey raw_rcv() path as far as I can tell, and I don't *think* there are other racey paths but it wouldn't hurt to handle it just in-case. I think this would be simple with the synchronize_rcu() patch, as you mentioned, but I'm not sure with the destructor.

Kind Regards,
Sam

^ permalink raw reply

* [PATCH] net: check qdisc_pkt_len_segs_init() return value on ingress
From: David Carlier @ 2026-04-08 17:23 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, horms
  Cc: sdf, kuniyu, skhawaja, liuhangbin, krikku, netdev, linux-kernel,
	David Carlier

Commit 7fb4c1967011 ("net: pull headers in qdisc_pkt_len_segs_init()")
changed qdisc_pkt_len_segs_init() to return an skb drop reason when
it detects malicious GSO packets. The egress path in __dev_queue_xmit()
checks this return value and drops bad packets, but the ingress path in
sch_handle_ingress() ignores it.

This means malformed GSO packets entering via TC ingress are not dropped
and could be redirected to another interface or cause incorrect qdisc
accounting.

Check the return value and drop the packet when a bad GSO is detected.

Fixes: 7fb4c1967011 ("net: pull headers in qdisc_pkt_len_segs_init()")
Signed-off-by: David Carlier <devnexen@gmail.com>
---
 net/core/dev.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 5a31f9d2128c..2b5f508fc479 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4459,7 +4459,7 @@ sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret,
 		   struct net_device *orig_dev, bool *another)
 {
 	struct bpf_mprog_entry *entry = rcu_dereference_bh(skb->dev->tcx_ingress);
-	enum skb_drop_reason drop_reason = SKB_DROP_REASON_TC_INGRESS;
+	enum skb_drop_reason drop_reason;
 	struct bpf_net_context __bpf_net_ctx, *bpf_net_ctx;
 	int sch_ret;
 
@@ -4472,7 +4472,15 @@ sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret,
 		*pt_prev = NULL;
 	}
 
-	qdisc_pkt_len_segs_init(skb);
+	drop_reason = qdisc_pkt_len_segs_init(skb);
+	if (unlikely(drop_reason)) {
+		kfree_skb_reason(skb, drop_reason);
+		*ret = NET_RX_DROP;
+		bpf_net_ctx_clear(bpf_net_ctx);
+		return NULL;
+	}
+
+	drop_reason = SKB_DROP_REASON_TC_INGRESS;
 	tcx_set_ingress(skb, true);
 
 	if (static_branch_unlikely(&tcx_needed_key)) {
-- 
2.53.0


^ permalink raw reply related

* [PATCH net 0/2] net: hamradio: fix missing input validation in bpqether and scc
From: Mashiro Chen @ 2026-04-08 17:23 UTC (permalink / raw)
  To: netdev
  Cc: andrew+netdev, davem, edumazet, kuba, pabeni, jreuter, linux-hams,
	linux-kernel, Mashiro Chen

Two fixes for missing input validation in the hamradio drivers:

- bpqether: bpq_rcv() computes frame length as data[0] + data[1]*256 - 5,
  which can underflow when the length fields encode a value less than 5.
  The resulting negative value is subsequently used as an unsigned length,
  leading to out-of-bounds access.

- scc: the SIOCSCCSMEM ioctl accepts a bufsize of 0 without validation.
  When a receive interrupt fires, dev_alloc_skb(0) allocates an skb with
  an empty data area, and the subsequent skb_put_u8() calls write into
  the adjacent skb_shared_info, corrupting heap memory.

Both fixes are minimal, adding only a bounds check before the dangerous
operation.

Mashiro Chen (2):
  net: hamradio: bpqether: validate frame length in bpq_rcv()
  net: hamradio: scc: validate bufsize in SIOCSCCSMEM ioctl

 drivers/net/hamradio/bpqether.c | 3 +++
 drivers/net/hamradio/scc.c      | 2 ++
 2 files changed, 5 insertions(+)

-- 
2.53.0


^ permalink raw reply

* [PATCH net 1/2] net: hamradio: bpqether: validate frame length in bpq_rcv()
From: Mashiro Chen @ 2026-04-08 17:23 UTC (permalink / raw)
  To: netdev
  Cc: andrew+netdev, davem, edumazet, kuba, pabeni, jreuter, linux-hams,
	linux-kernel, Mashiro Chen, stable
In-Reply-To: <20260408172358.281186-1-mashiro.chen@mailbox.org>

The BPQ length field is decoded as:

  len = skb->data[0] + skb->data[1] * 256 - 5;

If the sender sets bytes [0..1] to values whose combined value is
less than 5, len becomes negative.  Passing a negative int to
skb_trim() silently converts to a huge unsigned value, causing the
function to be a no-op.  The frame is then passed up to AX.25 with
its original (untrimmed) payload, delivering garbage beyond the
declared frame boundary.

Additionally, a negative len corrupts the 64-bit rx_bytes counter
through implicit sign-extension.

Add a bounds check before pulling the length bytes: reject frames
where len is negative or exceeds the remaining skb data.

Cc: stable@vger.kernel.org
Cc: linux-hams@vger.kernel.org
Signed-off-by: Mashiro Chen <mashiro.chen@mailbox.org>
---
 drivers/net/hamradio/bpqether.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/hamradio/bpqether.c b/drivers/net/hamradio/bpqether.c
index 045c5177262eaf..214fd1f819a1bb 100644
--- a/drivers/net/hamradio/bpqether.c
+++ b/drivers/net/hamradio/bpqether.c
@@ -187,6 +187,9 @@ static int bpq_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_ty

 	len = skb->data[0] + skb->data[1] * 256 - 5;

+	if (len < 0 || len > skb->len - 2)
+		goto drop_unlock;
+
 	skb_pull(skb, 2);	/* Remove the length bytes */
 	skb_trim(skb, len);	/* Set the length of the data */

-- 
2.53.0

^ permalink raw reply related

* [PATCH net 2/2] net: hamradio: scc: validate bufsize in SIOCSCCSMEM ioctl
From: Mashiro Chen @ 2026-04-08 17:23 UTC (permalink / raw)
  To: netdev
  Cc: andrew+netdev, davem, edumazet, kuba, pabeni, jreuter, linux-hams,
	linux-kernel, Mashiro Chen, stable
In-Reply-To: <20260408172358.281186-1-mashiro.chen@mailbox.org>

The SIOCSCCSMEM ioctl copies a scc_mem_config from user space and
assigns its bufsize field directly to scc->stat.bufsize without any
range validation:

  scc->stat.bufsize = memcfg.bufsize;

If a privileged user (CAP_SYS_RAWIO) sets bufsize to 0, the receive
interrupt handler later calls dev_alloc_skb(0) and immediately writes
a KISS type byte via skb_put_u8() into a zero-capacity socket buffer,
corrupting the adjacent skb_shared_info region.

The scc.c comment already states the buffer must not exceed 4096 bytes,
but this limit is never enforced.  Add a bounds check that rejects values
outside the range [16, 4096], consistent with the documented constraint
and large enough to hold at least one KISS header byte plus useful data.

Cc: stable@vger.kernel.org
Cc: linux-hams@vger.kernel.org
Signed-off-by: Mashiro Chen <mashiro.chen@mailbox.org>
---
 drivers/net/hamradio/scc.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/hamradio/scc.c b/drivers/net/hamradio/scc.c
index ae5048efde686a..fd3ff3f4311df2 100644
--- a/drivers/net/hamradio/scc.c
+++ b/drivers/net/hamradio/scc.c
@@ -1909,6 +1909,8 @@ static int scc_net_siocdevprivate(struct net_device *dev,
 			if (!capable(CAP_SYS_RAWIO)) return -EPERM;
 			if (!arg || copy_from_user(&memcfg, arg, sizeof(memcfg)))
 				return -EINVAL;
+			if (memcfg.bufsize < 16 || memcfg.bufsize > 4096)
+				return -EINVAL;
 			scc->stat.bufsize   = memcfg.bufsize;
 			return 0;
 		
-- 
2.53.0


^ permalink raw reply related

* [PATCH net] net: ax25: fix integer overflow in ax25_rx_fragment()
From: Mashiro Chen @ 2026-04-08 17:25 UTC (permalink / raw)
  To: netdev
  Cc: davem, edumazet, kuba, pabeni, horms, jreuter, linux-hams,
	linux-kernel, Mashiro Chen, stable

The ax25_cb fragmentation reassembly accumulator:

  ax25->fraglen += skb->len;

operates on the unsigned short field 'fraglen' declared in ax25_cb:

  unsigned short  paclen, fragno, fraglen;

When fragments accumulate with a combined payload exceeding 65535
bytes, fraglen wraps to near zero.  The subsequent allocation:

  skb = alloc_skb(AX25_MAX_HEADER_LEN + ax25->fraglen, GFP_ATOMIC);

then allocates a tiny buffer.  Every skb_put() call in the copy loop
that follows writes far beyond the allocated headroom, corrupting
the kernel heap.

An attacker on an AX.25 link that supports multi-fragment I-frames
(AX25_SEG_FIRST / AX25_SEG_REM mechanism) can trigger this by
sending enough continuation fragments to wrap the 16-bit counter.
With AX.25 segment numbers limited to 6 bits (max 63 continuation
fragments), a fragment payload of ~1040 bytes per fragment is
sufficient to overflow.

Fix mirrors the identical bug fixed in NET/ROM (nr_in.c): check for
overflow before adding skb->len to fraglen, and abort fragment
reassembly cleanly if the limit would be exceeded.

Cc: stable@vger.kernel.org
Cc: linux-hams@vger.kernel.org
Signed-off-by: Mashiro Chen <mashiro.chen@mailbox.org>
---
 net/ax25/ax25_in.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/net/ax25/ax25_in.c b/net/ax25/ax25_in.c
index d75b3e9ed93de8..68202c19b19e3f 100644
--- a/net/ax25/ax25_in.c
+++ b/net/ax25/ax25_in.c
@@ -41,6 +41,11 @@ static int ax25_rx_fragment(ax25_cb *ax25, struct sk_buff *skb)
 				/* Enqueue fragment */
 				ax25->fragno = *skb->data & AX25_SEG_REM;
 				skb_pull(skb, 1);	/* skip fragno */
+				if ((unsigned int)ax25->fraglen + skb->len > USHRT_MAX) {
+					skb_queue_purge(&ax25->frag_queue);
+					ax25->fragno = 0;
+					return 1;
+				}
 				ax25->fraglen += skb->len;
 				skb_queue_tail(&ax25->frag_queue, skb);

-- 
2.53.0

^ permalink raw reply related

* [PATCH net] net: rose: reject truncated CLEAR_REQUEST frames in state machines
From: Mashiro Chen @ 2026-04-08 17:25 UTC (permalink / raw)
  To: netdev
  Cc: davem, edumazet, kuba, pabeni, horms, linux-hams, linux-kernel,
	Mashiro Chen, stable

All five ROSE state machines (states 1-5) handle ROSE_CLEAR_REQUEST
by reading the cause and diagnostic bytes directly from skb->data[3]
and skb->data[4] without verifying that the frame is long enough:

  rose_disconnect(sk, ..., skb->data[3], skb->data[4]);

The entry-point check in rose_route_frame() only enforces
ROSE_MIN_LEN (3 bytes), so a remote peer on a ROSE network can
send a syntactically valid but truncated CLEAR_REQUEST (3 or 4
bytes) while a connection is open in any state.  Processing such a
frame causes a one- or two-byte out-of-bounds read past the skb
data, leaking uninitialized heap content as the cause/diagnostic
values returned to user space via getsockopt(ROSE_GETCAUSE).

Add a single length check at the rose_process_rx_frame() dispatch
point, before any state machine is entered, to drop frames that
carry the CLEAR_REQUEST type code but are too short to contain the
required cause and diagnostic fields.

Cc: stable@vger.kernel.org
Cc: linux-hams@vger.kernel.org
Signed-off-by: Mashiro Chen <mashiro.chen@mailbox.org>
---
 net/rose/rose_in.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/net/rose/rose_in.c b/net/rose/rose_in.c
index 0276b393f0e530..e2680058196273 100644
--- a/net/rose/rose_in.c
+++ b/net/rose/rose_in.c
@@ -271,6 +271,13 @@ int rose_process_rx_frame(struct sock *sk, struct sk_buff *skb)

 	frametype = rose_decode(skb, &ns, &nr, &q, &d, &m);

+	/*
+	 * ROSE_CLEAR_REQUEST carries cause and diagnostic in bytes 3..4.
+	 * Reject a malformed frame that is too short to contain them.
+	 */
+	if (frametype == ROSE_CLEAR_REQUEST && skb->len < 5)
+		return 0;
+
 	switch (rose->state) {
 	case ROSE_STATE_1:
 		queued = rose_state1_machine(sk, skb, frametype);
-- 
2.53.0

^ permalink raw reply related

* Re: [PATCH] octeon_ep_vf: add NULL check for napi_build_skb()
From: David CARLIER @ 2026-04-08 17:35 UTC (permalink / raw)
  To: Simon Horman
  Cc: Veerasenareddy Burru, Sathesh Edara, Shinas Rasheed,
	Satananda Burla, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, netdev, stable
In-Reply-To: <20260408170206.GI469338@kernel.org>

Hi Simon,

On Wed, 8 Apr 2026 at 18:02, Simon Horman <horms@kernel.org> wrote:
>
> On Fri, Apr 03, 2026 at 09:07:32PM +0100, David Carlier wrote:
> > napi_build_skb() can return NULL on allocation failure. In
> > __octep_vf_oq_process_rx(), the result is used directly without a NULL
> > check in both the single-buffer and multi-fragment paths, leading to a
> > NULL pointer dereference.
> >
> > Add NULL checks after both napi_build_skb() calls, properly advancing
> > descriptors and consuming remaining fragments on failure.
> >
> > Fixes: 1cd3b407977c ("octeon_ep_vf: add Tx/Rx processing and interrupt support")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: David Carlier <devnexen@gmail.com>
>
> Hi David,
>
> I appreciate that this is on the fast path, and thus I expect it
> is performance critical. But this patch largely duplicates code
> already present in the same function. Would it be possible
> refactor things a bit - e.g. using helpers - to make the change
> a bit cleaner while not hurting performance?
>
> If so, I'd suggest splitting patch(es) that refactor the code
> from the patch that fixes the bug.
>
> ...

Yes, valid points, I'll submit the v2 tomorrow. Cheers !

^ permalink raw reply

* Re: [PATCH v12 6/6] selftests: net: add TLS hardware offload test
From: Jakub Kicinski @ 2026-04-08 17:44 UTC (permalink / raw)
  To: Sabrina Dubroca
  Cc: Rishikesh Jethwani, netdev, saeedm, tariqt, mbloch, borisp,
	john.fastabend, davem, pabeni, edumazet, leon
In-Reply-To: <adaGQRUUNbXWXNgP@krikkit>

On Wed, 8 Apr 2026 18:45:53 +0200 Sabrina Dubroca wrote:
> @Jakub [top-posting so you don't have to scroll through the rest of my
> comments to find some global questions about this patch]

:)

> tools/testing/selftests/drivers/net/README.rst mentions "Local
> host is the DUT", but this test does rekeys on both sides and sends a
> bit of traffic back and forth. Is that acceptable?

Yes, I think so. Quick scan thru the code doesn't real much about
configuration and how the test ensure HW offload is engaged.
But sending traffic is the intended use of the remote host.

The statement is basically saying that if we have 2 hosts the local
one is going to have the kernel to test installed. The remote host
may have old kernel and no offloads available, it's just a traffic
source/sink.

> Another thought: is there a "standard" for stdout vs stderr, as well as
> verbosity of "test progress"/"debug" type messages ("sent
> keyupdate"/"received keyupdate"/"server listening"/"setup complete"
> etc) for those test programs? Any expectation for a --{debug,verbose}
> option to only display all this stuff on request?
> 
> Output for 1 test:
> -------- 8< --------
> TLS Version: TLS 1.3
> Cipher: AES-GCM-128
> Buffer size: 16384
> Connecting to 192.168.13.1:4433...
> Connected!
> Installing TLS_TX AES-GCM-128 gen 0...
> TLS_TX AES-GCM-128 gen 0 installed
> Installing TLS_RX AES-GCM-128 gen 0...
> TLS_RX AES-GCM-128 gen 0 installed
> TLS setup complete.
> Sending 100 messages of 16384 bytes...
> Sent 16384 bytes (iteration 1)
> Received echo 16384 bytes (ok)
> [...repeated]
> Sent 16384 bytes (iteration 100)
> Received echo 16384 bytes (ok)
> -------- 8< --------
> 
> With some rekeys I get 300L of output on each side.
> 
> 
> If not, I guess we can fall back to what makes the most sense for
> NIPA?

Best practices for these "Python-wrapped" tests are.. evolving.
The current thinking is to avoid the prints in the good case,
unless there's actually something sketchy that we may want
to double check (eg. the test is triggering some condition
probabilistically and we want to see if / how many times it managed 
to trigger it)

IIRC the command failed exceptions already print the command + stdout +
stderr. So having the C binary always print the debug output and exit(1)
if something goes wrong is probably the way to go. Python should then
simply do cmd(..), if the command succeeds there's no output.
If the cmd() raises an exception we'll see the "debug" info.

bkg() should hopefully behave the same, with the caveat that bkg()
with no exit param (exit_wait, ksft_wait) should be avoided, cause
without those we have to kill the process which makes the exit checking
unreliable. I tried to make it a little better in commit d99aa5912
so its not the end of the world if wait can be used. But good tests
should be designed to wait.

^ permalink raw reply

* Re: [PATCH net v4 15/15] rxrpc: fix reference count leak in rxrpc_server_keyring()
From: Jakub Kicinski @ 2026-04-08 17:52 UTC (permalink / raw)
  To: David Howells
  Cc: Anderson Nascimento, netdev, Marc Dionne, David S. Miller,
	Eric Dumazet, Paolo Abeni, linux-afs, linux-kernel, Luxiao Xu,
	Yifan Wu, Juefei Pu, Yuan Tan, Xin Liu, Ren Wei, Ren Wei,
	Simon Horman, stable
In-Reply-To: <2234187.1775645373@warthog.procyon.org.uk>

On Wed, 08 Apr 2026 11:49:33 +0100 David Howells wrote:
> Regarding patch 15, which provides an alternative fix to patch 8, I previously
> asked you to drop patch 15 - but I'm thinking now it's probably better to keep
> patch 15 and drop patch 8 (and change patch 15 to return -EINVAL).

Should I apply the other 13 patches and let you include whatever 
is appropriate to replace patches 8 and 15 in the next series?

^ permalink raw reply

* Re: [PATCH net v8 1/4] selftests: Migrate nsim-only MACsec tests to Python
From: Sabrina Dubroca @ 2026-04-08 18:00 UTC (permalink / raw)
  To: Cosmin Ratiu
  Cc: netdev, Andrew Lunn, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Stanislav Fomichev,
	David Wei, Shuah Khan, linux-kselftest, Dragos Tatulea
In-Reply-To: <20260408115240.1636047-2-cratiu@nvidia.com>

2026-04-08, 14:52:37 +0300, Cosmin Ratiu wrote:
> Move MACsec offload API and ethtool feature tests from
> tools/testing/selftests/drivers/net/netdevsim/macsec-offload.sh to
> tools/testing/selftests/drivers/net/macsec.py using the NetDrvEnv
> framework so tests can run against both netdevsim (default) and real
> hardware (NETIF=ethX). As some real hardware requires MACsec to use
> encryption, add that to the tests.
> 
> Netdevsim-specific limit checks (max SecY, max RX SC) were moved into
> separate test cases to avoid failures on real hardware.
> 
> Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
> ---
>  tools/testing/selftests/drivers/net/Makefile  |   1 +
>  tools/testing/selftests/drivers/net/config    |   1 +
>  tools/testing/selftests/drivers/net/macsec.py | 202 ++++++++++++++++++
>  .../selftests/drivers/net/netdevsim/Makefile  |   1 -
>  .../drivers/net/netdevsim/macsec-offload.sh   | 117 ----------
>  5 files changed, 204 insertions(+), 118 deletions(-)
>  create mode 100755 tools/testing/selftests/drivers/net/macsec.py
>  delete mode 100755 tools/testing/selftests/drivers/net/netdevsim/macsec-offload.sh

Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>

-- 
Sabrina

^ permalink raw reply

* Re: [net-next v9 07/10] net: bnxt: Implement software USO
From: Jakub Kicinski @ 2026-04-08 18:06 UTC (permalink / raw)
  To: Joe Damato
  Cc: netdev, Michael Chan, Pavan Chebbi, Andrew Lunn, David S. Miller,
	Eric Dumazet, Paolo Abeni, horms, linux-kernel, leon
In-Reply-To: <adaKt/VwrajKraR8@devvm20253.cco0.facebook.com>

On Wed, 8 Apr 2026 10:04:55 -0700 Joe Damato wrote:
> > This is the check I added. AI says this is wrong and netdev_queues.h says:
> > 
> >   * @get_desc must be a formula or a function call, it must always
> >   * return up-to-date information when evaluated!
> > 
> > which I obviously failed to do, so I'm pretty sure I got this wrong.  
> 
> So, there's two options to fix this that I can think of. I am leaning torward
> option 2, but if there are any strong opinions (or other options that I am
> missing) please let me know:
> 
>   1. Allocate the maximum number of slots per ring and eliminate this check
>      entirely. I figured this would be disliked because it potentially wastes
>      memory. The driver would need ring_size / 3 slots, and if we assume the
>      maximum is 2048 and the slot size is 256b, that works out to 175kb per
>      ring. Of course, this only affects NICs with SW USO and the buffer isn't
>      allocated for NICS with HW USO.
> 
>      This is probably simpler, but costs more memory than the existing design.
> 
>    2. Or, keep the smaller buffer that we have now (BNXT_SW_USO_MAX_SEGS (64)
>       * 256b = 16kb per ring) and fix the try_stop like this:
> 
> +static inline u16 bnxt_inline_avail(struct bnxt_tx_ring_info *txr)
> +{
> +       return BNXT_SW_USO_MAX_SEGS -
> +              (u16)(txr->tx_inline_prod - READ_ONCE(txr->tx_inline_cons));
> +}
> +
> 
> -       slots = txr->tx_inline_prod - txr->tx_inline_cons;
> -       slots = BNXT_SW_USO_MAX_SEGS - slots;
> -
> -       if (unlikely(slots < num_segs)) {
> -               netif_txq_try_stop(txq, slots, num_segs);
> +       if (unlikely(bnxt_inline_avail(txr) < num_segs)) {
> +               netif_txq_try_stop(txq, bnxt_inline_avail(txr), num_segs);

I think option 2 makes sense. The point (which I think you got) is that
the condition must be evaluated after the memory barrier.

Since the condition is repeated in your latest snippet - you can
probably use netif_txq_maybe_stop() ?

^ permalink raw reply

* Re: [PATCH net] tcp: update window_clamp when SO_RCVBUF is set
From: Jakub Kicinski @ 2026-04-08 18:11 UTC (permalink / raw)
  To: edumazet
  Cc: davem, netdev, pabeni, andrew+netdev, horms, ncardwell, kuniyu,
	willemb, dsahern, quic_subashab, quic_stranche
In-Reply-To: <20260408001438.129165-1-kuba@kernel.org>

On Tue,  7 Apr 2026 17:14:38 -0700 Jakub Kicinski wrote:
> Commit under Fixes moved recomputing the window clamp to
> tcp_measure_rcv_mss() (when scaling_ratio changes).
> I suspect it missed the fact that we don't recompute the clamp
> when rcvbuf is set. Until scaling_ratio changes we are
> stuck with the old window clamp which may be based on
> the small initial buffer. scaling_ratio may never change.
> 
> Inspired by Eric's recent commit d1361840f8c5 ("tcp: fix
> SO_RCVLOWAT and RCVBUF autotuning") plumb the user action
> thru to TCP and have it update the clamp.
> 
> A smaller fix would be to just have tcp_rcvbuf_grow()
> adjust the clamp even if SOCK_RCVBUF_LOCK is set.
> But IIUC this is what we were trying to get away from
> in the first place.

Hi Eric, any thoughts?
I always assume you are displeased if you don't reply within 8 hours :)

I should say that everyone has obviously discouraged the team that run
into this from using SO_RCVBUF. I'm fascinated by how they decided that
it helps since it clearly doesn't work. AI sure makes it easy for
people to "try things". Sigh.

^ permalink raw reply

* Re: [PATCH net v8 2/4] nsim: Add support for VLAN filters
From: Sabrina Dubroca @ 2026-04-08 18:13 UTC (permalink / raw)
  To: Cosmin Ratiu
  Cc: netdev, Andrew Lunn, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Stanislav Fomichev,
	David Wei, Shuah Khan, linux-kselftest, Dragos Tatulea
In-Reply-To: <20260408115240.1636047-3-cratiu@nvidia.com>

2026-04-08, 14:52:38 +0300, Cosmin Ratiu wrote:
> Add support for storing the list of VLANs in nsim devices, together with
> ops for adding/removing them and a debug file to show them.
> 
> This will be used in upcoming tests.
> 
> Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
> ---
>  drivers/net/netdevsim/netdev.c    | 65 ++++++++++++++++++++++++++++++-
>  drivers/net/netdevsim/netdevsim.h |  8 ++++
>  2 files changed, 71 insertions(+), 2 deletions(-)

Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>

-- 
Sabrina

^ permalink raw reply

* Re: [PATCH net-next v2 3/4] bpf-timestamp: keep track of the skb when wait_for_space occurs
From: Martin KaFai Lau @ 2026-04-08 18:12 UTC (permalink / raw)
  To: Jason Xing, Willem de Bruijn
  Cc: Jakub Sitnicki, davem, edumazet, kuba, pabeni, horms, willemb,
	martin.lau, netdev, bpf, Jason Xing, Yushan Zhou
In-Reply-To: <willemdebruijn.kernel.257654f9a3f23@gmail.com>

On Wed, Apr 08, 2026 at 11:15:09AM -0400, Willem de Bruijn wrote:
> > Avoiding adding a new one makes the whole work extremely hard. I'm
> > wondering since we have hwtstamp in shared info, why not add a
> > software one for timestamping use? Then, we would support more
> > different protocols in more different stages in a finer grain, which
> > is a big coarse picture in my mind.
> 
> I don't understand the need to store more data in the skb for BPF.

Adding a field specific to bpf timestamping is not scalable.
There will always be other bpf use cases that need to store
something in a skb.

There have been discussions about storing metadata for a skb which should
solve the general bpf use cases.

https://msgid.link/20260226-skb-local-storage-v1-0-4ca44f0dd9d1@cloudflare.com/
https://msgid.link/20260110-skb-meta-fixup-skb_metadata_set-calls-v1-0-1047878ed1b0@cloudflare.com/

> 
> With BPF hooks, the bpf program can record the relevant data directly
> in a BPF map.

^ permalink raw reply

* Re: [PATCH net] tcp: update window_clamp when SO_RCVBUF is set
From: Eric Dumazet @ 2026-04-08 18:13 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, pabeni, andrew+netdev, horms, ncardwell, kuniyu,
	willemb, dsahern, quic_subashab, quic_stranche
In-Reply-To: <20260408111107.077659c6@kernel.org>

On Wed, Apr 8, 2026 at 11:11 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Tue,  7 Apr 2026 17:14:38 -0700 Jakub Kicinski wrote:
> > Commit under Fixes moved recomputing the window clamp to
> > tcp_measure_rcv_mss() (when scaling_ratio changes).
> > I suspect it missed the fact that we don't recompute the clamp
> > when rcvbuf is set. Until scaling_ratio changes we are
> > stuck with the old window clamp which may be based on
> > the small initial buffer. scaling_ratio may never change.
> >
> > Inspired by Eric's recent commit d1361840f8c5 ("tcp: fix
> > SO_RCVLOWAT and RCVBUF autotuning") plumb the user action
> > thru to TCP and have it update the clamp.
> >
> > A smaller fix would be to just have tcp_rcvbuf_grow()
> > adjust the clamp even if SOCK_RCVBUF_LOCK is set.
> > But IIUC this is what we were trying to get away from
> > in the first place.
>
> Hi Eric, any thoughts?
> I always assume you are displeased if you don't reply within 8 hours :)
>

Not at all, I simply missed this patch. Too many emails to triage.

I will take a look asap.

> I should say that everyone has obviously discouraged the team that run
> into this from using SO_RCVBUF. I'm fascinated by how they decided that
> it helps since it clearly doesn't work. AI sure makes it easy for
> people to "try things". Sigh.

^ permalink raw reply

* Re: [PATCH v10 net-next 1/5] psp: add admin/non-admin version of psp_device_get_locked
From: Wei Wang @ 2026-04-08 18:24 UTC (permalink / raw)
  To: Daniel Zahka
  Cc: netdev, Jakub Kicinski, Willem de Bruijn, David Wei, Andrew Lunn,
	David S . Miller, Eric Dumazet, Simon Horman, Wei Wang
In-Reply-To: <0b4de4de-45d8-4d4f-a15f-87c527b87253@gmail.com>

On Mon, Apr 6, 2026 at 4:11 AM Daniel Zahka <daniel.zahka@gmail.com> wrote:
>
>
> On 4/5/26 1:58 AM, Wei Wang wrote:
> > From: Wei Wang <weibunny@fb.com>
> >
> > Introduce 2 versions of psp_device_get_locked:
> > 1. psp_device_get_locked_admin(): This version is used for operations
> >     that would change the status of the psd, and are currently used for
> >     dev-set nad key-rotation.
>
>
> typo: "and"

Corrected. Thanks.

>
>
> > 2. psp_device_get_locked(): This is the non-admin version, which are
> >     used for broader user issued operations including: dev-get, rx-assoc,
> >     tx-assoc, get-stats.
> >
> > Following commit will be implementing both of the checks.
> >
> > Signed-off-by: Wei Wang <weibunny@fb.com>
>
>
> Reviewed-by: Daniel Zahka <daniel.zahka@gmail.com>
>

^ permalink raw reply

* Re: [PATCH net v8 3/4] selftests: Add MACsec VLAN propagation traffic test
From: Sabrina Dubroca @ 2026-04-08 18:26 UTC (permalink / raw)
  To: Cosmin Ratiu
  Cc: netdev, Andrew Lunn, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Stanislav Fomichev,
	David Wei, Shuah Khan, linux-kselftest, Dragos Tatulea
In-Reply-To: <20260408115240.1636047-4-cratiu@nvidia.com>

2026-04-08, 14:52:39 +0300, Cosmin Ratiu wrote:
> Add VLAN filter propagation tests through offloaded MACsec devices via
> actual traffic.
> 
> The tests create MACsec tunnels with matching SAs on both endpoints,
> stack VLANs on top, and verify connectivity with ping. Covered:
> - Offloaded MACsec with VLAN (filters propagate to HW)
> - Software MACsec with VLAN (no HW filter propagation)
> - Offload on/off toggle and verifying traffic still works
> 
> On netdevsim this makes use of the VLAN filter debugfs file to actually
> validate that filters are applied/removed correctly.
> On real hardware the traffic should validate actual VLAN filter
> propagation.
> 
> Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
> ---
>  tools/testing/selftests/drivers/net/config    |   1 +
>  .../selftests/drivers/net/lib/py/env.py       |   9 ++
>  tools/testing/selftests/drivers/net/macsec.py | 141 ++++++++++++++++++
>  3 files changed, 151 insertions(+)

Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>

-- 
Sabrina

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox