Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [GIT PULL] wireless-2026-04-08
From: patchwork-bot+netdevbpf @ 2026-04-09  2:00 UTC (permalink / raw)
  To: Johannes Berg; +Cc: netdev, linux-wireless
In-Reply-To: <20260408081802.111623-3-johannes@sipsolutions.net>

Hello:

This pull request was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Wed,  8 Apr 2026 10:15:25 +0200 you wrote:
> Hi,
> 
> So in a way I'd hoped it wouldn't come to this, but while I
> was out last week a couple of things came in that seemed
> relevant enough to squeeze in now. I guess it wouldn't be
> much of an issue if not, but I figured I'd try anyway :)
> 
> [...]

Here is the summary with links:
  - [GIT,PULL] wireless-2026-04-08
    https://git.kernel.org/netdev/net/c/d65b175cfac6

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net 1/2] batman-adv: reject oversized global TT response buffers
From: patchwork-bot+netdevbpf @ 2026-04-09  2:00 UTC (permalink / raw)
  To: Simon Wunderlich
  Cc: davem, kuba, netdev, b.a.t.m.a.n, caoruide123, stable, yifanwucs,
	tomapufckgml, yuantan098, bird, enjou1224z, n05ec, sven
In-Reply-To: <20260408110255.976389-2-sw@simonwunderlich.de>

Hello:

This series was applied to netdev/net.git (main)
by Simon Wunderlich <sw@simonwunderlich.de>:

On Wed,  8 Apr 2026 13:02:54 +0200 you wrote:
> From: Ruide Cao <caoruide123@gmail.com>
> 
> batadv_tt_prepare_tvlv_global_data() builds the allocation length for a
> global TT response in 16-bit temporaries. When a remote originator
> advertises a large enough global TT, the TT payload length plus the VLAN
> header offset can exceed 65535 and wrap before kmalloc().
> 
> [...]

Here is the summary with links:
  - [net,1/2] batman-adv: reject oversized global TT response buffers
    https://git.kernel.org/netdev/net/c/3a359bf5c61d
  - [net,2/2] batman-adv: hold claim backbone gateways by reference
    https://git.kernel.org/netdev/net/c/82d8701b2c93

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH 1/8] xfrm: clear trailing padding in build_polexpire()
From: patchwork-bot+netdevbpf @ 2026-04-09  2:00 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: davem, kuba, herbert, netdev
In-Reply-To: <20260408095925.253681-2-steffen.klassert@secunet.com>

Hello:

This series was applied to netdev/net.git (main)
by Steffen Klassert <steffen.klassert@secunet.com>:

On Wed, 8 Apr 2026 11:58:57 +0200 you wrote:
> From: Yasuaki Torimaru <yasuakitorimaru@gmail.com>
> 
> build_expire() clears the trailing padding bytes of struct
> xfrm_user_expire after setting the hard field via memset_after(),
> but the analogous function build_polexpire() does not do this for
> struct xfrm_user_polexpire.
> 
> [...]

Here is the summary with links:
  - [1/8] xfrm: clear trailing padding in build_polexpire()
    https://git.kernel.org/netdev/net/c/71a98248c63c
  - [2/8] xfrm: account XFRMA_IF_ID in aevent size calculation
    https://git.kernel.org/netdev/net/c/7081d46d3231
  - [3/8] xfrm: Wait for RCU readers during policy netns exit
    https://git.kernel.org/netdev/net/c/069daad4f2ae
  - [4/8] xfrm: hold dev ref until after transport_finish NF_HOOK
    https://git.kernel.org/netdev/net/c/1c428b038400
  - [5/8] xfrm: fix refcount leak in xfrm_migrate_policy_find
    https://git.kernel.org/netdev/net/c/83317cce60a0
  - [6/8] xfrm_user: fix info leak in build_mapping()
    https://git.kernel.org/netdev/net/c/1beb76b2053b
  - [7/8] xfrm_user: fix info leak in build_report()
    https://git.kernel.org/netdev/net/c/d10119968d0e
  - [8/8] net: af_key: zero aligned sockaddr tail in PF_KEY exports
    https://git.kernel.org/netdev/net/c/426c355742f0

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH] net: mdio: realtek-rtl9300: use scoped device_for_each_child_node loop
From: patchwork-bot+netdevbpf @ 2026-04-09  1:50 UTC (permalink / raw)
  To: Felix Gu
  Cc: andrew, hkallweit1, linux, davem, edumazet, kuba, pabeni,
	chris.packham, netdev, linux-kernel
In-Reply-To: <20260405-rtl9300-v1-1-08e4499cf944@gmail.com>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Sun, 05 Apr 2026 14:51:52 +0800 you wrote:
> Switch to device_for_each_child_node_scoped() to auto-release fwnode
> references on early exit.
> 
> Fixes: 24e31e474769 ("net: mdio: Add RTL9300 MDIO driver")
> Signed-off-by: Felix Gu <ustc.gu@gmail.com>
> ---
>  drivers/net/mdio/mdio-realtek-rtl9300.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> [...]

Here is the summary with links:
  - net: mdio: realtek-rtl9300: use scoped device_for_each_child_node loop
    https://git.kernel.org/netdev/net/c/c09ea768bdb9

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net 1/7] ipvs: fix NULL deref in ip_vs_add_service error path
From: patchwork-bot+netdevbpf @ 2026-04-09  1:50 UTC (permalink / raw)
  To: Florian Westphal
  Cc: netdev, pabeni, davem, edumazet, kuba, netfilter-devel, pablo
In-Reply-To: <20260408163512.30537-2-fw@strlen.de>

Hello:

This series was applied to netdev/net.git (main)
by Florian Westphal <fw@strlen.de>:

On Wed,  8 Apr 2026 18:35:06 +0200 you wrote:
> From: Weiming Shi <bestswngs@gmail.com>
> 
> When ip_vs_bind_scheduler() succeeds in ip_vs_add_service(), the local
> variable sched is set to NULL.  If ip_vs_start_estimator() subsequently
> fails, the out_err cleanup calls ip_vs_unbind_scheduler(svc, sched)
> with sched == NULL.  ip_vs_unbind_scheduler() passes the cur_sched NULL
> check (because svc->scheduler was set by the successful bind) but then
> dereferences the NULL sched parameter at sched->done_service, causing a
> kernel panic at offset 0x30 from NULL.
> 
> [...]

Here is the summary with links:
  - [net,1/7] ipvs: fix NULL deref in ip_vs_add_service error path
    https://git.kernel.org/netdev/net/c/9a91797e61d2
  - [net,2/7] netfilter: nfnetlink_log: initialize nfgenmsg in NLMSG_DONE terminator
    https://git.kernel.org/netdev/net/c/1f3083aec883
  - [net,3/7] netfilter: xt_multiport: validate range encoding in checkentry
    https://git.kernel.org/netdev/net/c/ff64c5bfef12
  - [net,4/7] netfilter: ip6t_eui64: reject invalid MAC header for all packets
    https://git.kernel.org/netdev/net/c/fdce0b3590f7
  - [net,5/7] netfilter: nft_ct: fix use-after-free in timeout object destroy
    https://git.kernel.org/netdev/net/c/f8dca15a1b19
  - [net,6/7] netfilter: nfnetlink_queue: make hash table per queue
    https://git.kernel.org/netdev/net/c/936206e3f6ff
  - [net,7/7] selftests: nft_queue.sh: add a parallel stress test
    https://git.kernel.org/netdev/net/c/dde1a6084c5c

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net v5 00/21] rxrpc: Miscellaneous fixes
From: patchwork-bot+netdevbpf @ 2026-04-09  1:50 UTC (permalink / raw)
  To: David Howells
  Cc: netdev, marc.dionne, kuba, davem, edumazet, pabeni, linux-afs,
	linux-kernel
In-Reply-To: <20260408121252.2249051-1-dhowells@redhat.com>

Hello:

This series was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Wed,  8 Apr 2026 13:12:28 +0100 you wrote:
> Here are some fixes for rxrpc:
> 
>  (1) Fix key quota calculation.
> 
>  (2) Fix a memory leak.
> 
>  (3) Fix rxrpc_new_client_call_for_sendmsg() to substitute NULL for an
>      empty key.
> 
> [...]

Here is the summary with links:
  - [net,v5,01/21] rxrpc: Fix key quota calculation for multitoken keys
    https://git.kernel.org/netdev/net/c/bdbfead6d389
  - [net,v5,02/21] rxrpc: Fix key parsing memleak
    https://git.kernel.org/netdev/net/c/b555912b9b21
  - [net,v5,03/21] rxrpc: Fix anonymous key handling
    https://git.kernel.org/netdev/net/c/6a59d84b4fc2
  - [net,v5,04/21] rxrpc: Fix call removal to use RCU safe deletion
    https://git.kernel.org/netdev/net/c/146d4ab94cf1
  - [net,v5,05/21] rxrpc: Fix RxGK token loading to check bounds
    https://git.kernel.org/netdev/net/c/d179a868dd75
  - [net,v5,06/21] rxrpc: Fix use of wrong skb when comparing queued RESP challenge serial
    https://git.kernel.org/netdev/net/c/b33f5741bb18
  - [net,v5,07/21] rxrpc: Fix rack timer warning to report unexpected mode
    https://git.kernel.org/netdev/net/c/65b3ffe0972e
  - [net,v5,08/21] rxrpc: Fix key reference count leak from call->key
    https://git.kernel.org/netdev/net/c/d666540d217e
  - [net,v5,09/21] rxrpc: Fix to request an ack if window is limited
    https://git.kernel.org/netdev/net/c/0cd3e3f3f2ec
  - [net,v5,10/21] rxrpc: Only put the call ref if one was acquired
    https://git.kernel.org/netdev/net/c/6331f1b24a3e
  - [net,v5,11/21] rxrpc: reject undecryptable rxkad response tickets
    https://git.kernel.org/netdev/net/c/fe4447cd9562
  - [net,v5,12/21] rxrpc: fix RESPONSE authenticator parser OOB read
    https://git.kernel.org/netdev/net/c/3e3138007887
  - [net,v5,13/21] rxrpc: fix oversized RESPONSE authenticator length check
    https://git.kernel.org/netdev/net/c/a2567217ade9
  - [net,v5,14/21] rxrpc: fix reference count leak in rxrpc_server_keyring()
    https://git.kernel.org/netdev/net/c/f125846ee79f
  - [net,v5,15/21] rxrpc: Fix key/keyring checks in setsockopt(RXRPC_SECURITY_KEY/KEYRING)
    https://git.kernel.org/netdev/net/c/2afd86ccbb20
  - [net,v5,16/21] rxrpc: Fix missing error checks for rxkad encryption/decryption failure
    https://git.kernel.org/netdev/net/c/f93af41b9f5f
  - [net,v5,17/21] rxrpc: Fix integer overflow in rxgk_verify_response()
    https://git.kernel.org/netdev/net/c/699e52180f42
  - [net,v5,18/21] rxrpc: Fix leak of rxgk context in rxgk_verify_response()
    https://git.kernel.org/netdev/net/c/7e1876caa836
  - [net,v5,19/21] rxrpc: Fix buffer overread in rxgk_do_verify_authenticator()
    https://git.kernel.org/netdev/net/c/f564af387c8c
  - [net,v5,20/21] rxrpc: only handle RESPONSE during service challenge
    https://git.kernel.org/netdev/net/c/c43ffdcfdbb5
  - [net,v5,21/21] rxrpc: proc: size address buffers for %pISpc output
    https://git.kernel.org/netdev/net/c/a44ce6aa2efb

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net-next v6 00/10] Decouple receive and transmit enablement in team driver
From: Kuniyuki Iwashima @ 2026-04-09  1:48 UTC (permalink / raw)
  To: marcharvey
  Cc: andrew+netdev, davem, edumazet, horms, jiri, kuba, linux-kernel,
	linux-kselftest, netdev, pabeni, shuah
In-Reply-To: <CANkEMgk16j2xzQ85JGQ4OWqeiwiVO5Gy-UfkN3omKkKizpLxiQ@mail.gmail.com>

From: Marc Harvey <marcharvey@google.com>
Date: Wed, 8 Apr 2026 17:10:05 -0700
> On Wed, Apr 8, 2026 at 9:40 AM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > It pains me to report on non-debug kernels:
> 
> I'm sorry to have pained you. Despite my best efforts to run with the
> exact same environment and conditions as your CI, my teamd can be
> killed with "teamd -k" but yours hangs (both are version 1.32 on
> Fedora with the same kernel config).

Considering the subsequent "kill" works on the dbg instance (thanks
to 2400s timeout), I guess teamd is somehow stuck at SIGTERM handling
removing team devices in teamd_port_remove_all().  (SIGTERM being masked
sounds unlikely)

https://netdev-ctrl.bots.linux.dev/logs/vmksft/bonding-dbg/results/593802/4-teamd-activebackup-sh/stdout
https://netdev-ctrl.bots.linux.dev/logs/vmksft/bonding-dbg/results/593802/4-teamd-activebackup-sh/stderr
---8<---
[  759.819815][T21724] test_team1: Port device eth1 removed
[  759.822323][T21724] test_team1: Port device eth0 removed
[  790.615687][T21728] test_team2: Port device eth1 removed
[  790.617445][T21728] test_team2: Port device eth0 removed
---8<---

Adding -N and letting "ip netns del" release the last netns refcnt
and defer device destruction to cleanup_net() may help.


> For v7, I’ll invoke "teamd -k"
> using the timeout utility, or just increase the test timeout.

+1 for the latter, maybe set timeout=300.

daemon_pid_file_kill_wait(SIGTERM, 30) * 2 = 120s, but just in case.

See these files for howto:

  $ find tools/testing/selftests/net/ -name settings

^ permalink raw reply

* [PATCH] rose: fix OOB read on short CLEAR REQUEST frames.
From: Ashutosh Desai @ 2026-04-09  1:32 UTC (permalink / raw)
  To: netdev
  Cc: linux-hams, davem, edumazet, kuba, pabeni, horms, linux-kernel,
	Ashutosh Desai

rose_process_rx_frame() dispatches to state machines after calling
rose_decode(), but does not verify the frame is long enough before
doing so. All five state machine handlers read skb->data[3] and
skb->data[4] (cause and diagnostic bytes) when handling a
ROSE_CLEAR_REQUEST frame, yet the only upstream length check is
ROSE_MIN_LEN (3 bytes) in rose_route_frame().

A crafted 3-byte ROSE CLEAR REQUEST frame (bytes: GFI/LCI-high,
LCI-low, 0x13) passes the minimum length gate and reaches the state
machines, where skb->data[3] and skb->data[4] are read one and two
bytes past the valid buffer respectively.

Add a check in rose_process_rx_frame() that drops any CLEAR REQUEST
frame shorter than 5 bytes (3-byte header + cause + diagnostic),
covering all five state machines with a single guard.

Signed-off-by: Ashutosh Desai <ashutoshdesai993@gmail.com>
---
 net/rose/rose_in.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/net/rose/rose_in.c b/net/rose/rose_in.c
index 0276b393f..1ac9a6aee 100644
--- a/net/rose/rose_in.c
+++ b/net/rose/rose_in.c
@@ -271,6 +271,11 @@ int rose_process_rx_frame(struct sock *sk, struct sk_buff *skb)

 	frametype = rose_decode(skb, &ns, &nr, &q, &d, &m);

+	if (frametype == ROSE_CLEAR_REQUEST && skb->len < 5) {
+		kfree_skb(skb);
+		return 0;
+	}
+
 	switch (rose->state) {
 	case ROSE_STATE_1:
 		queued = rose_state1_machine(sk, skb, frametype);
-- 
2.34.1

^ permalink raw reply related

* [PATCH] ax25: fix OOB read after address header strip in ax25_rcv().
From: Ashutosh Desai @ 2026-04-09  1:22 UTC (permalink / raw)
  To: netdev
  Cc: linux-hams, jreuter, davem, edumazet, kuba, pabeni, horms,
	linux-kernel, Ashutosh Desai

ax25_rcv() calls skb_pull(skb, ax25_addr_size(&dp)) to strip the
address header, then immediately reads skb->data[0] and skb->data[1]
without verifying the buffer still contains at least 2 bytes.

A crafted 15-byte KISS frame (1 KISS byte + 14 address bytes with
EBIT set in the source address, no control/PID bytes) passes
ax25_addr_parse() which only requires len >= 14, and passes the KISS
byte check (low nibble == 0). After skb_pull(1) in ax25_kiss_rcv()
and skb_pull(14) in ax25_rcv(), skb->len is 0 and the subsequent
reads of skb->data[0] (control byte) and skb->data[1] (PID byte)
are out of bounds.

Add a check that at least 2 bytes remain after stripping the address
header, freeing the skb and returning on malformed input.

Signed-off-by: Ashutosh Desai <ashutoshdesai993@gmail.com>
---
 net/ax25/ax25_in.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/net/ax25/ax25_in.c b/net/ax25/ax25_in.c
index d75b3e9ed..92baac77f 100644
--- a/net/ax25/ax25_in.c
+++ b/net/ax25/ax25_in.c
@@ -217,6 +217,11 @@ static int ax25_rcv(struct sk_buff *skb, struct net_device *dev,
 	 */
 	skb_pull(skb, ax25_addr_size(&dp));

+	if (skb->len < 2) {
+		kfree_skb(skb);
+		return 0;
+	}
+
 	/* For our port addresses ? */
 	if (ax25cmp(&dest, dev_addr) == 0 && dp.lastrepeat + 1 == dp.ndigi)
 		mine = 1;
-- 
2.34.1

^ permalink raw reply related

* Re: [PATCH net-next v3 3/3] gve: implement PTP gettimex64
From: Jordan Rhee @ 2026-04-09  1:08 UTC (permalink / raw)
  To: Jacob Keller
  Cc: Jakub Kicinski, Harshitha Ramamurthy, netdev, joshwash,
	andrew+netdev, davem, edumazet, pabeni, richardcochran, willemb,
	nktgrg, jfraker, ziweixiao, maolson, thostet, jefrogers,
	alok.a.tiwari, yyd, linux-kernel, Naman Gulati
In-Reply-To: <093a5c92-f94c-49d6-96ea-0c76ff18f9e1@intel.com>

On Wed, Apr 8, 2026 at 3:43 PM Jacob Keller <jacob.e.keller@intel.com> wrote:
>
> On 4/6/2026 1:41 PM, Jordan Rhee wrote:
> > On Fri, Apr 3, 2026 at 2:18 PM Jacob Keller <jacob.e.keller@intel.com> wrote:
> >>
> >> On 4/3/2026 12:44 PM, Harshitha Ramamurthy wrote:
> >>> From: Jordan Rhee <jordanrhee@google.com>
> >>>
> >>> Enable chrony and phc2sys to synchronize system clock to NIC clock.
> >>>
> >>> The system cycle counters are sampled by the device to minimize the
> >>> uncertainty window. If the system times are sampled in the host, the
> >>> delta between pre and post readings is 100us or more due to AQ command
> >>> latency. The system times returned by the device have a delta of ~1us,
> >>> which enables significantly more accurate clock synchronization.
> >>>
> >>> Reviewed-by: Willem de Bruijn <willemb@google.com>
> >>> Reviewed-by: Kevin Yang <yyd@google.com>
> >>> Reviewed-by: Naman Gulati <namangulati@google.com>
> >>> Signed-off-by: Jordan Rhee <jordanrhee@google.com>
> >>> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
> >>> ---
> >>
> >>> +/*
> >>> + * Convert a raw cycle count (e.g. from get_cycles()) to the system clock
> >>> + * type specified by clockid. The system_time_snapshot must be taken before
> >>> + * the cycle counter is sampled.
> >>> + */
> >>> +static int gve_cycles_to_timespec64(struct gve_priv *priv, clockid_t clockid,
> >>> +                                 struct system_time_snapshot *snap,
> >>> +                                 u64 cycles, struct timespec64 *ts)
> >>> +{
> >>> +     struct gve_cycles_to_clock_callback_ctx ctx = {0};
> >>> +     struct system_device_crosststamp xtstamp;
> >>> +     int err;
> >>> +
> >>> +     ctx.cycles = cycles;
> >>> +     err = get_device_system_crosststamp(gve_cycles_to_clock_fn, &ctx, snap,
> >>> +                                         &xtstamp);
> >>> +     if (err) {
> >>> +             dev_err_ratelimited(&priv->pdev->dev,
> >>> +                                 "get_device_system_crosststamp() failed to convert %lld cycles to system time: %d\n",
> >>> +                                 cycles,
> >>> +                                 err);
> >>> +             return err;
> >>> +     }
> >>> +
> >>
> >> This looks a lot like a cross timestamp (i.e. something like PCIe PTM)
> >> Why not just implement the .crosstimestamp and PTP_SYS_OFF_PRECISE? Does
> >> that not work properly? Or is this not really a cross timestamp despite
> >> use of the get_device_system_crosststamp handler? :D
> >
> > .crosstimestamp is for devices that support simultaneous NIC and
> > system timestamps. Devices that don't support simultaneous timestamps
> > have to take a system time sandwich by calling
> > ptp_read_system_prets()/ptp_read_system_postts() on either side of the
> > NIC timestamp. Upper layers (e.g. chrony) use the sandwich delta in
> > nontrivial ways when estimating the system clock / NIC clock offset.
> > This is information that must be preserved, and it would be incorrect
> > to implement .crosstimestamp by returning the midpoint of the
> > sandwich, as tempting as that implementation might be.
> >
>
> True.
>
> > Gvnic does not support simultaneous NIC and system timestamps, so it
> > must use the sandwich technique. Since the NIC timestamp is obtained
> > using a firmware (hypervisor) call, the uncertainty window would be
> > too large if it were taken inside the VM. Gvnic takes the sandwich in
> > the hypervisor and returns the raw TSC values to the VM.
> > get_device_system_crosststamp() is used to convert the TSCs to system
> > times, which I believe is the only correct way to do this conversion.
> > Jordan
> >
>
> Hmm. The function says:
>
> "Synchronously capture system/device timestamp". That is what confuses
> me. Your implementation uses gve_cycles_to_clock_fn() which just sets
> some values in the system_counterval struct and exits. It doesn't
> "capture a system/device timestamp" tuple.
>
> This does feel a bit weird. No other caller appears to exist outside of
> the cross timestamp implementations.
>
> It sounds like what you want is a function that takes a cycles count
> value and does the conversion from TSC to the appropriate clock, along
> with all of the interopolation etc. What you've done is sort of a cludge
> around get_device_system_crosststamp() to force it to do that for you
> without actually using it as intended.
>
> I'd argue it would be better to have a cycles_to_ktime() or something
> which takes the TSC cycles value and the appropriate clock and does the
> exact same flow as get_device_system_crosststamp() for converting the
> cycles into proper ktime values without the mess of the callback
> function etc.

Yes, that's exactly how I'm using get_device_system_crosststamp().
There is no cycles_to_ktime() function, and adding one would make this
patch more difficult to backport to older kernels.

There is precedent for using get_device_system_crosststamp() this way
in the virtio_rtc driver. In viortc_ptp_getcrosststamp(),  the
timestamp is sampled before calling get_device_system_crosststamp(),
and the callback simply populates the return values from the context.
I believe the snapshot parameter was added to support this use case.

>
> I guess in principle what you've implemented is "correct" and
> functional, but it definitely feels a bit weird to use the API in this
> way. It smells like a neat hack instead of a proper interface for this
> purpose.
>
> That said, I won't object strongly if the maintainers are fine with
> using it for this purpose.
>
> Thanks,
> Jake

^ permalink raw reply

* Re: linux-next: manual merge of the net-next tree with the netfilter tree
From: Andrea Mayer @ 2026-04-09  1:04 UTC (permalink / raw)
  To: Mark Brown, Matthieu Baerts
  Cc: David Miller, Jakub Kicinski, Paolo Abeni, Networking,
	Justin Iurman, Linux Kernel Mailing List, Linux Next Mailing List,
	Andrea Mayer
In-Reply-To: <d9d4e328-6668-4c24-bace-77a0f93bbed6@sirena.org.uk>

On Wed, 8 Apr 2026 18:00:50 +0100
Mark Brown <broonie@kernel.org> wrote:

> On Wed, Apr 08, 2026 at 06:43:36PM +0200, Matthieu Baerts wrote:
> > On 08/04/2026 17:08, Mark Brown wrote:
> 
> > > This also needs a fixup for a new jump to the error handling paths that
> > > was added in seg6_build_state().
> 
> > I also had this other conflict there, and I did this when resolving it
> > in MPTCP tree:
> 
> >  +	if (tb[SEG6_IPTUNNEL_SRC]) {
> >  +		slwt->tunsrc = nla_get_in6_addr(tb[SEG6_IPTUNNEL_SRC]);
> >  +
> >  +		if (ipv6_addr_any(&slwt->tunsrc) ||
> >  +		    ipv6_addr_is_multicast(&slwt->tunsrc) ||
> >  +		    ipv6_addr_loopback(&slwt->tunsrc)) {
> >  +			NL_SET_ERR_MSG(extack, "invalid tunsrc address");
> >  +			err = -EINVAL;
> > - 			goto free_dst_cache;
> > ++			goto err_destroy_output;
> >  +		}
> >  +	}
> >  +
> 
> Yes, that's the additional fixup I mentioned above - it didn't conflict
> for me (well, the exit path did).

Thanks Mark and Matthieu for taking care of this.

I went through both commits and the rerere.
The resolution looks correct from the seg6 side. Build-tested.

Happy to help if anything else comes up.

Cheers,
Andrea

^ permalink raw reply

* Re: [PATCH v2] net: dsa: microchip: implement KSZ87xx Module 3 low-loss cable errata
From: Marek Vasut @ 2026-04-09  1:04 UTC (permalink / raw)
  To: Fidelio Lawson, Woojung Huh, UNGLinuxDriver, Andrew Lunn,
	Vladimir Oltean, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Marek Vasut, Maxime Chevallier
  Cc: netdev, devicetree, linux-kernel, Fidelio Lawson
In-Reply-To: <20260408-ksz87xx_errata_low_loss_connections-v2-1-9cfe38691713@exotec.com>

On 4/8/26 1:57 PM, Fidelio Lawson wrote:
> Implement the "Module 3: Equalizer fix for short cables" erratum from
> Microchip document DS80000687C for KSZ87xx switches.
> 
> The issue affects short or low-loss cable links (e.g. CAT5e/CAT6),
> where the PHY receiver equalizer may amplify high-amplitude signals
> excessively, resulting in internal distortion and link establishment
> failures.
> 
> KSZ87xx devices require a workaround for the Module 3 low-loss cable
> condition, controlled through the switch TABLE_LINK_MD_V indirect
> registers.
> 
> The affected registers are part of the switch address space and are not
> directly accessible from the PHY driver. To keep the PHY-facing API
> clean and avoid leaking switch-specific details, model this errata
> control as vendor-specific Clause 22 PHY registers.
> 
> Two vendor-defined bits are introduced in PHY_REG_LOW_LOSS_CTRL,
> and ksz8_r_phy() / ksz8_w_phy() translate accesses to these bits
> into the appropriate indirect TABLE_LINK_MD_V accesses.
> 
> The control register defines the following modes:
>    bits [1:0]:
>      00 = workaround disabled
>      01 = workaround 1 (DSP EQ training adjustment, LinkMD reg 0x3c)
>      10 = workaround 2 (receiver LPF bandwidth, LinkMD reg 0x4c)
> 
> Workaround 1: Adjusts the DSP EQ training behavior via LinkMD register
> 0x3C. Widens and optimizes the DSP EQ compensation range,
> and is expected to solve most short/low-loss cable issues.
> 
> Workaround 2: for the cases where Workaround 1 is not sufficient.
> This one adjusts the receiver low-pass filter bandwidth, effectively
> reducing the high-frequency component of the received signal
> 
> The register is accessible through standard PHY read/write operations
> (e.g. phytool), without requiring any switch-specific userspace
> interface. This allows robust link establishment on short or
> low-loss cabling without requiring DTS properties and without
> constraining hardware design choices.
> 
> The erratum affects the shared PHY analog front-end and therefore
> applies globally to the switch.
> 
> Signed-off-by: Fidelio Lawson <fidelio.lawson@exotec.com>
> ---
> Hello,
> 
> This patch implements the “Module 3: Equalizer fix for short cables” erratum
> described in Microchip document DS80000687C for KSZ87xx switches.
> 
> According to the erratum, the embedded PHY receiver in KSZ87xx switches is
> tuned by default for long, high-loss Ethernet cables. When operating with
> short or low-loss cables (for example CAT5e or CAT6), the PHY equalizer may
> over-amplify the incoming signal, leading to internal distortion and link
> establishment failures.
> 
> Microchip provides two workarounds, each requiring a write to a different
> indirect PHY register access mechanism.
> 
> The workaround requires programming internal PHY/DSP registers located in the
> LinkMD table, accessed through the KSZ8 indirect register mechanism. Since these
> registers belong to the switch address space and are not directly accessible
> from a standalone PHY driver, the erratum control is modeled as a vendor-specific
> Clause 22 PHY register, virtualized by the KSZ8 DSA driver.
> 
> Reads and writes to this register are intercepted by ksz8_r_phy() /
> ksz8_w_phy() and translated into the required TABLE_LINK_MD_V indirect accesses.
> The erratum affects the shared PHY analog front-end and therefore applies
> globally to the switch.
> 
> The register defines three modes:
>    - 0x0: workaround disabled
>    - 0x1: workaround 1 (DSP EQ training adjustment)
>    - 0x2: workaround 2 (receiver low-pass filter bandwidth reduction)
> 
> The register can be read and written from userspace via standard Clause 22 PHY
> accesses (for example using phytool) on DSA user ports.
> 
> This series is based on Linux v7.0-rc1.
> ---
> Changes in v2:
> - Dropped the device tree approache based on review feedback
> - Modeled the errata control as a vendor-specific Clause 22 PHY register
> - Added KSZ87xx-specific guards and replaced magic values with named macros
> - Rebased on Linux v7.0-rc1
> - Link to v1: https://patch.msgid.link/20260326-ksz87xx_errata_low_loss_connections-v1-0-79a698f43626@exotec.com
> ---
>   drivers/net/dsa/microchip/ksz8.c       | 33 +++++++++++++++++++++++++++++++++
>   drivers/net/dsa/microchip/ksz8_reg.h   | 20 +++++++++++++++++++-
>   drivers/net/dsa/microchip/ksz_common.h |  3 +++
>   3 files changed, 55 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/dsa/microchip/ksz8.c b/drivers/net/dsa/microchip/ksz8.c
> index c354abdafc1b..d11da6e9ff54 100644
> --- a/drivers/net/dsa/microchip/ksz8.c
> +++ b/drivers/net/dsa/microchip/ksz8.c
> @@ -1058,6 +1058,11 @@ int ksz8_r_phy(struct ksz_device *dev, u16 phy, u16 reg, u16 *val)
>   		if (ret)
>   			return ret;
>   
> +		break;
> +	case PHY_REG_KSZ87XX_LOW_LOSS:
> +		if (!ksz_is_ksz87xx(dev))
> +			return -EOPNOTSUPP;
> +		data = dev->low_loss_wa_mode;
>   		break;
>   	default:
>   		processed = false;
> @@ -1271,6 +1276,34 @@ int ksz8_w_phy(struct ksz_device *dev, u16 phy, u16 reg, u16 val)
>   		if (ret)
>   			return ret;
>   		break;
> +	case PHY_REG_KSZ87XX_LOW_LOSS:
> +		if (!ksz_is_ksz87xx(dev))
> +			return -EOPNOTSUPP;
> +
> +		switch (val & PHY_KSZ87XX_LOW_LOSS_MASK) {
> +		case PHY_LOW_LOSS_ERRATA_DISABLED:
> +			ret = ksz8_ind_write8(dev, TABLE_LINK_MD, KSZ87XX_REG_EQ_TRAIN,
> +					      KSZ87XX_EQ_TRAIN_DEFAULT);
> +			if (!ret)
> +				ret = ksz8_ind_write8(dev, TABLE_LINK_MD,
> +						      KSZ87XX_REG_PHY_LPF,
> +						      KSZ87XX_PHY_LPF_DEFAULT);
> +			break;
> +		case KSZ87XX_LOW_LOSS_WA_EQ:
> +			ret = ksz8_ind_write8(dev, TABLE_LINK_MD, KSZ87XX_REG_EQ_TRAIN,
> +					      KSZ87XX_EQ_TRAIN_LOW_LOSS);
> +			break;
> +		case KSZ87XX_LOW_LOSS_WA_LPF:
> +			ret = ksz8_ind_write8(dev, TABLE_LINK_MD, KSZ87XX_REG_PHY_LPF,
> +					      KSZ87XX_PHY_LPF_62MHZ);

Please adjust this and make the low pass filter bandwidth actually 
configurable according to the values supported by the hardware, see this 
article:

https://microchip.my.site.com/s/article/Solution-for-Using-CAT-5E-or-CAT-6-Short-Cable-with-a-Link-Issue-for-the-KSZ8795-Family

The indirect register (0x4C) is an 8-bit register. The bits [7:6] are 
described in the table below.

Low pass filter bandwidth
00 = 90MHz
01 = 62MHz
10 = 55MHz
11 = 44MHz

...

^ permalink raw reply

* Re: [PATCH net-next v3 00/10] enic: SR-IOV V2 admin channel and MBOX protocol
From: Jakub Kicinski @ 2026-04-09  1:02 UTC (permalink / raw)
  To: Satish Kharat via B4 Relay
  Cc: satishkh, Andrew Lunn, David S. Miller, Eric Dumazet, Paolo Abeni,
	netdev, linux-kernel,
	20260401-enic-sriov-v2-prep-v4-0-d5834b2ef1b9, Breno Leitao
In-Reply-To: <20260408-enic-sriov-v2-admin-channel-v2-v3-0-1d4999a03cec@cisco.com>

On Wed, 08 Apr 2026 09:36:21 -0700 Satish Kharat via B4 Relay wrote:
> This series adds the admin channel infrastructure and mailbox (MBOX)
> protocol needed for V2 SR-IOV support in the enic driver.
> 
> The V2 SR-IOV design uses a direct PF-VF communication channel built on
> dedicated WQ/RQ/CQ hardware resources and an MSI-X interrupt.

read this please:
https://www.kernel.org/doc/html/next/process/maintainer-netdev.html
-- 
pv-bot: 24h

^ permalink raw reply

* Re: [PATCH iwl-next 1/2] i40e: implement basic per-queue stats
From: Jakub Kicinski @ 2026-04-09  0:45 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: intel-wired-lan, Tony Nguyen, Przemek Kitszel, Andrew Lunn,
	David S. Miller, Eric Dumazet, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
	Stanislav Fomichev, netdev
In-Reply-To: <0815f1eb4b60faa653ea703e420395b724d05216.1775648513.git.pabeni@redhat.com>

On Wed,  8 Apr 2026 13:43:45 +0200 Paolo Abeni wrote:
> +static void i40e_get_queue_stats_tx(struct net_device *dev, int idx,
> +				    struct netdev_queue_stats_tx *tx)
> +{
> +	struct i40e_netdev_priv *np = netdev_priv(dev);
> +	struct i40e_vsi *vsi = np->vsi;
> +	struct i40e_ring *tx_ring;
> +
> +	rcu_read_lock();
> +	tx_ring = READ_ONCE(vsi->tx_rings[idx]);
> +	if (!tx_ring)
> +		goto out;
> +
> +	i40e_zero_tx_ring_stats(tx);
> +	i40e_add_tx_ring_stats(tx_ring, tx);
> +
> +	if (i40e_enabled_xdp_vsi(vsi)) {
> +		tx_ring = READ_ONCE(vsi->xdp_rings[idx]);
> +		if (tx_ring)
> +			i40e_add_tx_ring_stats(tx_ring, tx);

If XDP Tx happens on dedicated queues it should be added to base,
not to the stats of the "stack" queue. This is in anticipation of
XDP being its own queue type one day, we'll then isolate those
out of base. Ripping the out of TX could cause regressions.

> +	}
> +
> +out:
> +	rcu_read_unlock();
> +}
> +
> +static void i40e_get_base_stats(struct net_device *dev,
> +				struct netdev_queue_stats_rx *rx,
> +				struct netdev_queue_stats_tx *tx)
> +{
> +	struct i40e_netdev_priv *np = netdev_priv(dev);
> +	struct i40e_vsi *vsi = np->vsi;
> +
> +	tx->bytes = vsi->tx_bytes;
> +	tx->packets = vsi->tx_packets;
> +	tx->wake = vsi->tx_restart_base;
> +	tx->stop = vsi->tx_stopped_base;
> +	tx->hw_drops = vsi->tx_busy_base;
> +
> +	rx->bytes = vsi->rx_bytes;
> +	rx->packets = vsi->rx_packets;
> +}

^ permalink raw reply

* Re: [PATCH net-next v2 5/5] ethtool: strset: check nla_len overflow
From: Jakub Kicinski @ 2026-04-09  0:39 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Hangbin Liu, Donald Hunter, David S. Miller, Eric Dumazet,
	Paolo Abeni, Simon Horman, Andrew Lunn, netdev, linux-kernel
In-Reply-To: <adaFjwkOrPoBgzoc@devvm17672.vll0.facebook.com>

On Wed, 8 Apr 2026 09:43:35 -0700 Stanislav Fomichev wrote:
> On 04/08, Hangbin Liu wrote:
> > The netlink attribute length field nla_len is a __u16, which can only
> > represent values up to 65535 bytes. NICs with a large number of
> > statistics strings (e.g. mlx5_core with thousands of ETH_SS_STATS
> > entries) can produce a ETHTOOL_A_STRINGSET_STRINGS nest that exceeds
> > this limit.
> > 
> > When nla_nest_end() writes the actual nest size back to nla_len, the
> > value is silently truncated. This results in a corrupted netlink message
> > being sent to userspace: the parser reads a wrong (truncated) attribute
> > length and misaligns all subsequent attribute boundaries, causing decode
> > errors.
> > 
> > Fix this by using the new helper nla_nest_end_safe and error out if
> > the size exceeds U16_MAX.  
> 
> Not sure that's the user supposed to do? Does it mean there is no way
> to retrieve ETHTOOL_A_STRINGSET_STRINGS for those devices with too
> many strings?

Not via Netlink, they can still read them via the ioctl?
Since the legacy stats themselves can't be fetched over Netlink 
I'm not sure we should lose sleep over reading the stats strings 
via Netlink.

^ permalink raw reply

* Re: [PATCH net-next v2 1/5] tools: ynl: move ethtool.py to selftest
From: Jakub Kicinski @ 2026-04-09  0:37 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Hangbin Liu, Donald Hunter, David S. Miller, Eric Dumazet,
	Paolo Abeni, Simon Horman, Andrew Lunn, netdev, linux-kernel
In-Reply-To: <adaFLI2b5eb2mZHx@devvm17672.vll0.facebook.com>

On Wed, 8 Apr 2026 09:42:33 -0700 Stanislav Fomichev wrote:
> > @@ -8,7 +8,7 @@ KSELFTEST_KTAP_HELPERS="$(dirname "$(realpath "$0")")/../../../testing/selftests
> >  source "$KSELFTEST_KTAP_HELPERS"
> >  
> >  # Default ynl-ethtool path for direct execution, can be overridden by make install
> > -ynl_ethtool="../pyynl/ethtool.py"
> > +ynl_ethtool="./ethtool.py"
> >  
> >  readonly NSIM_ID="1337"
> >  readonly NSIM_DEV_NAME="nsim${NSIM_ID}"  
> 
> Do we need to add some expects/asserts to the script to really make it into
> a test? Right now it just prints things, so it's not really a test.

This file is full of asserts? It's a bash script that runs ethtool.py 
and checks the output. Which one of us is missing the point ? :)

^ permalink raw reply

* Re: [PATCH net-next v6 00/10] Decouple receive and transmit enablement in team driver
From: Jakub Kicinski @ 2026-04-09  0:31 UTC (permalink / raw)
  To: Marc Harvey
  Cc: Jiri Pirko, Andrew Lunn, David S. Miller, Eric Dumazet,
	Paolo Abeni, Shuah Khan, Simon Horman, netdev, linux-kernel,
	linux-kselftest
In-Reply-To: <CANkEMgk16j2xzQ85JGQ4OWqeiwiVO5Gy-UfkN3omKkKizpLxiQ@mail.gmail.com>

On Wed, 8 Apr 2026 17:10:05 -0700 Marc Harvey wrote:
> On Wed, Apr 8, 2026 at 9:40 AM Jakub Kicinski <kuba@kernel.org> wrote:
> > It pains me to report on non-debug kernels:  
> 
> I'm sorry to have pained you.

To be clear it's a compassionate pain on your behalf, I don' care :)

> Despite my best efforts to run with the exact same environment and
> conditions as your CI, my teamd can be killed with "teamd -k" but
> yours hangs (both are version 1.32 on Fedora with the same kernel
> config). For v7, I’ll invoke "teamd -k" using the timeout utility, or
> just increase the test timeout.


^ permalink raw reply

* Re: [RFC net-next 2/4] selftests: drv-net: tso: add helpers for double tunneling GSO
From: Jakub Kicinski @ 2026-04-09  0:27 UTC (permalink / raw)
  To: Xu Du
  Cc: davem, edumazet, pabeni, horms, shuah, netdev, linux-kselftest,
	linux-kernel
In-Reply-To: <CAA92KxmXOpH=CUqEdVV2948XhU1oFh8PBYLUt=ktSsK3sOh40g@mail.gmail.com>

On Wed, 8 Apr 2026 10:04:09 +0800 Xu Du wrote:
> On Tue, Apr 7, 2026 at 11:08 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > On Tue,  7 Apr 2026 10:45:09 +0800 Xu Du wrote:  
> > > As the YNL Python module cannot be invoked across different devices or
> > > environments directly in its current form, the helper abstracts the
> > > YNL CLI calls to ensure proper configuration of the tunneling device
> > > features.  
> >
> > Can you explain more? Why can't you use class RtnlFamily?
> 
> I want to test the gro-hint parameter functionality of the GENEVE tunnel,
> so I intend to use YNL for the testing. I am conducting the test between
> two machines using SSH type. I want to add the gro-hint parameter on
> both the local and remote nodes; however, I am unable to invoke class
> RtnlFamily on the remote node via SSH.

Oh. But that's not really what you're doing:

+def ynlcli(family, args, json=None, ns=None, host=None):
+    if (KSFT_DIR / "kselftest-list.txt").exists():
+        cli = KSFT_DIR / "net/lib/ynl/pyynl/cli.py"
+        spec = KSFT_DIR / f"net/lib/specs/{family}.yaml"
+    else:
+        cli = KSRC / "tools/net/ynl/pyynl/cli.py"
+        spec = KSRC / f"Documentation/netlink/specs/{family}.yaml"
+    if not cli.exists():
+        raise FileNotFoundError(f"cli not found at {cli}")
+    args = f"--spec {spec} --no-schema {args}"
+    return tool(cli.as_posix(), args, json=json, ns=ns, host=host, shell=True)

You're not deploying anything to the remote system.
Are you assuming that the remote system magically has the same
filesystem layout?

You can use the ynl CLI but it has to be whatever version is on 
the remote system. Just call ynl --family rt-link, don't dig
around for the spec paths etc.

^ permalink raw reply

* Re: [PATCH net-next v6 00/10] Decouple receive and transmit enablement in team driver
From: Marc Harvey @ 2026-04-09  0:10 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Jiri Pirko, Andrew Lunn, David S. Miller, Eric Dumazet,
	Paolo Abeni, Shuah Khan, Simon Horman, netdev, linux-kernel,
	linux-kselftest
In-Reply-To: <20260408094015.3b8359c1@kernel.org>

On Wed, Apr 8, 2026 at 9:40 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> It pains me to report on non-debug kernels:

I'm sorry to have pained you. Despite my best efforts to run with the
exact same environment and conditions as your CI, my teamd can be
killed with "teamd -k" but yours hangs (both are version 1.32 on
Fedora with the same kernel config). For v7, I’ll invoke "teamd -k"
using the timeout utility, or just increase the test timeout.

^ permalink raw reply

* [PATCH] MAINTAINERS: Remove Salil Mehta as HiSilicon HNS3/HNS Ethernet maintainer
From: Salil Mehta @ 2026-04-09  0:04 UTC (permalink / raw)
  To: davem, netdev, kuba; +Cc: salil.mehta, shenjian15, shaojijie, Salil Mehta

From: Salil Mehta <salil.mehta@opnsrc.net>

Closing this chapter and a long wonderful journey with my team, I sign off one
last time with my Huawei email address. Remove my maintainer entry for the
HiSilicon HNS and HNS3 10G/100G Ethernet drivers, and add a CREDITS entry for
my co-authorship and maintenance contributions to these drivers.

Link: https://lore.kernel.org/netdev/259cd032-2ccb-452b-8524-75bc7162e138@huawei.com/
Cc: Jian Shen <shenjian15@huawei.com>
Cc: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
---
 CREDITS     | 10 ++++++++++
 MAINTAINERS |  2 --
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/CREDITS b/CREDITS
index 9091bac3d2da..a03b00452a1e 100644
--- a/CREDITS
+++ b/CREDITS
@@ -3592,6 +3592,16 @@ E: wsalamon@tislabs.com
 E: wsalamon@nai.com
 D: portions of the Linux Security Module (LSM) framework and security modules
 
+N: Salil Mehta
+E: salil.mehta@opnsrc.net
+D: Co-authored Huawei/HiSilicon Kunpeng 920 SoC HNS3 PF and VF 100G
+D: Ethernet driver
+D: Co-authored Huawei/HiSilicon Kunpeng 916 SoC HNS 10G Ethernet
+D: driver enhancements
+D: Maintained Huawei/HiSilicon HNS and HNS3 10G/100G Ethernet drivers
+D: for Kunpeng 916 family, 920 family of SoCs
+S: Cambridge, Cambridgeshire, United Kingdom
+
 N: Robert Sanders
 E: gt8134b@prism.gatech.edu
 D: Dosemu
diff --git a/MAINTAINERS b/MAINTAINERS
index 9d1e6d3acbac..97d0bc3108de 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11530,7 +11530,6 @@ F:	drivers/bus/hisi_lpc.c
 
 HISILICON NETWORK SUBSYSTEM 3 DRIVER (HNS3)
 M:	Jian Shen <shenjian15@huawei.com>
-M:	Salil Mehta <salil.mehta@huawei.com>
 M:	Jijie Shao <shaojijie@huawei.com>
 L:	netdev@vger.kernel.org
 S:	Maintained
@@ -11545,7 +11544,6 @@ F:	drivers/net/ethernet/hisilicon/hibmcge/
 
 HISILICON NETWORK SUBSYSTEM DRIVER
 M:	Jian Shen <shenjian15@huawei.com>
-M:	Salil Mehta <salil.mehta@huawei.com>
 L:	netdev@vger.kernel.org
 S:	Maintained
 W:	http://www.hisilicon.com
-- 
2.34.1


^ permalink raw reply related

* Re: [PATCH net v3 1/2] seg6: separate dst_cache for input and output paths in seg6 lwtunnel
From: Jakub Kicinski @ 2026-04-09  0:08 UTC (permalink / raw)
  To: Andrea Mayer
  Cc: netdev, davem, edumazet, pabeni, horms, dsahern, david.lebrun,
	stefano.salsano, paolo.lungaroni, nicolas.dichtel, justin.iurman,
	linux-kernel, shuah, linux-kselftest
In-Reply-To: <20260409012708.78040268c05e3285742157ae@uniroma2.it>

On Thu, 9 Apr 2026 01:27:08 +0200 Andrea Mayer wrote:
> It does seem orthogonal to the dst_cache split and worth investigating.
> I'll take a look.

I fingered the review send command, to be clear the series we're
commenting on has been applied already.

^ permalink raw reply

* Re: [PATCH net-next v2 3/4] bpf-timestamp: keep track of the skb when wait_for_space occurs
From: Jason Xing @ 2026-04-08 23:52 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: Willem de Bruijn, Jakub Sitnicki, davem, edumazet, kuba, pabeni,
	horms, willemb, martin.lau, netdev, bpf, Jason Xing, Yushan Zhou
In-Reply-To: <20264818422.Ya8u.martin.lau@linux.dev>

On Thu, Apr 9, 2026 at 2:13 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On Wed, Apr 08, 2026 at 11:15:09AM -0400, Willem de Bruijn wrote:
> > > Avoiding adding a new one makes the whole work extremely hard. I'm
> > > wondering since we have hwtstamp in shared info, why not add a
> > > software one for timestamping use? Then, we would support more
> > > different protocols in more different stages in a finer grain, which
> > > is a big coarse picture in my mind.
> >
> > I don't understand the need to store more data in the skb for BPF.

My take is that it's not for BPF only. BPF is an approach that achieves
network observability safely. The ultimate goal is to serve the
network area.

>
> Adding a field specific to bpf timestamping is not scalable.
> There will always be other bpf use cases that need to store
> something in a skb.
>
> There have been discussions about storing metadata for a skb which should
> solve the general bpf use cases.
>
> https://msgid.link/20260226-skb-local-storage-v1-0-4ca44f0dd9d1@cloudflare.com/
> https://msgid.link/20260110-skb-meta-fixup-skb_metadata_set-calls-v1-0-1047878ed1b0@cloudflare.com/

I've been tracing this feature continuously since Jakub gave a talk in
netdev 0x19. We've discussed a bit there with the overall plan. Yes, in
theory, it works. But if we put it into practice, I totally have no idea.

In terms of the latest patchset, it doesn't meet my requirement:
1. in the stages below TCP stack, we're unable to read the timestamp
recorded in the orig skb.
2. it only serves the low rate flow, which is definitely not what I
expect. Please see the original verification from the cover letter:

"Rounding up to ~500 nsec per packet:

- at 100k pps, that's 5% of the 10 usec per-packet budget, but
- at 1 Mpps, that's already 50% of the budget, which is not acceptable.

While definitely not suitable for high-pps flows, the naive skb local
storage implementation is arguably acceptable at low rates, for example
when you need to attach metadata only to the first packet of a TCP/QUIC
connection or sample packets at very low rates for tracing."

As I replied to Willem, the look-up process is also very
time-consuming. It can be a temporary tool but not a always-on plan in
production.

Thanks,
Jason

^ permalink raw reply

* RE: [Intel-wired-lan] [PATCH iwl-net] ice: update PCS latency settings for E825 10G/25Gb modes
From: Mekala, SunithaX D @ 2026-04-08 23:45 UTC (permalink / raw)
  To: Nitka, Grzegorz, intel-wired-lan@lists.osuosl.org
  Cc: netdev@vger.kernel.org, Loktionov, Aleksandr, Nguyen, Anthony L,
	Fodor, Zoltan, Keller, Jacob E
In-Reply-To: <20260217225956.1593920-1-grzegorz.nitka@intel.com>

> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of Grzegorz Nitka
> Sent: Tuesday, February 17, 2026 3:00 PM
> To: intel-wired-lan@lists.osuosl.org
> Cc: netdev@vger.kernel.org; Loktionov, Aleksandr <aleksandr.loktionov@intel.com>; Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Fodor, Zoltan <zoltan.fodor@intel.com>; Keller, Jacob E > <jacob.e.keller@intel.com>
> Subject: [Intel-wired-lan] [PATCH iwl-net] ice: update PCS latency settings for E825 10G/25Gb modes
>
> Update MAC Rx/Tx offset registers settings (PHY_MAC_[RX|TX]_OFFSET
> registers) with the data obtained with the latest research. It applies
> to PCS latency settings for the following speeds/modes:
> * 10Gb NO-FEC
>         - TX latency changed from 71.25 ns to 73 ns
>         - RX latency changed from -25.6 ns to -28 ns
> * 25Gb NO-FEC
>	- TX latency changed from 28.17 ns to 33 ns
>         - RX latency changed from -12.45 ns to -12 ns
> * 25Gb RS-FEC
>         - TX latency changed from 64.5 ns to 69 ns
>         - RX latency changed from -3.6 ns to -3 ns
>
> The original data came from simulation and pre-production hardware.
> The new data measures the actual delays and as such is more accurate.
>
> Fixes: 7cab44f1c35f ("ice: Introduce ETH56G PHY model for E825C products")
> Co-developed-by: Zoltan Fodor <zoltan.fodor@intel.com>
> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
> Signed-off-by: Zoltan Fodor <zoltan.fodor@intel.com>
> Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
> ---
>  drivers/net/ethernet/intel/ice/ice_ptp_consts.h | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)

Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> (A Contingent worker at Intel)

^ permalink raw reply

* [PATCH net] net: ethernet: mtk_eth_soc: initialize PPE per-tag-layer MTU registers
From: Daniel Golle @ 2026-04-08 23:33 UTC (permalink / raw)
  To: Felix Fietkau, Lorenzo Bianconi, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Matthias Brugger,
	AngeloGioacchino Del Regno, Pablo Neira Ayuso, netdev,
	linux-kernel, linux-arm-kernel, linux-mediatek
  Cc: Chad Monroe, Elad Yifee, João Duarte, John Crispin

The PPE enforces output frame size limits via per-tag-layer VLAN_MTU
registers that the driver never initializes. The hardware defaults do
not account for PPPoE overhead, causing the PPE to punt encapsulated
frames back to the CPU instead of forwarding them.

Initialize the registers at PPE start and on MTU changes using the
maximum GMAC MTU. This is a conservative approximation -- the actual
per-PPE requirement depends on egress path, but using the global
maximum ensures the limits are never too small.

Fixes: ba37b7caf1ed2 ("net: ethernet: mtk_eth_soc: add support for initializing the PPE")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
---
 drivers/net/ethernet/mediatek/mtk_eth_soc.c | 22 ++++++++++++++-
 drivers/net/ethernet/mediatek/mtk_ppe.c     | 30 +++++++++++++++++++++
 drivers/net/ethernet/mediatek/mtk_ppe.h     |  1 +
 3 files changed, 52 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index 1767463475de4..06880fa86f0ea 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -3922,12 +3922,23 @@ static int mtk_device_event(struct notifier_block *n, unsigned long event, void
 	return NOTIFY_DONE;
 }
 
+static int mtk_max_gmac_mtu(struct mtk_eth *eth)
+{
+	int i, max_mtu = ETH_DATA_LEN;
+
+	for (i = 0; i < ARRAY_SIZE(eth->netdev); i++)
+		if (eth->netdev[i] && eth->netdev[i]->mtu > max_mtu)
+			max_mtu = eth->netdev[i]->mtu;
+
+	return max_mtu;
+}
+
 static int mtk_open(struct net_device *dev)
 {
 	struct mtk_mac *mac = netdev_priv(dev);
 	struct mtk_eth *eth = mac->hw;
 	struct mtk_mac *target_mac;
-	int i, err, ppe_num;
+	int i, err, ppe_num, mtu;
 
 	ppe_num = eth->soc->ppe_num;
 
@@ -3974,6 +3985,10 @@ static int mtk_open(struct net_device *dev)
 			mtk_gdm_config(eth, target_mac->id, gdm_config);
 		}
 
+		mtu = mtk_max_gmac_mtu(eth);
+		for (i = 0; i < ARRAY_SIZE(eth->ppe); i++)
+			mtk_ppe_update_mtu(eth->ppe[i], mtu);
+
 		napi_enable(&eth->tx_napi);
 		napi_enable(&eth->rx_napi[0].napi);
 		mtk_tx_irq_enable(eth, MTK_TX_DONE_INT);
@@ -4788,6 +4803,7 @@ static int mtk_change_mtu(struct net_device *dev, int new_mtu)
 	int length = new_mtu + MTK_RX_ETH_HLEN;
 	struct mtk_mac *mac = netdev_priv(dev);
 	struct mtk_eth *eth = mac->hw;
+	int max_mtu, i;
 
 	if (rcu_access_pointer(eth->prog) &&
 	    length > MTK_PP_MAX_BUF_SIZE) {
@@ -4798,6 +4814,10 @@ static int mtk_change_mtu(struct net_device *dev, int new_mtu)
 	mtk_set_mcr_max_rx(mac, length);
 	WRITE_ONCE(dev->mtu, new_mtu);
 
+	max_mtu = mtk_max_gmac_mtu(eth);
+	for (i = 0; i < ARRAY_SIZE(eth->ppe); i++)
+		mtk_ppe_update_mtu(eth->ppe[i], max_mtu);
+
 	return 0;
 }
 
diff --git a/drivers/net/ethernet/mediatek/mtk_ppe.c b/drivers/net/ethernet/mediatek/mtk_ppe.c
index 79c11e179d894..9e829a2e4354e 100644
--- a/drivers/net/ethernet/mediatek/mtk_ppe.c
+++ b/drivers/net/ethernet/mediatek/mtk_ppe.c
@@ -973,6 +973,36 @@ static void mtk_ppe_init_foe_table(struct mtk_ppe *ppe)
 	}
 }
 
+void mtk_ppe_update_mtu(struct mtk_ppe *ppe, int mtu)
+{
+	int base;
+	u32 val;
+
+	if (!ppe)
+		return;
+
+	/* The PPE checks output frame size against per-tag-layer MTU limits,
+	 * treating PPPoE and DSA tags just like 802.1Q VLAN tags. The Linux
+	 * device MTU already accounts for PPPoE (PPPOE_SES_HLEN) and DSA tag
+	 * overhead, but 802.1Q VLAN tags are handled transparently without
+	 * being reflected by the lower device MTU being increased by 4.
+	 * Use the maximum MTU across all GMAC interfaces so that PPE output
+	 * frame limits are sufficiently high regardless of which port a flow
+	 * egresses through.
+	 */
+	base = ETH_HLEN + mtu;
+
+	val = FIELD_PREP(MTK_PPE_VLAN_MTU0_NONE, base) |
+	      FIELD_PREP(MTK_PPE_VLAN_MTU0_1TAG, base + VLAN_HLEN);
+	ppe_w32(ppe, MTK_PPE_VLAN_MTU0, val);
+
+	val = FIELD_PREP(MTK_PPE_VLAN_MTU1_2TAG,
+			 base + 2 * VLAN_HLEN) |
+	      FIELD_PREP(MTK_PPE_VLAN_MTU1_3TAG,
+			 base + 3 * VLAN_HLEN);
+	ppe_w32(ppe, MTK_PPE_VLAN_MTU1, val);
+}
+
 void mtk_ppe_start(struct mtk_ppe *ppe)
 {
 	u32 val;
diff --git a/drivers/net/ethernet/mediatek/mtk_ppe.h b/drivers/net/ethernet/mediatek/mtk_ppe.h
index 223f709e2704f..ba85e39a155bf 100644
--- a/drivers/net/ethernet/mediatek/mtk_ppe.h
+++ b/drivers/net/ethernet/mediatek/mtk_ppe.h
@@ -346,6 +346,7 @@ struct mtk_ppe {
 struct mtk_ppe *mtk_ppe_init(struct mtk_eth *eth, void __iomem *base, int index);
 
 void mtk_ppe_deinit(struct mtk_eth *eth);
+void mtk_ppe_update_mtu(struct mtk_ppe *ppe, int mtu);
 void mtk_ppe_start(struct mtk_ppe *ppe);
 int mtk_ppe_stop(struct mtk_ppe *ppe);
 int mtk_ppe_prepare_reset(struct mtk_ppe *ppe);
-- 
2.53.0

^ permalink raw reply related

* Re: [PATCH net v3 1/2] seg6: separate dst_cache for input and output paths in seg6 lwtunnel
From: Andrea Mayer @ 2026-04-08 23:27 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, davem, edumazet, pabeni, horms, dsahern, david.lebrun,
	stefano.salsano, paolo.lungaroni, nicolas.dichtel, justin.iurman,
	linux-kernel, shuah, linux-kselftest, Andrea Mayer
In-Reply-To: <20260408032039.314762-1-kuba@kernel.org>

On Tue,  7 Apr 2026 20:20:38 -0700
Jakub Kicinski <kuba@kernel.org> wrote:

> This is an AI-generated review of your patch. The human sending this
> email says "Seems orthogonal but also legit"
> ---
> seg6: separate dst_cache for input and output paths in seg6 lwtunnel
> 
> This commit splits the shared seg6 lwtunnel dst_cache into separate input
> and output caches to prevent blind reuse across different routing contexts.
> 
> > diff --git a/net/ipv6/seg6_iptunnel.c b/net/ipv6/seg6_iptunnel.c
> > index 3e1b9991131a2..d6a0f7df90807 100644
> > --- a/net/ipv6/seg6_iptunnel.c
> > +++ b/net/ipv6/seg6_iptunnel.c
> 
> [ ... ]
> 
> > @@ -504,7 +505,7 @@ static int seg6_input_core(struct net *net, struct sock *sk,
> >  		/* cache only if we don't create a dst reference loop */
> >  		if (!dst->error && lwtst != dst->lwtstate) {
> >  			local_bh_disable();
> > -			dst_cache_set_ip6(&slwt->cache, dst,
> > +			dst_cache_set_ip6(&slwt->cache_input, dst,
> >  					  &ipv6_hdr(skb)->saddr);
> >  			local_bh_enable();
> >  		}
> 
> This code appears to cache the dst entry resolved just prior by calling
> ip6_route_input(skb).
> 
> Does ip6_route_input() assign a NOREF dst to the skb?
> 
> If so, this dst is protected only by RCU. If the underlying route is
> concurrently deleted, its refcount can drop to zero. When
> dst_cache_set_ip6() is called, it attempts to use dst_hold(dst).
> 
> If the refcount is already zero, dst_hold() will warn and skip incrementing
> the refcount, but dst_cache_set_ip6() will still store the pointer.
> 
> Could this cause the cache to hold a dangling pointer, leading to a
> use-after-free once the RCU grace period expires?
> 
> Should this path use skb_dst_force(skb) or dst_hold_safe(dst) to safely
> upgrade the dst to a refcounted reference before it is cached?
> -- 
> pw-bot: cr

Thanks Jakub.

It does seem orthogonal to the dst_cache split and worth investigating.
I'll take a look.

Andrea

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox