Netdev List
 help / color / mirror / Atom feed
* [PATCH v3] net/mlx5: Fix OOB access and stack information leak in PTP event handling
From: Prathamesh Deshpande @ 2026-04-10  1:53 UTC (permalink / raw)
  To: Carolina Jubran, Saeed Mahameed, Leon Romanovsky
  Cc: Richard Cochran, Tariq Toukan, Mark Bloch, netdev, linux-rdma,
	linux-kernel, Prathamesh Deshpande

In mlx5_pps_event(), several critical issues were identified during
review by Sashiko:

1. The 'pin' index from the hardware event was used without bounds
   checking to index 'pin_config' and 'pps_info->start', leading to
   potential out-of-bounds memory access.
2. 'ptp_event' was not zero-initialized. Since it contains a union,
   assigning a timestamp partially leaves the 'ts_raw' field with
   uninitialized stack memory, which can leak kernel data or
   corrupt time sync logic in hardpps().
3. A NULL 'pin_config' could be dereferenced if initialization failed.
4. 'clock->ptp' could be NULL if ptp_clock_register() failed.

Fix these by zero-initializing the event struct, adding a bounds
check against n_pins, and adding appropriate NULL guards.

Fixes: 7c39afb394c7 ("net/mlx5: PTP code migration to driver core section")
Suggested-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Prathamesh Deshpande <prathameshdeshpande7@gmail.com>
---
v3:
- Fix union corruption by using a local timestamp variable [Sashiko].
- Validate pin index against n_pins with WARN_ON_ONCE [Carolina].
- Remove redundant pin < 0 check and cleanup TODO comment.
v2:
- Zero-initialize ptp_event to prevent stack information leak [Sashiko].
- Add bounds check for hardware pin index to prevent OOB access [Sashiko].
- Add NULL guard for pin_config to handle initialization failures [Sashiko].
- Add NULL check for clock->ptp as originally intended.

 .../net/ethernet/mellanox/mlx5/core/lib/clock.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c
index bd4e042077af..674dd048a6b8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c
@@ -1164,16 +1164,22 @@ static int mlx5_pps_event(struct notifier_block *nb,
 							       pps_nb);
 	struct mlx5_core_dev *mdev = clock_state->mdev;
 	struct mlx5_clock *clock = mdev->clock;
-	struct ptp_clock_event ptp_event;
+	struct ptp_clock_event ptp_event = {};
 	struct mlx5_eqe *eqe = data;
 	int pin = eqe->data.pps.pin;
 	unsigned long flags;
 	u64 ns;
 
+	if (!clock->ptp_info.pin_config)
+		return NOTIFY_OK;
+
+	if (WARN_ON_ONCE(pin >= clock->ptp_info.n_pins))
+		return NOTIFY_OK;
+
 	switch (clock->ptp_info.pin_config[pin].func) {
 	case PTP_PF_EXTTS:
 		ptp_event.index = pin;
-		ptp_event.timestamp = mlx5_real_time_mode(mdev) ?
+		ns = mlx5_real_time_mode(mdev) ?
 			mlx5_real_time_cyc2time(clock,
 						be64_to_cpu(eqe->data.pps.time_stamp)) :
 			mlx5_timecounter_cyc2time(clock,
@@ -1181,12 +1187,13 @@ static int mlx5_pps_event(struct notifier_block *nb,
 		if (clock->pps_info.enabled) {
 			ptp_event.type = PTP_CLOCK_PPSUSR;
 			ptp_event.pps_times.ts_real =
-					ns_to_timespec64(ptp_event.timestamp);
+					ns_to_timespec64(ns);
 		} else {
 			ptp_event.type = PTP_CLOCK_EXTTS;
+			ptp_event.timestamp = ns;
 		}
-		/* TODOL clock->ptp can be NULL if ptp_clock_register fails */
-		ptp_clock_event(clock->ptp, &ptp_event);
+		if (clock->ptp)
+			ptp_clock_event(clock->ptp, &ptp_event);
 		break;
 	case PTP_PF_PEROUT:
 		if (clock->shared) {
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH v2] net/mlx5: Fix OOB access and stack information leak in
From: Prathamesh Deshpande @ 2026-04-10  2:00 UTC (permalink / raw)
  To: cjubran
  Cc: leon, linux-kernel, linux-rdma, mbloch, netdev,
	prathameshdeshpande7, richardcochran, saeedm, tariqt
In-Reply-To: <3a238d0c-4ec1-432d-995a-19d7db3e310e@nvidia.com>

On Thu, Apr 9, 2026 at 17:16 +0300, Carolina Jubran wrote:
> pin is defined as u8 in struct mlx5_eqe_pps, so pin < 0 is dead code.
> 
> As for the upper bound: in order to receive a PPS event on a pin, the 
> user must first configure it via mlx5_ptp_enable, which already 
> validates the index (rq->extts.index >= clock->ptp_info.n_pins returns 
> -EINVAL) and since the mtpps register only defines capabilities for 8 
> pins, so n_pins cannot exceed MAX_PIN_NUM.
> 
> Maybe wrap it with WARN_ON_ONCE instead of silently returning, so if 
> future hardware adds support for more pins we would notice rather than 
> silently dropping events.

Hi Carolina,

Thanks for the feedback. I've removed the redundant pin < 0 check and 
implemented the WARN_ON_ONCE for the upper bound as suggested.

I just submitted a v3 as a fresh thread with these changes and a fix 
for the union corruption bug.

Thanks,
Prathamesh


^ permalink raw reply

* Re: [PATCH net-next v11 00/14] netkit: Support for io_uring zero-copy and AF_XDP
From: patchwork-bot+netdevbpf @ 2026-04-10  2:00 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: netdev, bpf, kuba, davem, razor, pabeni, willemb, sdf,
	john.fastabend, martin.lau, jordan, maciej.fijalkowski,
	magnus.karlsson, dw, toke, yangzhenze, wangdongdong.6
In-Reply-To: <20260402231031.447597-1-daniel@iogearbox.net>

Hello:

This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Fri,  3 Apr 2026 01:10:17 +0200 you wrote:
> Containers use virtual netdevs to route traffic from a physical netdev
> in the host namespace. They do not have access to the physical netdev
> in the host and thus can't use memory providers or AF_XDP that require
> reconfiguring/restarting queues in the physical netdev.
> 
> This patchset adds the concept of queue leasing to virtual netdevs that
> allow containers to use memory providers and AF_XDP at native speed.
> Leased queues are bound to a real queue in a physical netdev and act
> as a proxy.
> 
> [...]

Here is the summary with links:
  - [net-next,v11,01/14] net: Add queue-create operation
    https://git.kernel.org/netdev/net-next/c/7789c6bb76ac
  - [net-next,v11,02/14] net: Implement netdev_nl_queue_create_doit
    https://git.kernel.org/netdev/net-next/c/d04686d9bc86
  - [net-next,v11,03/14] net: Add lease info to queue-get response
    https://git.kernel.org/netdev/net-next/c/21d58b35e500
  - [net-next,v11,04/14] net, ethtool: Disallow leased real rxqs to be resized
    https://git.kernel.org/netdev/net-next/c/22fdf28f7c03
  - [net-next,v11,05/14] net: Slightly simplify net_mp_{open,close}_rxq
    https://git.kernel.org/netdev/net-next/c/1e91c98bc9a8
  - [net-next,v11,06/14] net: Proxy netif_mp_{open,close}_rxq for leased queues
    https://git.kernel.org/netdev/net-next/c/5602ad61ebee
  - [net-next,v11,07/14] net: Proxy netdev_queue_get_dma_dev for leased queues
    https://git.kernel.org/netdev/net-next/c/222b5566a02d
  - [net-next,v11,08/14] xsk: Extend xsk_rcv_check validation
    https://git.kernel.org/netdev/net-next/c/9368397fb92a
  - [net-next,v11,09/14] xsk: Proxy pool management for leased queues
    https://git.kernel.org/netdev/net-next/c/910f636db958
  - [net-next,v11,10/14] netkit: Add single device mode for netkit
    https://git.kernel.org/netdev/net-next/c/481038960538
  - [net-next,v11,11/14] netkit: Implement rtnl_link_ops->alloc and ndo_queue_create
    https://git.kernel.org/netdev/net-next/c/b789acc0695c
  - [net-next,v11,12/14] netkit: Add netkit notifier to check for unregistering devices
    https://git.kernel.org/netdev/net-next/c/25444470570b
  - [net-next,v11,13/14] netkit: Add xsk support for af_xdp applications
    https://git.kernel.org/netdev/net-next/c/a14fd6474883
  - [net-next,v11,14/14] selftests/net: Add queue leasing tests with netkit
    https://git.kernel.org/netdev/net-next/c/65d657d80684

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* [PATCH net v2] net: fix __this_cpu_add() in preemptible code in dev_xmit_recursion_inc/dec
From: Jiayuan Chen @ 2026-04-10  2:06 UTC (permalink / raw)
  To: netdev
  Cc: Jiayuan Chen, David S. Miller, David Ahern, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Weiming Shi,
	linux-kernel

dev_xmit_recursion_{inc,dec}() use __this_cpu_{inc,dec}() which requires
the caller to be non-preemptible in order to avoid cpu migration. However,
some callers like SCTP's UDP encapsulation path invoke iptunnel_xmit()
from process context without disabling BH or preemption:

  sctp_inet_connect -> __sctp_connect -> sctp_do_sm ->
  sctp_outq_flush -> sctp_packet_transmit -> sctp_v4_xmit ->
  udp_tunnel_xmit_skb -> iptunnel_xmit -> dev_xmit_recursion_inc

This triggers the following warning on PREEMPT(full) kernels:

  BUG: using __this_cpu_add() in preemptible [00000000]
  caller is dev_xmit_recursion_inc include/linux/netdevice.h:3595 [inline]
  caller is iptunnel_xmit+0x1cd/0xb80 net/ipv4/ip_tunnel_core.c:72
  Tainted: [L]=SOFTLOCKUP
  Call Trace:
   <TASK>
   __dump_stack lib/dump_stack.c:94 [inline]
   dump_stack_lvl+0x100/0x190 lib/dump_stack.c:120
   check_preemption_disabled+0xd8/0xe0 lib/smp_processor_id.c:47
   dev_xmit_recursion_inc include/linux/netdevice.h:3595 [inline]
   iptunnel_xmit+0x1cd/0xb80 net/ipv4/ip_tunnel_core.c:72
   sctp_v4_xmit+0x75f/0x1060 net/sctp/protocol.c:1073
   sctp_packet_transmit+0x22ec/0x3060 net/sctp/output.c:653
   sctp_packet_singleton+0x19e/0x370 net/sctp/outqueue.c:783
   sctp_outq_flush_ctrl net/sctp/outqueue.c:914 [inline]
   sctp_outq_flush+0x315/0x3350 net/sctp/outqueue.c:1212
   sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1824 [inline]
   sctp_side_effects net/sctp/sm_sideeffect.c:1204 [inline]
   sctp_do_sm+0xce1/0x5be0 net/sctp/sm_sideeffect.c:1175
   sctp_primitive_ASSOCIATE+0x9c/0xd0 net/sctp/primitive.c:73
   __sctp_connect+0x9fc/0xc70 net/sctp/socket.c:1235
   sctp_connect net/sctp/socket.c:4818 [inline]
   sctp_inet_connect+0x15f/0x220 net/sctp/socket.c:4833
   __sys_connect_file+0x141/0x1a0 net/socket.c:2089
   __sys_connect+0x141/0x170 net/socket.c:2108
   __do_sys_connect net/socket.c:2114 [inline]
   __se_sys_connect net/socket.c:2111 [inline]
   __x64_sys_connect+0x72/0xb0 net/socket.c:2111
   do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
   do_syscall_64+0x106/0xf80 arch/x86/entry/syscall_64.c:94
   entry_SYSCALL_64_after_hwframe+0x77/0x7f

All other callers of dev_xmit_recursion_{inc,dec}() are fine: those in
net/core/dev.c and net/core/filter.c run under local_bh_disable(), and
lwtunnel_input() asserts in_softirq() context. Currently only
iptunnel_xmit() and ip6tunnel_xmit() can be reached from process
context via the SCTP UDP encapsulation path.

Fix this by adding guard(migrate)() at the top of iptunnel_xmit() and
ip6tunnel_xmit() to ensure dev_recursion_level(), dev_xmit_recursion_inc()
and dev_xmit_recursion_dec() all run on the same CPU.

Fixes: 6f1a9140ecda ("net: add xmit recursion limit to tunnel xmit functions")
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
---
v1->v2: https://lore.kernel.org/netdev/20260409035344.214279-1-jiayuan.chen@linux.dev/
 - Move guard(migrate)() to iptunnel_xmit()/ip6tunnel_xmit() instead
   of dev_xmit_recursion_{inc,dec}(), so that dev_recursion_level() is
   also covered under the same migration protection.
 - Revert changes to include/linux/netdevice.h.
---
 include/net/ip6_tunnel.h  | 2 ++
 net/ipv4/ip_tunnel_core.c | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/include/net/ip6_tunnel.h b/include/net/ip6_tunnel.h
index 359b595f1df9..3f877164233c 100644
--- a/include/net/ip6_tunnel.h
+++ b/include/net/ip6_tunnel.h
@@ -156,6 +156,8 @@ static inline void ip6tunnel_xmit(struct sock *sk, struct sk_buff *skb,
 {
 	int pkt_len, err;
 
+	guard(migrate)();
+
 	if (unlikely(dev_recursion_level() > IP_TUNNEL_RECURSION_LIMIT)) {
 		if (dev) {
 			net_crit_ratelimited("Dead loop on virtual device %s, fix it urgently!\n",
diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c
index 5683c328990f..808b8eaf7fad 100644
--- a/net/ipv4/ip_tunnel_core.c
+++ b/net/ipv4/ip_tunnel_core.c
@@ -58,6 +58,8 @@ void iptunnel_xmit(struct sock *sk, struct rtable *rt, struct sk_buff *skb,
 	struct iphdr *iph;
 	int err;
 
+	guard(migrate)();
+
 	if (unlikely(dev_recursion_level() > IP_TUNNEL_RECURSION_LIMIT)) {
 		if (dev) {
 			net_crit_ratelimited("Dead loop on virtual device %s, fix it urgently!\n",
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH net-next v2 5/5] ethtool: strset: check nla_len overflow
From: Jakub Kicinski @ 2026-04-10  2:14 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Stanislav Fomichev, Hangbin Liu, Donald Hunter, David S. Miller,
	Eric Dumazet, Paolo Abeni, Simon Horman, netdev, linux-kernel
In-Reply-To: <bb2e7087-aa36-4556-8778-b65d11354779@lunn.ch>

On Thu, 9 Apr 2026 18:12:36 +0200 Andrew Lunn wrote:
> > I guess... Should we update ethtool.yaml doc to tell the users to prefer
> > ioctl over netlink for strset-get and mention this new EMSGSIZE?  
> 
> No. The ioctl is deprecated. It can still be used for drivers which
> need it, but netlink is the preferred method.

Off the top of my head I think string sets are used in Netlink in
bitsets. These are the first class citizen string sets, maintained 
in the core.

I'm not sure if there was any motivation for exposing other string
sets (like driver priv flags, legacy stats etc) via Netlink or it 
was just a "why not cover the entire enum if it just works" thing.

There is no known real user space which runs into the Netlink + legacy
strings issue. Hangbin was doing exhaustive testing of the ethtool
YNL.

LMK if this makes sense or I'm missing a concern. If we need to make
larger string sets work we'll definitely have to revisit. But I'd like
to have that helper from patch 4 in tree sooner rather than later,
so I'd lean towards merging this series.

^ permalink raw reply

* Re: [RFC net-next 2/4] selftests: drv-net: tso: add helpers for double tunneling GSO
From: Jakub Kicinski @ 2026-04-10  2:23 UTC (permalink / raw)
  To: Xu Du
  Cc: davem, edumazet, pabeni, horms, shuah, netdev, linux-kselftest,
	linux-kernel
In-Reply-To: <CAA92Kxmka9=GEvNwxOy7pSEuudqsaF3WGJttyQpYEkTyyYhgLg@mail.gmail.com>

On Thu, 9 Apr 2026 15:35:21 +0800 Xu Du wrote:
> > > I want to test the gro-hint parameter functionality of the GENEVE tunnel,
> > > so I intend to use YNL for the testing. I am conducting the test between
> > > two machines using SSH type. I want to add the gro-hint parameter on
> > > both the local and remote nodes; however, I am unable to invoke class
> > > RtnlFamily on the remote node via SSH.  
> >
> > Oh. But that's not really what you're doing:
> >
> > +def ynlcli(family, args, json=None, ns=None, host=None):
> > +    if (KSFT_DIR / "kselftest-list.txt").exists():
> > +        cli = KSFT_DIR / "net/lib/ynl/pyynl/cli.py"
> > +        spec = KSFT_DIR / f"net/lib/specs/{family}.yaml"
> > +    else:
> > +        cli = KSRC / "tools/net/ynl/pyynl/cli.py"
> > +        spec = KSRC / f"Documentation/netlink/specs/{family}.yaml"
> > +    if not cli.exists():
> > +        raise FileNotFoundError(f"cli not found at {cli}")
> > +    args = f"--spec {spec} --no-schema {args}"
> > +    return tool(cli.as_posix(), args, json=json, ns=ns, host=host, shell=True)
> >
> > You're not deploying anything to the remote system.
> > Are you assuming that the remote system magically has the same
> > filesystem layout?
> >
> > You can use the ynl CLI but it has to be whatever version is on
> > the remote system. Just call ynl --family rt-link, don't dig
> > around for the spec paths etc.
> >  
> 
> In fact, I have tested this from two different locations. The first is in
> tools/testing/selftests/drivers/net/hw/ using python3 tso.py,
> which utilizes the specs located in Documentation/netlink/specs/.
> The second follows the testing methodology described in the
> README.rst of tools/testing/selftests/drivers/net/, which uses the specs
> in net/lib/specs/. Based on this, I include that different processes utilize
> different spec locations.
> I also referred to the implementation in net/lib/py/ynl.py, which employs
> a similar handling logic. Both using the source code repository and
> installing the package can meet the requirements for remote testing.

On the local system you have Python bindings.
On the remote system you can't assume KSFT_DIR or KSRC exist.

You can use ynl CLI on the remote system, like we use ip, tc etc.
But then just use --family rt-link, don't try to find the filesystem
path to the spec.

Is this clear now? Am I misunderstanding your misunderstanding?

^ permalink raw reply

* [PATCH net 1/1] net: stmmac: Update default_an_inband before passing value to phylink_config
From: KhaiWenTan @ 2026-04-10  2:07 UTC (permalink / raw)
  To: andrew+netdev, davem, edumazet, kuba, pabeni, mcoquelin.stm32,
	alexandre.torgue, rmk+kernel, maxime.chevallier, ovidiu.panait.rb,
	vladimir.oltean
  Cc: netdev, linux-stm32, linux-arm-kernel, linux-kernel,
	yoong.siang.song, hong.aun.looi, khai.wen.tan, KhaiWenTan

get_interfaces() will update both the plat->phy_interfaces and
mdio_bus_data->default_an_inband based on reading a SERDES register.

Therefore, we moved the priv->plat->get_interfaces() to be executed
first before assigning mdio_bus_data->default_an_inband to
config->default_an_inband to ensure default_an_inband is in correct
value during PHY setup.

Fixes: ca732e990fc8 ("net: stmmac: add get_interfaces() platform method")
Signed-off-by: KhaiWenTan <khai.wen.tan@linux.intel.com>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 13d3cac056be..c92054648a7e 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -1345,10 +1345,6 @@ static int stmmac_phylink_setup(struct stmmac_priv *priv)
 	priv->tx_lpi_clk_stop = priv->plat->flags &
 				STMMAC_FLAG_EN_TX_LPI_CLOCKGATING;
 
-	mdio_bus_data = priv->plat->mdio_bus_data;
-	if (mdio_bus_data)
-		config->default_an_inband = mdio_bus_data->default_an_inband;
-
 	/* Get the PHY interface modes (at the PHY end of the link) that
 	 * are supported by the platform.
 	 */
@@ -1356,6 +1352,10 @@ static int stmmac_phylink_setup(struct stmmac_priv *priv)
 		priv->plat->get_interfaces(priv, priv->plat->bsp_priv,
 					   config->supported_interfaces);
 
+	mdio_bus_data = priv->plat->mdio_bus_data;
+	if (mdio_bus_data)
+		config->default_an_inband = mdio_bus_data->default_an_inband;
+
 	/* Set the platform/firmware specified interface mode if the
 	 * supported interfaces have not already been provided using
 	 * phy_interface as a last resort.
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH net v3 0/5] bonding: 3ad: fix carrier state with no valid slaves
From: Jakub Kicinski @ 2026-04-10  2:38 UTC (permalink / raw)
  To: Louis Scalbert
  Cc: Jonas Gorski, netdev, andrew+netdev, jv, edumazet, pabeni, fbl,
	andy, shemminger, maheshb
In-Reply-To: <CAJ5u_OdU_okwjuEdm5zUBibW0PJsTng5ooZtXRX=wjhHpANdoQ@mail.gmail.com>

On Thu, 9 Apr 2026 13:49:06 +0200 Louis Scalbert wrote:
> > Signalling link up too early can cause issues for some protocols that
> > may change behavior in the absence of PDUs from a link partner.  
> 
> I agree with your point. I have observed issues with
> keepalived VRRP when it is configured on top of a bonding interface.
> 
> When the bond reports carrier as up while no slave is actually able to
> receive traffic (due to the partner not being ready, as indicated by the
> absence of LACP negotiation), the VRRP process interprets the interface
> as operational. At the same time, the absence of received VRRP
> advertisements is interpreted as if it were the only router on the
> segment. As a result, it transitions to the MASTER state.
> 
> In reality, another VRRP router may already be MASTER and actively
> sending advertisements, but those packets are not received due to the
> bonding state. This leads to a split-brain condition with multiple
> masters on the network.
> 
> Such a situation breaks the assumptions of
> VRRP, where a single MASTER is expected to handle traffic,
> and can result in traffic inconsistency or loss when upper-layer
> processes rely on this behavior.

It's been like this for what, 15 years?
We have to draw the line between fix and improvement somewhere.
In Linux we generally draw the line at regressions+crashes/security
bugs. If a use case never worked correctly it's not getting fixed.
It's getting enabled.

That said, if Jay wants it as a fix I'm not going to argue.

^ permalink raw reply

* Re: [PATCH RFC v2] r8169: implement SFP support
From: Andrew Lunn @ 2026-04-10  2:43 UTC (permalink / raw)
  To: Fabio Baltieri
  Cc: Heiner Kallweit, nic_swsd, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev, linux-kernel
In-Reply-To: <20260410005331.2045-1-fabio.baltieri@gmail.com>

On Fri, Apr 10, 2026 at 01:53:31AM +0100, Fabio Baltieri wrote:
> Implement support for reading the identification and diagnostic
> information on SFP modules for rtl8127atf devices.
> 
> This uses the sfp module, implements a GPIO devices for presence
> detection and loss of signal and i2c communication using the designware
> module.

I would probably break this up into smaller patches, GPIO, I2C, and
the swnode.

It might be you need to Cc: the GPIO Maintainers, the I2C Maintainers
for those patches.

> +static int r8169_gpio_get(struct gpio_chip *chip, unsigned int offset)
> +{
> +	struct rtl8169_private *tp = gpiochip_get_data(chip);
> +	int val;
> +
> +	val = r8168_mac_ocp_read(tp, 0xdc30);
> +
> +	return !!(val & BIT(offset));
> +}
> +
> +static int r8169_gpio_init(struct rtl8169_private *tp)
> +{
> +	struct gpio_chip *gc;
> +	struct pci_dev *pdev = tp->pci_dev;
> +	struct device *dev;
> +	int ret;
> +
> +	dev = &pdev->dev;
> +
> +	gc = devm_kzalloc(dev, sizeof(*gc), GFP_KERNEL);
> +	if (!gc)
> +		return -ENOMEM;
> +
> +	gc->label = devm_kasprintf(dev, GFP_KERNEL, "r8169_gpio-%x",
> +				   pci_dev_id(pdev));
> +	if (!gc->label)
> +		return -ENOMEM;
> +
> +	gc->base = -1;
> +	gc->ngpio = 16;
> +	gc->owner = THIS_MODULE;
> +	gc->parent = dev;
> +	gc->fwnode = software_node_fwnode(tp->nodes.group[SWNODE_GPIO]);
> +	gc->get = r8169_gpio_get;

So there is no set? The SFP cage has a transmit enable which is
generally connected to a GPIO output. You can use it to turn off the
laser, which phylink will do when the interface is admin down.

Can you trace the lines from the SFP cage back to the chip? At least
see if it connects back?

Are registers 0xdc30 +/- 4 used for anything? Maybe there is 16 GPI
and 16 GPO? Although that sounds like a lot of pins. Or it could be
there is a direction register, and an output register.

This looks quite good otherwise.

     Andrew

^ permalink raw reply

* Re: [PATCH net v3] net: rose: defer rose_neigh cleanup to workqueue to fix UAF
From: Jakub Kicinski @ 2026-04-10  2:49 UTC (permalink / raw)
  To: Mashiro Chen
  Cc: netdev, linux-hams, davem, edumazet, pabeni, horms, linux-kernel,
	syzbot+abd2b69348e2d9b107a1
In-Reply-To: <20260406170125.175258-1-mashiro.chen@mailbox.org>

On Tue,  7 Apr 2026 01:01:25 +0800 Mashiro Chen wrote:
> rose_neigh_put() frees the rose_neigh object when the reference count
> reaches zero, but does not stop the t0timer and ftimer beforehand.
> If a timer has been scheduled and fires after the object is freed,
> the callback will access already-freed memory, leading to a
> use-after-free.

What if ROSE is built as a module and gets unloaded?

Please don't post the next version until next week, we're drowning in
these AI generated patches.
-- 
pw-bot: cr

^ permalink raw reply

* Re: [PATCH net v2] net: netrom: fix lock order inversion in nr_add_node, nr_del_node and nr_dec_obs
From: Jakub Kicinski @ 2026-04-10  2:54 UTC (permalink / raw)
  To: Mashiro Chen
  Cc: netdev, linux-hams, davem, edumazet, pabeni, horms, linux-kernel,
	syzbot+6eb7834837cf6a8db75b
In-Reply-To: <20260406114904.89088-1-mashiro.chen@mailbox.org>

On Mon,  6 Apr 2026 19:49:04 +0800 Mashiro Chen wrote:
> nr_del_node() and nr_dec_obs() acquire nr_node_list_lock first, then
> call nr_remove_neigh() which internally acquires nr_neigh_list_lock.
> nr_add_node() acquires node_lock first, then calls nr_remove_neigh()
> which acquires nr_neigh_list_lock.

Can we please merge nr_node_list_lock and nr_neigh_list_lock
into one instead?

Lets try to simplify this code as much as possible.
It's a maintenance nightmare and has fewer users than syzbot reports
(i'm not joking).

^ permalink raw reply

* Re: [PATCH net-next v3 00/12] net: airoha: Support multiple net_devices connected to the same GDM port
From: Jakub Kicinski @ 2026-04-10  2:56 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Paolo Abeni,
	Rob Herring, Krzysztof Kozlowski, Conor Dooley, Christian Marangi,
	Benjamin Larsson, linux-arm-kernel, linux-mediatek, netdev,
	devicetree, Xuegang Lu
In-Reply-To: <20260406-airoha-eth-multi-serdes-v3-0-ab6ea49d59ff@kernel.org>

On Mon, 06 Apr 2026 12:34:05 +0200 Lorenzo Bianconi wrote:
> EN7581 or AN7583 SoCs support connecting multiple external SerDes (e.g.
> Ethernet or USB SerDes) to GDM3 or GDM4 ports via a hw arbiter that
> manages the traffic in a TDM manner. As a result multiple net_devices can
> connect to the same GDM{3,4} port and there is a theoretical "1:n"
> relation between GDM ports and net_devices.

Still waiting for the device tree review. I'm going to blindly send out
the Sashiko review, please comment if any of it makes sense?

^ permalink raw reply

* Re: [PATCH net-next v3 01/12] dt-bindings: net: airoha: Add EN7581 ethernet-ports properties
From: Jakub Kicinski @ 2026-04-10  2:57 UTC (permalink / raw)
  To: lorenzo
  Cc: Jakub Kicinski, andrew+netdev, davem, edumazet, pabeni, robh,
	krzk+dt, conor+dt, ansuelsmth, benjamin.larsson, linux-arm-kernel,
	linux-mediatek, netdev, devicetree
In-Reply-To: <20260406-airoha-eth-multi-serdes-v3-1-ab6ea49d59ff@kernel.org>

This is an AI-generated review of your patch. The human sending this
email says "For your consideration."
---
dt-bindings: net: airoha: Add EN7581 ethernet-ports properties

This commit introduces the ethernet-port property to model a net_device
connected via an external arbiter to the GDM3 or GDM4 ports on EN7581
and AN7583 SoCs.

> diff --git a/Documentation/devicetree/bindings/net/airoha,en7581-eth.yaml b/Documentation/devicetree/bindings/net/airoha,en7581-eth.yaml
> index fbe2ddcdd909c..ebbd433e9c9fb 100644
> --- a/Documentation/devicetree/bindings/net/airoha,en7581-eth.yaml
> +++ b/Documentation/devicetree/bindings/net/airoha,en7581-eth.yaml
> @@ -130,6 +130,30 @@ patternProperties:
>          maximum: 4
>          description: GMAC port identifier
>  
> +      '#address-cells':
> +        const: 1
> +      '#size-cells':
> +        const: 0
> +
> +    patternProperties:
> +      "^ethernet-port@[0-5]$":

The commit message states that GDM1 and GDM2 do not support connection
with the external arbiter. However, since this pattern property appears to
be placed inside the generic ^ethernet@[1-4]$ block, does this allow a
device tree to incorrectly configure ethernet-port subnodes on ethernet@1
or ethernet@2 and still pass schema validation?

Could this be restricted to GDM3 and GDM4, perhaps by splitting the
patternProperties or using an if/then block based on the reg property?

> +        type: object
> +        unevaluatedProperties: false
> +        $ref: ethernet-controller.yaml#

Does referencing ethernet-controller.yaml cause a validation conflict here?

The ethernet-controller.yaml schema enforces a strict nodename pattern
of ^ethernet(@.*)?$. Since these new nodes use the -port suffix and are
named ethernet-port@X, will they unconditionally fail the node name
validation enforced by the referenced schema during dt_binding_check?

[ ... ]
-- 
pw-bot: cr

^ permalink raw reply

* Re: [PATCH net-next v3 02/12] net: airoha: Set PPE cpu port for GDM2 if loopback is enabled
From: Jakub Kicinski @ 2026-04-10  2:57 UTC (permalink / raw)
  To: lorenzo
  Cc: Jakub Kicinski, andrew+netdev, davem, edumazet, pabeni, robh,
	krzk+dt, conor+dt, ansuelsmth, benjamin.larsson, linux-arm-kernel,
	linux-mediatek, netdev, devicetree
In-Reply-To: <20260406-airoha-eth-multi-serdes-v3-2-ab6ea49d59ff@kernel.org>

This is an AI-generated review of your patch. The human sending this
email says "For your consideration."
---
net: airoha: Move airoha_qdma pointer in airoha_gdm_dev struct

This commit moves the airoha_qdma pointer from the airoha_gdm_port struct
to the airoha_gdm_dev struct. This ensures the QDMA block used depends on
the specific net_device WAN/LAN configuration.

[ ... ]

> --- a/drivers/net/ethernet/airoha/airoha_eth.c
> +++ b/drivers/net/ethernet/airoha/airoha_eth.c

[ ... ]

> @@ -1497,9 +1498,10 @@ static void airoha_qdma_stop_napi(struct airoha_qdma *qdma)
>  	}
>  }
>  
> -static void airoha_update_hw_stats(struct airoha_gdm_port *port)
> +static void airoha_update_hw_stats(struct airoha_gdm_dev *dev)
>  {
> -	struct airoha_eth *eth = port->qdma->eth;
> +	struct airoha_gdm_port *port = dev->port;
> +	struct airoha_eth *eth = dev->eth;
>  	u32 val, i = 0;
>  
>  	spin_lock(&port->stats.lock);

This isn't a bug introduced by this patch, but does this spin_lock()
carry a deadlock risk?

Because ndo_get_stats64() can be called from atomic context (e.g., from a
softirq), if a process context caller holds port->stats.lock and is
interrupted by a softirq on the same CPU that also triggers dev_get_stats(),
the softirq could spin forever waiting for the lock.

Should this be upgraded to spin_lock_bh() to ensure local softirqs are
disabled during the critical section?

[ ... ]

> --- a/drivers/net/ethernet/airoha/airoha_eth.h
> +++ b/drivers/net/ethernet/airoha/airoha_eth.h
> @@ -534,12 +534,12 @@ struct airoha_qdma {
>  
>  struct airoha_gdm_dev {
>  	struct airoha_gdm_port *port;
> +	struct airoha_qdma *qdma;
>  	struct net_device *dev;
>  	struct airoha_eth *eth;
>  };
>  
>  struct airoha_gdm_port {
> -	struct airoha_qdma *qdma;
>  	struct airoha_gdm_dev *dev;
>  	int id;

This isn't a bug introduced by this patch, but since the qdma pointer
is being moved to better align with the hardware structure, should 
qos_sq_bmap, cpu_tx_packets, and fwd_tx_packets be moved to 
struct airoha_qdma as well?

Because QDMA1 is shared by all WAN ports (GDM 2, 3, 4) and qos_sq_bmap
is currently tracked per-port instead of per-QDMA, each WAN port 
independently allocates QDMA1 channels starting from index 0. Does this 
cause different ports to overwrite each other's hardware QoS configurations 
in the shared QDMA1 block?

Furthermore, cpu_tx_packets and fwd_tx_packets cache per-channel ETS
hardware counters but are single u64 scalar variables in the port structure.
When querying stats for multiple channels, do these scalar caches get
overwritten with values from completely different channels, resulting in
incorrect delta calculations?

^ permalink raw reply

* Re: [PATCH net-next v3 04/12] net: airoha: Rely on net_device pointer in HTB callbacks
From: Jakub Kicinski @ 2026-04-10  2:57 UTC (permalink / raw)
  To: lorenzo
  Cc: Jakub Kicinski, andrew+netdev, davem, edumazet, pabeni, robh,
	krzk+dt, conor+dt, ansuelsmth, benjamin.larsson, linux-arm-kernel,
	linux-mediatek, netdev, devicetree
In-Reply-To: <20260406-airoha-eth-multi-serdes-v3-4-ab6ea49d59ff@kernel.org>

This is an AI-generated review of your patch. The human sending this
email says "For your consideration."
---
net: airoha: Introduce WAN device flag

This commit introduces a WAN flag to specify whether a device is used to
transmit or receive WAN or LAN traffic. It enables specifying multiple LAN
devices but limits the configuration to a single WAN device.

> diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c
> index 5b0cd37b155e4..9988011dca53f 100644
> --- a/drivers/net/ethernet/airoha/airoha_eth.c
> +++ b/drivers/net/ethernet/airoha/airoha_eth.c
> @@ -1796,36 +1796,77 @@ static int airoha_set_gdm2_loopback(struct airoha_gdm_dev *dev)

[ ... ]

> +static int airoha_dev_init(struct net_device *netdev)
> +{
> +	struct airoha_gdm_dev *dev = netdev_priv(netdev);
> +	struct airoha_gdm_port *port = dev->port;
>  
>  	switch (port->id) {
>  	case AIROHA_GDM3_IDX:
> -	case AIROHA_GDM4_IDX:
> -		/* If GDM2 is active we can't enable loopback */
> -		if (!eth->ports[1]) {
> -			int err;
> +	case AIROHA_GDM4_IDX: {
> +		struct airoha_eth *eth = dev->eth;
>  
> -			err = airoha_set_gdm2_loopback(dev);
> -			if (err)
> -				return err;
> -		}
> +		if (eth->ports[1] || airoha_get_wan_gdm_dev(eth))
> +			break;

If a board's Device Tree enables both GDM3 and GDM4, and GDM2 is not present,
does this sequence permanently assign the WAN role to GDM3 since it is
initialized first? Is there a Device Tree property or fallback configuration
mechanism to allow users to specify GDM4 as the WAN port instead?

> +		fallthrough;
> +	}
> +	case AIROHA_GDM2_IDX:
> +		/* GDM2 is always used as wan */
> +		dev->flags |= PRIV_FLAG_WAN;
>  		break;
>  	default:
>  		break;
>  	}
>  
> -	for (i = 0; i < eth->soc->num_ppe; i++)
> -		airoha_ppe_set_cpu_port(dev, i, airoha_get_fe_port(dev));
> +	airoha_dev_set_qdma(dev);
> +	airoha_set_macaddr(dev, netdev->dev_addr);

Since the commit message notes it is possible to specify multiple LAN
devices, does unconditionally calling airoha_set_macaddr() for each
device overwrite the single shared hardware LAN MAC register?

If so, could this cause only the last initialized LAN device to have its MAC
correctly programmed into the hardware, leading to dropped unicast
packets for the other LAN interfaces?

> +
> +	if (!airoha_is_lan_gdm_dev(dev) &&
> +	    (port->id == AIROHA_GDM3_IDX || port->id == AIROHA_GDM4_IDX)) {
> +		int err;

[ ... ]

^ permalink raw reply

* Re: [PATCH net-next v3 09/12] net: airoha: Support multiple net_devices for a single FE GDM port
From: Jakub Kicinski @ 2026-04-10  2:57 UTC (permalink / raw)
  To: lorenzo
  Cc: Jakub Kicinski, andrew+netdev, davem, edumazet, pabeni, robh,
	krzk+dt, conor+dt, ansuelsmth, benjamin.larsson, linux-arm-kernel,
	linux-mediatek, netdev, devicetree
In-Reply-To: <20260406-airoha-eth-multi-serdes-v3-9-ab6ea49d59ff@kernel.org>

This is an AI-generated review of your patch. The human sending this
email says "For your consideration."
---
net: airoha: Support multiple net_devices for a single FE GDM port

This commit introduces support for mapping multiple net_device interfaces
to a single Frame Engine (FE) GDM port via an external hardware arbiter.
It updates the rx/tx paths, port allocation, and registration logic to
handle the new net_device array.

> diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c
> --- a/drivers/net/ethernet/airoha/airoha_eth.c
> +++ b/drivers/net/ethernet/airoha/airoha_eth.c
[ ... ]
> @@ -615,9 +617,8 @@ static int airoha_qdma_rx_process(struct airoha_queue *q, int budget)
>  		u32 hash, reason, msg1 = le32_to_cpu(desc->msg1);
>  		struct page *page = virt_to_head_page(e->buf);
>  		u32 desc_ctrl = le32_to_cpu(desc->ctrl);
> -		struct airoha_gdm_port *port;
> -		struct net_device *netdev;
> -		int data_len, len, p;
> +		struct airoha_gdm_dev *dev;
> +		int data_len, len;
>  
>  		if (!(desc_ctrl & QDMA_DESC_DONE_MASK))
>  			break;

While looking at this code, I noticed an issue that predates this patch.
Can the lack of a memory barrier after checking the descriptor ownership bit
cause stale reads?

The driver checks QDMA_DESC_DONE_MASK to verify that the hardware has finished
writing the descriptor. However, it reads desc->msg1 before this check and
without a dma_rmb():

		u32 hash, reason, msg1 = le32_to_cpu(desc->msg1);
		struct page *page = virt_to_head_page(e->buf);
		u32 desc_ctrl = le32_to_cpu(desc->ctrl);

		if (!(desc_ctrl & QDMA_DESC_DONE_MASK))
			break;

Since weakly ordered architectures can reorder memory reads, could the CPU
prefetch msg1 before the DMA engine writes the new descriptor values, leading
to routing the packet to the wrong gdm_dev?

[ ... ]
> @@ -1764,9 +1765,7 @@ static int airoha_set_gdm2_loopback(struct airoha_gdm_dev *dev)
>  	airoha_fe_clear(eth, REG_FE_VIP_PORT_EN, BIT(AIROHA_GDM2_IDX));
>  	airoha_fe_clear(eth, REG_FE_IFC_PORT_EN, BIT(AIROHA_GDM2_IDX));
>  
> -	/* XXX: handle XSI_USB_PORT and XSI_PCE1_PORT */
> -	nbq = port->id == AIROHA_GDM3_IDX && airoha_is_7581(eth) ? 4 : 0;
> -	src_port = eth->soc->ops.get_src_port_id(port, nbq);
> +	src_port = eth->soc->ops.get_src_port_id(port, dev->nbq);
>  	if (src_port < 0)
>  		return src_port;

Can configuring the loopback for one device break connectivity for other
devices sharing the same port?

When multiple net_devices (like PCIE0 and PCIE1 on GDM3) open and call
airoha_set_gdm2_loopback(), they modify global hardware registers such as
REG_FE_WAN_PORT.

Since WAN0_MASK is unconditionally updated to track the current device's
src_port:

	airoha_fe_rmw(eth, REG_FE_WAN_PORT,
		      WAN1_EN_MASK | WAN1_MASK | WAN0_MASK,
		      FIELD_PREP(WAN0_MASK, src_port));

won't the last configured device clobber the hardware state for the others?

[ ... ]
> @@ -1987,7 +1986,8 @@ static netdev_tx_t airoha_dev_xmit(struct sk_buff *skb,
>  	}
>  
>  	fport = airoha_get_fe_port(dev);
> -	msg1 = FIELD_PREP(QDMA_ETH_TXMSG_FPORT_MASK, fport) |
> +	msg1 = FIELD_PREP(QDMA_ETH_TXMSG_NBOQ_MASK, dev->nbq) |
> +	       FIELD_PREP(QDMA_ETH_TXMSG_FPORT_MASK, fport) |
>  	       FIELD_PREP(QDMA_ETH_TXMSG_METER_MASK, 0x7f);
>  
>  	q = &qdma->q_tx[qid];

Does sharing the hardware QDMA TX queues among multiple net_device instances
introduce a permanent transmit stall?

In airoha_dev_xmit(), if a shared hardware queue is full, the driver stops the
netdev_queue of the specific netdev making the call:

	if (q->queued + nr_frags >= q->ndesc) {
		/* not enough space in the queue */
		netif_tx_stop_queue(txq);
		spin_unlock_bh(&q->lock);
		return NETDEV_TX_BUSY;
	}

However, the TX completion handler (airoha_tx_irq_process()) only wakes the
queue of the netdev attached to the completed packet. If an interface's TX
queue is stopped but it has no pending packets in the hardware queue (because
another interface filled it), will it ever get woken up?

[ ... ]
> @@ -2955,11 +2958,25 @@ static int airoha_alloc_gdm_device(struct airoha_eth *eth,
>  			 netdev->dev_addr);
>  	}
>  
> +	/* Allowed nbq for EN7581 on GDM3 port are 4 and 5 for PCIE0
> +	 * and PCIE1 respectively.
> +	 */
> +	index = nbq;
> +	if (index && airoha_is_7581(eth) && port->id == AIROHA_GDM3_IDX)
> +		index -= 4;
> +
> +	if (index >= ARRAY_SIZE(port->devs) || port->devs[index]) {
> +		dev_err(eth->dev, "invalid nbq id: %d\n", nbq);
> +		return -EINVAL;
> +	}
> +
> +	netdev->dev.of_node = of_node_get(np);
>  	dev = netdev_priv(netdev);

Does this code leak the of_node reference?

The node reference is explicitly incremented via of_node_get(np). The networking
core does not automatically drop explicit of_node references on free_netdev().
Without a matching of_node_put() in the error paths or the driver's teardown
function (airoha_remove()), will this reference permanently leak when the device
is destroyed?

[ ... ]
> @@ -3003,7 +3021,43 @@ static int airoha_alloc_gdm_port(struct airoha_eth *eth,
>  	if (err)
>  		return err;
>  
> -	return airoha_alloc_gdm_device(eth, port, np);
> +	for_each_child_of_node(np, node) {
> +		/* Multiple external serdes connected to the FE GDM port via an
> +		 * external arbiter.
> +		 */
> +		const __be32 *nbq_ptr;
> +		int nbq;
> +
> +		if (!of_device_is_compatible(node, "airoha,eth-port"))
> +			continue;
> +
> +		d++;
> +		if (!of_device_is_available(node))
> +			continue;
> +
> +		nbq_ptr = of_get_property(node, "reg", NULL);
> +		if (!nbq_ptr) {
> +			dev_err(eth->dev, "missing nbq id\n");
> +			of_node_put(node);
> +			return -EINVAL;
> +		}
> +
> +		/* Verify the provided nbq parameter is valid */
> +		nbq = be32_to_cpup(nbq_ptr);
> +		err = eth->soc->ops.get_src_port_id(port, nbq);
> +		if (err < 0) {
> +			of_node_put(node);
> +			return err;
> +		}
> +
> +		err = airoha_alloc_gdm_device(eth, port, nbq, node);
> +		if (err) {
> +			of_node_put(node);
> +			return err;
> +		}
> +	}
> +
> +	return !d ? airoha_alloc_gdm_device(eth, port, 0, np) : 0;
>  }

Does passing 0 for nbq in the fallback path break older device trees on the
EN7581 SoC?

Later, when the interface is brought up, airoha_set_gdm2_loopback() calls
airoha_en7581_get_src_port_id(), which returns -EINVAL for GDM3 if nbq is
not 4 or 5:

	switch (port->id) {
	case AIROHA_GDM3_IDX:
		/* 7581 SoC supports PCIe serdes on GDM3 port */
		if (nbq == 4)
			return HSGMII_LAN_7581_PCIE0_SRCPORT;
		if (nbq == 5)
			return HSGMII_LAN_7581_PCIE1_SRCPORT;
		break;

If nbq is hardcoded to 0 for legacy DTs, won't this cause interface
initialization to fail unconditionally for GDM3?

^ permalink raw reply

* Re: [PATCH net-next v3 10/12] net: airoha: Do not stop GDM port if it is shared
From: Jakub Kicinski @ 2026-04-10  2:57 UTC (permalink / raw)
  To: lorenzo
  Cc: Jakub Kicinski, andrew+netdev, davem, edumazet, pabeni, robh,
	krzk+dt, conor+dt, ansuelsmth, benjamin.larsson, linux-arm-kernel,
	linux-mediatek, netdev, devicetree, Xuegang Lu
In-Reply-To: <20260406-airoha-eth-multi-serdes-v3-10-ab6ea49d59ff@kernel.org>

This is an AI-generated review of your patch. The human sending this
email says "For your consideration."
---
net: airoha: Do not stop GDM port if it is shared

Theoretically, in the current codebase, two independent net_devices can
be connected to the same GDM port so we need to check the GDM port is not
used by any other running net_device before setting the forward
configuration to FE_PSE_PORT_DROP.

Tested-by: Xuegang Lu <xuegang.lu@airoha.com>

> diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c
> index fd27ba13029bd..5b0cd37b155e4 100644
> --- a/drivers/net/ethernet/airoha/airoha_eth.c
> +++ b/drivers/net/ethernet/airoha/airoha_eth.c

[ ... ]

> @@ -1688,18 +1693,18 @@ static int airoha_dev_stop(struct net_device *netdev)
>  	struct airoha_gdm_dev *dev = netdev_priv(netdev);
>  	struct airoha_gdm_port *port = dev->port;
>  	struct airoha_qdma *qdma = dev->qdma;
> -	int i, err;
> +	int i;
>  
>  	netif_tx_disable(netdev);
> -	err = airoha_set_vip_for_gdm_port(dev, false);
> -	if (err)
> -		return err;
> -
>  	for (i = 0; i < ARRAY_SIZE(qdma->q_tx); i++)
>  		netdev_tx_reset_subqueue(netdev, i);
>  
> -	airoha_set_gdm_port_fwd_cfg(qdma->eth, REG_GDM_FWD_CFG(port->id),
> -				    FE_PSE_PORT_DROP);
> +	if (atomic_dec_and_test(&port->users)) {
> +		airoha_set_vip_for_gdm_port(dev, false);
> +		airoha_set_gdm_port_fwd_cfg(qdma->eth,
> +					    REG_GDM_FWD_CFG(port->id),
> +					    FE_PSE_PORT_DROP);
> +	}
>  
>  	if (atomic_dec_and_test(&qdma->users)) {
>  		airoha_qdma_clear(qdma, REG_QDMA_GLOBAL_CFG,
> 				  GLOBAL_CFG_TX_DMA_EN_MASK |
> 				  GLOBAL_CFG_RX_DMA_EN_MASK);
> 
> 		for (i = 0; i < ARRAY_SIZE(qdma->q_tx); i++) {
> 			if (!qdma->q_tx[i].ndesc)
> 				continue;
> 
> 			airoha_qdma_cleanup_tx_queue(&qdma->q_tx[i]);
> 		}
> 	}

This problem wasn't introduced by this patch, but does the QDMA teardown
path have a DMA corruption vulnerability here?

When GLOBAL_CFG_TX_DMA_EN_MASK is cleared, it fails to poll
GLOBAL_CFG_TX_DMA_BUSY_MASK, proceeding to unmap memory while the hardware
DMA engine may still be processing in-flight descriptors.

Furthermore, airoha_qdma_cleanup_tx_queue() unmaps the DMA addresses and
frees the SKBs, but it does not appear to clear the physical hardware
descriptors or reset the hardware ring pointers (REG_TX_CPU_IDX /
REG_TX_DMA_IDX).

When the device is later reopened in airoha_dev_open(),
GLOBAL_CFG_TX_DMA_EN_MASK is set again without re-initializing the rings.
Will the hardware DMA engine resume fetching from its previous index,
processing stale descriptors whose backing memory was already unmapped
and freed?

[ ... ]

> @@ -1851,10 +1856,22 @@ static void airoha_dev_get_stats64(struct net_device *netdev,
>  static int airoha_dev_change_mtu(struct net_device *netdev, int mtu)
>  {
>  	struct airoha_gdm_dev *dev = netdev_priv(netdev);
> +	u32 cur_len, len = ETH_HLEN + mtu + ETH_FCS_LEN;
>  	struct airoha_gdm_port *port = dev->port;
> -	u32 len = ETH_HLEN + mtu + ETH_FCS_LEN;
>  	struct airoha_eth *eth = dev->eth;
>  
> +	cur_len = FIELD_GET(GDM_LONG_LEN_MASK,
> +			    airoha_fe_rr(eth, REG_GDM_LEN_CFG(port->id)));
> +	if (len < cur_len) {
> +		u8 port_refcnt = atomic_read(&port->users);
> +
> +		/* We can decrease the device MTU just if the GDM port is
> +		 * not shared or if the other device is not running.
> +		 */
> +		if (port_refcnt > 1 || (port_refcnt && !netif_running(netdev)))
> +			return -EBUSY;
> +	}
> +

Does this MTU decrease validation create a state trap on shared ports?

If both eth0 and eth1 (sharing a port) are up, and a user increases eth0 MTU
to 9000, the hardware MTU updates to 9000. If the user then tries to lower
eth0 MTU back to 1500, len < cur_len evaluates to true. Because eth1 is also
up (port_refcnt > 1), this returns -EBUSY.

This would prevent either interface from decreasing its MTU without fully
bringing down the sibling interface.

Should the software netdev->mtu be decoupled from the hardware validation,
allowing valid MTU changes in software while dynamically programming the
hardware MTU to the maximum of all currently up interfaces on the shared
port?

^ permalink raw reply

* [PATCH net v2] net: ethernet: mtk_eth_soc: initialize PPE per-tag-layer MTU registers
From: Daniel Golle @ 2026-04-10  2:57 UTC (permalink / raw)
  To: Felix Fietkau, Lorenzo Bianconi, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Matthias Brugger,
	AngeloGioacchino Del Regno, Pablo Neira Ayuso, netdev,
	linux-kernel, linux-arm-kernel, linux-mediatek

The PPE enforces output frame size limits via per-tag-layer VLAN_MTU
registers that the driver never initializes. The hardware defaults do
not account for PPPoE overhead, causing the PPE to punt encapsulated
frames back to the CPU instead of forwarding them.

Initialize the registers at PPE start and on MTU changes using the
maximum GMAC MTU. This is a conservative approximation -- the actual
per-PPE requirement depends on egress path, but using the global
maximum ensures the limits are never too small.

Fixes: ba37b7caf1ed2 ("net: ethernet: mtk_eth_soc: add support for initializing the PPE")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
---
v2: rebase on top of current net/main

 drivers/net/ethernet/mediatek/mtk_eth_soc.c | 22 ++++++++++++++-
 drivers/net/ethernet/mediatek/mtk_ppe.c     | 30 +++++++++++++++++++++
 drivers/net/ethernet/mediatek/mtk_ppe.h     |  1 +
 3 files changed, 52 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index ddc321a02fdae..796f79088f366 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -3566,12 +3566,23 @@ static int mtk_device_event(struct notifier_block *n, unsigned long event, void
 	return NOTIFY_DONE;
 }
 
+static int mtk_max_gmac_mtu(struct mtk_eth *eth)
+{
+	int i, max_mtu = ETH_DATA_LEN;
+
+	for (i = 0; i < ARRAY_SIZE(eth->netdev); i++)
+		if (eth->netdev[i] && eth->netdev[i]->mtu > max_mtu)
+			max_mtu = eth->netdev[i]->mtu;
+
+	return max_mtu;
+}
+
 static int mtk_open(struct net_device *dev)
 {
 	struct mtk_mac *mac = netdev_priv(dev);
 	struct mtk_eth *eth = mac->hw;
 	struct mtk_mac *target_mac;
-	int i, err, ppe_num;
+	int i, err, ppe_num, mtu;
 
 	ppe_num = eth->soc->ppe_num;
 
@@ -3618,6 +3629,10 @@ static int mtk_open(struct net_device *dev)
 			mtk_gdm_config(eth, target_mac->id, gdm_config);
 		}
 
+		mtu = mtk_max_gmac_mtu(eth);
+		for (i = 0; i < ARRAY_SIZE(eth->ppe); i++)
+			mtk_ppe_update_mtu(eth->ppe[i], mtu);
+
 		napi_enable(&eth->tx_napi);
 		napi_enable(&eth->rx_napi);
 		mtk_tx_irq_enable(eth, MTK_TX_DONE_INT);
@@ -4311,6 +4326,7 @@ static int mtk_change_mtu(struct net_device *dev, int new_mtu)
 	int length = new_mtu + MTK_RX_ETH_HLEN;
 	struct mtk_mac *mac = netdev_priv(dev);
 	struct mtk_eth *eth = mac->hw;
+	int max_mtu, i;
 
 	if (rcu_access_pointer(eth->prog) &&
 	    length > MTK_PP_MAX_BUF_SIZE) {
@@ -4321,6 +4337,10 @@ static int mtk_change_mtu(struct net_device *dev, int new_mtu)
 	mtk_set_mcr_max_rx(mac, length);
 	WRITE_ONCE(dev->mtu, new_mtu);
 
+	max_mtu = mtk_max_gmac_mtu(eth);
+	for (i = 0; i < ARRAY_SIZE(eth->ppe); i++)
+		mtk_ppe_update_mtu(eth->ppe[i], max_mtu);
+
 	return 0;
 }
 
diff --git a/drivers/net/ethernet/mediatek/mtk_ppe.c b/drivers/net/ethernet/mediatek/mtk_ppe.c
index 75f7728fc7962..18279e2a7022e 100644
--- a/drivers/net/ethernet/mediatek/mtk_ppe.c
+++ b/drivers/net/ethernet/mediatek/mtk_ppe.c
@@ -973,6 +973,36 @@ static void mtk_ppe_init_foe_table(struct mtk_ppe *ppe)
 	}
 }
 
+void mtk_ppe_update_mtu(struct mtk_ppe *ppe, int mtu)
+{
+	int base;
+	u32 val;
+
+	if (!ppe)
+		return;
+
+	/* The PPE checks output frame size against per-tag-layer MTU limits,
+	 * treating PPPoE and DSA tags just like 802.1Q VLAN tags. The Linux
+	 * device MTU already accounts for PPPoE (PPPOE_SES_HLEN) and DSA tag
+	 * overhead, but 802.1Q VLAN tags are handled transparently without
+	 * being reflected by the lower device MTU being increased by 4.
+	 * Use the maximum MTU across all GMAC interfaces so that PPE output
+	 * frame limits are sufficiently high regardless of which port a flow
+	 * egresses through.
+	 */
+	base = ETH_HLEN + mtu;
+
+	val = FIELD_PREP(MTK_PPE_VLAN_MTU0_NONE, base) |
+	      FIELD_PREP(MTK_PPE_VLAN_MTU0_1TAG, base + VLAN_HLEN);
+	ppe_w32(ppe, MTK_PPE_VLAN_MTU0, val);
+
+	val = FIELD_PREP(MTK_PPE_VLAN_MTU1_2TAG,
+			 base + 2 * VLAN_HLEN) |
+	      FIELD_PREP(MTK_PPE_VLAN_MTU1_3TAG,
+			 base + 3 * VLAN_HLEN);
+	ppe_w32(ppe, MTK_PPE_VLAN_MTU1, val);
+}
+
 void mtk_ppe_start(struct mtk_ppe *ppe)
 {
 	u32 val;
diff --git a/drivers/net/ethernet/mediatek/mtk_ppe.h b/drivers/net/ethernet/mediatek/mtk_ppe.h
index 223f709e2704f..ba85e39a155bf 100644
--- a/drivers/net/ethernet/mediatek/mtk_ppe.h
+++ b/drivers/net/ethernet/mediatek/mtk_ppe.h
@@ -346,6 +346,7 @@ struct mtk_ppe {
 struct mtk_ppe *mtk_ppe_init(struct mtk_eth *eth, void __iomem *base, int index);
 
 void mtk_ppe_deinit(struct mtk_eth *eth);
+void mtk_ppe_update_mtu(struct mtk_ppe *ppe, int mtu);
 void mtk_ppe_start(struct mtk_ppe *ppe);
 int mtk_ppe_stop(struct mtk_ppe *ppe);
 int mtk_ppe_prepare_reset(struct mtk_ppe *ppe);
-- 
2.53.0

^ permalink raw reply related

* Re: [PATCH net-next v3 00/12] net: airoha: Support multiple net_devices connected to the same GDM port
From: Jakub Kicinski @ 2026-04-10  2:59 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Paolo Abeni,
	Rob Herring, Krzysztof Kozlowski, Conor Dooley, Christian Marangi,
	Benjamin Larsson, linux-arm-kernel, linux-mediatek, netdev,
	devicetree, Xuegang Lu
In-Reply-To: <20260406-airoha-eth-multi-serdes-v3-0-ab6ea49d59ff@kernel.org>

On Mon, 06 Apr 2026 12:34:05 +0200 Lorenzo Bianconi wrote:
> EN7581 or AN7583 SoCs support connecting multiple external SerDes (e.g.
> Ethernet or USB SerDes) to GDM3 or GDM4 ports via a hw arbiter that
> manages the traffic in a TDM manner. As a result multiple net_devices can
> connect to the same GDM{3,4} port and there is a theoretical "1:n"
> relation between GDM ports and net_devices.

Looks like this driver uses page pool.
If you're sharing the same page pool across multiple netdevs
it must not be linked to a netdev.

^ permalink raw reply

* Re: [PATCH net v8 0/4] macsec: Add support for VLAN filtering in offload mode
From: patchwork-bot+netdevbpf @ 2026-04-10  3:10 UTC (permalink / raw)
  To: Cosmin Ratiu
  Cc: netdev, sd, andrew+netdev, davem, edumazet, kuba, pabeni, horms,
	sdf, dw, shuah, linux-kselftest, dtatulea
In-Reply-To: <20260408115240.1636047-1-cratiu@nvidia.com>

Hello:

This series was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Wed, 8 Apr 2026 14:52:36 +0300 you wrote:
> This short series adds support for VLANs in MACsec devices when offload
> mode is enabled. This allows VLAN netdevs on top of MACsec netdevs to
> function, which accidentally used to be the case in the past, but was
> broken. This series adds back proper support.
> 
> As part of this, the existing nsim-only MACsec offload tests were
> translated to Python so they can run against real HW and new
> traffic-based tests were added for VLAN filter propagation, since
> there's currently no uAPI to check VLAN filters.
> 
> [...]

Here is the summary with links:
  - [net,v8,1/4] selftests: Migrate nsim-only MACsec tests to Python
    https://git.kernel.org/netdev/net/c/e1ab601bb230
  - [net,v8,2/4] nsim: Add support for VLAN filters
    https://git.kernel.org/netdev/net/c/c89f194b6b8e
  - [net,v8,3/4] selftests: Add MACsec VLAN propagation traffic test
    https://git.kernel.org/netdev/net/c/26555673bc78
  - [net,v8,4/4] macsec: Support VLAN-filtering lower devices
    https://git.kernel.org/netdev/net/c/a363b1c8be87

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [net-next,PATCH v6 1/3] dt-bindings: net: realtek,rtl82xx: Keep property list sorted
From: patchwork-bot+netdevbpf @ 2026-04-10  3:10 UTC (permalink / raw)
  To: Marek Vasut
  Cc: netdev, robh, davem, olek2, andrew, conor+dt, edumazet,
	f.fainelli, hkallweit1, ivan.galkin, kuba, krzk+dt, michael,
	pabeni, linux, vladimir.oltean, devicetree
In-Reply-To: <20260405233008.148974-1-marek.vasut@mailbox.org>

Hello:

This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Mon,  6 Apr 2026 01:29:56 +0200 you wrote:
> Sort the documented properties alphabetically, no functional change.
> 
> Acked-by: Rob Herring (Arm) <robh@kernel.org>
> Signed-off-by: Marek Vasut <marek.vasut@mailbox.org>
> ---
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Aleksander Jan Bajkowski <olek2@wp.pl>
> Cc: Andrew Lunn <andrew@lunn.ch>
> Cc: Conor Dooley <conor+dt@kernel.org>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Florian Fainelli <f.fainelli@gmail.com>
> Cc: Heiner Kallweit <hkallweit1@gmail.com>
> Cc: Ivan Galkin <ivan.galkin@axis.com>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Krzysztof Kozlowski <krzk+dt@kernel.org>
> Cc: Michael Klein <michael@fossekall.de>
> Cc: Paolo Abeni <pabeni@redhat.com>
> Cc: Rob Herring <robh@kernel.org>
> Cc: Russell King <linux@armlinux.org.uk>
> Cc: Vladimir Oltean <vladimir.oltean@nxp.com>
> Cc: devicetree@vger.kernel.org
> Cc: netdev@vger.kernel.org
> 
> [...]

Here is the summary with links:
  - [net-next,v6,1/3] dt-bindings: net: realtek,rtl82xx: Keep property list sorted
    https://git.kernel.org/netdev/net-next/c/4de7a8acd18e
  - [net-next,v6,2/3] dt-bindings: net: realtek,rtl82xx: Document realtek,*-ssc-enable property
    https://git.kernel.org/netdev/net-next/c/bfb859a5cb49
  - [net-next,v6,3/3] net: phy: realtek: Add property to enable SSC
    https://git.kernel.org/netdev/net-next/c/84c5a3f00084

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net-next] ppp: consolidate refcount decrements
From: patchwork-bot+netdevbpf @ 2026-04-10  3:10 UTC (permalink / raw)
  To: Qingfang Deng
  Cc: andrew+netdev, davem, edumazet, kuba, pabeni, bigeasy, kees,
	kuniyu, linux-ppp, netdev, linux-kernel
In-Reply-To: <20260407094058.257246-1-qingfang.deng@linux.dev>

Hello:

This patch was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Tue,  7 Apr 2026 17:40:56 +0800 you wrote:
> ppp_destroy_{channel,interface} are always called after
> refcount_dec_and_test().
> 
> To reduce boilerplate code, consolidate the decrements by moving them
> into the two functions. To reflect this change in semantics, rename the
> functions to ppp_release_*.
> 
> [...]

Here is the summary with links:
  - [net-next] ppp: consolidate refcount decrements
    https://git.kernel.org/netdev/net-next/c/5ecbebc9483c

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net-next v2] ipv6: move IFA_F_PERMANENT percpu allocation in process scope
From: patchwork-bot+netdevbpf @ 2026-04-10  3:10 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: netdev, davem, dsahern, edumazet, kuba, horms
In-Reply-To: <46a7a030727e236af2dc7752994cd4f04f4a91d2.1775658924.git.pabeni@redhat.com>

Hello:

This patch was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Wed,  8 Apr 2026 16:36:49 +0200 you wrote:
> Observed at boot time:
> 
>  CPU: 43 UID: 0 PID: 3595 Comm: (t-daemon) Not tainted 6.12.0 #1
>  Call Trace:
>   <TASK>
>   dump_stack_lvl+0x4e/0x70
>   pcpu_alloc_noprof.cold+0x1f/0x4b
>   fib_nh_common_init+0x4c/0x110
>   fib6_nh_init+0x387/0x740
>   ip6_route_info_create+0x46d/0x640
>   addrconf_f6i_alloc+0x13b/0x180
>   addrconf_permanent_addr+0xd0/0x220
>   addrconf_notify+0x93/0x540
>   notifier_call_chain+0x5a/0xd0
>   __dev_notify_flags+0x5c/0xf0
>   dev_change_flags+0x54/0x70
>   do_setlink+0x36c/0xce0
>   rtnl_setlink+0x11f/0x1d0
>   rtnetlink_rcv_msg+0x142/0x3f0
>   netlink_rcv_skb+0x50/0x100
>   netlink_unicast+0x242/0x390
>   netlink_sendmsg+0x21b/0x470
>   __sys_sendto+0x1dc/0x1f0
>   __x64_sys_sendto+0x24/0x30
>   do_syscall_64+0x7d/0x160
>   entry_SYSCALL_64_after_hwframe+0x76/0x7e
>  RIP: 0033:0x7f5c3852f127
>  Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 80 3d 85 ef 0c 00 00 41 89 ca 74 10 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 71 c3 55 48 83 ec 30 44 89 4c 24 2c 4c 89 44
>  RSP: 002b:00007ffe86caf4c8 EFLAGS: 00000202 ORIG_RAX: 000000000000002c
>  RAX: ffffffffffffffda RBX: 0000556c5cd93210 RCX: 00007f5c3852f127
>  RDX: 0000000000000020 RSI: 0000556c5cd938b0 RDI: 0000000000000003
>  RBP: 00007ffe86caf5a0 R08: 00007ffe86caf4e0 R09: 0000000000000080
>  R10: 0000000000000000 R11: 0000000000000202 R12: 0000556c5cd932d0
>  R13: 00000000021d05d1 R14: 00000000021d05d1 R15: 0000000000000001
> 
> [...]

Here is the summary with links:
  - [net-next,v2] ipv6: move IFA_F_PERMANENT percpu allocation in process scope
    https://git.kernel.org/netdev/net-next/c/8e6405f8218b

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH v4 net-next] net: use get_random_u{16,32,64}() where appropriate
From: patchwork-bot+netdevbpf @ 2026-04-10  3:10 UTC (permalink / raw)
  To: David CARLIER
  Cc: kuba, davem, edumazet, pabeni, andrew+netdev, horms, idryomov,
	johannes, matttbe, martineau, geliang, aconole, i.maximets,
	marcelo.leitner, lucien.xin, jmaloy, netdev, linux-wireless,
	mptcp, dev, linux-sctp, tipc-discussion, linux-kernel
In-Reply-To: <20260407150758.5889-1-devnexen@gmail.com>

Hello:

This patch was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Tue,  7 Apr 2026 16:07:58 +0100 you wrote:
> Use the typed random integer helpers instead of
> get_random_bytes() when filling a single integer variable.
> The helpers return the value directly, require no pointer
> or size argument, and better express intent.
> 
> Skipped sites writing into __be16 (netdevsim) and __le64
> (ceph) fields where a direct assignment would trigger
> sparse endianness warnings.
> 
> [...]

Here is the summary with links:
  - [v4,net-next] net: use get_random_u{16,32,64}() where appropriate
    https://git.kernel.org/netdev/net-next/c/9addea5d44b6

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net-next v4 3/3] gve: implement PTP gettimex64
From: Jakub Kicinski @ 2026-04-10  3:16 UTC (permalink / raw)
  To: Harshitha Ramamurthy
  Cc: netdev, joshwash, andrew+netdev, davem, edumazet, pabeni,
	richardcochran, willemb, nktgrg, jfraker, ziweixiao, maolson,
	thostet, jordanrhee, jefrogers, alok.a.tiwari, yyd,
	jacob.e.keller, linux-kernel, Naman Gulati
In-Reply-To: <20260406234002.3610542-4-hramamurthy@google.com>

On Mon,  6 Apr 2026 23:40:02 +0000 Harshitha Ramamurthy wrote:
> From: Jordan Rhee <jordanrhee@google.com>
> 
> Enable chrony and phc2sys to synchronize system clock to NIC clock.
> 
> The system cycle counters are sampled by the device to minimize the
> uncertainty window. If the system times are sampled in the host, the
> delta between pre and post readings is 100us or more due to AQ command
> latency. The system times returned by the device have a delta of ~1us,
> which enables significantly more accurate clock synchronization.

Interesting. I'd like this looked over by David Woodhouse and tglx.
Please repost after the merge window or send them an RFC.

> +static int gve_ptp_read_timestamp(struct gve_ptp *ptp, cycles_t *pre_cycles,
> +				  cycles_t *post_cycles,
> +				  struct system_time_snapshot *snap)
> +{
> +	unsigned long delay_us = 1000;
> +	int retry_count = 0;
> +	int err;
> +
> +	lockdep_assert_held(&ptp->nic_ts_read_lock);
> +
> +	do {

This can't be a for () loop with 5 iterations?

> +		if (snap)
> +			ktime_get_snapshot(snap);
> +
> +		*pre_cycles = get_cycles();
> +		err = gve_adminq_report_nic_ts(ptp->priv,
> +					       ptp->nic_ts_report_bus);
> +
> +		/* Prevent get_cycles() from being speculatively executed
> +		 * before the AdminQ command
> +		 */
> +		rmb();
> +		*post_cycles = get_cycles();
> +		if (likely(err != -EAGAIN))
> +			return err;
> +
> +		fsleep(delay_us);
> +
> +		/* Exponential backoff */
> +		delay_us *= 2;
> +		retry_count++;
> +	} while (retry_count < 5);
> +
> +	return -ETIMEDOUT;
> +}
> +
>  /* Read the nic timestamp from hardware via the admin queue. */
> -static int gve_clock_nic_ts_read(struct gve_ptp *ptp, u64 *nic_raw)
> +static int gve_clock_nic_ts_read(struct gve_ptp *ptp, u64 *nic_raw,
> +				 struct gve_sysclock_sample *sysclock)
>  {
> +	cycles_t host_pre_cycles, host_post_cycles;
> +	struct gve_nic_ts_report *ts_report;
>  	int err;
>  
>  	mutex_lock(&ptp->nic_ts_read_lock);
> -	err = gve_adminq_report_nic_ts(ptp->priv, ptp->nic_ts_report_bus);
> -	if (err)
> +	err = gve_ptp_read_timestamp(ptp, &host_pre_cycles, &host_post_cycles,
> +				     sysclock ? &sysclock->snapshot : NULL);
> +	if (err) {
> +		dev_err_ratelimited(&ptp->priv->pdev->dev,
> +				    "AdminQ timestamp read failed: %d\n", err);
>  		goto out;
> +	}
>  
> -	*nic_raw = be64_to_cpu(ptp->nic_ts_report->nic_timestamp);
> +	ts_report = ptp->nic_ts_report;
> +	*nic_raw = be64_to_cpu(ts_report->nic_timestamp);
> +
> +	if (sysclock) {
> +		sysclock->nic_pre_cycles = be64_to_cpu(ts_report->pre_cycles);
> +		sysclock->nic_post_cycles = be64_to_cpu(ts_report->post_cycles);
> +		sysclock->host_pre_cycles = host_pre_cycles;
> +		sysclock->host_post_cycles = host_post_cycles;
> +	}
>  
>  out:
>  	mutex_unlock(&ptp->nic_ts_read_lock);
>  	return err;
>  }
>  
> +struct gve_cycles_to_clock_callback_ctx {
> +	u64 cycles;
> +};
> +
> +static int gve_cycles_to_clock_fn(ktime_t *device_time,
> +				  struct system_counterval_t *system_counterval,
> +				  void *ctx)

Does this do anything GVE specific??

> +{
> +	struct gve_cycles_to_clock_callback_ctx *context = ctx;
> +
> +	*device_time = 0;
> +
> +	system_counterval->cycles = context->cycles;
> +	system_counterval->use_nsecs = false;
> +
> +	if (IS_ENABLED(CONFIG_X86))
> +		system_counterval->cs_id = CSID_X86_TSC;
> +	else if (IS_ENABLED(CONFIG_ARM64))
> +		system_counterval->cs_id = CSID_ARM_ARCH_COUNTER;
> +	else
> +		return -EOPNOTSUPP;
> +
> +	return 0;
> +}
-- 
pw-bot: cr

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox