* [PATCH v4 3/3] arm64: dts: rockchip: describe PCIe RTL8125 Ethernet on Radxa ROCK 5 family
From: Ricardo Pardini via B4 Relay @ 2026-06-17 12:58 UTC (permalink / raw)
To: Heiner Kallweit, nic_swsd, Andrew Lunn, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Rob Herring,
Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner
Cc: Sebastian Reichel, netdev, devicetree, linux-kernel,
linux-arm-kernel, linux-rockchip, Ricardo Pardini
In-Reply-To: <20260617-rk3588-dts-rtl-eth-describe-dt-alias-v4-0-2bd38922d129@pardini.net>
From: Ricardo Pardini <ricardo@pardini.net>
The Radxa ROCK 5B / 5B+ / 5T all carry on-board Realtek RTL8125 NICs.
Describe the fixed function nodes and attach ethernet0/ethernet1
aliases, so that U-Boot's fdt_fixup_ethernet() can inject mac-address
properties from its ethaddr/eth1addr env, for stable MACs across
boots that both U-Boot and the kernel agree on.
The RTL8125 on pcie2x1l2 is shared by all three variants. The ROCK 5T
additionally describes pcie2x1l1 with its second RTL8125.
Signed-off-by: Ricardo Pardini <ricardo@pardini.net>
---
.../arm64/boot/dts/rockchip/rk3588-rock-5b-5bp-5t.dtsi | 15 +++++++++++++++
arch/arm64/boot/dts/rockchip/rk3588-rock-5t.dts | 18 ++++++++++++++++++
2 files changed, 33 insertions(+)
diff --git a/arch/arm64/boot/dts/rockchip/rk3588-rock-5b-5bp-5t.dtsi b/arch/arm64/boot/dts/rockchip/rk3588-rock-5b-5bp-5t.dtsi
index bf4a1d2e55ca3..b53dfe6848cce 100644
--- a/arch/arm64/boot/dts/rockchip/rk3588-rock-5b-5bp-5t.dtsi
+++ b/arch/arm64/boot/dts/rockchip/rk3588-rock-5b-5bp-5t.dtsi
@@ -10,6 +10,7 @@
/ {
aliases {
+ ethernet0 = &rtl_eth0;
mmc0 = &sdhci;
mmc1 = &sdmmc;
mmc2 = &sdio;
@@ -482,6 +483,20 @@ &pcie2x1l2 {
reset-gpios = <&gpio3 RK_PB0 GPIO_ACTIVE_HIGH>;
vpcie3v3-supply = <&vcc3v3_pcie2x1l2>;
status = "okay";
+
+ pcie@0,0 {
+ reg = <0x400000 0 0 0 0>;
+ #address-cells = <3>;
+ #size-cells = <2>;
+ ranges;
+ device_type = "pci";
+ bus-range = <0x41 0x4f>;
+
+ rtl_eth0: ethernet@0,0 {
+ compatible = "pci10ec,8125";
+ reg = <0x410000 0 0 0 0>;
+ };
+ };
};
&pcie30phy {
diff --git a/arch/arm64/boot/dts/rockchip/rk3588-rock-5t.dts b/arch/arm64/boot/dts/rockchip/rk3588-rock-5t.dts
index 425036146b6d9..b1a3e4b2165f9 100644
--- a/arch/arm64/boot/dts/rockchip/rk3588-rock-5t.dts
+++ b/arch/arm64/boot/dts/rockchip/rk3588-rock-5t.dts
@@ -8,6 +8,10 @@ / {
model = "Radxa ROCK 5T";
compatible = "radxa,rock-5t", "rockchip,rk3588";
+ aliases {
+ ethernet1 = &rtl_eth1;
+ };
+
analog-sound {
compatible = "audio-graph-card";
label = "rk3588-es8316";
@@ -76,6 +80,20 @@ &pcie2x1l1 {
reset-gpios = <&gpio4 RK_PA2 GPIO_ACTIVE_HIGH>;
vpcie3v3-supply = <&vcc3v3_pcie2x1l1>;
status = "okay";
+
+ pcie@0,0 {
+ reg = <0x300000 0 0 0 0>;
+ #address-cells = <3>;
+ #size-cells = <2>;
+ ranges;
+ device_type = "pci";
+ bus-range = <0x31 0x3f>;
+
+ rtl_eth1: ethernet@0,0 {
+ compatible = "pci10ec,8125";
+ reg = <0x310000 0 0 0 0>;
+ };
+ };
};
&pcie30phy {
--
2.54.0
^ permalink raw reply related
* [PATCH v4 2/3] arm64: dts: rockchip: describe PCIe RTL8125 Ethernet on NanoPC-T6
From: Ricardo Pardini via B4 Relay @ 2026-06-17 12:58 UTC (permalink / raw)
To: Heiner Kallweit, nic_swsd, Andrew Lunn, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Rob Herring,
Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner
Cc: Sebastian Reichel, netdev, devicetree, linux-kernel,
linux-arm-kernel, linux-rockchip, Ricardo Pardini
In-Reply-To: <20260617-rk3588-dts-rtl-eth-describe-dt-alias-v4-0-2bd38922d129@pardini.net>
From: Ricardo Pardini <ricardo@pardini.net>
The FriendlyElec NanoPC-T6 carries two on-board Realtek RTL8125 NICs
behind pcie2x1l0 and pcie2x1l2.
Describe the fixed function nodes and attach ethernet0/ethernet1
aliases, so that U-Boot's fdt_fixup_ethernet() can inject mac-address
properties from its ethaddr/eth1addr env. The on-NIC EEPROMs on this
board are not pre-programmed with a unique MAC, so this gives a
stable MAC across boots that both U-Boot and the kernel agree on.
Signed-off-by: Ricardo Pardini <ricardo@pardini.net>
---
arch/arm64/boot/dts/rockchip/rk3588-nanopc-t6.dtsi | 30 ++++++++++++++++++++++
1 file changed, 30 insertions(+)
diff --git a/arch/arm64/boot/dts/rockchip/rk3588-nanopc-t6.dtsi b/arch/arm64/boot/dts/rockchip/rk3588-nanopc-t6.dtsi
index 84b6b53f016ab..0c11033f9d8e4 100644
--- a/arch/arm64/boot/dts/rockchip/rk3588-nanopc-t6.dtsi
+++ b/arch/arm64/boot/dts/rockchip/rk3588-nanopc-t6.dtsi
@@ -20,6 +20,8 @@ / {
compatible = "friendlyarm,nanopc-t6", "rockchip,rk3588";
aliases {
+ ethernet0 = &rtl_eth0;
+ ethernet1 = &rtl_eth1;
mmc0 = &sdhci;
mmc1 = &sdmmc;
};
@@ -635,6 +637,20 @@ &pcie2x1l0 {
pinctrl-names = "default";
pinctrl-0 = <&pcie2_0_rst>;
status = "okay";
+
+ pcie@0,0 {
+ reg = <0x200000 0 0 0 0>;
+ #address-cells = <3>;
+ #size-cells = <2>;
+ ranges;
+ device_type = "pci";
+ bus-range = <0x21 0x2f>;
+
+ rtl_eth0: ethernet@0,0 {
+ compatible = "pci10ec,8125";
+ reg = <0x210000 0 0 0 0>;
+ };
+ };
};
&pcie2x1l1 {
@@ -651,6 +667,20 @@ &pcie2x1l2 {
pinctrl-names = "default";
pinctrl-0 = <&pcie2_2_rst>;
status = "okay";
+
+ pcie@0,0 {
+ reg = <0x400000 0 0 0 0>;
+ #address-cells = <3>;
+ #size-cells = <2>;
+ ranges;
+ device_type = "pci";
+ bus-range = <0x41 0x4f>;
+
+ rtl_eth1: ethernet@0,0 {
+ compatible = "pci10ec,8125";
+ reg = <0x410000 0 0 0 0>;
+ };
+ };
};
&pcie30phy {
--
2.54.0
^ permalink raw reply related
* [PATCH v4 1/3] dt-bindings: net: add Realtek RTL8125 PCIe Ethernet
From: Ricardo Pardini via B4 Relay @ 2026-06-17 12:58 UTC (permalink / raw)
To: Heiner Kallweit, nic_swsd, Andrew Lunn, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Rob Herring,
Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner
Cc: Sebastian Reichel, netdev, devicetree, linux-kernel,
linux-arm-kernel, linux-rockchip, Ricardo Pardini
In-Reply-To: <20260617-rk3588-dts-rtl-eth-describe-dt-alias-v4-0-2bd38922d129@pardini.net>
From: Ricardo Pardini <ricardo@pardini.net>
Add a binding for fixed/soldered Realtek RTL8125 PCIe Ethernet
controller.
The "pciVVVV,DDDD" compatibles are the Open Firmware PCI Bus Binding
spelling, auto-derived from PCI-SIG vendor/device IDs, but they still
need a binding when used in a board DT - analogous to "usbVVVV,PPPP"
compatibles documented in their own bindings (e.g. microchip,lan95xx)
so board DTs attaching properties (fixed MAC, nvmem cell, ...) to
these PCI function nodes can be validated.
Suggested-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Ricardo Pardini <ricardo@pardini.net>
---
.../devicetree/bindings/net/realtek,rtl8125.yaml | 43 ++++++++++++++++++++++
MAINTAINERS | 1 +
2 files changed, 44 insertions(+)
diff --git a/Documentation/devicetree/bindings/net/realtek,rtl8125.yaml b/Documentation/devicetree/bindings/net/realtek,rtl8125.yaml
new file mode 100644
index 0000000000000..eee13fbc1e6a6
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/realtek,rtl8125.yaml
@@ -0,0 +1,43 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/net/realtek,rtl8125.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Realtek RTL8125 2.5 Gigabit PCIe Ethernet Controller
+
+maintainers:
+ - Heiner Kallweit <hkallweit1@gmail.com>
+
+description:
+ The Realtek RTL8125 is a 2.5GBASE-T Ethernet controller with a PCIe host
+ interface.
+
+allOf:
+ - $ref: ethernet-controller.yaml#
+
+properties:
+ compatible:
+ const: pci10ec,8125
+
+ reg:
+ maxItems: 1
+
+required:
+ - compatible
+ - reg
+
+unevaluatedProperties: false
+
+examples:
+ - |
+ pcie {
+ #address-cells = <3>;
+ #size-cells = <2>;
+
+ ethernet@0,0 {
+ compatible = "pci10ec,8125";
+ reg = <0x10000 0 0 0 0>;
+ local-mac-address = [00 00 00 00 00 00];
+ };
+ };
diff --git a/MAINTAINERS b/MAINTAINERS
index c8d4b913f26c1..e5fbd82946aec 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -134,6 +134,7 @@ M: Heiner Kallweit <hkallweit1@gmail.com>
M: nic_swsd@realtek.com
L: netdev@vger.kernel.org
S: Maintained
+F: Documentation/devicetree/bindings/net/realtek,rtl8125.yaml
F: drivers/net/ethernet/realtek/r8169*
8250/16?50 (AND CLONE UARTS) SERIAL DRIVER
--
2.54.0
^ permalink raw reply related
* [PATCH v4 0/3] describe RTL8125 PCIe NICs on Rockchip boards (and add DT binding)
From: Ricardo Pardini via B4 Relay @ 2026-06-17 12:58 UTC (permalink / raw)
To: Heiner Kallweit, nic_swsd, Andrew Lunn, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Rob Herring,
Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner
Cc: Sebastian Reichel, netdev, devicetree, linux-kernel,
linux-arm-kernel, linux-rockchip, Ricardo Pardini
Several Rockchip rk35xx boards carry on-board Realtek RTL8125 2.5GbE
NICs whose PCI function nodes are not described in the DT. Describing
them allows for stable ethernetN aliases (matching the GMAC alias
convention on these boards) and lets U-Boot's fdt_fixup_ethernet()
inject mac-address properties from its ethaddr/ethNaddr env, so MACs
stay stable across boots and U-Boot and kernel MAC match.
Patch 1 adds a DT binding for Realtek RTL8125 family PCIe Ethernet
controllers.
Patch 2 describes the on-board RTL8125 function nodes on the
FriendlyElec NanoPC-T6 (and variants).
Patch 3 describes the on-board RTL8125 function nodes on the Radxa
ROCK 5B / 5B+ / 5T family done based on lspci output provided by
helpful Armbian folks.
---
Changes in v4:
- binding: simplify the binding YAML ref Sashiko's and Krzysztof's
reviews
- binding: describe only the RTL8125 + rename to match ref Heiner's
review.
- dt: fix the bus-range according to Sashiko's review.
- Link to v3: https://patch.msgid.link/20260605-rk3588-dts-rtl-eth-describe-dt-alias-v3-0-8a8857b39daf@pardini.net
Changes in v3:
- new patch: add a DT binding for Realtek r8169 family PCIe Ethernet
controllers, per Sebastian Reichel's review (the "pciVVVV,DDDD" OF
spelling still needs a binding when used in a board DT).
- new patch for Rock5 series, and include a brief rationale in each.
- retitle the series, since it now covers a few boards and a binding
rather than just DeviceTree changes for the NanoPC-T6.
- drop the v2 "rename vcc3v3_pcie2x1l0 regulator" patch from this
series; it will be sent separately as it is not relevant to this.
- Link to v2: https://patch.msgid.link/20260529-rk3588-dts-rtl-eth-describe-dt-alias-v2-0-49700248143f@pardini.net
Changes in v2:
- fix: pcie2x1l0, not pcie2x1l1; indirectly caught by Sashiko's review [1]
- while-at-it: rename regulator vcc3v3_pcie2x1l0 to l1
- Link to v1: https://patch.msgid.link/20260525-rk3588-dts-rtl-eth-describe-dt-alias-v1-1-a6fcda563ac7@pardini.net
[1] https://sashiko.dev/#/patchset/20260525-rk3588-dts-rtl-eth-describe-dt-alias-v1-1-a6fcda563ac7%40pardini.net
To: Heiner Kallweit <hkallweit1@gmail.com>
To: nic_swsd@realtek.com
To: Andrew Lunn <andrew+netdev@lunn.ch>
To: "David S. Miller" <davem@davemloft.net>
To: Eric Dumazet <edumazet@google.com>
To: Jakub Kicinski <kuba@kernel.org>
To: Paolo Abeni <pabeni@redhat.com>
To: Rob Herring <robh@kernel.org>
To: Krzysztof Kozlowski <krzk+dt@kernel.org>
To: Conor Dooley <conor+dt@kernel.org>
To: Heiko Stuebner <heiko@sntech.de>
Cc: Sebastian Reichel <sebastian.reichel@collabora.com>
Cc: netdev@vger.kernel.org
Cc: devicetree@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-rockchip@lists.infradead.org
Signed-off-by: Ricardo Pardini <ricardo@pardini.net>
---
Ricardo Pardini (3):
dt-bindings: net: add Realtek RTL8125 PCIe Ethernet
arm64: dts: rockchip: describe PCIe RTL8125 Ethernet on NanoPC-T6
arm64: dts: rockchip: describe PCIe RTL8125 Ethernet on Radxa ROCK 5 family
.../devicetree/bindings/net/realtek,rtl8125.yaml | 43 ++++++++++++++++++++++
MAINTAINERS | 1 +
arch/arm64/boot/dts/rockchip/rk3588-nanopc-t6.dtsi | 30 +++++++++++++++
.../boot/dts/rockchip/rk3588-rock-5b-5bp-5t.dtsi | 15 ++++++++
arch/arm64/boot/dts/rockchip/rk3588-rock-5t.dts | 18 +++++++++
5 files changed, 107 insertions(+)
---
base-commit: 8cd9520d35a6c38db6567e97dd93b1f11f185dc6
change-id: 20260524-rk3588-dts-rtl-eth-describe-dt-alias-c1ed187b7c50
Best regards,
--
Ricardo Pardini <ricardo@pardini.net>
^ permalink raw reply
* Re: [PATCH v2] net: mvneta: free/request IRQ across suspend/resume
From: Maxime Chevallier @ 2026-06-17 12:49 UTC (permalink / raw)
To: Yun Zhou, marcin.s.wojtas, andrew+netdev, davem, edumazet, kuba,
pabeni, bigeasy, clrkwllms, rostedt
Cc: netdev, linux-kernel, linux-rt-devel
In-Reply-To: <20260617092028.1722407-1-yun.zhou@windriver.com>
Hi,
On 6/17/26 11:20, Yun Zhou wrote:
> On PREEMPT_RT, the mvneta IRQ handler is force-threaded. Under high
> network traffic, the IRQ can enter suspend with desc->depth == 1
> (masked by the oneshot mechanism between handler invocations).
>
> During suspend, the kernel increments depth to 2 and masks the
> interrupt at the MPIC level (clearing the SRC_CTL CPU routing bit,
> due to IRQCHIP_MASK_ON_SUSPEND). On resume, depth is decremented
> back to 1, but since it does not reach 0, the unmask is never
> called. The MPIC CPU routing remains cleared, permanently disabling
> interrupt delivery.
>
> Fix by freeing the IRQ in suspend and re-requesting it in resume.
> This ensures a clean IRQ state (depth=0, proper hardware routing)
> on every resume cycle, regardless of the pre-suspend depth. This
> follows the approach used by other drivers (e.g. igb).
This description makes it sound like it's not really a mvneta problem,
but rather a broader effect from preempt-rt / irq management / suspend
interactions.
Is this the expected way to deal with that ?
Maxime
^ permalink raw reply
* Re: [PATCH v2] [net] net: airoha: Clean up RX queues in airoha_dev_stop
From: Lorenzo Bianconi @ 2026-06-17 12:48 UTC (permalink / raw)
To: Wayen Yan
Cc: netdev, horms, pabeni, kuba, edumazet, andrew+netdev,
angelogioacchino.delregno, matthias.bgg, linux-arm-kernel,
linux-mediatek
In-Reply-To: <178170026659.2238511.17652659042899875248@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 874 bytes --]
> Thanks Simon for forwarding the AI review. I've reviewed all three
> concerns:
>
> #1 (NAPI race) and #2 (RX refill) are valid. #2 is the decisive
> issue: airoha_dev_open() has no RX ring refill, so draining the
> queues in stop would cause RX stall on next open. This aligns with
> Lorenzo's earlier feedback — RX queues don't need cleanup in
> dev_stop(). I'll drop this patch.
>
> #3 (q->skb leak) is a pre-existing issue, not introduced by this
> patch. It exists even in the module unload path
> (airoha_qdma_cleanup()). @Lorenzo — do you think this warrants a
> fix? A one-liner in airoha_qdma_cleanup_rx_queue() would cover both
> paths. Or is this too unlikely to matter in practice?
Soon I will post a patch to run airoha_qdma_cleanup_tx_queue() just in
airoha_hw_cleanup() so I think we should just drop this patch.
Regards,
Lorenzo
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply
* Re: [PATCH v2] [net] net: airoha: Clean up RX queues in airoha_dev_stop
From: Wayen Yan @ 2026-06-17 12:44 UTC (permalink / raw)
To: netdev
Cc: lorenzo, horms, pabeni, kuba, edumazet, andrew+netdev,
angelogioacchino.delregno, matthias.bgg, linux-arm-kernel,
linux-mediatek
In-Reply-To: <178161160256.2165161.14322392784449633554@gmail.com>
Thanks Simon for forwarding the AI review. I've reviewed all three
concerns:
#1 (NAPI race) and #2 (RX refill) are valid. #2 is the decisive
issue: airoha_dev_open() has no RX ring refill, so draining the
queues in stop would cause RX stall on next open. This aligns with
Lorenzo's earlier feedback — RX queues don't need cleanup in
dev_stop(). I'll drop this patch.
#3 (q->skb leak) is a pre-existing issue, not introduced by this
patch. It exists even in the module unload path
(airoha_qdma_cleanup()). @Lorenzo — do you think this warrants a
fix? A one-liner in airoha_qdma_cleanup_rx_queue() would cover both
paths. Or is this too unlikely to matter in practice?
^ permalink raw reply
* [PATCH net v2] amt: don't read the IP source address from a reallocated skb header
From: Michael Bommarito @ 2026-06-17 12:34 UTC (permalink / raw)
To: Taehee Yoo, David S . Miller, Jakub Kicinski, Paolo Abeni,
Eric Dumazet
Cc: Andrew Lunn, netdev, linux-kernel
amt_update_handler() caches iph = ip_hdr(skb) and then calls
pskb_may_pull(). pskb_may_pull() can reallocate the skb head: the new
head is allocated and the old one is freed. The cached iph is not
refreshed, so the following tunnel lookup reads iph->saddr from the
freed head. On an AMT relay this lookup runs for every incoming
membership update, before the update's nonce and response MAC are
validated.
The sibling handlers amt_multicast_data_handler() and
amt_membership_query_handler() re-read ip_hdr() after the pull and are
not affected; only amt_update_handler() keeps the pre-pull pointer.
Snapshot the source address before the pulls and match against the
snapshot.
The stale read was confirmed by instrumentation rather than a sanitizer:
after the head is reallocated the comparison reads from the freed old
head. KASAN does not flag it because the skb head is released through
the page-fragment free path, which is not poisoned on free.
Fixes: cbc21dc1cfe9 ("amt: add data plane of amt interface")
Acked-by: Taehee Yoo <ap420073@gmail.com>
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
---
v2: per Taehee Yoo's review
(https://lore.kernel.org/all/CAMArcTWCg4x1bxrzr+XHc_FqbzJELCMu+tE=x8Jhewgr-_A3Rw@mail.gmail.com/):
- retag the subject as [PATCH net] (this is a bug fix);
- drop Cc: stable -- the Fixes tag is enough for the stable backport
process to pick it up;
- carry Taehee Yoo's Acked-by.
No code change from v1.
v1: https://lore.kernel.org/all/20260614155539.3106537-1-michael.bommarito@gmail.com/
Confirmed on x86_64 by instrumenting the comparison: with the update
packet built so the first pskb_may_pull() reallocates the head (it pulls
bytes out of a page fragment with no tailroom), the read runs against
the freed old head -- the head pointer moves and the old page's refcount
is 0. Neither generic KASAN nor arm64 HW-tag KASAN reports it: page-
fragment frees are not synchronously poisoned, and under MTE the freed
page keeps a tag matching the stale pointer, so this class of stale-
header read escapes the usual fuzzing oracles. On a live relay the freed
head is also exposed to reuse by later skb allocations.
amtdbg: cmp reads iph=...e000 (skb->head=...384380) stale_head=1 ref=0
A KUnit covering the re-read can follow separately.
drivers/net/amt.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/amt.c b/drivers/net/amt.c
index f2f3139..af6e28d 100644
--- a/drivers/net/amt.c
+++ b/drivers/net/amt.c
@@ -2455,8 +2455,10 @@ static bool amt_update_handler(struct amt_dev *amt, struct sk_buff *skb)
struct ethhdr *eth;
struct iphdr *iph;
int len, hdr_size;
+ __be32 saddr;
iph = ip_hdr(skb);
+ saddr = iph->saddr;
hdr_size = sizeof(*amtmu) + sizeof(struct udphdr);
if (!pskb_may_pull(skb, hdr_size))
@@ -2472,7 +2474,7 @@ static bool amt_update_handler(struct amt_dev *amt, struct sk_buff *skb)
skb_reset_network_header(skb);
list_for_each_entry_rcu(tunnel, &amt->tunnel_list, list) {
- if (tunnel->ip4 == iph->saddr) {
+ if (tunnel->ip4 == saddr) {
if ((amtmu->nonce == tunnel->nonce &&
amtmu->response_mac == tunnel->mac)) {
mod_delayed_work(amt_wq, &tunnel->gc_wq,
base-commit: 5200f5f493f79f14bbdc349e402a40dfb32f23c8
--
2.53.0
^ permalink raw reply related
* [PATCH net-next v3] net: dsa: Fix skb ownership in taggers
From: Linus Walleij @ 2026-06-17 12:30 UTC (permalink / raw)
To: Andrew Lunn, Vladimir Oltean, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, Florian Fainelli,
Jonas Gorski, Hauke Mehrtens, Kurt Kanzenbach, Woojung Huh,
UNGLinuxDriver, Chester A. Unal, Daniel Golle, Matthias Brugger,
AngeloGioacchino Del Regno, Wei Fang, Clark Wang,
Clément Léger, George McCollister, David Yang
Cc: netdev, Sashiko AI Review, Linus Walleij
The tag_8021q.c tagger calls vlan_insert_tag() in dsa_8021q_xmit().
vlan_insert_tag() will consume the skb with kfree_skb() on failure
and return NULL.
When NULL is returned as error code to ->xmit() in dsa_user_xmit()
it will free the same skb again leading to a double-free.
The idea of dsa_user_xmit() and dsa_switch_rcv() dropping the skb
they held before the call to ->xmit() and ->rcv() is conceptually
wrong: the pattern elsewhere in the networking code is that consumers
drop their skb:s on failure.
Modify the ->xmit() and ->rcv() call sites to not drop the SKB if
the taggers return NULL from any of these calls. Move those drops into
the taggers so every callback error path that retains ownership consumes
the skb before returning NULL.
Keep the existing helper ownership rules: VLAN insertion helpers already
free on failure (this is the case in tag_8021q.c), while deferred
transmit paths either transfer the skb reference to worker context or
hold a worker reference with skb_get() and drop the caller's reference.
For SJA1105 meta RX, transfer the buffered stampable skb under the meta
lock and return NULL while the skb is waiting for its meta frame: the
skb is not dropped in this case.
NOTICE: Backporting patches to taggers (e.g. for stable kernels) after
this point cannot be mechanical or they will introduce double
kfree_skb().
Reported-by: Sashiko AI Review <sashiko-bot@kernel.org>
Closes: https://lore.kernel.org/r/20260610153952.1685895-1-kuba@kernel.org/
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Assisted-by: Codex:gpt-5-5
Acked-by: David Yang <mmyangfl@gmail.com> # yt921x
Acked-by: Kurt Kanzenbach <kurt@linutronix.de> # hellcreek
Reviewed-by: Wei Fang <wei.fang@nxp.com> # netc
Signed-off-by: Linus Walleij <linusw@kernel.org>
---
Changes in v3:
- Simplify __skb_put_padto(skb, ETH_ZLEN, false) and
skb_put_padto(skb, ETH_ZLEN) to eth_skb_pad().
- Pick up Wei's review tag.
- Link to v2: https://patch.msgid.link/20260616-dsa-fix-free-skb-v2-1-9dbda6a19e97@kernel.org
Changes in v2:
- In some instances __skb_pad() and __skb_put_padto() followed by a
kfree_skb() could be simplified to just call skb_pad() and
skb_put_padto() which will free the skb on failure.
- Use a label and goto for the kfree_skb(); return NULL; in
the netc_rcv() callback in tag_netc.c as requested.
- Collect ACKs.
- Retag for net-next.
- Link to v1: https://patch.msgid.link/20260616-dsa-fix-free-skb-v1-1-fd30b35dcf66@kernel.org
---
net/dsa/tag.c | 4 +---
net/dsa/tag_ar9331.c | 10 ++++++++--
net/dsa/tag_brcm.c | 39 ++++++++++++++++++++++++---------------
net/dsa/tag_dsa.c | 15 ++++++++++++---
net/dsa/tag_gswip.c | 8 ++++++--
net/dsa/tag_hellcreek.c | 9 +++++++--
net/dsa/tag_ksz.c | 44 +++++++++++++++++++++++++++++++-------------
net/dsa/tag_lan9303.c | 2 ++
net/dsa/tag_mtk.c | 8 ++++++--
net/dsa/tag_mxl-gsw1xx.c | 3 +++
net/dsa/tag_mxl862xx.c | 3 +++
net/dsa/tag_netc.c | 18 ++++++++++--------
net/dsa/tag_ocelot.c | 4 +++-
net/dsa/tag_ocelot_8021q.c | 20 +++++++++++++-------
net/dsa/tag_qca.c | 14 +++++++++++---
net/dsa/tag_rtl4_a.c | 8 ++++++--
net/dsa/tag_rtl8_4.c | 24 ++++++++++++++++++------
net/dsa/tag_rzn1_a5psw.c | 8 ++++++--
net/dsa/tag_sja1105.c | 42 +++++++++++++++++++++++++++---------------
net/dsa/tag_trailer.c | 16 ++++++++++++----
net/dsa/tag_vsc73xx_8021q.c | 1 +
net/dsa/tag_xrs700x.c | 12 +++++++++---
net/dsa/tag_yt921x.c | 7 ++++++-
net/dsa/user.c | 7 +++----
24 files changed, 228 insertions(+), 98 deletions(-)
diff --git a/net/dsa/tag.c b/net/dsa/tag.c
index 79ad105902d9..cfc8f5a0cbd9 100644
--- a/net/dsa/tag.c
+++ b/net/dsa/tag.c
@@ -84,10 +84,8 @@ static int dsa_switch_rcv(struct sk_buff *skb, struct net_device *dev,
nskb = cpu_dp->rcv(skb, dev);
}
- if (!nskb) {
- kfree_skb(skb);
+ if (!nskb)
return 0;
- }
skb = nskb;
skb_push(skb, ETH_HLEN);
diff --git a/net/dsa/tag_ar9331.c b/net/dsa/tag_ar9331.c
index cbb588ca73aa..2e2388143b02 100644
--- a/net/dsa/tag_ar9331.c
+++ b/net/dsa/tag_ar9331.c
@@ -51,8 +51,10 @@ static struct sk_buff *ar9331_tag_rcv(struct sk_buff *skb,
u8 ver, port;
u16 hdr;
- if (unlikely(!pskb_may_pull(skb, AR9331_HDR_LEN)))
+ if (unlikely(!pskb_may_pull(skb, AR9331_HDR_LEN))) {
+ kfree_skb(skb);
return NULL;
+ }
hdr = le16_to_cpu(*(__le16 *)skb_mac_header(skb));
@@ -60,12 +62,14 @@ static struct sk_buff *ar9331_tag_rcv(struct sk_buff *skb,
if (unlikely(ver != AR9331_HDR_VERSION)) {
netdev_warn_once(ndev, "%s:%i wrong header version 0x%2x\n",
__func__, __LINE__, hdr);
+ kfree_skb(skb);
return NULL;
}
if (unlikely(hdr & AR9331_HDR_FROM_CPU)) {
netdev_warn_once(ndev, "%s:%i packet should not be from cpu 0x%2x\n",
__func__, __LINE__, hdr);
+ kfree_skb(skb);
return NULL;
}
@@ -75,8 +79,10 @@ static struct sk_buff *ar9331_tag_rcv(struct sk_buff *skb,
port = FIELD_GET(AR9331_HDR_PORT_NUM_MASK, hdr);
skb->dev = dsa_conduit_find_user(ndev, 0, port);
- if (!skb->dev)
+ if (!skb->dev) {
+ kfree_skb(skb);
return NULL;
+ }
return skb;
}
diff --git a/net/dsa/tag_brcm.c b/net/dsa/tag_brcm.c
index cf9420439054..411e3b57d16a 100644
--- a/net/dsa/tag_brcm.c
+++ b/net/dsa/tag_brcm.c
@@ -102,9 +102,9 @@ static struct sk_buff *brcm_tag_xmit_ll(struct sk_buff *skb,
* (including FCS and tag) because the length verification is done after
* the Broadcom tag is stripped off the ingress packet.
*
- * Let dsa_user_xmit() free the SKB
+ * Free the SKB on error.
*/
- if (__skb_put_padto(skb, ETH_ZLEN + BRCM_TAG_LEN, false))
+ if (skb_put_padto(skb, ETH_ZLEN + BRCM_TAG_LEN))
return NULL;
skb_push(skb, BRCM_TAG_LEN);
@@ -151,27 +151,35 @@ static struct sk_buff *brcm_tag_rcv_ll(struct sk_buff *skb,
int source_port;
u8 *brcm_tag;
- if (unlikely(!pskb_may_pull(skb, BRCM_TAG_LEN)))
+ if (unlikely(!pskb_may_pull(skb, BRCM_TAG_LEN))) {
+ kfree_skb(skb);
return NULL;
+ }
brcm_tag = skb->data - offset;
/* The opcode should never be different than 0b000 */
- if (unlikely((brcm_tag[0] >> BRCM_OPCODE_SHIFT) & BRCM_OPCODE_MASK))
+ if (unlikely((brcm_tag[0] >> BRCM_OPCODE_SHIFT) & BRCM_OPCODE_MASK)) {
+ kfree_skb(skb);
return NULL;
+ }
/* We should never see a reserved reason code without knowing how to
* handle it
*/
- if (unlikely(brcm_tag[2] & BRCM_EG_RC_RSVD))
+ if (unlikely(brcm_tag[2] & BRCM_EG_RC_RSVD)) {
+ kfree_skb(skb);
return NULL;
+ }
/* Locate which port this is coming from */
source_port = brcm_tag[3] & BRCM_EG_PID_MASK;
skb->dev = dsa_conduit_find_user(dev, 0, source_port);
- if (!skb->dev)
+ if (!skb->dev) {
+ kfree_skb(skb);
return NULL;
+ }
/* Remove Broadcom tag and update checksum */
skb_pull_rcsum(skb, BRCM_TAG_LEN);
@@ -228,8 +236,10 @@ static struct sk_buff *brcm_leg_tag_rcv(struct sk_buff *skb,
__be16 *proto;
u8 *brcm_tag;
- if (unlikely(!pskb_may_pull(skb, BRCM_LEG_TAG_LEN + VLAN_HLEN)))
+ if (unlikely(!pskb_may_pull(skb, BRCM_LEG_TAG_LEN + VLAN_HLEN))) {
+ kfree_skb(skb);
return NULL;
+ }
brcm_tag = dsa_etype_header_pos_rx(skb);
proto = (__be16 *)(brcm_tag + BRCM_LEG_TAG_LEN);
@@ -237,8 +247,10 @@ static struct sk_buff *brcm_leg_tag_rcv(struct sk_buff *skb,
source_port = brcm_tag[5] & BRCM_LEG_PORT_ID;
skb->dev = dsa_conduit_find_user(dev, 0, source_port);
- if (!skb->dev)
+ if (!skb->dev) {
+ kfree_skb(skb);
return NULL;
+ }
/* The internal switch in BCM63XX SoCs always tags on egress on the CPU
* port. We use VID 0 internally for untagged traffic, so strip the tag
@@ -273,10 +285,8 @@ static struct sk_buff *brcm_leg_tag_xmit(struct sk_buff *skb,
* need to make sure that packets are at least 70 bytes
* (including FCS and tag) because the length verification is done after
* the Broadcom tag is stripped off the ingress packet.
- *
- * Let dsa_user_xmit() free the SKB
*/
- if (__skb_put_padto(skb, ETH_ZLEN + BRCM_LEG_TAG_LEN, false))
+ if (skb_put_padto(skb, ETH_ZLEN + BRCM_LEG_TAG_LEN))
return NULL;
skb_push(skb, BRCM_LEG_TAG_LEN);
@@ -325,10 +335,8 @@ static struct sk_buff *brcm_leg_fcs_tag_xmit(struct sk_buff *skb,
* need to make sure that packets are at least 70 bytes (including FCS
* and tag) because the length verification is done after the Broadcom
* tag is stripped off the ingress packet.
- *
- * Let dsa_user_xmit() free the SKB.
*/
- if (__skb_put_padto(skb, ETH_ZLEN + BRCM_LEG_TAG_LEN, false))
+ if (skb_put_padto(skb, ETH_ZLEN + BRCM_LEG_TAG_LEN))
return NULL;
fcs_len = skb->len;
@@ -351,8 +359,9 @@ static struct sk_buff *brcm_leg_fcs_tag_xmit(struct sk_buff *skb,
brcm_tag[5] = dp->index & BRCM_LEG_PORT_ID;
/* Original FCS value */
- if (__skb_pad(skb, ETH_FCS_LEN, false))
+ if (skb_pad(skb, ETH_FCS_LEN))
return NULL;
+
skb_put_data(skb, &fcs_val, ETH_FCS_LEN);
return skb;
diff --git a/net/dsa/tag_dsa.c b/net/dsa/tag_dsa.c
index 2a2c4fb61a65..d5ffee35fbb5 100644
--- a/net/dsa/tag_dsa.c
+++ b/net/dsa/tag_dsa.c
@@ -224,6 +224,7 @@ static struct sk_buff *dsa_rcv_ll(struct sk_buff *skb, struct net_device *dev,
/* Remote management is not implemented yet,
* drop.
*/
+ kfree_skb(skb);
return NULL;
case DSA_CODE_ARP_MIRROR:
case DSA_CODE_POLICY_MIRROR:
@@ -244,12 +245,14 @@ static struct sk_buff *dsa_rcv_ll(struct sk_buff *skb, struct net_device *dev,
/* Reserved code, this could be anything. Drop
* seems like the safest option.
*/
+ kfree_skb(skb);
return NULL;
}
break;
default:
+ kfree_skb(skb);
return NULL;
}
@@ -271,8 +274,10 @@ static struct sk_buff *dsa_rcv_ll(struct sk_buff *skb, struct net_device *dev,
source_port);
}
- if (!skb->dev)
+ if (!skb->dev) {
+ kfree_skb(skb);
return NULL;
+ }
/* When using LAG offload, skb->dev is not a DSA user interface,
* so we cannot call dsa_default_offload_fwd_mark and we need to
@@ -335,8 +340,10 @@ static struct sk_buff *dsa_xmit(struct sk_buff *skb, struct net_device *dev)
static struct sk_buff *dsa_rcv(struct sk_buff *skb, struct net_device *dev)
{
- if (unlikely(!pskb_may_pull(skb, DSA_HLEN)))
+ if (unlikely(!pskb_may_pull(skb, DSA_HLEN))) {
+ kfree_skb(skb);
return NULL;
+ }
return dsa_rcv_ll(skb, dev, 0);
}
@@ -375,8 +382,10 @@ static struct sk_buff *edsa_xmit(struct sk_buff *skb, struct net_device *dev)
static struct sk_buff *edsa_rcv(struct sk_buff *skb, struct net_device *dev)
{
- if (unlikely(!pskb_may_pull(skb, EDSA_HLEN)))
+ if (unlikely(!pskb_may_pull(skb, EDSA_HLEN))) {
+ kfree_skb(skb);
return NULL;
+ }
skb_pull_rcsum(skb, EDSA_HLEN - DSA_HLEN);
diff --git a/net/dsa/tag_gswip.c b/net/dsa/tag_gswip.c
index 5fa436121087..5c407d448c9f 100644
--- a/net/dsa/tag_gswip.c
+++ b/net/dsa/tag_gswip.c
@@ -80,16 +80,20 @@ static struct sk_buff *gswip_tag_rcv(struct sk_buff *skb,
int port;
u8 *gswip_tag;
- if (unlikely(!pskb_may_pull(skb, GSWIP_RX_HEADER_LEN)))
+ if (unlikely(!pskb_may_pull(skb, GSWIP_RX_HEADER_LEN))) {
+ kfree_skb(skb);
return NULL;
+ }
gswip_tag = skb->data - ETH_HLEN;
/* Get source port information */
port = (gswip_tag[7] & GSWIP_RX_SPPID_MASK) >> GSWIP_RX_SPPID_SHIFT;
skb->dev = dsa_conduit_find_user(dev, 0, port);
- if (!skb->dev)
+ if (!skb->dev) {
+ kfree_skb(skb);
return NULL;
+ }
/* remove GSWIP tag */
skb_pull_rcsum(skb, GSWIP_RX_HEADER_LEN);
diff --git a/net/dsa/tag_hellcreek.c b/net/dsa/tag_hellcreek.c
index 544ab15685a2..dd9f328f3182 100644
--- a/net/dsa/tag_hellcreek.c
+++ b/net/dsa/tag_hellcreek.c
@@ -27,8 +27,10 @@ static struct sk_buff *hellcreek_xmit(struct sk_buff *skb,
* checksums after the switch strips the tag.
*/
if (skb->ip_summed == CHECKSUM_PARTIAL &&
- skb_checksum_help(skb))
+ skb_checksum_help(skb)) {
+ kfree_skb(skb);
return NULL;
+ }
/* Tag encoding */
tag = skb_put(skb, HELLCREEK_TAG_LEN);
@@ -47,11 +49,14 @@ static struct sk_buff *hellcreek_rcv(struct sk_buff *skb,
skb->dev = dsa_conduit_find_user(dev, 0, port);
if (!skb->dev) {
netdev_warn_once(dev, "Failed to get source port: %d\n", port);
+ kfree_skb(skb);
return NULL;
}
- if (pskb_trim_rcsum(skb, skb->len - HELLCREEK_TAG_LEN))
+ if (pskb_trim_rcsum(skb, skb->len - HELLCREEK_TAG_LEN)) {
+ kfree_skb(skb);
return NULL;
+ }
dsa_default_offload_fwd_mark(skb);
diff --git a/net/dsa/tag_ksz.c b/net/dsa/tag_ksz.c
index d2475c3bbb7d..67fa89f102e0 100644
--- a/net/dsa/tag_ksz.c
+++ b/net/dsa/tag_ksz.c
@@ -88,11 +88,15 @@ static struct sk_buff *ksz_common_rcv(struct sk_buff *skb,
unsigned int port, unsigned int len)
{
skb->dev = dsa_conduit_find_user(dev, 0, port);
- if (!skb->dev)
+ if (!skb->dev) {
+ kfree_skb(skb);
return NULL;
+ }
- if (pskb_trim_rcsum(skb, skb->len - len))
+ if (pskb_trim_rcsum(skb, skb->len - len)) {
+ kfree_skb(skb);
return NULL;
+ }
dsa_default_offload_fwd_mark(skb);
@@ -123,8 +127,10 @@ static struct sk_buff *ksz8795_xmit(struct sk_buff *skb, struct net_device *dev)
struct ethhdr *hdr;
u8 *tag;
- if (skb->ip_summed == CHECKSUM_PARTIAL && skb_checksum_help(skb))
+ if (skb->ip_summed == CHECKSUM_PARTIAL && skb_checksum_help(skb)) {
+ kfree_skb(skb);
return NULL;
+ }
/* Tag encoding */
tag = skb_put(skb, KSZ_INGRESS_TAG_LEN);
@@ -141,8 +147,10 @@ static struct sk_buff *ksz8795_rcv(struct sk_buff *skb, struct net_device *dev)
{
u8 *tag;
- if (skb_linearize(skb))
+ if (skb_linearize(skb)) {
+ kfree_skb(skb);
return NULL;
+ }
tag = skb_tail_pointer(skb) - KSZ_EGRESS_TAG_LEN;
@@ -255,22 +263,24 @@ static struct sk_buff *ksz_defer_xmit(struct dsa_port *dp, struct sk_buff *skb)
xmit_work_fn = tagger_data->xmit_work_fn;
xmit_worker = priv->xmit_worker;
- if (!xmit_work_fn || !xmit_worker)
+ if (!xmit_work_fn || !xmit_worker) {
+ kfree_skb(skb);
return NULL;
+ }
xmit_work = kzalloc_obj(*xmit_work, GFP_ATOMIC);
- if (!xmit_work)
+ if (!xmit_work) {
+ kfree_skb(skb);
return NULL;
+ }
kthread_init_work(&xmit_work->work, xmit_work_fn);
- /* Increase refcount so the kfree_skb in dsa_user_xmit
- * won't really free the packet.
- */
xmit_work->dp = dp;
xmit_work->skb = skb_get(skb);
kthread_queue_work(xmit_worker, &xmit_work->work);
+ kfree_skb(skb);
return NULL;
}
@@ -284,8 +294,10 @@ static struct sk_buff *ksz9477_xmit(struct sk_buff *skb,
__be16 *tag;
u16 val;
- if (skb->ip_summed == CHECKSUM_PARTIAL && skb_checksum_help(skb))
+ if (skb->ip_summed == CHECKSUM_PARTIAL && skb_checksum_help(skb)) {
+ kfree_skb(skb);
return NULL;
+ }
/* Tag encoding */
ksz_xmit_timestamp(dp, skb);
@@ -310,8 +322,10 @@ static struct sk_buff *ksz9477_rcv(struct sk_buff *skb, struct net_device *dev)
unsigned int port;
u8 *tag;
- if (skb_linearize(skb))
+ if (skb_linearize(skb)) {
+ kfree_skb(skb);
return NULL;
+ }
/* Tag decoding */
tag = skb_tail_pointer(skb) - KSZ_EGRESS_TAG_LEN;
@@ -352,8 +366,10 @@ static struct sk_buff *ksz9893_xmit(struct sk_buff *skb,
struct ethhdr *hdr;
u8 *tag;
- if (skb->ip_summed == CHECKSUM_PARTIAL && skb_checksum_help(skb))
+ if (skb->ip_summed == CHECKSUM_PARTIAL && skb_checksum_help(skb)) {
+ kfree_skb(skb);
return NULL;
+ }
/* Tag encoding */
ksz_xmit_timestamp(dp, skb);
@@ -418,8 +434,10 @@ static struct sk_buff *lan937x_xmit(struct sk_buff *skb,
__be16 *tag;
u16 val;
- if (skb->ip_summed == CHECKSUM_PARTIAL && skb_checksum_help(skb))
+ if (skb->ip_summed == CHECKSUM_PARTIAL && skb_checksum_help(skb)) {
+ kfree_skb(skb);
return NULL;
+ }
ksz_xmit_timestamp(dp, skb);
diff --git a/net/dsa/tag_lan9303.c b/net/dsa/tag_lan9303.c
index 258e5d7dc5ef..d1194696499a 100644
--- a/net/dsa/tag_lan9303.c
+++ b/net/dsa/tag_lan9303.c
@@ -85,6 +85,7 @@ static struct sk_buff *lan9303_rcv(struct sk_buff *skb, struct net_device *dev)
if (unlikely(!pskb_may_pull(skb, LAN9303_TAG_LEN))) {
dev_warn_ratelimited(&dev->dev,
"Dropping packet, cannot pull\n");
+ kfree_skb(skb);
return NULL;
}
@@ -102,6 +103,7 @@ static struct sk_buff *lan9303_rcv(struct sk_buff *skb, struct net_device *dev)
skb->dev = dsa_conduit_find_user(dev, 0, source_port);
if (!skb->dev) {
dev_warn_ratelimited(&dev->dev, "Dropping packet due to invalid source port\n");
+ kfree_skb(skb);
return NULL;
}
diff --git a/net/dsa/tag_mtk.c b/net/dsa/tag_mtk.c
index dea3eecaf093..c7dc7731675e 100644
--- a/net/dsa/tag_mtk.c
+++ b/net/dsa/tag_mtk.c
@@ -72,8 +72,10 @@ static struct sk_buff *mtk_tag_rcv(struct sk_buff *skb, struct net_device *dev)
int port;
__be16 *phdr;
- if (unlikely(!pskb_may_pull(skb, MTK_HDR_LEN)))
+ if (unlikely(!pskb_may_pull(skb, MTK_HDR_LEN))) {
+ kfree_skb(skb);
return NULL;
+ }
phdr = dsa_etype_header_pos_rx(skb);
hdr = ntohs(*phdr);
@@ -87,8 +89,10 @@ static struct sk_buff *mtk_tag_rcv(struct sk_buff *skb, struct net_device *dev)
port = (hdr & MTK_HDR_RECV_SOURCE_PORT_MASK);
skb->dev = dsa_conduit_find_user(dev, 0, port);
- if (!skb->dev)
+ if (!skb->dev) {
+ kfree_skb(skb);
return NULL;
+ }
dsa_default_offload_fwd_mark(skb);
diff --git a/net/dsa/tag_mxl-gsw1xx.c b/net/dsa/tag_mxl-gsw1xx.c
index 60f7c445e656..4b1b6ef94196 100644
--- a/net/dsa/tag_mxl-gsw1xx.c
+++ b/net/dsa/tag_mxl-gsw1xx.c
@@ -73,6 +73,7 @@ static struct sk_buff *gsw1xx_tag_rcv(struct sk_buff *skb,
if (unlikely(!pskb_may_pull(skb, GSW1XX_HEADER_LEN))) {
dev_warn_ratelimited(&dev->dev, "Dropping packet, cannot pull SKB\n");
+ kfree_skb(skb);
return NULL;
}
@@ -81,6 +82,7 @@ static struct sk_buff *gsw1xx_tag_rcv(struct sk_buff *skb,
if (unlikely(ntohs(gsw1xx_tag[0]) != ETH_P_MXLGSW)) {
dev_warn_ratelimited(&dev->dev, "Dropping packet due to invalid special tag\n");
dev_warn_ratelimited(&dev->dev, "Tag: %8ph\n", gsw1xx_tag);
+ kfree_skb(skb);
return NULL;
}
@@ -90,6 +92,7 @@ static struct sk_buff *gsw1xx_tag_rcv(struct sk_buff *skb,
if (!skb->dev) {
dev_warn_ratelimited(&dev->dev, "Dropping packet due to invalid source port\n");
dev_warn_ratelimited(&dev->dev, "Tag: %8ph\n", gsw1xx_tag);
+ kfree_skb(skb);
return NULL;
}
diff --git a/net/dsa/tag_mxl862xx.c b/net/dsa/tag_mxl862xx.c
index 8daefeb8d49d..87b80ddf0946 100644
--- a/net/dsa/tag_mxl862xx.c
+++ b/net/dsa/tag_mxl862xx.c
@@ -64,6 +64,7 @@ static struct sk_buff *mxl862_tag_rcv(struct sk_buff *skb,
if (unlikely(!pskb_may_pull(skb, MXL862_HEADER_LEN))) {
dev_warn_ratelimited(&dev->dev, "Cannot pull SKB, packet dropped\n");
+ kfree_skb(skb);
return NULL;
}
@@ -73,6 +74,7 @@ static struct sk_buff *mxl862_tag_rcv(struct sk_buff *skb,
dev_warn_ratelimited(&dev->dev,
"Invalid special tag marker, packet dropped, tag: %8ph\n",
mxl862_tag);
+ kfree_skb(skb);
return NULL;
}
@@ -83,6 +85,7 @@ static struct sk_buff *mxl862_tag_rcv(struct sk_buff *skb,
dev_warn_ratelimited(&dev->dev,
"Invalid source port, packet dropped, tag: %8ph\n",
mxl862_tag);
+ kfree_skb(skb);
return NULL;
}
diff --git a/net/dsa/tag_netc.c b/net/dsa/tag_netc.c
index ccedfe3a80b6..df72a61796ad 100644
--- a/net/dsa/tag_netc.c
+++ b/net/dsa/tag_netc.c
@@ -131,14 +131,13 @@ static struct sk_buff *netc_rcv(struct sk_buff *skb,
int type, subtype;
if (unlikely(!pskb_may_pull(skb, NETC_TAG_MAX_LEN)))
- return NULL;
+ goto err_free_skb;
tag_cmn = dsa_etype_header_pos_rx(skb);
if (ntohs(tag_cmn->tpid) != ETH_P_NXP_NETC) {
dev_warn_ratelimited(&ndev->dev, "Unknown TPID 0x%04x\n",
ntohs(tag_cmn->tpid));
-
- return NULL;
+ goto err_free_skb;
}
if (tag_cmn->qos & NETC_TAG_QV)
@@ -149,14 +148,13 @@ static struct sk_buff *netc_rcv(struct sk_buff *skb,
if (!sw_id) {
dev_warn_ratelimited(&ndev->dev,
"VEPA switch ID is not supported yet\n");
-
- return NULL;
+ goto err_free_skb;
}
port = FIELD_GET(NETC_TAG_PORT, tag_cmn->switch_port);
skb->dev = dsa_conduit_find_user(ndev, sw_id, port);
if (!skb->dev)
- return NULL;
+ goto err_free_skb;
type = FIELD_GET(NETC_TAG_TYPE, tag_cmn->type);
subtype = FIELD_GET(NETC_TAG_SUBTYPE, tag_cmn->type);
@@ -165,11 +163,11 @@ static struct sk_buff *netc_rcv(struct sk_buff *skb,
} else if (type == NETC_TAG_TO_HOST) {
/* Currently only subtype0 supported */
if (subtype != NETC_TAG_TH_SUBTYPE0)
- return NULL;
+ goto err_free_skb;
} else {
dev_warn_ratelimited(&ndev->dev,
"Unexpected tag type %d\n", type);
- return NULL;
+ goto err_free_skb;
}
/* Remove Switch tag from the frame */
@@ -178,6 +176,10 @@ static struct sk_buff *netc_rcv(struct sk_buff *skb,
dsa_strip_etype_header(skb, tag_len);
return skb;
+
+err_free_skb:
+ kfree_skb(skb);
+ return NULL;
}
static void netc_flow_dissect(const struct sk_buff *skb, __be16 *proto,
diff --git a/net/dsa/tag_ocelot.c b/net/dsa/tag_ocelot.c
index 3405def79c2d..d208c7322cd6 100644
--- a/net/dsa/tag_ocelot.c
+++ b/net/dsa/tag_ocelot.c
@@ -107,14 +107,16 @@ static struct sk_buff *ocelot_rcv(struct sk_buff *skb,
ocelot_xfh_get_rew_val(extraction, &rew_val);
skb->dev = dsa_conduit_find_user(netdev, 0, src_port);
- if (!skb->dev)
+ if (!skb->dev) {
/* The switch will reflect back some frames sent through
* sockets opened on the bare DSA conduit. These will come back
* with src_port equal to the index of the CPU port, for which
* there is no user registered. So don't print any error
* message here (ignore and drop those frames).
*/
+ kfree_skb(skb);
return NULL;
+ }
dsa_default_offload_fwd_mark(skb);
skb->priority = qos_class;
diff --git a/net/dsa/tag_ocelot_8021q.c b/net/dsa/tag_ocelot_8021q.c
index e89d9254e90a..f50f1cd83f16 100644
--- a/net/dsa/tag_ocelot_8021q.c
+++ b/net/dsa/tag_ocelot_8021q.c
@@ -33,30 +33,34 @@ static struct sk_buff *ocelot_defer_xmit(struct dsa_port *dp,
xmit_work_fn = data->xmit_work_fn;
xmit_worker = priv->xmit_worker;
- if (!xmit_work_fn || !xmit_worker)
+ if (!xmit_work_fn || !xmit_worker) {
+ kfree_skb(skb);
return NULL;
+ }
/* PTP over IP packets need UDP checksumming. We may have inherited
* NETIF_F_HW_CSUM from the DSA conduit, but these packets are not sent
* through the DSA conduit, so calculate the checksum here.
*/
- if (skb->ip_summed == CHECKSUM_PARTIAL && skb_checksum_help(skb))
+ if (skb->ip_summed == CHECKSUM_PARTIAL && skb_checksum_help(skb)) {
+ kfree_skb(skb);
return NULL;
+ }
xmit_work = kzalloc_obj(*xmit_work, GFP_ATOMIC);
- if (!xmit_work)
+ if (!xmit_work) {
+ kfree_skb(skb);
return NULL;
+ }
/* Calls felix_port_deferred_xmit in felix.c */
kthread_init_work(&xmit_work->work, xmit_work_fn);
- /* Increase refcount so the kfree_skb in dsa_user_xmit
- * won't really free the packet.
- */
xmit_work->dp = dp;
xmit_work->skb = skb_get(skb);
kthread_queue_work(xmit_worker, &xmit_work->work);
+ kfree_skb(skb);
return NULL;
}
@@ -84,8 +88,10 @@ static struct sk_buff *ocelot_rcv(struct sk_buff *skb,
dsa_8021q_rcv(skb, &src_port, &switch_id, NULL, NULL);
skb->dev = dsa_conduit_find_user(netdev, switch_id, src_port);
- if (!skb->dev)
+ if (!skb->dev) {
+ kfree_skb(skb);
return NULL;
+ }
dsa_default_offload_fwd_mark(skb);
diff --git a/net/dsa/tag_qca.c b/net/dsa/tag_qca.c
index 9e3b429e8b36..510792fbfa92 100644
--- a/net/dsa/tag_qca.c
+++ b/net/dsa/tag_qca.c
@@ -46,16 +46,20 @@ static struct sk_buff *qca_tag_rcv(struct sk_buff *skb, struct net_device *dev)
tagger_data = ds->tagger_data;
- if (unlikely(!pskb_may_pull(skb, QCA_HDR_LEN)))
+ if (unlikely(!pskb_may_pull(skb, QCA_HDR_LEN))) {
+ kfree_skb(skb);
return NULL;
+ }
phdr = dsa_etype_header_pos_rx(skb);
hdr = ntohs(*phdr);
/* Make sure the version is correct */
ver = FIELD_GET(QCA_HDR_RECV_VERSION, hdr);
- if (unlikely(ver != QCA_HDR_VERSION))
+ if (unlikely(ver != QCA_HDR_VERSION)) {
+ kfree_skb(skb);
return NULL;
+ }
/* Get pk type */
pk_type = FIELD_GET(QCA_HDR_RECV_TYPE, hdr);
@@ -64,6 +68,7 @@ static struct sk_buff *qca_tag_rcv(struct sk_buff *skb, struct net_device *dev)
if (pk_type == QCA_HDR_RECV_TYPE_RW_REG_ACK) {
if (likely(tagger_data->rw_reg_ack_handler))
tagger_data->rw_reg_ack_handler(ds, skb);
+ kfree_skb(skb);
return NULL;
}
@@ -71,6 +76,7 @@ static struct sk_buff *qca_tag_rcv(struct sk_buff *skb, struct net_device *dev)
if (pk_type == QCA_HDR_RECV_TYPE_MIB) {
if (likely(tagger_data->mib_autocast_handler))
tagger_data->mib_autocast_handler(ds, skb);
+ kfree_skb(skb);
return NULL;
}
@@ -78,8 +84,10 @@ static struct sk_buff *qca_tag_rcv(struct sk_buff *skb, struct net_device *dev)
port = FIELD_GET(QCA_HDR_RECV_SOURCE_PORT, hdr);
skb->dev = dsa_conduit_find_user(dev, 0, port);
- if (!skb->dev)
+ if (!skb->dev) {
+ kfree_skb(skb);
return NULL;
+ }
/* Remove QCA tag and recalculate checksum */
skb_pull_rcsum(skb, QCA_HDR_LEN);
diff --git a/net/dsa/tag_rtl4_a.c b/net/dsa/tag_rtl4_a.c
index 3cc63eacfa03..590ea3b921c9 100644
--- a/net/dsa/tag_rtl4_a.c
+++ b/net/dsa/tag_rtl4_a.c
@@ -41,7 +41,7 @@ static struct sk_buff *rtl4a_tag_xmit(struct sk_buff *skb,
u16 out;
/* Pad out to at least 60 bytes */
- if (unlikely(__skb_put_padto(skb, ETH_ZLEN, false)))
+ if (unlikely(eth_skb_pad(skb)))
return NULL;
netdev_dbg(dev, "add realtek tag to package to port %d\n",
@@ -75,8 +75,10 @@ static struct sk_buff *rtl4a_tag_rcv(struct sk_buff *skb,
u8 prot;
u8 port;
- if (unlikely(!pskb_may_pull(skb, RTL4_A_HDR_LEN)))
+ if (unlikely(!pskb_may_pull(skb, RTL4_A_HDR_LEN))) {
+ kfree_skb(skb);
return NULL;
+ }
tag = dsa_etype_header_pos_rx(skb);
p = (__be16 *)tag;
@@ -92,6 +94,7 @@ static struct sk_buff *rtl4a_tag_rcv(struct sk_buff *skb,
prot = (protport >> RTL4_A_PROTOCOL_SHIFT) & 0x0f;
if (prot != RTL4_A_PROTOCOL_RTL8366RB) {
netdev_err(dev, "unknown realtek protocol 0x%01x\n", prot);
+ kfree_skb(skb);
return NULL;
}
port = protport & 0xff;
@@ -99,6 +102,7 @@ static struct sk_buff *rtl4a_tag_rcv(struct sk_buff *skb,
skb->dev = dsa_conduit_find_user(dev, 0, port);
if (!skb->dev) {
netdev_dbg(dev, "could not find user for port %d\n", port);
+ kfree_skb(skb);
return NULL;
}
diff --git a/net/dsa/tag_rtl8_4.c b/net/dsa/tag_rtl8_4.c
index 852c6b88079a..4da3beebef75 100644
--- a/net/dsa/tag_rtl8_4.c
+++ b/net/dsa/tag_rtl8_4.c
@@ -143,8 +143,10 @@ static struct sk_buff *rtl8_4t_tag_xmit(struct sk_buff *skb,
/* Calculate the checksum here if not done yet as trailing tags will
* break either software or hardware based checksum
*/
- if (skb->ip_summed == CHECKSUM_PARTIAL && skb_checksum_help(skb))
+ if (skb->ip_summed == CHECKSUM_PARTIAL && skb_checksum_help(skb)) {
+ kfree_skb(skb);
return NULL;
+ }
rtl8_4_write_tag(skb, dev, skb_put(skb, RTL8_4_TAG_LEN));
@@ -201,11 +203,15 @@ static int rtl8_4_read_tag(struct sk_buff *skb, struct net_device *dev,
static struct sk_buff *rtl8_4_tag_rcv(struct sk_buff *skb,
struct net_device *dev)
{
- if (unlikely(!pskb_may_pull(skb, RTL8_4_TAG_LEN)))
+ if (unlikely(!pskb_may_pull(skb, RTL8_4_TAG_LEN))) {
+ kfree_skb(skb);
return NULL;
+ }
- if (unlikely(rtl8_4_read_tag(skb, dev, dsa_etype_header_pos_rx(skb))))
+ if (unlikely(rtl8_4_read_tag(skb, dev, dsa_etype_header_pos_rx(skb)))) {
+ kfree_skb(skb);
return NULL;
+ }
/* Remove tag and recalculate checksum */
skb_pull_rcsum(skb, RTL8_4_TAG_LEN);
@@ -218,14 +224,20 @@ static struct sk_buff *rtl8_4_tag_rcv(struct sk_buff *skb,
static struct sk_buff *rtl8_4t_tag_rcv(struct sk_buff *skb,
struct net_device *dev)
{
- if (skb_linearize(skb))
+ if (skb_linearize(skb)) {
+ kfree_skb(skb);
return NULL;
+ }
- if (unlikely(rtl8_4_read_tag(skb, dev, skb_tail_pointer(skb) - RTL8_4_TAG_LEN)))
+ if (unlikely(rtl8_4_read_tag(skb, dev, skb_tail_pointer(skb) - RTL8_4_TAG_LEN))) {
+ kfree_skb(skb);
return NULL;
+ }
- if (pskb_trim_rcsum(skb, skb->len - RTL8_4_TAG_LEN))
+ if (pskb_trim_rcsum(skb, skb->len - RTL8_4_TAG_LEN)) {
+ kfree_skb(skb);
return NULL;
+ }
return skb;
}
diff --git a/net/dsa/tag_rzn1_a5psw.c b/net/dsa/tag_rzn1_a5psw.c
index 10994b3470f6..734910156dc3 100644
--- a/net/dsa/tag_rzn1_a5psw.c
+++ b/net/dsa/tag_rzn1_a5psw.c
@@ -48,7 +48,7 @@ static struct sk_buff *a5psw_tag_xmit(struct sk_buff *skb, struct net_device *de
* least 60 bytes otherwise they will be discarded when they enter the
* switch port logic.
*/
- if (__skb_put_padto(skb, ETH_ZLEN, false))
+ if (eth_skb_pad(skb))
return NULL;
/* provide 'A5PSW_TAG_LEN' bytes additional space */
@@ -77,6 +77,7 @@ static struct sk_buff *a5psw_tag_rcv(struct sk_buff *skb,
if (unlikely(!pskb_may_pull(skb, A5PSW_TAG_LEN))) {
dev_warn_ratelimited(&dev->dev,
"Dropping packet, cannot pull\n");
+ kfree_skb(skb);
return NULL;
}
@@ -84,14 +85,17 @@ static struct sk_buff *a5psw_tag_rcv(struct sk_buff *skb,
if (tag->ctrl_tag != htons(ETH_P_DSA_A5PSW)) {
dev_warn_ratelimited(&dev->dev, "Dropping packet due to invalid TAG marker\n");
+ kfree_skb(skb);
return NULL;
}
port = FIELD_GET(A5PSW_CTRL_DATA_PORT, ntohs(tag->ctrl_data));
skb->dev = dsa_conduit_find_user(dev, 0, port);
- if (!skb->dev)
+ if (!skb->dev) {
+ kfree_skb(skb);
return NULL;
+ }
skb_pull_rcsum(skb, A5PSW_TAG_LEN);
dsa_strip_etype_header(skb, A5PSW_TAG_LEN);
diff --git a/net/dsa/tag_sja1105.c b/net/dsa/tag_sja1105.c
index de6d4ce8668b..bfe1f746f55b 100644
--- a/net/dsa/tag_sja1105.c
+++ b/net/dsa/tag_sja1105.c
@@ -149,19 +149,20 @@ static struct sk_buff *sja1105_defer_xmit(struct dsa_port *dp,
xmit_work_fn = tagger_data->xmit_work_fn;
xmit_worker = priv->xmit_worker;
- if (!xmit_work_fn || !xmit_worker)
+ if (!xmit_work_fn || !xmit_worker) {
+ kfree_skb(skb);
return NULL;
+ }
xmit_work = kzalloc_obj(*xmit_work, GFP_ATOMIC);
- if (!xmit_work)
+ if (!xmit_work) {
+ kfree_skb(skb);
return NULL;
+ }
kthread_init_work(&xmit_work->work, xmit_work_fn);
- /* Increase refcount so the kfree_skb in dsa_user_xmit
- * won't really free the packet.
- */
xmit_work->dp = dp;
- xmit_work->skb = skb_get(skb);
+ xmit_work->skb = skb;
kthread_queue_work(xmit_worker, &xmit_work->work);
@@ -401,10 +402,7 @@ static struct sk_buff
kfree_skb(priv->stampable_skb);
}
- /* Hold a reference to avoid dsa_switch_rcv
- * from freeing the skb.
- */
- priv->stampable_skb = skb_get(skb);
+ priv->stampable_skb = skb;
spin_unlock(&priv->meta_lock);
/* Tell DSA we got nothing */
@@ -436,6 +434,7 @@ static struct sk_buff
dev_err_ratelimited(ds->dev,
"Unexpected meta frame\n");
spin_unlock(&priv->meta_lock);
+ kfree_skb(skb);
return NULL;
}
@@ -443,6 +442,7 @@ static struct sk_buff
dev_err_ratelimited(ds->dev,
"Meta frame on wrong port\n");
spin_unlock(&priv->meta_lock);
+ kfree_skb(skb);
return NULL;
}
@@ -501,18 +501,21 @@ static struct sk_buff *sja1105_rcv(struct sk_buff *skb,
/* Normal data plane traffic and link-local frames are tagged with
* a tag_8021q VLAN which we have to strip
*/
- if (sja1105_skb_has_tag_8021q(skb))
+ if (sja1105_skb_has_tag_8021q(skb)) {
dsa_8021q_rcv(skb, &source_port, &switch_id, &vbid, &vid);
- else if (source_port == -1 && switch_id == -1)
+ } else if (source_port == -1 && switch_id == -1) {
/* Packets with no source information have no chance of
* getting accepted, drop them straight away.
*/
+ kfree_skb(skb);
return NULL;
+ }
skb->dev = dsa_tag_8021q_find_user(netdev, source_port, switch_id,
vid, vbid);
if (!skb->dev) {
netdev_warn(netdev, "Couldn't decode source port\n");
+ kfree_skb(skb);
return NULL;
}
@@ -539,12 +542,15 @@ static struct sk_buff *sja1110_rcv_meta(struct sk_buff *skb, u16 rx_header)
if (!ds) {
net_err_ratelimited("%s: cannot find switch id %d\n",
conduit->name, switch_id);
+ kfree_skb(skb);
return NULL;
}
tagger_data = sja1105_tagger_data(ds);
- if (!tagger_data->meta_tstamp_handler)
+ if (!tagger_data->meta_tstamp_handler) {
+ kfree_skb(skb);
return NULL;
+ }
for (i = 0; i <= n_ts; i++) {
u8 ts_id, source_port, dir;
@@ -562,6 +568,7 @@ static struct sk_buff *sja1110_rcv_meta(struct sk_buff *skb, u16 rx_header)
}
/* Discard the meta frame, we've consumed the timestamps it contained */
+ kfree_skb(skb);
return NULL;
}
@@ -572,8 +579,10 @@ static struct sk_buff *sja1110_rcv_inband_control_extension(struct sk_buff *skb,
{
u16 rx_header;
- if (unlikely(!pskb_may_pull(skb, SJA1110_HEADER_LEN)))
+ if (unlikely(!pskb_may_pull(skb, SJA1110_HEADER_LEN))) {
+ kfree_skb(skb);
return NULL;
+ }
/* skb->data points to skb_mac_header(skb) + ETH_HLEN, which is exactly
* what we need because the caller has checked the EtherType (which is
@@ -609,8 +618,10 @@ static struct sk_buff *sja1110_rcv_inband_control_extension(struct sk_buff *skb,
* padding and trailer we need to account for the fact that
* skb->data points to skb_mac_header(skb) + ETH_HLEN.
*/
- if (pskb_trim_rcsum(skb, start_of_padding - ETH_HLEN))
+ if (pskb_trim_rcsum(skb, start_of_padding - ETH_HLEN)) {
+ kfree_skb(skb);
return NULL;
+ }
/* Trap-to-host frame, no timestamp trailer */
} else {
*source_port = SJA1110_RX_HEADER_SRC_PORT(rx_header);
@@ -653,6 +664,7 @@ static struct sk_buff *sja1110_rcv(struct sk_buff *skb,
if (!skb->dev) {
netdev_warn(netdev, "Couldn't decode source port\n");
+ kfree_skb(skb);
return NULL;
}
diff --git a/net/dsa/tag_trailer.c b/net/dsa/tag_trailer.c
index 4dce24cfe6a7..49c802c10ca6 100644
--- a/net/dsa/tag_trailer.c
+++ b/net/dsa/tag_trailer.c
@@ -30,22 +30,30 @@ static struct sk_buff *trailer_rcv(struct sk_buff *skb, struct net_device *dev)
u8 *trailer;
int source_port;
- if (skb_linearize(skb))
+ if (skb_linearize(skb)) {
+ kfree_skb(skb);
return NULL;
+ }
trailer = skb_tail_pointer(skb) - 4;
if (trailer[0] != 0x80 || (trailer[1] & 0xf8) != 0x00 ||
- (trailer[2] & 0xef) != 0x00 || trailer[3] != 0x00)
+ (trailer[2] & 0xef) != 0x00 || trailer[3] != 0x00) {
+ kfree_skb(skb);
return NULL;
+ }
source_port = trailer[1] & 7;
skb->dev = dsa_conduit_find_user(dev, 0, source_port);
- if (!skb->dev)
+ if (!skb->dev) {
+ kfree_skb(skb);
return NULL;
+ }
- if (pskb_trim_rcsum(skb, skb->len - 4))
+ if (pskb_trim_rcsum(skb, skb->len - 4)) {
+ kfree_skb(skb);
return NULL;
+ }
return skb;
}
diff --git a/net/dsa/tag_vsc73xx_8021q.c b/net/dsa/tag_vsc73xx_8021q.c
index af121a9aff7f..f4736a1a7a0f 100644
--- a/net/dsa/tag_vsc73xx_8021q.c
+++ b/net/dsa/tag_vsc73xx_8021q.c
@@ -44,6 +44,7 @@ vsc73xx_rcv(struct sk_buff *skb, struct net_device *netdev)
if (!skb->dev) {
dev_warn_ratelimited(&netdev->dev,
"Couldn't decode source port\n");
+ kfree_skb(skb);
return NULL;
}
diff --git a/net/dsa/tag_xrs700x.c b/net/dsa/tag_xrs700x.c
index a05219f702c6..bb268020ee86 100644
--- a/net/dsa/tag_xrs700x.c
+++ b/net/dsa/tag_xrs700x.c
@@ -30,15 +30,21 @@ static struct sk_buff *xrs700x_rcv(struct sk_buff *skb, struct net_device *dev)
source_port = ffs((int)trailer[0]) - 1;
- if (source_port < 0)
+ if (source_port < 0) {
+ kfree_skb(skb);
return NULL;
+ }
skb->dev = dsa_conduit_find_user(dev, 0, source_port);
- if (!skb->dev)
+ if (!skb->dev) {
+ kfree_skb(skb);
return NULL;
+ }
- if (pskb_trim_rcsum(skb, skb->len - 1))
+ if (pskb_trim_rcsum(skb, skb->len - 1)) {
+ kfree_skb(skb);
return NULL;
+ }
/* Frame is forwarded by hardware, don't forward in software. */
dsa_default_offload_fwd_mark(skb);
diff --git a/net/dsa/tag_yt921x.c b/net/dsa/tag_yt921x.c
index f3ced99b1c85..294784ab6694 100644
--- a/net/dsa/tag_yt921x.c
+++ b/net/dsa/tag_yt921x.c
@@ -87,8 +87,10 @@ yt921x_tag_rcv(struct sk_buff *skb, struct net_device *netdev)
__be16 *tag;
u16 rx;
- if (unlikely(!pskb_may_pull(skb, YT921X_TAG_LEN)))
+ if (unlikely(!pskb_may_pull(skb, YT921X_TAG_LEN))) {
+ kfree_skb(skb);
return NULL;
+ }
tag = dsa_etype_header_pos_rx(skb);
@@ -96,6 +98,7 @@ yt921x_tag_rcv(struct sk_buff *skb, struct net_device *netdev)
dev_warn_ratelimited(&netdev->dev,
"Unexpected EtherType 0x%04x\n",
ntohs(tag[0]));
+ kfree_skb(skb);
return NULL;
}
@@ -104,6 +107,7 @@ yt921x_tag_rcv(struct sk_buff *skb, struct net_device *netdev)
if (unlikely((rx & YT921X_TAG_PORT_EN) == 0)) {
dev_warn_ratelimited(&netdev->dev,
"Unexpected rx tag 0x%04x\n", rx);
+ kfree_skb(skb);
return NULL;
}
@@ -112,6 +116,7 @@ yt921x_tag_rcv(struct sk_buff *skb, struct net_device *netdev)
if (unlikely(!skb->dev)) {
dev_warn_ratelimited(&netdev->dev,
"Couldn't decode source port %u\n", port);
+ kfree_skb(skb);
return NULL;
}
diff --git a/net/dsa/user.c b/net/dsa/user.c
index 8704c1a3a5b7..072fa76972cc 100644
--- a/net/dsa/user.c
+++ b/net/dsa/user.c
@@ -935,13 +935,12 @@ static netdev_tx_t dsa_user_xmit(struct sk_buff *skb, struct net_device *dev)
eth_skb_pad(skb);
/* Transmit function may have to reallocate the original SKB,
- * in which case it must have freed it. Only free it here on error.
+ * in which case it must have freed it. Taggers will drop the
+ * passed skb on error.
*/
nskb = p->xmit(skb, dev);
- if (!nskb) {
- kfree_skb(skb);
+ if (!nskb)
return NETDEV_TX_OK;
- }
return dsa_enqueue_skb(nskb, dev);
}
---
base-commit: f34c6b3a3c3d98f34918e1d2ea846a5acccac6d1
change-id: 20260616-dsa-fix-free-skb-bb028ce90802
Best regards,
--
Linus Walleij <linusw@kernel.org>
^ permalink raw reply related
* Re: [PATCH net-next v2] net: dsa: Fix skb ownership in taggers
From: Linus Walleij @ 2026-06-17 12:23 UTC (permalink / raw)
To: Vladimir Oltean
Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Florian Fainelli, Jonas Gorski,
Hauke Mehrtens, Kurt Kanzenbach, Woojung Huh, UNGLinuxDriver,
Chester A. Unal, Daniel Golle, Matthias Brugger,
AngeloGioacchino Del Regno, Wei Fang, Clark Wang,
Clément Léger, George McCollister, David Yang, netdev,
Sashiko AI Review
In-Reply-To: <20260616213728.xhja2net2vbjmgzb@skbuf>
On Tue, Jun 16, 2026 at 11:37 PM Vladimir Oltean <olteanv@gmail.com> wrote:
> From my perspective, the tradeoff between pros and cons is not so well
> explained. Consider the following not mentioned in the commit message:
>
> - Changing the kfree_skb() convention, without any mechanical obstacle
> preventing the backporting of patches that are written assuming one
> convention down to trees expecting the other (obstacle like a failure
> to compile, for example, which would warn people of their otherwise
> silent incompatibility), is an avoidable experience (at best) from a
> maintainance perspective.
I added a terse blurb like this:
NOTICE: Backporting patches to taggers (e.g. for stable kernels) after
this point cannot be mechanical or they will introduce double
kfree_skb().
However, usually we do not consider obstacles for backporting things to
stable to be a major concern, c.f. Documentation/process/stable-api-nonsense.rst
> - Has anyone proven that a real problem exists? Because dsa_user_xmit()
> -> skb_ensure_writable_head_tail() has run successfully at this stage,
> so we know that dev->needed_headroom bytes are available for writing.
> Because DSA uses VLAN as a tag, dsa_user_setup_tagger() will increase
> dev->needed_headroom by VLAN_HLEN for the tag_8021q protocols, so
> vlan_insert_tag() should not fail. I've looked at this function at it
> seems not to be coded up to fail for any other reason.
I guess what you're saying is that vlan_insert_tag() will never fail in
->xmit()?
If we decide to keep things as they are we can probably make a patch
to the Sashiko ruleset and tell the AI to explicitly shut up about any use
of risking double-free when calling vlan_insert_tag() in any DSA taggers
->xmit() callback, but it seems a bit hard to maintain this way.
> Otherwise, sure, it seems cleaner this way, but the way I see it, it
> risks introducing more issues than it fixes. If maintainers feel
> different about this please go ahead, but given the fact that I don't
> really have a lot of time to do proper review during this period, I'm
> more on the pragmatic side on this one.
I am a bit in love with my own patch but I'm not married to it.
I'll let Jakub & al decide on its fate, if your stance is neutral.
If things explode we'll revert it...
Yours,
Linus Walleij
^ permalink raw reply
* Re: [PATCH] net: airoha: Clean up RX queues in airoha_dev_stop
From: Simon Horman @ 2026-06-17 12:22 UTC (permalink / raw)
To: Wayen Yan
Cc: netdev, lorenzo, pabeni, kuba, edumazet, andrew+netdev,
angelogioacchino.delregno, matthias.bgg, linux-arm-kernel,
linux-mediatek
In-Reply-To: <178160746585.2156302.190868309474762875@gmail.com>
On Tue, Jun 16, 2026 at 06:50:48PM +0800, Wayen Yan wrote:
> When the last port is stopped, airoha_dev_stop() clears TX queues
> but neglects to clean up RX queues. This can lead to:
> - RX ring buffer descriptors remaining valid after device close
> - Potential DMA synchronization issues on device reopen
> - Risk of use-after-free if pages are freed while DMA is still active
>
> Add cleanup loop for RX queues to mirror the TX queue cleanup,
> ensuring symmetric resource management.
>
> Fixes: 20bf7d07c956 ("net: airoha: add QDMA support for Airoha EN7581 Ethernet")
> Signed-off-by: Wayen Yan <win847@gmail.com>
Hi Wayen Yan,
There is AI-generated review of this patch-set available on both
https://sashiko.dev and https://netdev-ai.bots.linux.dev/sashiko/
I asked AI to summarise these concerns, it came up with the
following. I would appreciate it if you could look over this feedback.
1. NAPI Synchronization:
While the TX path is managed by the netdev layer during stop, the RX path
relies on the NAPI subsystem. Since NAPI remains active during
`airoha_dev_stop()`, the new cleanup loop could race with the poller
(`airoha_qdma_rx_process`). It would be safer to call `napi_disable()`
before draining the queues to ensure exclusive access to the descriptors.
2. RX Queue Refill:
Unlike TX, the RX hardware requires descriptors to be pre-allocated and
posted by the driver to receive data. Because this patch empties the rings,
`airoha_dev_open()` needs a corresponding update to refill them (e.g.,
via `airoha_qdma_fill_rx_queue()`). Without this, the interface will
encounter an empty ring on restart, leading to an RX stall.
3. SKB Accumulation:
The cleanup should also account for the `q->skb` pointer used for
fragmented packets. If a partial packet is sitting in the queue when the
interface is stopped, freeing it and resetting the pointer to NULL will
prevent a memory leak and ensure the next session starts with a clean
state.
^ permalink raw reply
* Re: [PATCH net] netpoll: run NAPI poll in softirq context to avoid rq->lock self-deadlock
From: Petr Mladek @ 2026-06-17 12:13 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Jakub Kicinski, Sebastian Andrzej Siewior, John Ogness,
Sergey Senozhatsky, Vlad Poenaru, Thomas Gleixner, netdev,
David S . Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
Breno Leitao, Clark Williams, Steven Rostedt, linux-rt-devel,
linux-kernel, stable, Frederic Weisbecker, Ingo Molnar,
Vincent Guittot, Dietmar Eggemann, K Prateek Nayak
In-Reply-To: <20260617111958.GL49951@noisy.programming.kicks-ass.net>
On Wed 2026-06-17 13:19:58, Peter Zijlstra wrote:
> On Wed, Jun 17, 2026 at 12:37:30PM +0200, Petr Mladek wrote:
> > On Tue 2026-06-16 14:17:19, Jakub Kicinski wrote:
> > > On Tue, 16 Jun 2026 19:02:57 +0200 Peter Zijlstra wrote:
> > > > > So this is not an issue since commit 7eab73b18630e ("netconsole: convert
> > > > > to NBCON console infrastructure"). Because from here now on writes are
> > > > > deferred to the nbcon thread. So this purely about -stable in this case.
> > > >
> > > > Hmm, I thought netconsole had some reserved skbs and could to writes
> > > > 'atomic' like? That said, it was 2.6 era the last time I looked at
> > > > netconsole.
> > >
> > > Yes, that part is fine. The problem is that netconsole tries
> > > to reap Tx completions if the Tx queue is full. We can't call
> > > skb destructor in irq context so we put the completed skbs on
> > > a queue and try to arm softirq to get to them later.
> > > Arming softirq causes a ksoftirq wake up.
> > >
> > > We already skip the completion polling if we detect getting called
> > > from the same networking driver. It's best effort, anyway.
> > > Networking-side fix would be to toss another OR condition into
> > > the skip. But we don't have one that'd work cleanly :S
> >
> > Alternative solution might be to offload the ksoftirq wake up
> > to an irq_work. It might make this part safe for the
> > console->write_atomic() call.
> >
> > Well, my understanding is that there are more problems.
> > AFAIK, some drivers do not use an IRQ safe locking, see
> > https://lore.kernel.org/all/oth5t27z6acp7qxut7u45ekyil7djirg2ny3bnsvnzeqasavxb@nhwdxahvcosh/
>
> But anything using locking is not ->write_atomic() and should be driven
> from a kthread, no?
Right. I am not sure where my head was this morning.
Best Regards,
Petr
^ permalink raw reply
* Re: [PATCH net] netpoll: run NAPI poll in softirq context to avoid rq->lock self-deadlock
From: John Ogness @ 2026-06-17 12:12 UTC (permalink / raw)
To: Petr Mladek, Peter Zijlstra
Cc: Sebastian Andrzej Siewior, Jakub Kicinski, Sergey Senozhatsky,
Vlad Poenaru, Thomas Gleixner, netdev, David S . Miller,
Eric Dumazet, Paolo Abeni, Simon Horman, Breno Leitao,
Clark Williams, Steven Rostedt, linux-rt-devel, linux-kernel,
stable, Frederic Weisbecker, Ingo Molnar, Vincent Guittot,
Dietmar Eggemann, K Prateek Nayak
In-Reply-To: <ajKMH_LmiZhjNlOW@pathway.suse.cz>
On 2026-06-17, Petr Mladek <pmladek@suse.com> wrote:
> On Wed 2026-06-17 13:15:04, Peter Zijlstra wrote:
>> Can't we push all the legacy consoles into a single legacy kthread? I
>> mean, converting all consoles is of course awesome, but should we really
>> wait for that?
>
> I am afraid that converting the consoles one by one is the deal with
> Linus. I could imagine to moving last few sinners into the kthread
> when the majority is converted. But we are far from there :-/
Note that the proposed patch is only for older kernels. For mainline it
is moot because netconsole is already converted to nbcon.
John
^ permalink raw reply
* Re: [PATCH net] net: rnpgbe: fix mailbox endianness handling
From: Andrew Lunn @ 2026-06-17 12:09 UTC (permalink / raw)
To: Yibo Dong
Cc: andrew+netdev, davem, edumazet, kuba, pabeni, vadim.fedorenko,
netdev, linux-kernel, yaojun
In-Reply-To: <0C93ADB24CB93157+20260617114614.GA273003@nic-Precision-5820-Tower>
> My understanding is as follows:
> The firmware structures are defined with__le16 / __le32 for wire format,
> but the original code cast these struct pointers to u32 * before passing
> them to the mailbox read/write routines:
> - Send path: (u32 *)&req -> msg buffer -> writel()
> - Receive path: readl() -> msg buffer -> (u32 *)&reply
> Sparse only sees pure u32 = u32 assignments here, so no type mismatch is
> reported.
Can the code be changed so that it does not need the cast? Casts are
bad, as you have just shown. This is something i try to push back on,
it makes you think about types and avoid issues like this.
Andrew
^ permalink raw reply
* [PATCH iwl v3] ice: retry reading NVM if admin queue returns EBUSY
From: Robert Malz @ 2026-06-17 12:07 UTC (permalink / raw)
To: anthony.l.nguyen, przemyslaw.kitszel; +Cc: intel-wired-lan, netdev
When the admin queue command to read NVM returns EBUSY, the driver
currently treats it as a fatal error and aborts the entire read
operation. This can cause spurious NVM read failures during periods of
high firmware activity.
Add retry logic to ice_read_flat_nvm() that handles EBUSY responses
from the admin queue. When an EBUSY error is encountered, release the
NVM resource lock, wait for ICE_SQ_SEND_DELAY_TIME_MS, re-acquire it,
and retry the failed read. The retry is attempted up to
ICE_SQ_SEND_MAX_EXECUTE times before giving up.
Code was extracted from OOT ice driver 1.15.4 release. Additional
change was made to reset last_cmd in case of retry to make sure that
all commands are retried properly.
Fixes: e94509906d6b ("ice: create function to read a section of the NVM and Shadow RAM")
Signed-off-by: Robert Malz <robert.malz@canonical.com>
---
Changes in v2:
- change ICE_AQ_RC_EBUSY -> LIBIE_AQ_RC_EBUSY
Changes in v3:
- resending to comply with the netdev 24-hour rule. No code changes since v2.
drivers/net/ethernet/intel/ice/ice_nvm.c | 25 +++++++++++++++++++-----
1 file changed, 20 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_nvm.c b/drivers/net/ethernet/intel/ice/ice_nvm.c
index 7e187a804dfa..b3120605d66f 100644
--- a/drivers/net/ethernet/intel/ice/ice_nvm.c
+++ b/drivers/net/ethernet/intel/ice/ice_nvm.c
@@ -67,6 +67,7 @@ ice_read_flat_nvm(struct ice_hw *hw, u32 offset, u32 *length, u8 *data,
{
u32 inlen = *length;
u32 bytes_read = 0;
+ int retry_cnt = 0;
bool last_cmd;
int status;
@@ -96,11 +97,25 @@ ice_read_flat_nvm(struct ice_hw *hw, u32 offset, u32 *length, u8 *data,
offset, read_size,
data + bytes_read, last_cmd,
read_shadow_ram, NULL);
- if (status)
- break;
-
- bytes_read += read_size;
- offset += read_size;
+ if (status) {
+ if (hw->adminq.sq_last_status != LIBIE_AQ_RC_EBUSY ||
+ retry_cnt > ICE_SQ_SEND_MAX_EXECUTE)
+ break;
+ ice_debug(hw, ICE_DBG_NVM,
+ "NVM read EBUSY error, retry %d\n",
+ retry_cnt + 1);
+ last_cmd = false;
+ ice_release_nvm(hw);
+ msleep(ICE_SQ_SEND_DELAY_TIME_MS);
+ status = ice_acquire_nvm(hw, ICE_RES_READ);
+ if (status)
+ break;
+ retry_cnt++;
+ } else {
+ bytes_read += read_size;
+ offset += read_size;
+ retry_cnt = 0;
+ }
} while (!last_cmd);
*length = bytes_read;
--
2.34.1
^ permalink raw reply related
* Re: [PATCH net-next 1/3] net: busy-poll: introduce sk_tx_busy_loop()
From: Menglong Dong @ 2026-06-17 12:00 UTC (permalink / raw)
To: Maciej Fijalkowski
Cc: Menglong Dong, Jakub Kicinski, jasowang, mst, xuanzhuo, eperezma,
andrew+netdev, davem, edumazet, pabeni, magnus.karlsson, sdf,
horms, ast, daniel, hawk, john.fastabend, bjorn, kerneljasonxing,
netdev, virtualization, linux-kernel, bpf
In-Reply-To: <ajJrckiXEUztBQDz@boxer>
On Wed, Jun 17, 2026 at 5:40 PM Maciej Fijalkowski
<maciej.fijalkowski@intel.com> wrote:
>
> On Sun, Jun 14, 2026 at 06:12:46PM +0800, Menglong Dong wrote:
> > On 2026/6/14 02:21, Jakub Kicinski wrote:
> > > On Thu, 11 Jun 2026 15:12:40 +0800 menglong8.dong@gmail.com wrote:
[...]
> >
> > I'm not sure if it is a good idea to introduce the sk_tx_busy_loop().
> > Maybe we can modify the driver instead by using the same NAPI
> > for both data sending and receiving, just like others do. The
> > advantage of introduce sk_tx_busy_loop() is that we can split the
> > data sending and receiving, which maybe more efficient.
>
> Would be good if you back your changes by any performance numbers. I
> believe that drivers do tx processing via rx napi as before AF_XDP it was
> only about cleaning up writebacks, AF_XDP added more weight via actual tx
> descriptors submission.
>
> Maybe you can vibe-code virtio-net to work only with rx napi and see what
> are the results.
Hi, Maciej. I have not done such performance testing yet. It's a good
and interesting
idea to do such testing on viriot-net, and I'll do it. If there is no obvious
performance differences, I'll modify virtio-net by sending data via rx napi
instead.
>
> Side note/question - Do you have a tx-only use case for AF_XDP ? I am
> planning (for a long time actually) to implement asymmetric AF_XDP
> sockets. Currently for ZC scenarios xsk socket occupies both rx and tx
> queues even when you do rx or tx only.
I think this is an interesting idea, and will be helpful in some cases.
I'm improving the performance of MySQL with AF_XDP. For this case,
tx-only is not suitable, as data reading and writing are both needed.
But for the other case, such as Redis, data reading is mostly. And in
this case, I think it's a good idea to use such "tx-only" ZC AF_XDP.
In my case, I don't want to occupy the whole NIC or the whole queue
with AF_XDP, and the other users can use the NIC too. However, the
ZC of AF_XDP has a little additional overhead to the skb in rx path,
as there is an extra data copy.
If such "tx-only" ZC is supported, the performance of AF_XDP is still
good in the read mostly case, and doesn't have additional overhead to
others too.
I haven't used AF_XDP for such a "reading mostly" case yet, so I'm not
sure if I'm right ;)
Thanks!
Menglong Dong
>
> >
> > >
> > > Third, this series does not apply.
> >
> > Ah, I'll rebase this series if a V2 is acceptable.
> >
> > Thanks!
> > Menglong Dong
> >
> > >
> > >
> >
> >
> >
> >
^ permalink raw reply
* RE: [PATCH net-next v5 1/4] dpll: add DPLL_PIN_TYPE_INT_NCO pin type
From: Kubalewski, Arkadiusz @ 2026-06-17 11:59 UTC (permalink / raw)
To: Vecera, Ivan, Jiri Pirko, Vadim Fedorenko, Jakub Kicinski
Cc: netdev@vger.kernel.org, Jiri Pirko, David S. Miller,
Donald Hunter, Eric Dumazet, Jakub Kicinski, Schmidt, Michal,
Paolo Abeni, Vaananen, Pasi, Oros, Petr, Prathosh Satish,
Simon Horman, Vadim Fedorenko, linux-kernel@vger.kernel.org
In-Reply-To: <ca33b9b8-aafa-40f0-9943-a6b6736af4e4@redhat.com>
>From: Ivan Vecera <ivecera@redhat.com>
>Sent: Monday, June 15, 2026 2:00 PM
>
>On 6/11/26 2:09 PM, Jiri Pirko wrote:
>> Wed, Jun 10, 2026 at 05:45:46PM +0200, ivecera@redhat.com wrote:
>>> On 6/10/26 3:04 PM, Kubalewski, Arkadiusz wrote:
>>>>> From: Ivan Vecera <ivecera@redhat.com>
>>>>> Sent: Tuesday, June 9, 2026 4:59 PM
>>>>>
>>>>> On 6/9/26 4:00 PM, Kubalewski, Arkadiusz wrote:
>>>>>>> From: Jiri Pirko <jiri@resnulli.us>
>>>>>>> Sent: Tuesday, June 9, 2026 10:51 AM
>>>>>>>
>>>>>>> Mon, Jun 08, 2026 at 07:03:46PM +0200,
>>>>>>> arkadiusz.kubalewski@intel.com
>>>>>>> wrote:
>>>>>>>>> From: Ivan Vecera <ivecera@redhat.com>
>>>>>>>>> Sent: Monday, June 8, 2026 5:48 PM
>>>>>>>>>
>>>>>>>>> On 6/8/26 4:43 PM, Kubalewski, Arkadiusz wrote:
>>>>>>>>>>> From: Ivan Vecera <ivecera@redhat.com>
>>>>>>>>>>> Sent: Sunday, May 31, 2026 9:44 PM ...
>>>>>>>>>>> -
>>>>>>>>>>> name: gnss
>>>>>>>>>>> doc: GNSS recovered clock
>>>>>>>>>>> + -
>>>>>>>>>>> + name: int-nco
>>>>>>>>>>> + doc: |
>>>>>>>>>>> + Device internal numerically controlled oscillator.
>>>>>>>>>>> + When connected as a DPLL input, the DPLL enters NCO
>>>>>>>>>>> mode
>>>>>>>>>>> + where the output frequency is adjusted by the host
>>>>>>>>>>> via
>>>>>>>>>>> + the PTP clock interface.
>>>>>>>>>>
>>>>>>>>>> Hi Ivan!
>>>>>>>>>>
>>>>>>>>>> How would you control this in case of automatic mode dpll?
>>>>>>>>>> Automatic mode DPLL shall be controlled on HW level, such pin
>>>>>>>>>> brakes that rule and requires some driver magic to show it is
>>>>>>>>>> higher priority then the rest of the pins?
>>>>>>>>>
>>>>>>>>> The NCO pin can be connected only in manual mode. In other words
>>>>>>>>> a
>>>>>>>>> DPLL in automatic mode cannot select NCO pin (switch to NCO mode)
>>>>>>>>> by
>>>>>>>>> its own.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Being picky on DPLL_MODE for enabling feature is not something we
>>>>>>>> can allow if it is not related to HW limitation, is it?
>>>>>>>> Could you please elaborate why it is not possible for AUTOMATIC
>>>>>>>> mode?
>>>>>>>
>>>>>>> In automatic mode, the pin selection logic is defined upon prio. I
>>>>>>> can imagine that if NCO pin has the highest prio of the available
>>>>>>> ones, it gets picked. I would be aligned 100% with automatic mode
>>>>>>> behaviour.
>>>>>>> Is there a real usecase for it?
>>>>>>>
>>>>>>> [..]
>>>>>>
>>>>>> This is not true. AUTOMATIC mode is HW solution, SW driver ONLY
>>>>>> configures priorities on the inputs, not manages the active inputs.
>>>>>> This brakes that behavior, the SW driver would have to manually
>>>>>> override the AUTMATIC mode to be fed from such NCO pin as it doesn't
>>>>>> exists on it's priority list, HW cannot pick or use it.
>>>>>
>>>>> Correct, AUTO mode is hardware feature and it should not be emulated
>>>>> by a
>>>>> driver. If the hardware does not support it then the switching
>>>>> between
>>>>> input references should be done by userspace (by monitoring ffo,
>>>>> phase_offset, operstate).
>>>>>
>>>>
>>>> Yes, exactly, so for AUTOMATIC mode HW it will not be possible to
>>>> create
>>>> such pin, which means that NCO pin would serve only a MANUAL mode
>>>> implementation.
>>>> Basically this is something we shall not allow to happen. DPLL API
>>>> should be designed to cover the case where AUTO mode is able to
>>>> implement
>>>> all features consistently.
>>>
>>> If you don't like the proposal from Jiri (NCO switch driven by NCO pin
>>> priority -> highest==enter_nco else leave_nco) then it could be
>>> possible
>>> to handle the switching by allowing the state 'connected' in AUTO mode
>>> for the NCO pin type. Then the implementation will be the same for both
>>> selection modes.
>>>
>>> Only difference would be that a user does not need to switch the device
>>>from the AUTO to MANUAL mode.
>>>
>>>>>> The real use case is that any DPLL can switch the mode to this one
>>>>>> instead of implementing MANUAL mode just to use the feature with a
>>>>>> 'virtual' pin.
>>>>>
>>>>> I don't expect this... but it is up to a driver. I don't plan such
>>>>> functionality in zl3073x as the NCO pin does not expose prio_get()
>>>>> and
>>>>> prio_set() callbacks - so it is clear that this pin cannot be part of
>>>>> the
>>>>> automatic selection.
>>>>>
>>>>> Ivan
>>>>
>>>> There is a difference between particular HW and API capabilities, with
>>>> the
>>>> proposed API we would disallow the possibility of such implementation
>>>> for
>>>> existing HW variants.
>>>>
>>>> DPLL NCO MODE would allow that but as pointed here by Ivan and by Jiri
>>>> in
>>>> the other email it would also require the extra implementation for
>>>> some
>>>> configuration - device level phase/ffo handling.
>>>>
>>>> To summarize it all, I don't have such simple solution for it.
>>>>
>>>> First thing that comes to my mind is to combine both approaches.
>>>> Make it possible for AUTMATIC mode to also set "CONNECTED" state
>>>> on certain kind of "OVERRIDE" pins, where it could be determined by
>>>> the type of PIN and embed that logic into the DPLL subsystem.
>>>
>>> The possible states for particual pins are now handled at a driver
>>> level
>>> so the driver decides if the requested state is correct or not. So it
>>> could be easy to implement this.
>>>
>>> For auto mode allowed states:
>>> - input references: selectable / disconnected
>>> - nco pin: connected / disconnected
>>>
>>>> Basically, if driver registers such NCO pin it would be always
>>>> selected
>>>> manually, and in such case all the other pins are going to
>>>> disconnected
>>>> state while DPLL mode is also a "OVERRIDE" or something like it.
>>>
>>> I would leave this decision on the driver level... Imagine the
>>> potential
>>> HW that would allow to switch NCO mode if there is no valid input
>>> reference.
>>>
>>> Example:
>>>
>>> REF0 (prio 0) -> +------+ -> OUT0
>>> REF1 (prio 1) -> | DPLL | -> ...
>>> NCO (prio 2) -> +------+ -> OUTn
>>>
>>> Such HW would prefer REF0 or REF1 and lock to one of them if they are
>>> qualified. But if they are NOT, then it switches to NCO mode.
Now you said yourself "NCO mode" ... I agree that it would be a mode in
that case. Where instead of running on regular/built in XO dpll would run
on NCO and user could select it, and this would be addition to regular
behavior.
I also agree that the pin approach might be better/easier to use, assuming
frequency offset for all the outputs given dpll drives, it makes more sense
to have it configurable on input side.
>>>
>>> In this situation the relevant driver would allow to configure priority
>>> and state 'selectable' for this NCO pin.
>>>
>>>> Perhaps the pin type could include OVERRIDE in it's name to make it
>>>> less
>>>> confusing and needs some extra documentation.
>>>>
>>>> Thoughts?
>>> I think _INT_ is ok. In the case of TYPE_INT_OSCILLATOR it is also
>>> obvious that it is not a standard input reference.
>>>
>>> Jiri, Vadim, Arek, thoughts?
>>
>> I agree with you, the driver should have the flexibility to implement
>> this according to his/hw's needs/capabilities. If it implements prio
>> selection in AUTO mode, let it have it. If it implements manual NCO pin
>> selection in AUTO mode using connected/disconnected override, let it
>> have it.
I don't know 'current' HW that is capable of using AUTO mode as a part of
HW-based priority source selection and use such NCO input..
But as already explained above, this is special mode of regular XO, which
allows DPLL's output frequency offset configuration.
>>
>> Moreover, I actually like the "override" capability for pins in AUTO
>> mode in general. It may be handy for other usecases as well.
>>
>Arek? Vadim?
>
>Thanks,
>Ivan
Agree, 'override' capability of a pin would be the way to go for this and
other similar further cases.
I believe a single approach on this would be best, I mean if AUTO mode
needs a capability, to switch from regular behavior to 'OVERRIDE', and
'OVERRIDE' is only pin capability that allows such behavior for AUTO
mode, then similar approach should be used on MANUAL mode, to make
userspace know that such pin is always available to set "CONNECTED"
and make the userspace implementation consistent on enabling it no matter
if AUTO or MANUAL mode dpll.
Thank you!
Arkadiusz
^ permalink raw reply
* Re: [PATCH net] netpoll: run NAPI poll in softirq context to avoid rq->lock self-deadlock
From: Petr Mladek @ 2026-06-17 11:59 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Sebastian Andrzej Siewior, Jakub Kicinski, John Ogness,
Sergey Senozhatsky, Vlad Poenaru, Thomas Gleixner, netdev,
David S . Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
Breno Leitao, Clark Williams, Steven Rostedt, linux-rt-devel,
linux-kernel, stable, Frederic Weisbecker, Ingo Molnar,
Vincent Guittot, Dietmar Eggemann, K Prateek Nayak
In-Reply-To: <20260617111504.GK49951@noisy.programming.kicks-ass.net>
On Wed 2026-06-17 13:15:04, Peter Zijlstra wrote:
> On Wed, Jun 17, 2026 at 12:12:07PM +0200, Petr Mladek wrote:
> > On Tue 2026-06-16 17:31:22, Sebastian Andrzej Siewior wrote:
> > > On 2026-06-16 08:11:28 [-0700], Jakub Kicinski wrote:
> > > > >
> > > > > Adding sched and printk folks for opinions while eyeballing
> > > > > WARN_ON_DEFERRED().
> > > >
> > > > Thanks a lot for looking into this! To be clear - the printk_deferred /
> > > > WARN_DEFERRED would be just for stable? Or there's still some
> > > > sensitivity even with nbcon?
> > >
> > > We already have printk_deferred(). WARN_DEFERRED() would be new. I
> > > *think* this is not limited netpoll/ netconsole but all console drivers
> > > not using CON_NBCON if the printk (via WARN) occurs with the rq held.
> > > I don't remember all the details but printk_deferred() was introduced to
> > > circumvent this until printk is fixed.
> >
> > Just to make it clear. The problem with the legacy consoles is that
> > they are called under console_lock() which is a semaphore. And it
> > calls wake_up_process() in console_unlock() when there is another
> > waiter on the lock.
> >
> > > Once we get rid of those legacy drivers and NBCON is the default we can
> > > get rid of printk_deferred() :)
> >
> > Yup.
>
> Can't we push all the legacy consoles into a single legacy kthread? I
> mean, converting all consoles is of course awesome, but should we really
> wait for that?
I am afraid that converting the consoles one by one is the deal with
Linus. I could imagine to moving last few sinners into the kthread
when the majority is converted. But we are far from there :-/
Best Regards,
Petr
^ permalink raw reply
* Re: [PATCH] net: airoha: Fix MODULE_LICENSE to match SPDX GPL-2.0-only identifier
From: Leon Romanovsky @ 2026-06-17 11:58 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Wayen Yan, netdev, lorenzo, horms, pabeni, kuba, edumazet,
andrew+netdev, angelogioacchino.delregno, matthias.bgg,
linux-arm-kernel, linux-mediatek
In-Reply-To: <178156440888.329386.11011872053824456703.git-patchwork-notify@kernel.org>
On Mon, Jun 15, 2026 at 11:00:08PM +0000, patchwork-bot+netdevbpf@kernel.org wrote:
> Hello:
>
> This patch was applied to netdev/net-next.git (main)
> by Jakub Kicinski <kuba@kernel.org>:
>
> On Sun, 14 Jun 2026 07:52:39 +0800 you wrote:
> > Both airoha_eth.c and airoha_npu.c declare SPDX-License-Identifier:
> > GPL-2.0-only but use MODULE_LICENSE("GPL"), which the kernel module
> > loader interprets as GPL-2.0+ (any GPL version). This mismatch causes
> > license compliance tools (FOSSology, ScanCode, etc.) to misidentify
> > the effective license as more permissive than intended.
> >
> > Replace MODULE_LICENSE("GPL") with MODULE_LICENSE("GPL v2") to
> > align with the GPL-2.0-only SPDX identifier. Per include/linux/module.h,
> > "GPL v2" maps to GPL-2.0-only, matching the source files' declared
> > license.
> >
> > [...]
>
> Here is the summary with links:
> - net: airoha: Fix MODULE_LICENSE to match SPDX GPL-2.0-only identifier
> https://git.kernel.org/netdev/net-next/c/b0d62ed16424
Jakub,
This patch doesn't fix anything. License rules are pretty clear.
Documentation/process/license-rules.rst
444 "GPL" Module is licensed under GPL version 2. This
445 does not express any distinction between
446 GPL-2.0-only or GPL-2.0-or-later. The exact
447 license information can only be determined
448 via the license information in the
449 corresponding source files.
450
451 "GPL v2" Same as "GPL". It exists for historic
452 reasons.
>
> You are awesome, thank you!
> --
> Deet-doot-dot, I am a bot.
> https://korg.docs.kernel.org/patchwork/pwbot.html
>
>
>
^ permalink raw reply
* Re: [PATCH net] selftests: vlan_bridge_binding: Fix flaky operational state check
From: Petr Machata @ 2026-06-17 11:46 UTC (permalink / raw)
To: Ido Schimmel; +Cc: netdev, davem, kuba, pabeni, edumazet, petrm, horms, razor
In-Reply-To: <20260617104323.1069457-1-idosch@nvidia.com>
Ido Schimmel <idosch@nvidia.com> writes:
> check_operstate() busy waits for up to one second for the operational
> state to change to the expected state. This is not enough since carrier
> loss events can be delayed by the kernel for up to one second (see
> __linkwatch_run_queue()), leading to sporadic failures.
>
> Fix by increasing the busy wait period to two seconds.
>
> Fixes: dca12e9ab760 ("selftests: net: Add a VLAN bridge binding selftest")
> Reported-by: Jakub Kicinski <kuba@kernel.org>
> Closes: https://lore.kernel.org/netdev/20260616092733.3a31be4d@kernel.org/
> Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
^ permalink raw reply
* Re: [PATCH net] net: rnpgbe: fix mailbox endianness handling
From: Yibo Dong @ 2026-06-17 11:46 UTC (permalink / raw)
To: Andrew Lunn
Cc: andrew+netdev, davem, edumazet, kuba, pabeni, vadim.fedorenko,
netdev, linux-kernel, yaojun
In-Reply-To: <bf5366c5-88ae-418c-9c5d-b2249a7f43fc@lunn.ch>
On Wed, Jun 17, 2026 at 11:40:42AM +0200, Andrew Lunn wrote:
Hi Andrew:
> On Wed, Jun 17, 2026 at 04:35:31PM +0800, Dong Yibo wrote:
> > Mailbox data is exchanged through 32-bit MMIO accesses but the
> > mailbox payload is defined using little-endian FW structures with
> > __le16 and __le32 fields.
>
> Given you are using __le16 and __le32, why did sparse not find these
> issues? It would be good to understand this, because if sparse missed
> this, what else has sparse missed which is also broken?
>
> Andrew
>
My understanding is as follows:
The firmware structures are defined with__le16 / __le32 for wire format,
but the original code cast these struct pointers to u32 * before passing
them to the mailbox read/write routines:
- Send path: (u32 *)&req -> msg buffer -> writel()
- Receive path: readl() -> msg buffer -> (u32 *)&reply
Sparse only sees pure u32 = u32 assignments here, so no type mismatch is
reported. In fact, readl()/writel() operate on 'native CPU-ordered u32
values', not little-endian values.
The __le annotations correctly describe the firmware wire format, but
the original mailbox transport using plain u32 * buffers erased all endian
type information at the MMIO boundary, hiding this mismatch from sparse.
I have also checked the rest of the rnpgbe driver: all __le types are
confined strictly to mailbox firmware structures, and this fix covers all
MMIO <-> structure data transfer paths. Comparisons between two __le fields
(e.g., reply->opcode != req->opcode) are safe, as both values share the
same byte order.
Thanks for your feedback!
^ permalink raw reply
* Re: [PATCH net] dpaa2-switch: fix VLAN upper check not rejecting bridge join
From: Simon Horman @ 2026-06-17 11:45 UTC (permalink / raw)
To: ioana.ciornei
Cc: Simon Horman, andrew+netdev, davem, edumazet, kuba, pabeni,
netdev, f.fainelli, vladimir.oltean, linux-kernel
In-Reply-To: <20260616105430.3725910-1-ioana.ciornei@nxp.com>
This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://sashiko.dev
---
dpaa2-switch: fix VLAN upper check not rejecting bridge join
The blamed commit refactored the prechangeupper event handling but
failed to actually return an error in case
dpaa2_switch_prevent_bridging_with_8021q_upper() detected a 802.1q upper
on a port which tries to join a bridge. Fix this by returning err
instead of 0.
> diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c b/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c
> index 52c1cb9cb7e02..46ae81c2fa013 100644
> --- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c
> +++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c
> @@ -2177,7 +2177,7 @@ dpaa2_switch_prechangeupper_sanity_checks(struct net_device *netdev,
> if (err) {
> NL_SET_ERR_MSG_MOD(extack,
> "Cannot join a bridge while VLAN uppers are present");
> - return 0;
> + return err;
[Severity: High]
If a user adds an 802.1q VLAN upper device to a port after it has already
successfully joined a bridge, won't this check trigger when the port leaves
the bridge?
In drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c,
dpaa2_switch_prechangeupper_sanity_checks() is called for both bridge joins
and unlinks from dpaa2_switch_port_prechangeupper():
err = dpaa2_switch_prechangeupper_sanity_checks(netdev, upper_dev, extack);
if (err)
return err;
if (!info->linking)
dpaa2_switch_port_pre_bridge_leave(netdev);
Since returning an error aborts the notifier chain, an unlink notification
would fail this check, preventing dpaa2_switch_port_pre_bridge_leave() from
running. Does this lead to state inconsistency and skipped switchdev
unoffloads?
Could this check be guarded so it only runs when info->linking is true?
> }
>
> netdev_for_each_lower_dev(upper_dev, other_dev, iter) {
^ permalink raw reply
* Re: [PATCH net-next] docs: exclude driver and netdevsim bugs
From: Leon Romanovsky @ 2026-06-17 11:40 UTC (permalink / raw)
To: Jakub Kicinski
Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, johannes,
corbet, skhan, workflows, linux-doc
In-Reply-To: <20260615091909.78ad2b03@kernel.org>
On Mon, Jun 15, 2026 at 09:19:09AM -0700, Jakub Kicinski wrote:
> On Mon, 15 Jun 2026 12:14:36 +0300 Leon Romanovsky wrote:
> > > +Unless explicitly excluded all bug fixes should be targeting the ``net``
> > > +tree and contain an appropriate Fixes tag.
> > > +
> > > +Obvious exclusions:
> > > +
> > > + - fixes for bugs which only exist in ``net-next`` should target ``net-next``
> > > + (please still include the Fixes tag in the commit message)
> > > + - bugs which cannot be reached, e.g. in code paths not executed given
> > > + current in-tree callers
> > > + - fixes for compiler warnings and typos
> >
> > If you decide to resubmit this patch, could you please remove "fixes for
> > compiler warnings" from the exclusion list?
> >
> > It is quite frustrating to receive a compiler warning originating from a
> > different subsystem after the merge window, knowing it will not be
> > addressed until the next merge window (around eight weeks later).
>
> Agreed, FWIW, but not planning to resubmit.
> I think people misunderstood that I'm __documenting what I already do__
> rather than trying to have a discussion :/
I'm pretty sure that people aren't aware of it.
Thanks
^ permalink raw reply
* Re: [PATCH net 4/4] net: ti: icssg: Fix XSK zero copy TX during application wakeup
From: Meghana Malladi @ 2026-06-17 11:31 UTC (permalink / raw)
To: Jakub Kicinski
Cc: diogo.ivo, haokexin, vadim.fedorenko, devnexen, horms,
jacob.e.keller, sdf, john.fastabend, hawk, daniel, ast, pabeni,
edumazet, davem, andrew+netdev, bpf, linux-kernel, netdev,
linux-arm-kernel, srk, Vignesh Raghavendra, Roger Quadros,
danishanwar
In-Reply-To: <20260616081954.0d12aa13@kernel.org>
On 6/16/26 20:49, Jakub Kicinski wrote:
> On Tue, 16 Jun 2026 16:41:00 +0530 Meghana Malladi wrote:
>> On 6/16/26 04:51, Jakub Kicinski wrote:
>>> On Fri, 12 Jun 2026 00:27:44 +0530 Meghana Malladi wrote:
>>>> @@ -169,9 +169,6 @@ static int emac_xsk_xmit_zc(struct prueth_emac *emac,
>>>>
>>>> num_tx++;
>>>> }
>>>> -
>>>> - xsk_tx_release(tx_chn->xsk_pool);
>>>> - return num_tx;
>>>
>>> Why are you deleting this?
>>>
>>
>> xsk_sendmsg() also calls this without an rcu-lock when transmitting the
>> packets if the xmit was successful, so I was assuming it is not required
>> and I removed this.
>
> I think you still need it. Besides, seems like a separate cleanup.
>
Okay, I will add it back then.
>>>> void prueth_xmit_free(struct prueth_tx_chn *tx_chn,
>>>> @@ -279,9 +276,6 @@ int emac_tx_complete_packets(struct prueth_emac *emac, int chn,
>>>> num_tx++;
>>>> }
>>>>
>>>> - if (!num_tx)
>>>> - return 0;
>>>
>>> Does something prevent us from running all this code if budget is 0?
>>> If budget is 0 we can complete normal Tx with skbs but we must
>>> not touch any AF-XDP related state.
>>
>> Can you elaborate more, I couldn't interpret your comment here
>
> netpoll may call napi from any context, including from IRQ.
> It uses budget of 0 to indicate that it's trying to only reap tx
> completions, without doing any Rx or XDP work. XDPs can't be called
> from IRQ context.
>
Ah I wasn't aware of this, I will add a check to ensure AF_XDP runs only
when budget > 0 then.
>>>> netif_txq = netdev_get_tx_queue(ndev, chn);
>>>> netdev_tx_completed_queue(netif_txq, num_tx, total_bytes);
>>>>
>>>> @@ -306,7 +300,9 @@ int emac_tx_complete_packets(struct prueth_emac *emac, int chn,
>>>>
>>>> netif_txq = netdev_get_tx_queue(ndev, chn);
>>>> txq_trans_cond_update(netif_txq);
>>>
>>> This looks misplaced, now we will hit it even if we didn't complete
>>> or submit any Tx.
>>
>> This code needs to be hit for packet transmission in zero copy mode.
>> emac_xsk_xmit_zc() submits the packets to the DMA in NAPI context,
>> when application wakes up the driver and triggers NAPI. Once DMA
>> transfer is done, irq gets triggered NAPI gets called which will handle
>> the tx packet completion + submit next Tx batch packets to the DMA.
>>
>> if (tx_chn->xsk_pool) -> check ensure this hits and runs for zero copy
>> only. Also above check (!num_tx) returns early during the application
>> wakeup (where budget is zero), hence it is removed.
>
> I'm commenting on txq_trans_cond_update(), you're calling it
> effectively on every NAPI call when XSK is bound, whether
> Tx is making progress or not.
Ok got it, but I wonder if it will hurt in anyway to call this even when
there are no Tx completions.
Nonetheless, I will move this inside xsk_frames_done check.
^ permalink raw reply
* Re: [PATCH net-next 0/2] appletalk: move the protocol out of tree
From: Carsten Strotmann @ 2026-06-17 11:15 UTC (permalink / raw)
To: Jakub Kicinski, Carsten Strotmann
Cc: John Paul Adrian Glaubitz, davem, netdev, edumazet, pabeni,
andrew+netdev, horms, geert, chleroy, npiggin, mpe, maddy,
linux-mips, linux-m68k, linuxppc-dev
In-Reply-To: <20260616084901.3319d82e@kernel.org>
Hi Jakub,
On Tuesday 16 June 2026 05:49:01 PM (+02:00), Jakub Kicinski wrote:
> > the solution, as Adrian pointed out, is to leave these features in
> > the Linux kernel but have them disabled by default.
>
> I think y'all need to internalize that "just leave it in" means work.
> _Someone_ has to handle the reports and patches. And since nobody is
> doing that the code is going to GitHub, where it can continue to "just
> be left" or whatever, without racking up CVEs for the Linux kernel
> and leading to maintainer burn out :/
>
That's a good point. The large influx of reports is a problem,
and burn out of maintainers is a too high cost.
> > Maybe put a warning message in the kernel config tools that people
> > should only enable these if they know what they are doing.
> >
> > These "retro"-features should not pose any security risk of they are
> > not compiled into a kernel.
>
> Nobody is stopping you from using this code! It's perfectly suitable
> to be an out of tree module. Maybe it'd be harder if someone wanted to
> remove a CPU architecture you want to use, but protocols are perfectly
> fine as loadable modules. You can continue to use the code from:
> https://github.com/linux-netdev/mod-orphan
>
> Presumably you could get Debian to package that and you wouldn't even
> know the sources no longer live in the kernel tree.
>
It seems the current situation is the price of success (of Linux, which is
good).
I guess the way to go would be to move these old drivers to userspace in
order to
reduce dependencies on the Linux Kernel. But that is not a task for the
Linux-Maintainers, but for the Retro-Community.
Thanks for your work and the background information
Carsten
--
https://strotmann.de
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox