Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH net v3 3/4] nfc: llcp: fix TLV parsing OOB in nfc_llcp_recv_snl
From: Lekë Hapçiu @ 2026-04-14 23:35 UTC (permalink / raw)
  To: netdev
  Cc: davem, edumazet, kuba, pabeni, horms, linux-kernel, stable,
	Lekë Hapçiu

From: Lekë Hapçiu <framemain@outlook.com>

nfc_llcp_recv_snl() has four problems when handling a hostile peer:

 1. nfc_llcp_dsap()/nfc_llcp_ssap() dereference skb->data[0..1] without
    verifying skb->len; a 0- or 1-byte frame leads to an OOB read.
    Additionally tlv_len = skb->len - LLCP_HEADER_SIZE wraps when
    skb->len < 2, causing the following loop to run far past the
    buffer.

 2. The per-iteration loop guard `offset < tlv_len` only proves one
    byte is available, but the body reads tlv[0] and tlv[1].

 3. The peer-supplied `length` field is used to advance `tlv` without
    being checked against the remaining array space.

 4. The SDREQ handler previously only required length >= 1 but reads
    both tid (tlv[2]) and the first byte of service_name (tlv[3], via
    the pr_debug("%.16s") print and the service_name_len = length - 1
    string usage), so length >= 2 is required.

Fix: reject frames smaller than LLCP_HEADER_SIZE up front; add TLV
header and TLV value guards at the top of each iteration; bump the
SDREQ minimum length to 2.

Reachable from any NFC peer within ~4 cm once an LLCP link is up.

Fixes: 7a06f0ee2823 ("NFC: llcp: Service Name Lookup implementation")
Cc: stable@vger.kernel.org
Signed-off-by: Lekë Hapçiu <framemain@outlook.com>
---
 net/nfc/llcp_core.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/net/nfc/llcp_core.c b/net/nfc/llcp_core.c
index 366d75663..efe228f96 100644
--- a/net/nfc/llcp_core.c
+++ b/net/nfc/llcp_core.c
@@ -1282,6 +1282,11 @@ static void nfc_llcp_recv_snl(struct nfc_llcp_local *local,
 	size_t sdres_tlvs_len;
 	HLIST_HEAD(nl_sdres_list);
 
+	if (skb->len < LLCP_HEADER_SIZE) {
+		pr_err("Malformed SNL PDU\n");
+		return;
+	}
+
 	dsap = nfc_llcp_dsap(skb);
 	ssap = nfc_llcp_ssap(skb);
 
@@ -1298,11 +1303,17 @@ static void nfc_llcp_recv_snl(struct nfc_llcp_local *local,
 	sdres_tlvs_len = 0;
 
 	while (offset < tlv_len) {
+		if (tlv_len - offset < 2)
+			break;
 		type = tlv[0];
 		length = tlv[1];
+		if (tlv_len - offset - 2 < length)
+			break;
 
 		switch (type) {
 		case LLCP_TLV_SDREQ:
+			if (length < 2)
+				break;
 			tid = tlv[2];
 			service_name = (char *) &tlv[3];
 			service_name_len = length - 1;
-- 
2.51.0


^ permalink raw reply related

* [PATCH net v3 2/4] nfc: llcp: fix TLV parsing in parse_gb_tlv and parse_connection_tlv
From: Lekë Hapçiu @ 2026-04-14 23:35 UTC (permalink / raw)
  To: netdev
  Cc: davem, edumazet, kuba, pabeni, horms, linux-kernel, stable,
	Lekë Hapçiu

From: Lekë Hapçiu <framemain@outlook.com>

nfc_llcp_parse_gb_tlv() and nfc_llcp_parse_connection_tlv() walk TLV
arrays whose length and content come from a peer-supplied frame.  The
parsing loop has three weaknesses:

 1. `offset` is declared u8 while `tlv_array_len` is u16.  In
    parse_connection_tlv() the TLV array can reach ~2173 bytes (MIUX
    up to 0x7FF), so 128 zero-length TLVs wrap `offset` back to 0 and
    the loop never terminates while `tlv` advances past the buffer.

 2. The guard `offset < tlv_array_len` only proves one byte is
    available, but the body reads tlv[0] (type) and tlv[1] (length).
    When one byte remains, tlv[1] is out of bounds.

 3. `length` is read from peer data and used to advance `tlv` without
    being checked against the remaining array space.  A crafted length
    walks `tlv` past the buffer; the next iteration reads tlv[0]/tlv[1]
    from adjacent memory.

The llcp_tlv8() and llcp_tlv16() accessors additionally read tlv[2]
and tlv[2..3]; a zero-length TLV makes those reads out of bounds.

Fix: promote `offset` to u16; add two per-iteration guards, one for
the TLV header and one for the TLV value; require length >= 1 for all
TLVs before the type dispatch and length >= 2 for the llcp_tlv16()
accessors (MIUX, WKS).  Return -EINVAL on malformed input.

Reached on ATR_RES (parse_gb_tlv) and on CONNECT/CC PDUs before a
connection is established (parse_connection_tlv).  Both are
triggerable from any NFC peer within ~4 cm, without authentication.

Reported-by: Simon Horman <horms@kernel.org>
Fixes: d646960f7986 ("NFC: Add LLCP sockets")
Cc: stable@vger.kernel.org
Signed-off-by: Lekë Hapçiu <framemain@outlook.com>
---
 net/nfc/llcp_commands.c | 24 ++++++++++++++++++++++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/net/nfc/llcp_commands.c b/net/nfc/llcp_commands.c
index 291f26fac..b6dcfb2d1 100644
--- a/net/nfc/llcp_commands.c
+++ b/net/nfc/llcp_commands.c
@@ -193,7 +193,8 @@ int nfc_llcp_parse_gb_tlv(struct nfc_llcp_local *local,
 			  const u8 *tlv_array, u16 tlv_array_len)
 {
 	const u8 *tlv = tlv_array;
-	u8 type, length, offset = 0;
+	u8 type, length;
+	u16 offset = 0;
 
 	pr_debug("TLV array length %d\n", tlv_array_len);
 
@@ -201,8 +202,14 @@ int nfc_llcp_parse_gb_tlv(struct nfc_llcp_local *local,
 		return -ENODEV;
 
 	while (offset < tlv_array_len) {
+		if (tlv_array_len - offset < 2)
+			return -EINVAL;
 		type = tlv[0];
 		length = tlv[1];
+		if (tlv_array_len - offset - 2 < length)
+			return -EINVAL;
+		if (length < 1)
+			return -EINVAL;
 
 		pr_debug("type 0x%x length %d\n", type, length);
 
@@ -211,9 +218,13 @@ int nfc_llcp_parse_gb_tlv(struct nfc_llcp_local *local,
 			local->remote_version = llcp_tlv_version(tlv);
 			break;
 		case LLCP_TLV_MIUX:
+			if (length < 2)
+				return -EINVAL;
 			local->remote_miu = llcp_tlv_miux(tlv) + 128;
 			break;
 		case LLCP_TLV_WKS:
+			if (length < 2)
+				return -EINVAL;
 			local->remote_wks = llcp_tlv_wks(tlv);
 			break;
 		case LLCP_TLV_LTO:
@@ -243,7 +254,8 @@ int nfc_llcp_parse_connection_tlv(struct nfc_llcp_sock *sock,
 				  const u8 *tlv_array, u16 tlv_array_len)
 {
 	const u8 *tlv = tlv_array;
-	u8 type, length, offset = 0;
+	u8 type, length;
+	u16 offset = 0;
 
 	pr_debug("TLV array length %d\n", tlv_array_len);
 
@@ -251,13 +263,21 @@ int nfc_llcp_parse_connection_tlv(struct nfc_llcp_sock *sock,
 		return -ENOTCONN;
 
 	while (offset < tlv_array_len) {
+		if (tlv_array_len - offset < 2)
+			return -EINVAL;
 		type = tlv[0];
 		length = tlv[1];
+		if (tlv_array_len - offset - 2 < length)
+			return -EINVAL;
+		if (length < 1)
+			return -EINVAL;
 
 		pr_debug("type 0x%x length %d\n", type, length);
 
 		switch (type) {
 		case LLCP_TLV_MIUX:
+			if (length < 2)
+				return -EINVAL;
 			sock->remote_miu = llcp_tlv_miux(tlv) + 128;
 			break;
 		case LLCP_TLV_RW:
-- 
2.51.0


^ permalink raw reply related

* [PATCH net v3 1/4] nfc: nci: fix u8 underflow in nci_store_general_bytes_nfc_dep
From: Lekë Hapçiu @ 2026-04-14 23:35 UTC (permalink / raw)
  To: netdev
  Cc: davem, edumazet, kuba, pabeni, horms, linux-kernel, stable,
	Lekë Hapçiu

From: Lekë Hapçiu <framemain@outlook.com>

nci_store_general_bytes_nfc_dep() computes the General Bytes length by
subtracting a fixed header offset from the peer-supplied atr_res_len
(POLL) or atr_req_len (LISTEN) field:

    ndev->remote_gb_len = min_t(__u8,
        atr_res_len - NFC_ATR_RES_GT_OFFSET,   /* offset = 15 */
        NFC_ATR_RES_GB_MAXSIZE);

Both length fields are __u8.  When a malicious NFC-DEP peer sends an
ATR_RES/ATR_REQ whose length is smaller than the fixed offset (< 15
or < 14 respectively), the subtraction wraps:

    atr_res_len = 0  ->  (u8)(0 - 15) = 241
    min_t(__u8, 241, NFC_ATR_RES_GB_MAXSIZE=47) = 47

The subsequent memcpy then reads 47 bytes beyond the valid activation
parameter data into ndev->remote_gb[].  This buffer is later fed to
nfc_llcp_parse_gb_tlv() as a TLV array.

Reject the frame with NCI_STATUS_RF_PROTOCOL_ERROR when the length is
below the required offset, and propagate the error out of
nci_rf_intf_activated_ntf_packet() instead of silently accepting the
malformed packet.

Reachable from any NFC peer within ~4 cm during RF activation, prior
to any pairing.

Fixes: c4fbb6515709 ("NFC: NCI: Add NFC-DEP support to NCI data exchange")
Cc: stable@vger.kernel.org
Signed-off-by: Lekë Hapçiu <framemain@outlook.com>
---
 net/nfc/nci/ntf.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/net/nfc/nci/ntf.c b/net/nfc/nci/ntf.c
index c96512bb8..eb8c6e5a1 100644
--- a/net/nfc/nci/ntf.c
+++ b/net/nfc/nci/ntf.c
@@ -631,6 +631,9 @@ static int nci_store_general_bytes_nfc_dep(struct nci_dev *ndev,
 	switch (ntf->activation_rf_tech_and_mode) {
 	case NCI_NFC_A_PASSIVE_POLL_MODE:
 	case NCI_NFC_F_PASSIVE_POLL_MODE:
+		if (ntf->activation_params.poll_nfc_dep.atr_res_len <
+		    NFC_ATR_RES_GT_OFFSET)
+			return NCI_STATUS_RF_PROTOCOL_ERROR;
 		ndev->remote_gb_len = min_t(__u8,
 			(ntf->activation_params.poll_nfc_dep.atr_res_len
 						- NFC_ATR_RES_GT_OFFSET),
@@ -643,6 +646,9 @@ static int nci_store_general_bytes_nfc_dep(struct nci_dev *ndev,
 
 	case NCI_NFC_A_PASSIVE_LISTEN_MODE:
 	case NCI_NFC_F_PASSIVE_LISTEN_MODE:
+		if (ntf->activation_params.listen_nfc_dep.atr_req_len <
+		    NFC_ATR_REQ_GT_OFFSET)
+			return NCI_STATUS_RF_PROTOCOL_ERROR;
 		ndev->remote_gb_len = min_t(__u8,
 			(ntf->activation_params.listen_nfc_dep.atr_req_len
 						- NFC_ATR_REQ_GT_OFFSET),
@@ -842,8 +848,10 @@ static int nci_rf_intf_activated_ntf_packet(struct nci_dev *ndev,
 		/* store general bytes to be reported later in dep_link_up */
 		if (ntf.rf_interface == NCI_RF_INTERFACE_NFC_DEP) {
 			err = nci_store_general_bytes_nfc_dep(ndev, &ntf);
-			if (err != NCI_STATUS_OK)
+			if (err != NCI_STATUS_OK) {
 				pr_err("unable to store general bytes\n");
+				return -EINVAL;
+			}
 		}
 
 		/* store ATS to be reported later in nci_activate_target */
-- 
2.51.0


^ permalink raw reply related

* [PATCH net v3 0/4] nfc: fix multiple parsing vulnerabilities reachable from RF
From: Lekë Hapçiu @ 2026-04-14 23:35 UTC (permalink / raw)
  To: netdev
  Cc: davem, edumazet, kuba, pabeni, horms, linux-kernel, stable,
	Lekë Hapçiu

From: Lekë Hapçiu <framemain@outlook.com>

This series fixes four RF-reachable parsing vulnerabilities in the NFC
stack.  All four are triggerable from an NFC peer within ~4 cm of the
victim, before any pairing or authentication.

Patch 1 fixes a u8 underflow in nci_store_general_bytes_nfc_dep() where
a short ATR_RES/ATR_REQ causes (atr_res_len - NFC_ATR_RES_GT_OFFSET) to
wrap in u8 arithmetic, producing a bogus remote_gb_len that copies up
to 47 bytes beyond the valid activation parameter data.

Patch 2 hardens nfc_llcp_parse_gb_tlv() and
nfc_llcp_parse_connection_tlv().  The loop guard does not prove that
two header bytes can be read, and the peer-controlled `length` field
is used to advance `tlv` without bounds checking.  An 8-bit `offset`
against a 16-bit `tlv_array_len` compounds the issue in
parse_connection_tlv() where the TLV array can exceed 255 bytes.

Patch 3 fixes nfc_llcp_recv_snl().  The SNL handler accesses skb->data
before verifying skb->len, and its inner TLV loop has the same two
weaknesses as patch 2.  SDREQ handling additionally requires
length >= 2 because both tid (tlv[2]) and the start of service_name
(tlv[3]) are read.

Patch 4 fixes nfc_llcp_recv_dm() which reads skb->data[2] (the DM
reason byte) without checking skb->len >= 3.

Changes in v3:
 - Restore the u8 -> u16 `offset` promotion in patch 2.  v2 split this
   into a separate v1 patch and did not re-send it; v3 combines the
   promotion and the bounds checks in a single patch (Paolo Abeni).
 - Return -EINVAL from nci_store_general_bytes_nfc_dep() and propagate
   the error out of nci_rf_intf_activated_ntf_packet() rather than
   silently accepting the malformed packet (Paolo Abeni).
 - Drop the style-only paren removal in patch 1 (Paolo Abeni).
 - Condense commit message in patch 2 (Paolo Abeni).
 - Consolidate the length >= 1 checks before the switch in patch 2,
   keeping length >= 2 only for the llcp_tlv16() accessors (Paolo Abeni).
 - Tighten SDREQ length check from >=1 to >=2 in patch 3; the handler
   reads both tlv[2] and tlv[3] (Sashiko).
 - Add patch 4 for nfc_llcp_recv_dm().
 - Send as a fresh thread rather than In-Reply-To v2 (Paolo Abeni).

Lekë Hapçiu (4):
  nfc: nci: fix u8 underflow in nci_store_general_bytes_nfc_dep
  nfc: llcp: fix TLV parsing in parse_gb_tlv and parse_connection_tlv
  nfc: llcp: fix TLV parsing OOB in nfc_llcp_recv_snl
  nfc: llcp: fix OOB read of DM reason byte in nfc_llcp_recv_dm

 net/nfc/llcp_commands.c | 24 ++++++++++++++++++++++--
 net/nfc/llcp_core.c     | 16 ++++++++++++++++
 net/nfc/nci/ntf.c       | 10 +++++++++-
 3 files changed, 47 insertions(+), 3 deletions(-)

-- 
2.51.0

^ permalink raw reply

* Re: [PATCH net-next v9 02/10] net: phy: phy_link_topology: Track ports in phy_link_topology
From: Andrew Lunn @ 2026-04-14 23:31 UTC (permalink / raw)
  To: Maxime Chevallier
  Cc: davem, Jakub Kicinski, Eric Dumazet, Paolo Abeni, Russell King,
	Heiner Kallweit, netdev, linux-kernel, thomas.petazzoni,
	Christophe Leroy, Herve Codina, Florian Fainelli, Vladimir Oltean,
	Köry Maincent, Marek Behún, Oleksij Rempel,
	Nicolò Veronese, Simon Horman, mwojtas, Romain Gantois,
	Daniel Golle, Dimitri Fedrau
In-Reply-To: <20260403123755.175742-3-maxime.chevallier@bootlin.com>

On Fri, Apr 03, 2026 at 02:37:46PM +0200, Maxime Chevallier wrote:
> phy_port is aimed at representing the various physical interfaces of a
> net_device. They can be controlled by various components in the link,
> such as the Ethernet PHY, the Ethernet MAC, and SFP module, etc.
> 
> Let's therefore make so we keep track of all the ports connected to a
> netdev in phy_link_topology. The only ports added for now are phy-driven
> ports.
> 
> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* Re: [PATCH net-next v9 01/10] net: phy: phy_link_topology: Add a helper for opportunistic alloc
From: Andrew Lunn @ 2026-04-14 23:28 UTC (permalink / raw)
  To: Maxime Chevallier
  Cc: davem, Jakub Kicinski, Eric Dumazet, Paolo Abeni, Russell King,
	Heiner Kallweit, netdev, linux-kernel, thomas.petazzoni,
	Christophe Leroy, Herve Codina, Florian Fainelli, Vladimir Oltean,
	Köry Maincent, Marek Behún, Oleksij Rempel,
	Nicolò Veronese, Simon Horman, mwojtas, Romain Gantois,
	Daniel Golle, Dimitri Fedrau
In-Reply-To: <20260403123755.175742-2-maxime.chevallier@bootlin.com>

On Fri, Apr 03, 2026 at 02:37:45PM +0200, Maxime Chevallier wrote:
> The phy_link_topology structure stores information about the PHY-related
> components connected to a net_device. It is opportunistically allocated,
> when we add the first item to the topology, as this is not relevant for
> all kinds of net_devices.
> 
> In preparation for the addition of phy_port tracking in the topology,
> let's make a dedicated helper for that allocation sequence.
> 
> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* [RFC PATCH net-next 2/2] selftests: net: add FOU multicast encapsulation resubmit test
From: Anton Danilov @ 2026-04-14 23:28 UTC (permalink / raw)
  To: netdev
  Cc: willemdebruijn.kernel, davem, dsahern, edumazet, kuba, pabeni,
	horms, shuah, linux-kselftest
In-Reply-To: <ad7MsSJOuUU6EGwS@dau-home-pc>

Add a selftest to verify that FOU-encapsulated packets addressed to a
multicast destination are correctly resubmitted to the inner protocol
handler (GRE) via the UDP multicast delivery path.

The test creates two network namespaces connected by a veth pair. The
receiver namespace has a FOU/GRETAP tunnel with a multicast remote
address (239.0.0.1). The sender crafts GRE-over-UDP packets and sends
them to the multicast address.

The early demux optimization (net.ipv4.ip_early_demux) is disabled on
the receiver to force packets through __udp4_lib_mcast_deliver(),
which is the code path that was previously broken.

Signed-off-by: Anton Danilov <littlesmilingcloud@gmail.com>
Assisted-by: Claude:claude-opus-4-6
---
 tools/testing/selftests/net/Makefile          |   1 +
 .../testing/selftests/net/fou_mcast_encap.sh  | 150 ++++++++++++++++++
 2 files changed, 151 insertions(+)
 create mode 100755 tools/testing/selftests/net/fou_mcast_encap.sh

diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index a275ed584026..9b2a573e4af2 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -38,6 +38,7 @@ TEST_PROGS := \
 	fib_rule_tests.sh \
 	fib_tests.sh \
 	fin_ack_lat.sh \
+	fou_mcast_encap.sh \
 	fq_band_pktlimit.sh \
 	gre_gso.sh \
 	gre_ipv6_lladdr.sh \
diff --git a/tools/testing/selftests/net/fou_mcast_encap.sh b/tools/testing/selftests/net/fou_mcast_encap.sh
new file mode 100755
index 000000000000..d4737d674862
--- /dev/null
+++ b/tools/testing/selftests/net/fou_mcast_encap.sh
@@ -0,0 +1,150 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# Test that UDP encapsulation (FOU) correctly handles packet resubmit
+# when packets are delivered via the multicast UDP delivery path.
+#
+# When a FOU-encapsulated packet arrives with a multicast destination IP,
+# __udp4_lib_mcast_deliver() must resubmit it to the inner protocol
+# handler (e.g., GRE) rather than consuming it. This test verifies that
+# by creating a FOU/GRETAP tunnel with a multicast remote address, sending
+# encapsulated packets, and checking that they are correctly decapsulated.
+#
+# The early demux optimization can mask this issue by routing packets via
+# the unicast path (udp_unicast_rcv_skb), so we disable it to force
+# packets through __udp4_lib_mcast_deliver().
+
+source lib.sh
+
+NSENDER=""
+NRECV=""
+
+cleanup() {
+	cleanup_all_ns
+}
+
+trap cleanup EXIT
+
+setup() {
+	setup_ns NSENDER NRECV
+
+	ip link add veth_s type veth peer name veth_r
+	ip link set veth_s netns "$NSENDER"
+	ip link set veth_r netns "$NRECV"
+
+	ip -n "$NSENDER" addr add 10.0.0.1/24 dev veth_s
+	ip -n "$NSENDER" link set veth_s up
+
+	ip -n "$NRECV" addr add 10.0.0.2/24 dev veth_r
+	ip -n "$NRECV" link set veth_r up
+
+	# Disable early demux to force multicast delivery path
+	ip netns exec "$NRECV" sysctl -wq net.ipv4.ip_early_demux=0
+
+	# Join multicast group on receiver
+	ip -n "$NRECV" addr add 239.0.0.1/32 dev veth_r autojoin
+
+	# Multicast routes
+	ip -n "$NRECV" route add 239.0.0.0/8 dev veth_r
+	ip -n "$NSENDER" route add 239.0.0.0/8 dev veth_s
+
+	# FOU listener
+	ip netns exec "$NRECV" ip fou add port 4797 ipproto 47
+
+	# GRETAP with multicast remote - this triggers __udp4_lib_mcast_deliver
+	ip -n "$NRECV" link add eoudp0 type gretap \
+		remote 239.0.0.1 local 10.0.0.2 \
+		encap fou encap-sport 4797 encap-dport 4797 \
+		key 239.0.0.1
+	ip -n "$NRECV" link set eoudp0 up
+	ip -n "$NRECV" addr add 192.168.99.2/24 dev eoudp0
+}
+
+send_fou_gre_packets() {
+	local count=$1
+
+	ip netns exec "$NSENDER" python3 -c "
+import socket, struct
+
+# GRE header: key flag set, proto=0x6558 (transparent ethernet bridging)
+gre_key = socket.inet_aton('239.0.0.1')
+gre_hdr = struct.pack('!HH', 0x2000, 0x6558) + gre_key
+
+# Inner Ethernet frame
+dst_mac = b'\xff\xff\xff\xff\xff\xff'
+src_mac = b'\x02\x00\x00\x00\x00\x01'
+eth_hdr = dst_mac + src_mac + struct.pack('!H', 0x0800)
+
+# Inner IP: 192.168.99.1 -> 192.168.99.2, ICMP echo
+inner_ip_src = socket.inet_aton('192.168.99.1')
+inner_ip_dst = socket.inet_aton('192.168.99.2')
+
+# ICMP echo request
+icmp_payload = b'TESTFOU!' * 4
+icmp_hdr = struct.pack('!BBHHH', 8, 0, 0, 0x1234, 1) + icmp_payload
+csum = 0
+for i in range(0, len(icmp_hdr), 2):
+    if i + 1 < len(icmp_hdr):
+        csum += (icmp_hdr[i] << 8) + icmp_hdr[i+1]
+    else:
+        csum += icmp_hdr[i] << 8
+while csum >> 16:
+    csum = (csum & 0xffff) + (csum >> 16)
+csum = ~csum & 0xffff
+icmp_hdr = struct.pack('!BBHHH', 8, 0, csum, 0x1234, 1) + icmp_payload
+
+# Inner IP header
+ip_len = 20 + len(icmp_hdr)
+ip_hdr = struct.pack('!BBHHHBBH', 0x45, 0, ip_len, 0x1234, 0, 64, 1, 0)
+ip_hdr += inner_ip_src + inner_ip_dst
+csum = 0
+for i in range(0, 20, 2):
+    csum += (ip_hdr[i] << 8) + ip_hdr[i+1]
+while csum >> 16:
+    csum = (csum & 0xffff) + (csum >> 16)
+csum = ~csum & 0xffff
+ip_hdr = ip_hdr[:10] + struct.pack('!H', csum) + ip_hdr[12:]
+
+payload = gre_hdr + eth_hdr + ip_hdr + icmp_hdr
+
+sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
+for _ in range($count):
+    sock.sendto(payload, ('239.0.0.1', 4797))
+sock.close()
+"
+}
+
+get_rx_packets() {
+	ip -n "$NRECV" -s link show eoudp0 | awk '/RX:/{getline; print $2}'
+}
+
+test_fou_mcast_encap() {
+	local count=100
+	local rx_before
+	local rx_after
+	local rx_delta
+
+	rx_before=$(get_rx_packets)
+	send_fou_gre_packets $count
+	sleep 1
+	rx_after=$(get_rx_packets)
+
+	rx_delta=$((rx_after - rx_before))
+
+	if [ "$rx_delta" -ge "$count" ]; then
+		echo "PASS: received $rx_delta/$count packets via multicast FOU/GRETAP"
+		return "$ksft_pass"
+	elif [ "$rx_delta" -gt 0 ]; then
+		echo "FAIL: only $rx_delta/$count packets received (partial delivery)"
+		return "$ksft_fail"
+	else
+		echo "FAIL: 0/$count packets received (multicast encap resubmit broken)"
+		return "$ksft_fail"
+	fi
+}
+
+echo "TEST: FOU/GRETAP multicast encapsulation resubmit"
+
+setup
+test_fou_mcast_encap
+exit $?
-- 
2.47.3


^ permalink raw reply related

* [RFC PATCH net-next 1/2] udp: fix encapsulation packet resubmit in multicast deliver
From: Anton Danilov @ 2026-04-14 23:27 UTC (permalink / raw)
  To: netdev
  Cc: willemdebruijn.kernel, davem, dsahern, edumazet, kuba, pabeni,
	horms, shuah, linux-kselftest
In-Reply-To: <ad7MsSJOuUU6EGwS@dau-home-pc>

When a UDP encapsulation socket (e.g., FOU) receives a multicast
packet, __udp4_lib_mcast_deliver() and __udp6_lib_mcast_deliver()
incorrectly call consume_skb() when udp_queue_rcv_skb() returns a
positive value. A positive return value from udp_queue_rcv_skb()
indicates that the encap_rcv handler (e.g., fou_udp_recv) has
consumed the UDP header and wants the packet to be resubmitted to
the IP protocol handler for further processing (e.g., as a GRE
packet).

The unicast path in udp_unicast_rcv_skb() handles this correctly by
returning -ret, which propagates up to ip_protocol_deliver_rcu() for
resubmission. The GSO path in udp_queue_rcv_skb() also handles this
correctly by calling ip_protocol_deliver_rcu() directly. However, the
multicast path destroys the packet via consume_skb() instead of
resubmitting it, causing silent packet loss.

This bug affects any UDP encapsulation (FOU, GUE) combined with
multicast destination addresses. In practice, it causes ~50% packet
loss on FOU/GRETAP tunnels configured with multicast remote addresses,
with the exact ratio depending on the early demux cache hit rate
(packets that hit early demux take the unicast path and are handled
correctly).

Fix this by calling ip_protocol_deliver_rcu() (IPv4) or
ip6_protocol_deliver_rcu() (IPv6) instead of consume_skb() when the
return value is positive, matching the behavior of the GSO path.

Signed-off-by: Anton Danilov <littlesmilingcloud@gmail.com>
Assisted-by: Claude:claude-opus-4-6
---
 net/ipv4/udp.c | 13 +++++++++----
 net/ipv6/udp.c | 13 +++++++++----
 2 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index e9e2ce9522ef..8c2d4367cba2 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -2467,6 +2467,7 @@ static int __udp4_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 	struct udp_hslot *hslot;
 	struct sk_buff *nskb;
 	bool use_hash2;
+	int ret;

 	hash2_any = 0;
 	hash2 = 0;
@@ -2500,8 +2501,10 @@ static int __udp4_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 			__UDP_INC_STATS(net, UDP_MIB_INERRORS);
 			continue;
 		}
-		if (udp_queue_rcv_skb(sk, nskb) > 0)
-			consume_skb(nskb);
+		ret = udp_queue_rcv_skb(sk, nskb);
+		if (ret > 0)
+			ip_protocol_deliver_rcu(dev_net(nskb->dev), nskb,
+						ret);
 	}

 	/* Also lookup *:port if we are using hash2 and haven't done so yet. */
@@ -2511,8 +2514,10 @@ static int __udp4_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 	}

 	if (first) {
-		if (udp_queue_rcv_skb(first, skb) > 0)
-			consume_skb(skb);
+		ret = udp_queue_rcv_skb(first, skb);
+		if (ret > 0)
+			ip_protocol_deliver_rcu(dev_net(skb->dev), skb,
+						ret);
 	} else {
 		kfree_skb(skb);
 		__UDP_INC_STATS(net, UDP_MIB_IGNOREDMULTI);
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 15e032194ecc..f74935d9f7d7 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -949,6 +949,7 @@ static int __udp6_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 	struct udp_hslot *hslot;
 	struct sk_buff *nskb;
 	bool use_hash2;
+	int ret;

 	hash2_any = 0;
 	hash2 = 0;
@@ -987,8 +988,10 @@ static int __udp6_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 			continue;
 		}

-		if (udpv6_queue_rcv_skb(sk, nskb) > 0)
-			consume_skb(nskb);
+		ret = udpv6_queue_rcv_skb(sk, nskb);
+		if (ret > 0)
+			ip6_protocol_deliver_rcu(dev_net(nskb->dev), nskb,
+						 ret, true);
 	}

 	/* Also lookup *:port if we are using hash2 and haven't done so yet. */
@@ -998,8 +1001,10 @@ static int __udp6_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 	}

 	if (first) {
-		if (udpv6_queue_rcv_skb(first, skb) > 0)
-			consume_skb(skb);
+		ret = udpv6_queue_rcv_skb(first, skb);
+		if (ret > 0)
+			ip6_protocol_deliver_rcu(dev_net(skb->dev), skb,
+						 ret, true);
 	} else {
 		kfree_skb(skb);
 		__UDP6_INC_STATS(net, UDP_MIB_IGNOREDMULTI);
-- 
2.47.3

^ permalink raw reply related

* [RFC PATCH net-next 0/2] udp: fix FOU/GUE over multicast
From: Anton Danilov @ 2026-04-14 23:24 UTC (permalink / raw)
  To: netdev
  Cc: willemdebruijn.kernel, davem, dsahern, edumazet, kuba, pabeni,
	horms, shuah, linux-kselftest

UDP encapsulation (FOU, GUE) has never worked correctly with multicast
destination addresses. When a FOU-encapsulated packet arrives at a
multicast address, it enters __udp4_lib_mcast_deliver() which calls
consume_skb() on packets that need resubmission to the inner protocol
handler, silently dropping them instead.

The unicast delivery path and the GSO segmentation path both handle
this correctly, but the multicast path was never updated to support
UDP encapsulation resubmit.

This causes silent packet loss for FOU/GRETAP tunnels configured with
multicast remote addresses. The loss ratio depends on the early demux
cache hit rate - packets that hit early demux bypass the multicast path
and work correctly, masking the issue.

Reproducing the issue:

  ip netns add ns_a && ip netns add ns_b
  ip link add veth0 type veth peer name veth1
  ip link set veth0 netns ns_a && ip link set veth1 netns ns_b

  ip -n ns_a addr add 10.0.0.1/24 dev veth0 && ip -n ns_a link set veth0 up
  ip -n ns_b addr add 10.0.0.2/24 dev veth1 && ip -n ns_b link set veth1 up

  # Disable early demux to expose the bug (otherwise it's partially masked)
  ip netns exec ns_b sysctl -w net.ipv4.ip_early_demux=0

  # Join multicast group
  ip -n ns_b addr add 239.0.0.1/32 dev veth1 autojoin

  # FOU + GRETAP with multicast remote
  ip netns exec ns_b ip fou add port 4797 ipproto 47
  ip -n ns_b link add eoudp0 type gretap \
      remote 239.0.0.1 local 10.0.0.2 \
      encap fou encap-sport 4797 encap-dport 4797 key 239.0.0.1
  ip -n ns_b link set eoudp0 up

  # Send FOU/GRE packets to 239.0.0.1:4797 from ns_a
  # -> without this fix: 0 packets received on eoudp0
  # -> with this fix: all packets received on eoudp0

AI assistance (Claude, claude-opus-4-6) was used during root cause
analysis of the kernel source code (tracing the call chain from
udp_queue_rcv_skb through encap_rcv to ip_protocol_deliver_rcu,
comparing unicast/GSO/multicast paths) and during patch and selftest
authoring. The fix approach was identified by observing that the
unicast path (udp_unicast_rcv_skb) and the GSO path
(udp_queue_rcv_skb) both already handle encap resubmit correctly,
while the multicast path did not.

Anton Danilov (2):
  udp: fix encapsulation packet resubmit in multicast deliver
  selftests: net: add FOU multicast encapsulation resubmit test

 net/ipv4/udp.c                                |  13 +-
 net/ipv6/udp.c                                |  13 +-
 tools/testing/selftests/net/Makefile          |   1 +
 .../testing/selftests/net/fou_mcast_encap.sh  | 150 ++++++++++++++++++
 4 files changed, 169 insertions(+), 8 deletions(-)
 create mode 100755 tools/testing/selftests/net/fou_mcast_encap.sh

-- 
2.47.3

^ permalink raw reply

* Re: [PATCH net-next v9 01/10] net: phy: phy_link_topology: Add a helper for opportunistic alloc
From: Andrew Lunn @ 2026-04-14 23:23 UTC (permalink / raw)
  To: Maxime Chevallier
  Cc: davem, Jakub Kicinski, Eric Dumazet, Paolo Abeni, Russell King,
	Heiner Kallweit, netdev, linux-kernel, thomas.petazzoni,
	Christophe Leroy, Herve Codina, Florian Fainelli, Vladimir Oltean,
	Köry Maincent, Marek Behún, Oleksij Rempel,
	Nicolò Veronese, Simon Horman, mwojtas, Romain Gantois,
	Daniel Golle, Dimitri Fedrau
In-Reply-To: <20260403123755.175742-2-maxime.chevallier@bootlin.com>

On Fri, Apr 03, 2026 at 02:37:45PM +0200, Maxime Chevallier wrote:
> The phy_link_topology structure stores information about the PHY-related
> components connected to a net_device. It is opportunistically allocated,
> when we add the first item to the topology, as this is not relevant for
> all kinds of net_devices.
> 
> In preparation for the addition of phy_port tracking in the topology,
> let's make a dedicated helper for that allocation sequence.
> 
> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* DMA issues with the SKGE drivers
From: Benoît Dufour @ 2026-04-14 23:23 UTC (permalink / raw)
  To: netdev

[-- Attachment #1.1.1.1: Type: text/plain, Size: 1425 bytes --]

In 2024, I reported a bug about the SKGE driver, you can see it here:
https://bugzilla.kernel.org/show_bug.cgi?id=219270

Basically, the problem is that the Marvell 88E8001 on my ASUS A8V 
motherboard can only work with 32bit DMA, and if trying to use 64bit 
DMA, the NIC won't work at all and after some time, the operating system 
will become completely unresponsive (on screen tty will stop refresh, 
keyboard and mouse input will stop working too).

The fix is quite easy:

At the very end of the SKGE driver source code, the ASUS A8V motherboard 
(as well as many other boards like the ASUS A8V Deluxe) should be added 
to the list of 32bit DMA boards:
https://github.com/torvalds/linux/blob/508fed6795411f5ab277fd1edc0d7adca4946f23/drivers/net/ethernet/marvell/skge.c#L4150

In the bug report I posted in 2024, I also posted a test case:
https://bugzilla.kernel.org/attachment.cgi?id=306873&action=diff

but actually I guess a better fix would be to dynamically testing if the 
NIC support only 32bit or 64bit DMA too. If 64bit wouldn't be supported, 
this error (or a similar one) should be reported by the card:
PCI error cmd=0x117 status=0x22b0

Do you think it would be doable?

p.s.:
I know that NIC and those motherboards are very old, but I really wish 
those kinds of bugs should be fixed at some point.

-- 
Benoît Dufour

Unfortunately still a student in Computer Science

[-- Attachment #1.1.1.2: Type: text/html, Size: 2430 bytes --]

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3317 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply

* [PATCH v5] net/mlx5: Fix PTP event stack info leak and NULL ptp_clock use
From: Prathamesh Deshpande @ 2026-04-14 23:19 UTC (permalink / raw)
  To: Saeed Mahameed, Leon Romanovsky, Carolina Jubran
  Cc: Jakub Kicinski, David S . Miller, Tariq Toukan, Richard Cochran,
	Mark Bloch, netdev, linux-rdma, linux-kernel,
	Prathamesh Deshpande

In mlx5_pps_event(), ptp_event is not zero-initialized. Since it contains
a union, partial assignment can leave stale stack data in unused members.

Also, clock->ptp may be NULL if ptp_clock_register() fails.

Fix by zero-initializing ptp_event, using a local timestamp variable for
event data assignment, and guarding ptp_clock_event() with a NULL check.

Fixes: 7c39afb394c7 ("net/mlx5: PTP code migration to driver core section")
Signed-off-by: Prathamesh Deshpande <prathameshdeshpande7@gmail.com>
---
v5:
- Drop MAX_PIN_NUM check per review.
- Drop pin_config local guard to keep this revision narrowly scoped.
v4:
- Validate pin index against MAX_PIN_NUM instead of n_pins.
v3:
- Fix union corruption by using a local timestamp variable.
- Validate pin index against n_pins with WARN_ON_ONCE.
- Remove redundant pin < 0 check and cleanup TODO comment.
v2:
- Zero-initialize ptp_event to prevent stack information leak.
- Add bounds check for hardware pin index to prevent OOB access.
- Add NULL guard for pin_config to handle initialization failures.
- Add NULL check for clock->ptp as originally intended.

 drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c
index bd4e042077af..77d7b81a0a25 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c
@@ -1164,7 +1164,7 @@ static int mlx5_pps_event(struct notifier_block *nb,
 							       pps_nb);
 	struct mlx5_core_dev *mdev = clock_state->mdev;
 	struct mlx5_clock *clock = mdev->clock;
-	struct ptp_clock_event ptp_event;
+	struct ptp_clock_event ptp_event = {};
 	struct mlx5_eqe *eqe = data;
 	int pin = eqe->data.pps.pin;
 	unsigned long flags;
@@ -1173,7 +1173,7 @@ static int mlx5_pps_event(struct notifier_block *nb,
 	switch (clock->ptp_info.pin_config[pin].func) {
 	case PTP_PF_EXTTS:
 		ptp_event.index = pin;
-		ptp_event.timestamp = mlx5_real_time_mode(mdev) ?
+		ns = mlx5_real_time_mode(mdev) ?
 			mlx5_real_time_cyc2time(clock,
 						be64_to_cpu(eqe->data.pps.time_stamp)) :
 			mlx5_timecounter_cyc2time(clock,
@@ -1181,12 +1181,13 @@ static int mlx5_pps_event(struct notifier_block *nb,
 		if (clock->pps_info.enabled) {
 			ptp_event.type = PTP_CLOCK_PPSUSR;
 			ptp_event.pps_times.ts_real =
-					ns_to_timespec64(ptp_event.timestamp);
+					ns_to_timespec64(ns);
 		} else {
 			ptp_event.type = PTP_CLOCK_EXTTS;
+			ptp_event.timestamp = ns;
 		}
-		/* TODOL clock->ptp can be NULL if ptp_clock_register fails */
-		ptp_clock_event(clock->ptp, &ptp_event);
+		if (clock->ptp)
+			ptp_clock_event(clock->ptp, &ptp_event);
 		break;
 	case PTP_PF_PEROUT:
 		if (clock->shared) {
-- 
2.43.0


^ permalink raw reply related

* [PATCH net] net: dsa: remove redundant netdev_lock_ops() from conduit ethtool ops
From: Stanislav Fomichev @ 2026-04-14 23:10 UTC (permalink / raw)
  To: netdev
  Cc: davem, edumazet, kuba, pabeni, andrew, olteanv, horms, sdf,
	linux-kernel, Maxime Chevallier

DSA replaces the conduit (master) device's ethtool_ops with its own
wrappers that aggregate stats from both the conduit and DSA switch
ports. Taking the lock again inside the DSA wrappers causes a deadlock.

Stumbled upon this when booting qemu with fbnic and CONFIG_NET_DSA_LOOP=y
(which looks like some kind of testing device that auto-populates the ports
of eth0). `ethtool -i` is enough to deadlock. This means we have basically zero
coverage for DSA stuff with real ops locked devs.

Remove the redundant netdev_lock_ops()/netdev_unlock_ops() calls from
the DSA conduit ethtool wrappers.

Cc: Maxime Chevallier <maxime.chevallier@bootlin.com>
Fixes: 2bcf4772e45a ("net: ethtool: try to protect all callback with netdev instance lock")
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
(cherry picked from commit 1538c00ab3212273240112bd53692d54d95f2dd5)
---
 net/dsa/conduit.c | 16 +---------------
 1 file changed, 1 insertion(+), 15 deletions(-)

diff --git a/net/dsa/conduit.c b/net/dsa/conduit.c
index a1b044467bd6..8398d72d7e4d 100644
--- a/net/dsa/conduit.c
+++ b/net/dsa/conduit.c
@@ -27,9 +27,7 @@ static int dsa_conduit_get_regs_len(struct net_device *dev)
 	int len;
 
 	if (ops && ops->get_regs_len) {
-		netdev_lock_ops(dev);
 		len = ops->get_regs_len(dev);
-		netdev_unlock_ops(dev);
 		if (len < 0)
 			return len;
 		ret += len;
@@ -60,15 +58,11 @@ static void dsa_conduit_get_regs(struct net_device *dev,
 	int len;
 
 	if (ops && ops->get_regs_len && ops->get_regs) {
-		netdev_lock_ops(dev);
 		len = ops->get_regs_len(dev);
-		if (len < 0) {
-			netdev_unlock_ops(dev);
+		if (len < 0)
 			return;
-		}
 		regs->len = len;
 		ops->get_regs(dev, regs, data);
-		netdev_unlock_ops(dev);
 		data += regs->len;
 	}
 
@@ -115,10 +109,8 @@ static void dsa_conduit_get_ethtool_stats(struct net_device *dev,
 	int count, mcount = 0;
 
 	if (ops && ops->get_sset_count && ops->get_ethtool_stats) {
-		netdev_lock_ops(dev);
 		mcount = ops->get_sset_count(dev, ETH_SS_STATS);
 		ops->get_ethtool_stats(dev, stats, data);
-		netdev_unlock_ops(dev);
 	}
 
 	list_for_each_entry(dp, &dst->ports, list) {
@@ -149,10 +141,8 @@ static void dsa_conduit_get_ethtool_phy_stats(struct net_device *dev,
 		if (count >= 0)
 			phy_ethtool_get_stats(dev->phydev, stats, data);
 	} else if (ops && ops->get_sset_count && ops->get_ethtool_phy_stats) {
-		netdev_lock_ops(dev);
 		count = ops->get_sset_count(dev, ETH_SS_PHY_STATS);
 		ops->get_ethtool_phy_stats(dev, stats, data);
-		netdev_unlock_ops(dev);
 	}
 
 	if (count < 0)
@@ -176,13 +166,11 @@ static int dsa_conduit_get_sset_count(struct net_device *dev, int sset)
 	struct dsa_switch_tree *dst = cpu_dp->dst;
 	int count = 0;
 
-	netdev_lock_ops(dev);
 	if (sset == ETH_SS_PHY_STATS && dev->phydev &&
 	    (!ops || !ops->get_ethtool_phy_stats))
 		count = phy_ethtool_get_sset_count(dev->phydev);
 	else if (ops && ops->get_sset_count)
 		count = ops->get_sset_count(dev, sset);
-	netdev_unlock_ops(dev);
 
 	if (count < 0)
 		count = 0;
@@ -239,7 +227,6 @@ static void dsa_conduit_get_strings(struct net_device *dev, u32 stringset,
 	struct dsa_switch_tree *dst = cpu_dp->dst;
 	int count, mcount = 0;
 
-	netdev_lock_ops(dev);
 	if (stringset == ETH_SS_PHY_STATS && dev->phydev &&
 	    !ops->get_ethtool_phy_stats) {
 		mcount = phy_ethtool_get_sset_count(dev->phydev);
@@ -253,7 +240,6 @@ static void dsa_conduit_get_strings(struct net_device *dev, u32 stringset,
 			mcount = 0;
 		ops->get_strings(dev, stringset, data);
 	}
-	netdev_unlock_ops(dev);
 
 	list_for_each_entry(dp, &dst->ports, list) {
 		if (!dsa_port_is_dsa(dp) && !dsa_port_is_cpu(dp))
-- 
2.52.0


^ permalink raw reply related

* Re: [PATCH net v2 1/1] net: hsr: avoid learning unknown senders for local delivery
From: Yuan Tan @ 2026-04-14 23:05 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, Felix Maurer, Ao Zhou
  Cc: yuantan098, netdev, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Murali Karicheri,
	Shaurya Rane, Ingo Molnar, Kees Cook, Yifan Wu, Juefei Pu,
	Xin Liu, Yuqi Xu, Haoze Xie
In-Reply-To: <20260414145926.0Lwoa8ca@linutronix.de>


On 4/14/2026 7:59 AM, Sebastian Andrzej Siewior wrote:
> On 2026-04-08 12:40:15 [+0200], Felix Maurer wrote:
>> IMHO, the only real way to prevent excessive resource use on our side is
>> to put a limit on these resources. In this case, limit the size of the
>> node table (bonus: make that limit configurable as Paolo suggested).
> I am slowly catching up. There was no follow-up on this one, right?


Hi,

Thank you for checking in. Please excuse the delay as we are only able
to work on this part-time.

We are still refining the fix approach based on the recent feedback to
ensure the fix is suitable and robust.

Thanks for your patience.

Best,

Yuan

>> Thanks,
>>    Felix
> Sebastian

^ permalink raw reply

* Re: [PATCH v5 net-next 0/8] dpll/ice: Add TXC DPLL type and full TX reference clock control for E825
From: Jakub Kicinski @ 2026-04-14 21:58 UTC (permalink / raw)
  To: Kubalewski, Arkadiusz
  Cc: Nitka, Grzegorz, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, intel-wired-lan@lists.osuosl.org,
	Oros, Petr, richardcochran@gmail.com, andrew+netdev@lunn.ch,
	Kitszel, Przemyslaw, Nguyen, Anthony L,
	Prathosh.Satish@microchip.com, Vecera, Ivan, jiri@resnulli.us,
	vadim.fedorenko@linux.dev, donald.hunter@gmail.com,
	horms@kernel.org, pabeni@redhat.com, davem@davemloft.net,
	edumazet@google.com
In-Reply-To: <IA0PR11MB737882B384AE7279EBCD05C79B242@IA0PR11MB7378.namprd11.prod.outlook.com>

On Mon, 13 Apr 2026 08:19:30 +0000 Kubalewski, Arkadiusz wrote:
>> My concern is that I think this is a pretty run of the mill SyncE
>> design. If we need to pretend we have two DPLLs here if we really
>> only have one and a mux - then our APIs are mis-designed :(  
> 
> Well, the true is that we did not anticipated per-port control of the
> TX clock source, as a single DPLL device could drive multiple of such.
> 
> This is not true, that we pretend there is a second PLL - there is a
> PLL on each TX clock, maybe not a full DPLL, but still the loop with
> a control over it's sources is there and it has the same 2 external
> sources + default XO.

Don't we put that MAC PLL into bypass mode if we feed a clock from
the EEC DPLL?

> A mentioned try of adding per port MUX-type pin, just to give some control
> to the user, is where we wanted to simplify things, but in the end the API
> would have to be modified in significant way, various paths related to pin
> registration and keeping correct references, just to make working case
> for the pin_on_pin_register and it's internals. We decided that the burden
> and impact for existing design was to high.
> 
> And that is why the TXC approach emerged, the change of DPLL is minimal,
> The model is still correct from user perspective, SyncE SW controller shall
> anticipate possibility that per-port TXC dpll is there 

We are starting to push into what was previously the domain of
drivers/clk, tho. IIUC the "ASIC PLL"s are usually integrated with
clock dividers. And cannot be "configured" after chip init / async
reset (which is why I presume you whack a reset in patch 7?).

> This particular device and driver doesn't implement any EEC-type DPLL
> device, the one could think that we can just change the type here and use
> EEC type instead of new one TXC - since we share pins from external dpll
> driver, which is EEC type, and our DPLL device would have different clock_id
> and module. But, further designs, where a single NIC is having control over
> both a EEC DPLL and ability to control each source per-port this would be
> problematic. At least one NIC Port driver would have to have 2 EEC-type DPLLs
> leaving user with extra confusion.

The distinction between TXC and EEC dpll is confusing. 
I thought EEC one _was_supposed_to_ drive the Tx clock?
What PPS means is obvious, what EEC means if not driving Tx clock is
unclear to me..

Let me summarize my concerns - we need to navigate the split between
drivers/clk and dpll. We need a distinction on what goes where, because
every ASIC has a bunch of PLLs which until now have been controlled by
device tree (if at all). If the main question we want to answer is
"which clock ref is used to drive internal clock" all we need is a MUX.
If we want to make dpll cover also ASIC PLLs for platforms without
device tree we need a more generic name than TXC, IMHO.

^ permalink raw reply

* Re: [PATCH iwl-next v2 2/2] idpf: implement pci error handlers
From: Tantilov, Emil S @ 2026-04-14 21:43 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: intel-wired-lan, netdev, przemyslaw.kitszel, jay.bhat,
	ivan.d.barrera, aleksandr.loktionov, larysa.zaremba,
	anthony.l.nguyen, andrew+netdev, davem, edumazet, kuba, pabeni,
	aleksander.lobakin, linux-pci, madhu.chittim, decot, willemb,
	sheenamo
In-Reply-To: <ad5ZoDCuSsPW0lKo@wunner.de>



On 4/14/2026 8:13 AM, Lukas Wunner wrote:
> On Mon, Apr 13, 2026 at 08:16:31PM -0700, Emil Tantilov wrote:
>> +static pci_ers_result_t
>> +idpf_pci_err_slot_reset(struct pci_dev *pdev)
>> +{
>> +	struct idpf_adapter *adapter = pci_get_drvdata(pdev);
>> +
>> +	pci_restore_state(pdev);
>> +	pci_set_master(pdev);
>> +	pci_wake_from_d3(pdev, false);
>> +	if (readl(adapter->reset_reg.rstat) != 0xFFFFFFFF)
>> +		return PCI_ERS_RESULT_RECOVERED;
> 
> FWIW, there's a PCI_POSSIBLE_ERROR() helper that you may find useful
> to check for an "all ones" MMIO read.

Will check it out.

Thanks,
Emil

> 
> Thanks,
> 
> Lukas


^ permalink raw reply

* Re: [PATCH iwl-next v2 2/2] idpf: implement pci error handlers
From: Tantilov, Emil S @ 2026-04-14 21:42 UTC (permalink / raw)
  To: Lukas Wunner, Loktionov, Aleksandr
  Cc: intel-wired-lan@lists.osuosl.org, netdev@vger.kernel.org,
	Kitszel, Przemyslaw, Bhat, Jay, Barrera, Ivan D, Zaremba, Larysa,
	Nguyen, Anthony L, andrew+netdev@lunn.ch, davem@davemloft.net,
	edumazet@google.com, kuba@kernel.org, pabeni@redhat.com,
	Lobakin, Aleksander, linux-pci@vger.kernel.org, Chittim, Madhu,
	decot@google.com, willemb@google.com, sheenamo@google.com
In-Reply-To: <ad5Y-gNBDvns-WAE@wunner.de>



On 4/14/2026 8:10 AM, Lukas Wunner wrote:
> On Tue, Apr 14, 2026 at 11:09:05AM +0000, Loktionov, Aleksandr wrote:
>>> From: Tantilov, Emil S <emil.s.tantilov@intel.com>
>>> .slot_reset is the callback attempting to restore the device, provided
>>> a PCI reset was initiated by the AER driver.
> 
> Just for clarity, those callbacks are invoked by PCI core error handling
> code and are shared by EEH, AER, DPC as well as s390 error recovery flows.
> So it's not only AER.

Understood. I can change the wording to be more generic.

> 
>>> +/**
>>> + * idpf_pci_err_resume - Resume operations after PCI error recovery
>>> + * @pdev: PCI device struct
>>> + */
>>> +static void idpf_pci_err_resume(struct pci_dev *pdev) {
>>> +	struct idpf_adapter *adapter = pci_get_drvdata(pdev);
>>> +
>>> +	/* Force a PFR when resuming from PCI error. */
>>> +	if (test_and_set_bit(IDPF_PCI_CB_RESET, adapter->flags))
>>> +		adapter->dev_ops.reg_ops.trigger_reset(adapter,
>>> IDPF_HR_FUNC_RESET);
>>
>> You say "Force a PFR", but PFR is only triggered on the AER path,
>> not on the FLR path.
> 
> And?  idpf_pci_err_resume() is only invoked in the error recovery path
> (aka AER path), not FLR path AFAICS.

The driver calls is in idpf_pci_err_reset_done():

<...>-86378   [009] ..... 342752.746321: idpf_pci_err_reset_prepare 
<-pci_dev_save_and_disable
bash-86378   [045] ..... 342756.748148: idpf_pci_err_reset_done 
<-pci_reset_function
bash-86378   [045] ..... 342756.748272: idpf_pci_err_resume 
<-pci_reset_function

Thanks,
Emil

> 
> Thanks,
> 
> Lukas


^ permalink raw reply

* Re: [Intel-wired-lan] [PATCH v2] dpf: fix UAF and double free in idpf_plug_vport_aux_dev() error path
From: Jacob Keller @ 2026-04-14 21:02 UTC (permalink / raw)
  To: Guangshuo Li, Tony Nguyen, Przemek Kitszel, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Joshua Hay, Tatyana Nikolova, Madhu Chittim, intel-wired-lan,
	netdev, linux-kernel, Greg Kroah-Hartman
  Cc: stable
In-Reply-To: <20260413112030.2694563-1-lgs201920130244@gmail.com>

On 4/13/2026 4:20 AM, Guangshuo Li wrote:
> If auxiliary_device_add() fails, idpf_plug_vport_aux_dev() calls
> auxiliary_device_uninit(adev), whose release callback
> idpf_vport_adev_release() frees the containing
> struct iidc_rdma_vport_auxiliary_dev.
> 
> The current error path then accesses adev->id and later frees iadev
> again, which may lead to a use-after-free and double free.
> 
> The issue was identified by a static analysis tool I developed and
> confirmed by manual review.
> 
> Fix it by storing the allocated auxiliary device id in a local
> variable and avoiding direct freeing of iadev after
> auxiliary_device_uninit().
> 
> Fixes: be91128c579c ("idpf: implement RDMA vport auxiliary dev create, init, and destroy")
> Cc: stable@vger.kernel.org
> Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com>
> ---

This doesn't look right. The commit message analysis seems to match this
fix from Greg KH:

https://lore.kernel.org/intel-wired-lan/2026041432-tapestry-condition-22ff@gregkh/

But the changes do not make any sense to me. It looks like a poorly done
AI-generated "fix" which is not correct. Greg's version does look like
it properly resolves this.

> v2:
>   - note that the issue was identified by my static analysis tool
>   - and confirmed by manual review
> 

What even is this change log?? I see that version was sent and everyone
else was sane enough to just silently reject or ignore the v1...

>  drivers/net/ethernet/intel/idpf/idpf_idc.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/intel/idpf/idpf_idc.c b/drivers/net/ethernet/intel/idpf/idpf_idc.c
> index 6dad0593f7f2..2a18907643fc 100644
> --- a/drivers/net/ethernet/intel/idpf/idpf_idc.c
> +++ b/drivers/net/ethernet/intel/idpf/idpf_idc.c
> @@ -59,6 +59,7 @@ static int idpf_plug_vport_aux_dev(struct iidc_rdma_core_dev_info *cdev_info,
>  	char name[IDPF_IDC_MAX_ADEV_NAME_LEN];
>  	struct auxiliary_device *adev;
>  	int ret;
> +	int adev_id;
>  

You create a local variable here...

>  	iadev = kzalloc(sizeof(*iadev), GFP_KERNEL);
>  	if (!iadev)
> @@ -74,11 +75,14 @@ static int idpf_plug_vport_aux_dev(struct iidc_rdma_core_dev_info *cdev_info,
>  		goto err_ida_alloc;
>  	}
>  	adev->id = ret;
> +	adev->id = adev_id;

adev_is is never initialized, so you assign a random garbage
uninitialized value. This is obviously wrong and will lead to worse
errors than the failed cleanup.

I'm rejecting this patch in favor of the clearly appropriate fix from Greg.

>  	adev->dev.release = idpf_vport_adev_release;
>  	adev->dev.parent = &cdev_info->pdev->dev;
>  	sprintf(name, "%04x.rdma.vdev", cdev_info->pdev->vendor);
>  	adev->name = name;
>  
> +	/* iadev is owned by the auxiliary device */
> +	iadev = NULL;>  	ret = auxiliary_device_init(adev);
>  	if (ret)
>  		goto err_aux_dev_init;
> @@ -92,7 +96,7 @@ static int idpf_plug_vport_aux_dev(struct iidc_rdma_core_dev_info *cdev_info,
>  err_aux_dev_add:
>  	auxiliary_device_uninit(adev);
>  err_aux_dev_init:
> -	ida_free(&idpf_idc_ida, adev->id);
> +	ida_free(&idpf_idc_ida, adev_id);
>  err_ida_alloc:
>  	vdev_info->adev = NULL;
>  	kfree(iadev);


^ permalink raw reply

* Re: [RFC PATCH 2/4] nfs: add NFS_CAP_P2PDMA and detect transport support
From: Chuck Lever @ 2026-04-14 20:59 UTC (permalink / raw)
  To: Pranjal Shrivastava
  Cc: Trond Myklebust, Anna Schumaker, davem, Jakub Kicinski, edumazet,
	Paolo Abeni, Chuck Lever, Jeff Layton, Tom Talpey,
	Olga Kornievskaia, NeilBrown, Dai Ngo, linux-nfs, netdev
In-Reply-To: <ad6bkyA1ItA8ou9i@google.com>


On Tue, Apr 14, 2026, at 12:54 PM, Pranjal Shrivastava wrote:
> On Thu, Apr 02, 2026 at 09:11:04AM -0400, Chuck Lever wrote:
>> 
>> On Wed, Apr 1, 2026, at 3:44 PM, Pranjal Shrivastava wrote:
>> > The NFS server capabilities bitmask (server->caps) is currently full,
>> > utilizing all 32 bits of the existing unsigned int. Expand the bitmask
>> > to 64 bits (u64) to allow for new feature flags.
>> >
>> > Introduce a new capability bit, NFS_CAP_P2PDMA, to indicate that the
>> > local mount is backed by hardware and a transport capable of PCI
>> > Peer-to-Peer DMA.
>> >
>> > Update nfs_server_set_init_caps() to query the underlying SunRPC
>> > transport for P2PDMA support during the mount process. If the transport
>> > (e.g., RDMA) signals support, set the NFS_CAP_P2PDMA bit in the mount's
>> > capabilities. This allows the high-performance Direct I/O path to
>> > efficiently determine if it should allow P2P memory buffers.
>> 
>> > diff --git a/fs/nfs/client.c b/fs/nfs/client.c
>> > index be02bb227741..f177cf098d44 100644
>> > --- a/fs/nfs/client.c
>> > +++ b/fs/nfs/client.c
>> 
>> > @@ -725,6 +727,12 @@ void nfs_server_set_init_caps(struct nfs_server *server)
>> >  		nfs4_server_set_init_caps(server);
>> >  		break;
>> >  	}
>> > +
>> > +	rcu_read_lock();
>> > +	xprt = rcu_dereference(server->client->cl_xprt);
>> > +	if (xprt->ops->supports_p2pdma && xprt->ops->supports_p2pdma(xprt))
>> > +		server->caps |= NFS_CAP_P2PDMA;
>> > +	rcu_read_unlock();
>> >  }
>> >  EXPORT_SYMBOL_GPL(nfs_server_set_init_caps);
>> 
>> Is the transport even connected when the NFS client does this
>> test? If it isn't, xprtrdma and the RDMA core have not chosen
>> an underlying device yet.
>> 
>> Note that, even if this logic /is/ correct, if the transport
>> connection is lost the transport will reconnect automatically,
>> doing the RDMA CM dance again and possibly resolving to a
>> different device. The NFS client layer will be none-the-wiser
>> and the NFS_CAP_P2PDMA flag setting will be stale at that point,
>> and quite possibly incorrect if the new connection's device is
>> not P2P-enabled.
>> 
>> (Basically this is what happens when an RDMA device is removed).
>> 
>> So this detection has to be done as part of xprtrdma's connection
>> flow, and it needs to set a flag somewhere in the rpc_xprt. The
>> NFS direct I/O code path then has to look for that flag before
>> choosing the mechanism/flags it uses for each iov iter.
>> 
>
> Ack. I agree, so should we start with an inital cap and then update it 
> in the event of a transport change / disconnect? Or shall we populate 
> the cap only when a transport is connected?

IMO this flag does not belong in the NFS server CAPS, as it is a
capability associated with each RPC transport. How should
NFS_CAP_P2PDMA be set if there are two RPC transports, one with
P2PDMA enabled and with it disabled? (Perhaps it should be a flag
in the transport switch instance rather than the transport instance).

Which mechanism to use has to be re-decided every time a dreq is
scheduled because the xprt can change between an original send
and a retransmission (if, say, the COMMIT verifier changes due to
a server reboot).

Trond and Anna will have the final say about how this works.


-- 
Chuck Lever

^ permalink raw reply

* Re: [Intel-wired-lan] [PATCH iwl-net v5] ice: fix missing dpll notifications for SW pins
From: Jacob Keller @ 2026-04-14 20:46 UTC (permalink / raw)
  To: Michal Schmidt, Petr Oros, netdev
  Cc: Tony Nguyen, Przemek Kitszel, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Arkadiusz Kubalewski, intel-wired-lan, linux-kernel
In-Reply-To: <68b5cc9d-81d2-4ff5-9d3e-a6d6746dcb3e@redhat.com>

On 4/14/2026 12:16 PM, Michal Schmidt wrote:
> On 4/9/26 12:25, Petr Oros wrote:
>> ---
>>   drivers/net/ethernet/intel/ice/ice_dpll.c | 74 +++++++++++++++++++----
>>   1 file changed, 63 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/intel/ice/ice_dpll.c b/drivers/net/
>> ethernet/intel/ice/ice_dpll.c
>> index 3f8cd5b8298b57..d817f17dcf1951 100644
>> --- a/drivers/net/ethernet/intel/ice/ice_dpll.c
>> +++ b/drivers/net/ethernet/intel/ice/ice_dpll.c
>> @@ -1154,6 +1154,30 @@ ice_dpll_input_state_get(const struct dpll_pin
>> *pin, void *pin_priv,
>>                         extack, ICE_DPLL_PIN_TYPE_INPUT);
>>   }
>>   +/**
>> + * ice_dpll_sw_pin_notify_peer - notify the paired SW pin after a
>> state change
>> + * @d: pointer to dplls struct
>> + * @changed: the SW pin that was explicitly changed (already notified
>> by dpll core)
>> + *
>> + * SMA and U.FL pins share physical signal paths in pairs (SMA1/U.FL1
>> and
>> + * SMA2/U.FL2).  When one pin's routing changes via the PCA9575 GPIO
>> + * expander, the paired pin's state may also change.  Send a change
>> + * notification for the peer pin so userspace consumers monitoring the
>> + * peer via dpll netlink learn about the update.
>> + *
>> + * Context: Can be called under pf->dplls.lock, dpll_pin_change_ntf()
>> is safe.
>> + */
>> +static void ice_dpll_sw_pin_notify_peer(struct ice_dplls *d,
>> +                    struct ice_dpll_pin *changed)
>> +{
>> +    struct ice_dpll_pin *peer;
>> +
>> +    peer = (changed >= d->sma && changed < d->sma +
>> ICE_DPLL_PIN_SW_NUM) ?
>> +        &d->ufl[changed->idx] : &d->sma[changed->idx];
>> +    if (peer->pin)
>> +        dpll_pin_change_ntf(peer->pin);
>> +}
>> +
>>   /**
>>    * ice_dpll_sma_direction_set - set direction of SMA pin
>>    * @p: pointer to a pin
>> @@ -1233,6 +1257,8 @@ static int ice_dpll_sma_direction_set(struct
>> ice_dpll_pin *p,
>>               ret = ice_dpll_pin_state_update(p->pf, target,
>>                               type, extack);
>>       }
>> +    if (!ret)
>> +        ice_dpll_sw_pin_notify_peer(d, p);
>>         return ret;
>>   }
> 
> ice_dpll_sma_direction_set() runs to process a DPLL_CMD_PIN_SET command
> from userspace. It runs with dpll_lock held - taken in dpll_pin_pre_doit().
> ice_dpll_sw_pin_notify_peer() -> dpll_pin_change_ntf() will take
> dpll_lock again and deadlock.
> 

Yep. I think you could use __dpll_pin_change_ntf() which is the version
that assumes the lock is held.. but that function is not exported
outside of drivers/dpll.

Either way, this needs to be fixed somehow before I can apply it.

Thanks,
Jake

> Michal
> 


^ permalink raw reply

* Re: [PATCH RFC bpf-next 1/8] kasan: expose generic kasan helpers
From: Alexis Lothoré @ 2026-04-14 20:44 UTC (permalink / raw)
  To: Alexei Starovoitov, Alexis Lothoré
  Cc: Andrey Konovalov, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman,
	Kumar Kartikeya Dwivedi, Song Liu, Yonghong Song, Jiri Olsa,
	John Fastabend, David S. Miller, David Ahern, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, X86 ML, H. Peter Anvin,
	Shuah Khan, Maxime Coquelin, Alexandre Torgue, Andrey Ryabinin,
	Alexander Potapenko, Dmitry Vyukov, Vincenzo Frascino,
	Andrew Morton, ebpf, Bastien Curutchet, Thomas Petazzoni,
	Xu Kuohai, bpf, LKML, Network Development,
	open list:KERNEL SELFTEST FRAMEWORK, linux-stm32,
	linux-arm-kernel, kasan-dev, linux-mm
In-Reply-To: <CAADnVQ+c9h_wuNwj8pjx885oNErGY7bxxCwKi+DiJ0XKSpyYfg@mail.gmail.com>

On Tue Apr 14, 2026 at 9:16 PM CEST, Alexei Starovoitov wrote:
> On Tue, Apr 14, 2026 at 11:41 AM Alexis Lothoré
> <alexis.lothore@bootlin.com> wrote:
>>
>> On Tue Apr 14, 2026 at 4:36 PM CEST, Alexei Starovoitov wrote:
>> > On Tue, Apr 14, 2026 at 6:13 AM Alexis Lothoré
>> > <alexis.lothore@bootlin.com> wrote:
>> >>
>> >> Hi Andrey, thanks for the prompt review !

[...]

>> > No. The performance penalty will be too high.
>>
>> Since we are mentioning it, I did not consider yet any performance
>> comparision/benchmarking (and I am not really familiar with usual bpf
>> performance validation practices for new bpf features). Is there any
>> existing test I should take a look at for this ? Maybe some specific
>> benches in tools/testing/selftests/bpf/bench ?
>
> So far everything in bpf/bench/ measures bpf infra like
> maps, kprobes, tracepoints, etc.
> We don't have benchmarks for bpf programs.
> So we don't know how well JITs are generating code
> and how much inlining done by the verifier, JITs actually helps.
>
> Puranjay is working on creating a SPECint like set of benchmarks.
>
> For this kasan work we should make the best decisions from
> performance point of view, like not wasting unnecessary call
> and not saving unnecessary registers. btw in the other patch
> I think you can skip saving of r10 and r11.

Noted, I'll do some checks and tests without those two.

> But we cannot quantify yet that avoiding extra call gives us N%.
>
> You can micro-benchmark, of course, but gotta be careful
> interpreting the results. It might be too easy to get into
> thinking that JIT must inline __asan_load() for the sake of performance.

Ok, interesting, thanks for those details

Alexis

-- 
Alexis Lothoré, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


^ permalink raw reply

* Re: [Intel-wired-lan] [PATCH net] ice: fix missing SMA pin initialization in DPLL subsystem
From: Jacob Keller @ 2026-04-14 20:13 UTC (permalink / raw)
  To: Petr Oros, Rinitha, SX, netdev@vger.kernel.org
  Cc: Vecera, Ivan, Kitszel, Przemyslaw, Eric Dumazet,
	Kubalewski, Arkadiusz, Andrew Lunn, Nguyen, Anthony L,
	Simon Horman, intel-wired-lan@lists.osuosl.org, Jakub Kicinski,
	Paolo Abeni, David S. Miller, linux-kernel@vger.kernel.org
In-Reply-To: <d74a9071-336d-4fec-a061-bf9a3a444678@redhat.com>

On 4/8/2026 1:54 AM, Petr Oros wrote:
> 
> On 4/1/26 18:29, Rinitha, SX wrote:
>>> -----Original Message-----
>>> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf
>>> Of Petr Oros
>>> Sent: 13 February 2026 19:47
>>> To: netdev@vger.kernel.org
>>> Cc: Vecera, Ivan <ivecera@redhat.com>; Kitszel, Przemyslaw
>>> <przemyslaw.kitszel@intel.com>; Eric Dumazet <edumazet@google.com>;
>>> Kubalewski, Arkadiusz <arkadiusz.kubalewski@intel.com>; Andrew Lunn
>>> <andrew+netdev@lunn.ch>; Nguyen, Anthony L
>>> <anthony.l.nguyen@intel.com>; Simon Horman <horms@kernel.org>; intel-
>>> wired-lan@lists.osuosl.org; Jakub Kicinski <kuba@kernel.org>; Paolo
>>> Abeni <pabeni@redhat.com>; David S. Miller <davem@davemloft.net>;
>>> linux-kernel@vger.kernel.org
>>> Subject: [Intel-wired-lan] [PATCH net] ice: fix missing SMA pin
>>> initialization in DPLL subsystem
>>>
>>> The DPLL SMA/U.FL pin redesign introduced
>>> ice_dpll_sw_pin_frequency_get() which gates frequency reporting on
>>> the pin's active flag. This flag is determined by
>>> ice_dpll_sw_pins_update() from the PCA9575 GPIO expander state.
>>> Before the redesign, SMA pins were exposed as direct HW input/output
>>> pins and ice_dpll_frequency_get() returned the CGU frequency
>>> unconditionally — the PCA9575 state was never consulted.
>>>
>>> The PCA9575 powers on with all outputs high, setting ICE_SMA1_DIR_EN,
>>> ICE_SMA1_TX_EN, ICE_SMA2_DIR_EN and ICE_SMA2_TX_EN. Nothing in the
>>> driver writes the register during initialization, so
>>> ice_dpll_sw_pins_update() sees all pins as inactive and
>>> ice_dpll_sw_pin_frequency_get() permanently returns 0 Hz for every SW
>>> pin.
>>>
>>> Fix this by writing a default SMA configuration in
>>> ice_dpll_init_info_sw_pins(): clear all SMA bits, then set SMA1 and
>>> SMA2 as active inputs (DIR_EN=0) with U.FL1 output and U.FL2 input
>>> disabled. Each SMA/U.FL pair shares a physical signal path so only
>>> one pin per pair can be active at a time. U.FL pins still report
>>> frequency 0 after this fix: U.FL1 (output-only) is disabled by
>>> ICE_SMA1_TX_EN which keeps the TX output buffer off, and U.FL2
>>> (input-only) is disabled by ICE_SMA2_UFL2_RX_DIS. They can be
>>> activated by changing the corresponding SMA pin direction via dpll
>>> netlink.
>>>
>>> Fixes: 2dd5d03c77e2 ("ice: redesign dpll sma/u.fl pins control")
>>> Signed-off-by: Petr Oros <poros@redhat.com>
>>> ---
>>> drivers/net/ethernet/intel/ice/ice_dpll.c | 17 +++++++++++++++++
>>> 1 file changed, 17 insertions(+)
>>>
>> When SMA1 is changed from output to input , U.FL1 (input) is expected
>> to get connected but is still disconnected
>> Similary, when SMA2 is changed from input to output , U.FL2 (output)
>> is still disconnected
> 
> Hi Rinitha,
> 
> Thanks for testing this.
> The initialization patch itself is correct. After boot, the PCA9575
> register is written to a known-good default state and SMA1/SMA2
> properly report as active inputs with the expected frequency.
> 
> The behavior you describe (U.FL1/U.FL2 staying disconnected after
> SMA direction change) is a pre-existing issue in
> ice_dpll_sma_direction_set(), not in the initialization path.
> 
> I am addressing this in v2 of "[PATCH iwl-net] ice: fix U.FL pin
> state set affecting paired SMA pin" with an expanded scope that
> covers both directions of the SMA/U.FL pairing.
> 
> Is it OK like this?
> 

@Rinitha,

I agree with Petr's assessment here, that the SMA issue is pre-existing
and shouldn't block sending this patch. Could you please let me know if
you agree and we can resolve the issue you reported within Petr's other
patch? I'm hoping to put together a net series with several fixes that
have been waiting for some time.

Thanks,
Jake

> Regards,
> Petr
> 


^ permalink raw reply

* Re: [PATCH net-next v2 2/2] selftests/bpf: verify syncookie statistics in tcp_custom_syncookie
From: Martin KaFai Lau @ 2026-04-14 20:02 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Kuniyuki Iwashima, Jiayuan Chen, Eric Dumazet, Daniel Borkmann,
	netdev, Neal Cardwell, David S. Miller, Jakub Kicinski,
	Simon Horman, David Ahern, Alexei Starovoitov, Andrii Nakryiko,
	Eduard Zingerman, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Shuah Khan,
	linux-kernel, bpf, linux-kselftest
In-Reply-To: <04cafd50-7af5-4835-b0df-67393d9f5a51@redhat.com>

On Tue, Apr 14, 2026 at 11:17:39AM +0200, Paolo Abeni wrote:
> On 4/14/26 11:08 AM, Paolo Abeni wrote:
> > On 4/14/26 7:50 AM, Kuniyuki Iwashima wrote:
> >> On Fri, Apr 10, 2026 at 6:32 PM Jiayuan Chen <jiayuan.chen@linux.dev> wrote:
> >>>
> >>> Add read_tcpext_snmp() helper to network_helpers which reads a
> >>> TcpExt SNMP counter via nstat, and use it in the tcp_custom_syncookie
> >>> test to verify that LINUX_MIB_SYNCOOKIESRECV is incremented and
> >>> LINUX_MIB_SYNCOOKIESFAILED stays unchanged across a successful
> >>> BPF custom syncookie validation.
> >>>
> >>> The delta is captured between start_server() and accept(), which
> >>> covers the full SYN/ACK/cookie-check path for one connection.
> >>>
> >>> Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
> >>> ---
> >>>  tools/testing/selftests/bpf/network_helpers.c | 22 +++++++++++++++++++
> >>>  tools/testing/selftests/bpf/network_helpers.h |  1 +
> >>>  .../bpf/prog_tests/tcp_custom_syncookie.c     | 20 +++++++++++++++++
> >>
> >> As you touch bpf selftest helper files, please rebase on bpf-next
> >> to avoid possible conflicts and tag bpf-next in the Subject.
> > 
> > To hopefully  minimize the conflicts handling I'm going to apply patch
> > 1/2 to net-next. Please resubmit patch 2/2 to bpf-next after the
> > relevant net core reach there.
> 
> Uhmm... the original feature went through the bpf tree, so I guess both
> patches could/should via bpf-next. Hopefully conflict into the tcp code
> should be minimal.

I think it is best to land both patches together. It seems the 7.1 pull-request
is out. We can take it to bpf-next/net after the merge window and then follow
by a pull-request for the net-next tree as usual.

^ permalink raw reply

* Re: [RFC PATCH 4/4] nfs: allow P2PDMA in direct I/O path
From: Pranjal Shrivastava @ 2026-04-14 20:00 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: trond.myklebust, anna, davem, kuba, edumazet, pabeni, chuck.lever,
	jlayton, tom, okorniev, neil, dai.ngo, linux-nfs, netdev
In-Reply-To: <ac35ICYHuw4lEOri@infradead.org>

On Wed, Apr 01, 2026 at 10:05:36PM -0700, Christoph Hellwig wrote:
> On Wed, Apr 01, 2026 at 07:45:00PM +0000, Pranjal Shrivastava wrote:
> > Migrate the NFS Direct I/O path from the legacy iov_iter_get_pages_alloc2()
> > API to the modern iov_iter_extract_pages() API. This migration enables
> > support for PCI Peer-to-Peer DMA (P2PDMA) by allowing the setting the
> > ITER_ALLOW_P2PDMA flag.
> > 
> > Pass ITER_ALLOW_P2PDMA to iov_iter_extract_pages() only if the local
> > mount indicates support via the NFS_CAP_P2PDMA capability bit (detected
> > at mount time for RDMA transports).
> 
> Please split theconversion to iov_iter_extract_pages into a separate
> preparation patch, and even series.  That is a long overdue change
> that fixes potential data corruption in XFS.
> 

Sure, I'll send out a series with the migration to 
iov_iter_extract_pages, should I club this with the pin-aware + folios
for direct I/O or send it as a separate series?

Thanks,
Praan

^ permalink raw reply

* Re: [RFC PATCH 3/4] nfs: make nfs_page pin-aware
From: Pranjal Shrivastava @ 2026-04-14 19:58 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: trond.myklebust, anna, davem, kuba, edumazet, pabeni, chuck.lever,
	jlayton, tom, okorniev, neil, dai.ngo, linux-nfs, netdev
In-Reply-To: <ac341x4RXKoShXsB@infradead.org>

On Wed, Apr 01, 2026 at 10:04:23PM -0700, Christoph Hellwig wrote:
> This conversion really should go first as it is badly needed independent
> of any P2P support.  And I wonder if it should go further - currently
> the NFS I/O code is using folios for buffered I/O, but pages for direct
> I/O, which makes larger I/O very inefficient.
> 

Ack. I'll send this out as a different series with migration to folios
for direct I/O as well.

> The iov_iter_extract_bvecs wrapper allows to extract bvecs instead, which
> might be a good choice here either by passing down the bvecs or
> converting to an nfs_page inline.  Or just open coding a variant of
> iov_iter_extract_bvecs that converts to nfs_page structures instead of
> bvecs.  This would pair with a helper similar to __bio_release_pages on
> the unlock side.

Ack. I'll attempt an open coded variant.

> 
> > +			req = nfs_page_create_from_page(dreq->ctx, pagevec[i], false,
> >  							pgbase, pos, req_len);
> >
> 
> A lot of this code reads pretty odd as it's overflowing the lines.
> 

Ahh, my bad. For some reason even checkpatch didn't catch this, I'll fix
this here and everywhere else.

Thanks,
Praan

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox