Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net v4 1/2] tcp: call sk_data_ready() after listener migration
From: Eric Dumazet @ 2026-04-22 15:56 UTC (permalink / raw)
  To: Zhenzhong Wu
  Cc: netdev, ncardwell, kuniyu, davem, dsahern, kuba, pabeni, horms,
	shuah, tamird, linux-kernel, linux-kselftest, stable
In-Reply-To: <20260422024554.130346-2-jt26wzz@gmail.com>

On Tue, Apr 21, 2026 at 7:46 PM Zhenzhong Wu <jt26wzz@gmail.com> wrote:
>
> When inet_csk_listen_stop() migrates an established child socket from
> a closing listener to another socket in the same SO_REUSEPORT group,
> the target listener gets a new accept-queue entry via
> inet_csk_reqsk_queue_add(), but that path never notifies the target
> listener's waiters. A nonblocking accept() still works because it
> checks the queue directly, but poll()/epoll_wait() waiters and
> blocking accept() callers can also remain asleep indefinitely.
>
> Call READ_ONCE(nsk->sk_data_ready)(nsk) after a successful migration
> in inet_csk_listen_stop().
>
> However, after inet_csk_reqsk_queue_add() succeeds, the ref acquired
> in reuseport_migrate_sock() is effectively transferred to
> nreq->rsk_listener. Another CPU can then dequeue nreq via accept()
> or listener shutdown, hit reqsk_put(), and drop that listener ref.
> Since listeners are SOCK_RCU_FREE, wrap the post-queue_add()
> dereferences of nsk in rcu_read_lock()/rcu_read_unlock(), which also
> covers the existing sock_net(nsk) access in that path.
>
> The reqsk_timer_handler() path does not need the same changes for two
> reasons: half-open requests become readable only after the final ACK,
> where tcp_child_process() already wakes the listener; and once nreq is
> visible via inet_ehash_insert(), the success path no longer touches
> nsk directly.
>
> Fixes: 54b92e841937 ("tcp: Migrate TCP_ESTABLISHED/TCP_SYN_RECV sockets in accept queues.")
> Cc: stable@vger.kernel.org
> Suggested-by: Eric Dumazet <edumazet@google.com>
> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
> Signed-off-by: Zhenzhong Wu <jt26wzz@gmail.com>

Reviewed-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

* Re: [PATCH bpf-next] selftests/bpf: drop xdping tool
From: Alexis Lothoré @ 2026-04-22 15:48 UTC (permalink / raw)
  To: Alan Maguire, Alexis Lothoré (eBPF Foundation),
	Andrii Nakryiko, Eduard Zingerman, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Kumar Kartikeya Dwivedi,
	Song Liu, Yonghong Song, Jiri Olsa, Shuah Khan, David S. Miller,
	Jakub Kicinski, Jesper Dangaard Brouer, John Fastabend,
	Stanislav Fomichev
  Cc: ebpf, Bastien Curutchet, Thomas Petazzoni, linux-kernel, bpf,
	linux-kselftest, netdev
In-Reply-To: <dc0f3c46-32e0-411d-9e50-498cf9b05505@oracle.com>

On Wed Apr 22, 2026 at 4:30 PM CEST, Alan Maguire wrote:
> On 17/04/2026 16:33, Alexis Lothoré (eBPF Foundation) wrote:
>> As part of a larger cleanup effort in the bpf selftests directory,
>> tests and scripts are either being converted to the test_progs framework
>> (so they are executed automatically in bpf CI), or removed if not
>> relevant for such integration.
>> 
>> The test_xdping.sh script (with the associated xdping.c) acts as a RTT
>> measurement tool, by attaching two small xdp programs to two interfaces.
>> Converting this test to test_progs may not make much sense:
>> - RTT measurement does not really fit in the scope of a functional test,
>>   this is rather about measuring some performance level.
>> - there are other existing tests in test_progs that actively validate
>>   XDP features like program attachment, return value processing, packet
>>   modification, etc
>> 
>> Drop test_xdping.sh and the corresponding xdping.c userspace part. Keep
>> the ebpf part (xdping_kern.c), as it is used by another test integrated
>> in test_progs (btf_dump)
>> 
>> Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
>
> Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
>
> as discussed, switching to loading xdp_dummy.bpf.o in prog_tests/btf_dump.c 
> would be good too (feel free to retain the Reviewed-by: with that v2 change).

Great, I was kind of hoping to get your feedback, as you are the one who
have originally introduced the test. Thanks !

Alexis
>
> Thanks!
>
>  
>> ---
>>  tools/testing/selftests/bpf/.gitignore     |   1 -
>>  tools/testing/selftests/bpf/Makefile       |   3 -
>>  tools/testing/selftests/bpf/test_xdping.sh | 103 ------------
>>  tools/testing/selftests/bpf/xdping.c       | 254 -----------------------------
>>  4 files changed, 361 deletions(-)
>> 
>> diff --git a/tools/testing/selftests/bpf/.gitignore b/tools/testing/selftests/bpf/.gitignore
>> index bfdc5518ecc8..986a6389186b 100644
>> --- a/tools/testing/selftests/bpf/.gitignore
>> +++ b/tools/testing/selftests/bpf/.gitignore
>> @@ -21,7 +21,6 @@ test_lirc_mode2_user
>>  flow_dissector_load
>>  test_tcpnotify_user
>>  test_libbpf
>> -xdping
>>  test_cpp
>>  *.d
>>  *.subskel.h
>> diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
>> index 78e60040811e..00a986a7d088 100644
>> --- a/tools/testing/selftests/bpf/Makefile
>> +++ b/tools/testing/selftests/bpf/Makefile
>> @@ -111,7 +111,6 @@ TEST_FILES = xsk_prereqs.sh $(wildcard progs/btf_dump_test_case_*.c)
>>  # Order correspond to 'make run_tests' order
>>  TEST_PROGS := test_kmod.sh \
>>  	test_lirc_mode2.sh \
>> -	test_xdping.sh \
>>  	test_bpftool_build.sh \
>>  	test_doc_build.sh \
>>  	test_xsk.sh \
>> @@ -134,7 +133,6 @@ TEST_GEN_PROGS_EXTENDED = \
>>  	xdp_features \
>>  	xdp_hw_metadata \
>>  	xdp_synproxy \
>> -	xdping \
>>  	xskxceiver
>>  
>>  TEST_GEN_FILES += $(TEST_KMODS) liburandom_read.so urandom_read sign-file uprobe_multi
>> @@ -320,7 +318,6 @@ $(OUTPUT)/test_tcpnotify_user: $(CGROUP_HELPERS) $(TESTING_HELPERS) $(TRACE_HELP
>>  $(OUTPUT)/test_sock_fields: $(CGROUP_HELPERS) $(TESTING_HELPERS)
>>  $(OUTPUT)/test_tag: $(TESTING_HELPERS)
>>  $(OUTPUT)/test_lirc_mode2_user: $(TESTING_HELPERS)
>> -$(OUTPUT)/xdping: $(TESTING_HELPERS)
>>  $(OUTPUT)/flow_dissector_load: $(TESTING_HELPERS)
>>  $(OUTPUT)/test_maps: $(TESTING_HELPERS)
>>  $(OUTPUT)/test_verifier: $(TESTING_HELPERS) $(CAP_HELPERS) $(UNPRIV_HELPERS)
>> diff --git a/tools/testing/selftests/bpf/test_xdping.sh b/tools/testing/selftests/bpf/test_xdping.sh
>> deleted file mode 100755
>> index c3d82e0a7378..000000000000
>> --- a/tools/testing/selftests/bpf/test_xdping.sh
>> +++ /dev/null
>> @@ -1,103 +0,0 @@
>> -#!/bin/bash
>> -# SPDX-License-Identifier: GPL-2.0
>> -
>> -# xdping tests
>> -#   Here we setup and teardown configuration required to run
>> -#   xdping, exercising its options.
>> -#
>> -#   Setup is similar to test_tunnel tests but without the tunnel.
>> -#
>> -# Topology:
>> -# ---------
>> -#     root namespace   |     tc_ns0 namespace
>> -#                      |
>> -#      ----------      |     ----------
>> -#      |  veth1  | --------- |  veth0  |
>> -#      ----------    peer    ----------
>> -#
>> -# Device Configuration
>> -# --------------------
>> -# Root namespace with BPF
>> -# Device names and addresses:
>> -#	veth1 IP: 10.1.1.200
>> -#	xdp added to veth1, xdpings originate from here.
>> -#
>> -# Namespace tc_ns0 with BPF
>> -# Device names and addresses:
>> -#       veth0 IPv4: 10.1.1.100
>> -#	For some tests xdping run in server mode here.
>> -#
>> -
>> -readonly TARGET_IP="10.1.1.100"
>> -readonly TARGET_NS="xdp_ns0"
>> -
>> -readonly LOCAL_IP="10.1.1.200"
>> -
>> -setup()
>> -{
>> -	ip netns add $TARGET_NS
>> -	ip link add veth0 type veth peer name veth1
>> -	ip link set veth0 netns $TARGET_NS
>> -	ip netns exec $TARGET_NS ip addr add ${TARGET_IP}/24 dev veth0
>> -	ip addr add ${LOCAL_IP}/24 dev veth1
>> -	ip netns exec $TARGET_NS ip link set veth0 up
>> -	ip link set veth1 up
>> -}
>> -
>> -cleanup()
>> -{
>> -	set +e
>> -	ip netns delete $TARGET_NS 2>/dev/null
>> -	ip link del veth1 2>/dev/null
>> -	if [[ $server_pid -ne 0 ]]; then
>> -		kill -TERM $server_pid
>> -	fi
>> -}
>> -
>> -test()
>> -{
>> -	client_args="$1"
>> -	server_args="$2"
>> -
>> -	echo "Test client args '$client_args'; server args '$server_args'"
>> -
>> -	server_pid=0
>> -	if [[ -n "$server_args" ]]; then
>> -		ip netns exec $TARGET_NS ./xdping $server_args &
>> -		server_pid=$!
>> -		sleep 10
>> -	fi
>> -	./xdping $client_args $TARGET_IP
>> -
>> -	if [[ $server_pid -ne 0 ]]; then
>> -		kill -TERM $server_pid
>> -		server_pid=0
>> -	fi
>> -
>> -	echo "Test client args '$client_args'; server args '$server_args': PASS"
>> -}
>> -
>> -set -e
>> -
>> -server_pid=0
>> -
>> -trap cleanup EXIT
>> -
>> -setup
>> -
>> -for server_args in "" "-I veth0 -s -S" ; do
>> -	# client in skb mode
>> -	client_args="-I veth1 -S"
>> -	test "$client_args" "$server_args"
>> -
>> -	# client with count of 10 RTT measurements.
>> -	client_args="-I veth1 -S -c 10"
>> -	test "$client_args" "$server_args"
>> -done
>> -
>> -# Test drv mode
>> -test "-I veth1 -N" "-I veth0 -s -N"
>> -test "-I veth1 -N -c 10" "-I veth0 -s -N"
>> -
>> -echo "OK. All tests passed"
>> -exit 0
>> diff --git a/tools/testing/selftests/bpf/xdping.c b/tools/testing/selftests/bpf/xdping.c
>> deleted file mode 100644
>> index 9ed8c796645d..000000000000
>> --- a/tools/testing/selftests/bpf/xdping.c
>> +++ /dev/null
>> @@ -1,254 +0,0 @@
>> -// SPDX-License-Identifier: GPL-2.0
>> -/* Copyright (c) 2019, Oracle and/or its affiliates. All rights reserved. */
>> -
>> -#include <linux/bpf.h>
>> -#include <linux/if_link.h>
>> -#include <arpa/inet.h>
>> -#include <assert.h>
>> -#include <errno.h>
>> -#include <signal.h>
>> -#include <stdio.h>
>> -#include <stdlib.h>
>> -#include <string.h>
>> -#include <unistd.h>
>> -#include <libgen.h>
>> -#include <net/if.h>
>> -#include <sys/types.h>
>> -#include <sys/socket.h>
>> -#include <netdb.h>
>> -
>> -#include "bpf/bpf.h"
>> -#include "bpf/libbpf.h"
>> -
>> -#include "xdping.h"
>> -#include "testing_helpers.h"
>> -
>> -static int ifindex;
>> -static __u32 xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
>> -
>> -static void cleanup(int sig)
>> -{
>> -	bpf_xdp_detach(ifindex, xdp_flags, NULL);
>> -	if (sig)
>> -		exit(1);
>> -}
>> -
>> -static int get_stats(int fd, __u16 count, __u32 raddr)
>> -{
>> -	struct pinginfo pinginfo = { 0 };
>> -	char inaddrbuf[INET_ADDRSTRLEN];
>> -	struct in_addr inaddr;
>> -	__u16 i;
>> -
>> -	inaddr.s_addr = raddr;
>> -
>> -	printf("\nXDP RTT data:\n");
>> -
>> -	if (bpf_map_lookup_elem(fd, &raddr, &pinginfo)) {
>> -		perror("bpf_map_lookup elem");
>> -		return 1;
>> -	}
>> -
>> -	for (i = 0; i < count; i++) {
>> -		if (pinginfo.times[i] == 0)
>> -			break;
>> -
>> -		printf("64 bytes from %s: icmp_seq=%d ttl=64 time=%#.5f ms\n",
>> -		       inet_ntop(AF_INET, &inaddr, inaddrbuf,
>> -				 sizeof(inaddrbuf)),
>> -		       count + i + 1,
>> -		       (double)pinginfo.times[i]/1000000);
>> -	}
>> -
>> -	if (i < count) {
>> -		fprintf(stderr, "Expected %d samples, got %d.\n", count, i);
>> -		return 1;
>> -	}
>> -
>> -	bpf_map_delete_elem(fd, &raddr);
>> -
>> -	return 0;
>> -}
>> -
>> -static void show_usage(const char *prog)
>> -{
>> -	fprintf(stderr,
>> -		"usage: %s [OPTS] -I interface destination\n\n"
>> -		"OPTS:\n"
>> -		"    -c count		Stop after sending count requests\n"
>> -		"			(default %d, max %d)\n"
>> -		"    -I interface	interface name\n"
>> -		"    -N			Run in driver mode\n"
>> -		"    -s			Server mode\n"
>> -		"    -S			Run in skb mode\n",
>> -		prog, XDPING_DEFAULT_COUNT, XDPING_MAX_COUNT);
>> -}
>> -
>> -int main(int argc, char **argv)
>> -{
>> -	__u32 mode_flags = XDP_FLAGS_DRV_MODE | XDP_FLAGS_SKB_MODE;
>> -	struct addrinfo *a, hints = { .ai_family = AF_INET };
>> -	__u16 count = XDPING_DEFAULT_COUNT;
>> -	struct pinginfo pinginfo = { 0 };
>> -	const char *optstr = "c:I:NsS";
>> -	struct bpf_program *main_prog;
>> -	int prog_fd = -1, map_fd = -1;
>> -	struct sockaddr_in rin;
>> -	struct bpf_object *obj;
>> -	struct bpf_map *map;
>> -	char *ifname = NULL;
>> -	char filename[256];
>> -	int opt, ret = 1;
>> -	__u32 raddr = 0;
>> -	int server = 0;
>> -	char cmd[256];
>> -
>> -	while ((opt = getopt(argc, argv, optstr)) != -1) {
>> -		switch (opt) {
>> -		case 'c':
>> -			count = atoi(optarg);
>> -			if (count < 1 || count > XDPING_MAX_COUNT) {
>> -				fprintf(stderr,
>> -					"min count is 1, max count is %d\n",
>> -					XDPING_MAX_COUNT);
>> -				return 1;
>> -			}
>> -			break;
>> -		case 'I':
>> -			ifname = optarg;
>> -			ifindex = if_nametoindex(ifname);
>> -			if (!ifindex) {
>> -				fprintf(stderr, "Could not get interface %s\n",
>> -					ifname);
>> -				return 1;
>> -			}
>> -			break;
>> -		case 'N':
>> -			xdp_flags |= XDP_FLAGS_DRV_MODE;
>> -			break;
>> -		case 's':
>> -			/* use server program */
>> -			server = 1;
>> -			break;
>> -		case 'S':
>> -			xdp_flags |= XDP_FLAGS_SKB_MODE;
>> -			break;
>> -		default:
>> -			show_usage(basename(argv[0]));
>> -			return 1;
>> -		}
>> -	}
>> -
>> -	if (!ifname) {
>> -		show_usage(basename(argv[0]));
>> -		return 1;
>> -	}
>> -	if (!server && optind == argc) {
>> -		show_usage(basename(argv[0]));
>> -		return 1;
>> -	}
>> -
>> -	if ((xdp_flags & mode_flags) == mode_flags) {
>> -		fprintf(stderr, "-N or -S can be specified, not both.\n");
>> -		show_usage(basename(argv[0]));
>> -		return 1;
>> -	}
>> -
>> -	if (!server) {
>> -		/* Only supports IPv4; see hints initialization above. */
>> -		if (getaddrinfo(argv[optind], NULL, &hints, &a) || !a) {
>> -			fprintf(stderr, "Could not resolve %s\n", argv[optind]);
>> -			return 1;
>> -		}
>> -		memcpy(&rin, a->ai_addr, sizeof(rin));
>> -		raddr = rin.sin_addr.s_addr;
>> -		freeaddrinfo(a);
>> -	}
>> -
>> -	/* Use libbpf 1.0 API mode */
>> -	libbpf_set_strict_mode(LIBBPF_STRICT_ALL);
>> -
>> -	snprintf(filename, sizeof(filename), "%s_kern.bpf.o", argv[0]);
>> -
>> -	if (bpf_prog_test_load(filename, BPF_PROG_TYPE_XDP, &obj, &prog_fd)) {
>> -		fprintf(stderr, "load of %s failed\n", filename);
>> -		return 1;
>> -	}
>> -
>> -	main_prog = bpf_object__find_program_by_name(obj,
>> -						     server ? "xdping_server" : "xdping_client");
>> -	if (main_prog)
>> -		prog_fd = bpf_program__fd(main_prog);
>> -	if (!main_prog || prog_fd < 0) {
>> -		fprintf(stderr, "could not find xdping program");
>> -		return 1;
>> -	}
>> -
>> -	map = bpf_object__next_map(obj, NULL);
>> -	if (map)
>> -		map_fd = bpf_map__fd(map);
>> -	if (!map || map_fd < 0) {
>> -		fprintf(stderr, "Could not find ping map");
>> -		goto done;
>> -	}
>> -
>> -	signal(SIGINT, cleanup);
>> -	signal(SIGTERM, cleanup);
>> -
>> -	printf("Setting up XDP for %s, please wait...\n", ifname);
>> -
>> -	printf("XDP setup disrupts network connectivity, hit Ctrl+C to quit\n");
>> -
>> -	if (bpf_xdp_attach(ifindex, prog_fd, xdp_flags, NULL) < 0) {
>> -		fprintf(stderr, "Link set xdp fd failed for %s\n", ifname);
>> -		goto done;
>> -	}
>> -
>> -	if (server) {
>> -		close(prog_fd);
>> -		close(map_fd);
>> -		printf("Running server on %s; press Ctrl+C to exit...\n",
>> -		       ifname);
>> -		do { } while (1);
>> -	}
>> -
>> -	/* Start xdping-ing from last regular ping reply, e.g. for a count
>> -	 * of 10 ICMP requests, we start xdping-ing using reply with seq number
>> -	 * 10.  The reason the last "real" ping RTT is much higher is that
>> -	 * the ping program sees the ICMP reply associated with the last
>> -	 * XDP-generated packet, so ping doesn't get a reply until XDP is done.
>> -	 */
>> -	pinginfo.seq = htons(count);
>> -	pinginfo.count = count;
>> -
>> -	if (bpf_map_update_elem(map_fd, &raddr, &pinginfo, BPF_ANY)) {
>> -		fprintf(stderr, "could not communicate with BPF map: %s\n",
>> -			strerror(errno));
>> -		cleanup(0);
>> -		goto done;
>> -	}
>> -
>> -	/* We need to wait for XDP setup to complete. */
>> -	sleep(10);
>> -
>> -	snprintf(cmd, sizeof(cmd), "ping -c %d -I %s %s",
>> -		 count, ifname, argv[optind]);
>> -
>> -	printf("\nNormal ping RTT data\n");
>> -	printf("[Ignore final RTT; it is distorted by XDP using the reply]\n");
>> -
>> -	ret = system(cmd);
>> -
>> -	if (!ret)
>> -		ret = get_stats(map_fd, count, raddr);
>> -
>> -	cleanup(0);
>> -
>> -done:
>> -	if (prog_fd > 0)
>> -		close(prog_fd);
>> -	if (map_fd > 0)
>> -		close(map_fd);
>> -
>> -	return ret;
>> -}
>> 
>> ---
>> base-commit: b7fb68124aa80db90394236a9a4a6add12f4425d
>> change-id: 20260417-xdping-5c2ef5a63899
>> 
>> Best regards,
>> --  
>> Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
>> 




-- 
Alexis Lothoré, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


^ permalink raw reply

* [PATCH net 1/1] net/smc: avoid early lgr access in smc_clc_wait_msg
From: Ren Wei @ 2026-04-22 15:40 UTC (permalink / raw)
  To: linux-rdma, linux-s390, netdev
  Cc: alibuda, dust.li, sidraya, wenjia, mjambigi, tonylu, guwen, davem,
	edumazet, kuba, pabeni, horms, ubraun, yuantan098, yifanwucs,
	tomapufckgml, bird, ruijieli51, n05ec
In-Reply-To: <cover.1776850759.git.ruijieli51@gmail.com>

From: Ruijie Li <ruijieli51@gmail.com>

A CLC decline can be received while the handshake is still in an early
stage, before the connection has been associated with a link group.

The decline handling in smc_clc_wait_msg() updates link-group level sync
state for first-contact declines, but that state only exists after link
group setup has completed. Guard the link-group update accordingly and
keep the per-socket peer diagnosis handling unchanged.

This preserves the existing sync_err handling for established link-group
contexts and avoids touching link-group state before it is available.

Fixes: 0cfdd8f92cac ("smc: connection and link group creation")
Cc: stable@kernel.org
Reported-by: Yuan Tan <yuantan098@gmail.com>
Reported-by: Yifan Wu <yifanwucs@gmail.com>
Reported-by: Juefei Pu <tomapufckgml@gmail.com>
Reported-by: Xin Liu <bird@lzu.edu.cn>
Signed-off-by: Ruijie Li <ruijieli51@gmail.com>
Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
---
 net/smc/smc_clc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/smc/smc_clc.c b/net/smc/smc_clc.c
index c38fc7bf0a7e..014d527d5462 100644
--- a/net/smc/smc_clc.c
+++ b/net/smc/smc_clc.c
@@ -788,8 +788,8 @@ int smc_clc_wait_msg(struct smc_sock *smc, void *buf, int buflen,
 		dclc = (struct smc_clc_msg_decline *)clcm;
 		reason_code = SMC_CLC_DECL_PEERDECL;
 		smc->peer_diagnosis = ntohl(dclc->peer_diagnosis);
-		if (((struct smc_clc_msg_decline *)buf)->hdr.typev2 &
-						SMC_FIRST_CONTACT_MASK) {
+		if ((dclc->hdr.typev2 & SMC_FIRST_CONTACT_MASK) &&
+		    smc->conn.lgr) {
 			smc->conn.lgr->sync_err = 1;
 			smc_lgr_terminate_sched(smc->conn.lgr);
 		}
-- 
2.34.1


^ permalink raw reply related

* [syzbot ci] Re: ipv6: udp: fix memory leak in udpv6_sendmsg error path
From: syzbot ci @ 2026-04-22 15:41 UTC (permalink / raw)
  To: 25181214217, davem, dsahern, edumazet, horms, kuba, linux-kernel,
	netdev, pabeni, willemdebruijn.kernel
  Cc: syzbot, syzkaller-bugs
In-Reply-To: <20260422105802.486216-1-25181214217@stu.xidian.edu.cn>

syzbot ci has tested the following series

[v1] ipv6: udp: fix memory leak in udpv6_sendmsg error path
https://lore.kernel.org/all/20260422105802.486216-1-25181214217@stu.xidian.edu.cn
* [PATCH] ipv6: udp: fix memory leak in udpv6_sendmsg error path

and found the following issues:
* KASAN: slab-use-after-free Read in ip6_pol_route
* KASAN: slab-use-after-free Write in rcuref_put
* WARNING in rcuref_put_slowpath

Full report is available here:
https://ci.syzbot.org/series/2abb21f1-6f46-4f6f-a074-0051111986db

***

KASAN: slab-use-after-free Read in ip6_pol_route

tree:      torvalds
URL:       https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux
base:      6596a02b207886e9e00bb0161c7fd59fea53c081
arch:      amd64
compiler:  Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
config:    https://ci.syzbot.org/builds/ad85c1df-394a-471c-b2ea-0e168bab3b26/config
syz repro: https://ci.syzbot.org/findings/66d12b42-aa7a-4da1-b456-d18de1a54007/syz_repro

==================================================================
BUG: KASAN: slab-use-after-free in rt6_get_pcpu_route net/ipv6/route.c:1446 [inline]
BUG: KASAN: slab-use-after-free in ip6_pol_route+0x12b5/0x13d0 net/ipv6/route.c:2316
Read of size 4 at addr ffff88810e948518 by task syz.0.26/6002

CPU: 0 UID: 0 PID: 6002 Comm: syz.0.26 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
 print_address_description+0x55/0x1e0 mm/kasan/report.c:378
 print_report+0x58/0x70 mm/kasan/report.c:482
 kasan_report+0x117/0x150 mm/kasan/report.c:595
 rt6_get_pcpu_route net/ipv6/route.c:1446 [inline]
 ip6_pol_route+0x12b5/0x13d0 net/ipv6/route.c:2316
 pol_lookup_func include/net/ip6_fib.h:667 [inline]
 fib6_rule_lookup+0x222/0x730 net/ipv6/fib6_rules.c:123
 ip6_route_output_flags_noref net/ipv6/route.c:2699 [inline]
 ip6_route_output_flags+0x364/0x5d0 net/ipv6/route.c:2711
 ip6_route_output include/net/ip6_route.h:100 [inline]
 ip6_dst_lookup_tail+0x1c3/0x15a0 net/ipv6/ip6_output.c:1155
 ip6_dst_lookup_flow+0x89/0x150 net/ipv6/ip6_output.c:1288
 ip6_datagram_dst_update+0x73a/0xd20 net/ipv6/datagram.c:97
 __ip6_datagram_connect+0xbd1/0x1150 net/ipv6/datagram.c:256
 udpv6_connect+0x36/0x240 net/ipv6/udp.c:1297
 __sys_connect_file net/socket.c:2148 [inline]
 __sys_connect+0x312/0x450 net/socket.c:2167
 __do_sys_connect net/socket.c:2173 [inline]
 __se_sys_connect net/socket.c:2170 [inline]
 __x64_sys_connect+0x7a/0x90 net/socket.c:2170
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x15f/0xf80 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f999eb9c819
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f999fa0a028 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
RAX: ffffffffffffffda RBX: 00007f999ee15fa0 RCX: 00007f999eb9c819
RDX: 000000000000001c RSI: 00002000000002c0 RDI: 0000000000000003
RBP: 00007f999ec32c91 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f999ee16038 R14: 00007f999ee15fa0 R15: 00007fff22f1ff88
 </TASK>

Allocated by task 30:
 kasan_save_stack mm/kasan/common.c:57 [inline]
 kasan_save_track+0x3e/0x80 mm/kasan/common.c:78
 unpoison_slab_object mm/kasan/common.c:340 [inline]
 __kasan_slab_alloc+0x6c/0x80 mm/kasan/common.c:366
 kasan_slab_alloc include/linux/kasan.h:253 [inline]
 slab_post_alloc_hook mm/slub.c:4569 [inline]
 slab_alloc_node mm/slub.c:4898 [inline]
 kmem_cache_alloc_noprof+0x2bc/0x650 mm/slub.c:4905
 dst_alloc+0x105/0x170 net/core/dst.c:90
 ip6_dst_alloc net/ipv6/route.c:342 [inline]
 ip6_rt_pcpu_alloc net/ipv6/route.c:1419 [inline]
 rt6_make_pcpu_route net/ipv6/route.c:1468 [inline]
 ip6_pol_route+0xafb/0x13d0 net/ipv6/route.c:2319
 pol_lookup_func include/net/ip6_fib.h:667 [inline]
 fib6_rule_lookup+0x222/0x730 net/ipv6/fib6_rules.c:123
 ip6_route_output_flags_noref net/ipv6/route.c:2699 [inline]
 ip6_route_output_flags+0x364/0x5d0 net/ipv6/route.c:2711
 ip6_route_output include/net/ip6_route.h:100 [inline]
 ip6_dst_lookup_tail+0x1c3/0x15a0 net/ipv6/ip6_output.c:1155
 ip6_dst_lookup_flow+0x89/0x150 net/ipv6/ip6_output.c:1288
 send6+0x4dc/0x910 drivers/net/wireguard/socket.c:139
 wg_socket_send_skb_to_peer+0x111/0x1d0 drivers/net/wireguard/socket.c:177
 wg_packet_send_handshake_initiation drivers/net/wireguard/send.c:40 [inline]
 wg_packet_handshake_send_worker+0x203/0x350 drivers/net/wireguard/send.c:51
 process_one_work kernel/workqueue.c:3302 [inline]
 process_scheduled_works+0xb5d/0x1860 kernel/workqueue.c:3385
 worker_thread+0xa53/0xfc0 kernel/workqueue.c:3466
 kthread+0x388/0x470 kernel/kthread.c:436
 ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245

Freed by task 23:
 kasan_save_stack mm/kasan/common.c:57 [inline]
 kasan_save_track+0x3e/0x80 mm/kasan/common.c:78
 kasan_save_free_info+0x46/0x50 mm/kasan/generic.c:584
 poison_slab_object mm/kasan/common.c:253 [inline]
 __kasan_slab_free+0x5c/0x80 mm/kasan/common.c:285
 kasan_slab_free include/linux/kasan.h:235 [inline]
 slab_free_hook mm/slub.c:2689 [inline]
 slab_free mm/slub.c:6246 [inline]
 kmem_cache_free+0x182/0x650 mm/slub.c:6373
 dst_destroy+0x235/0x350 net/core/dst.c:122
 rcu_do_batch kernel/rcu/tree.c:2617 [inline]
 rcu_core+0x7cd/0x1070 kernel/rcu/tree.c:2869
 handle_softirqs+0x22a/0x840 kernel/softirq.c:622
 run_ksoftirqd+0x36/0x60 kernel/softirq.c:1076
 smpboot_thread_fn+0x541/0xa50 kernel/smpboot.c:160
 kthread+0x388/0x470 kernel/kthread.c:436
 ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245

Last potentially related work creation:
 kasan_save_stack+0x3e/0x60 mm/kasan/common.c:57
 kasan_record_aux_stack+0xbd/0xd0 mm/kasan/generic.c:556
 __call_rcu_common kernel/rcu/tree.c:3131 [inline]
 call_rcu+0xee/0x890 kernel/rcu/tree.c:3251
 inet_sock_destruct+0x564/0x740 net/ipv4/af_inet.c:165
 __sk_destruct+0x8d/0x9d0 net/core/sock.c:2352
 rcu_do_batch kernel/rcu/tree.c:2617 [inline]
 rcu_core+0x7cd/0x1070 kernel/rcu/tree.c:2869
 handle_softirqs+0x22a/0x840 kernel/softirq.c:622
 run_ksoftirqd+0x36/0x60 kernel/softirq.c:1076
 smpboot_thread_fn+0x541/0xa50 kernel/smpboot.c:160
 kthread+0x388/0x470 kernel/kthread.c:436
 ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245

The buggy address belongs to the object at ffff88810e948480
 which belongs to the cache ip6_dst_cache of size 232
The buggy address is located 152 bytes inside of
 freed 232-byte region [ffff88810e948480, ffff88810e948568)

The buggy address belongs to the physical page:
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x10e948
head: order:1 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
memcg:ffff88810e9480f9
flags: 0x17ff00000000040(head|node=0|zone=2|lastcpupid=0x7ff)
page_type: f5(slab)
raw: 017ff00000000040 ffff8881772f6c80 dead000000000100 dead000000000122
raw: 0000000000000000 0000018000150015 00000000f5000000 ffff88810e9480f9
head: 017ff00000000040 ffff8881772f6c80 dead000000000100 dead000000000122
head: 0000000000000000 0000018000150015 00000000f5000000 ffff88810e9480f9
head: 017ff00000000001 ffffffffffffff81 00000000ffffffff 00000000ffffffff
head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000002
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 1, migratetype Unmovable, gfp_mask 0xd2820(GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 5871, tgid 5871 (kworker/0:3), ts 86853962004, free_ts 86835725693
 set_page_owner include/linux/page_owner.h:32 [inline]
 post_alloc_hook+0x231/0x280 mm/page_alloc.c:1858
 prep_new_page mm/page_alloc.c:1866 [inline]
 get_page_from_freelist+0x24ba/0x2540 mm/page_alloc.c:3946
 __alloc_frozen_pages_noprof+0x18d/0x380 mm/page_alloc.c:5226
 alloc_slab_page mm/slub.c:3278 [inline]
 allocate_slab+0x77/0x660 mm/slub.c:3467
 new_slab mm/slub.c:3525 [inline]
 refill_objects+0x339/0x3d0 mm/slub.c:7251
 refill_sheaf mm/slub.c:2816 [inline]
 __pcs_replace_empty_main+0x321/0x720 mm/slub.c:4651
 alloc_from_pcs mm/slub.c:4749 [inline]
 slab_alloc_node mm/slub.c:4883 [inline]
 kmem_cache_alloc_noprof+0x37d/0x650 mm/slub.c:4905
 dst_alloc+0x105/0x170 net/core/dst.c:90
 ip6_dst_alloc net/ipv6/route.c:342 [inline]
 ip6_rt_pcpu_alloc net/ipv6/route.c:1419 [inline]
 rt6_make_pcpu_route net/ipv6/route.c:1468 [inline]
 ip6_pol_route+0xafb/0x13d0 net/ipv6/route.c:2319
 pol_lookup_func include/net/ip6_fib.h:667 [inline]
 fib6_rule_lookup+0x556/0x730 net/ipv6/fib6_rules.c:123
 ip6_route_input_lookup net/ipv6/route.c:2352 [inline]
 ip6_route_input+0x730/0xad0 net/ipv6/route.c:2655
 ip6_rcv_finish+0x141/0x280 net/ipv6/ip6_input.c:117
 NF_HOOK+0x336/0x3c0 include/linux/netfilter.h:318
 __netif_receive_skb_one_core net/core/dev.c:6209 [inline]
 __netif_receive_skb net/core/dev.c:6322 [inline]
 process_backlog+0x7dd/0x1950 net/core/dev.c:6673
 __napi_poll+0xae/0x340 net/core/dev.c:7737
 napi_poll net/core/dev.c:7800 [inline]
 net_rx_action+0x627/0xf70 net/core/dev.c:7957
page last free pid 5871 tgid 5871 stack trace:
 reset_page_owner include/linux/page_owner.h:25 [inline]
 __free_pages_prepare mm/page_alloc.c:1402 [inline]
 __free_frozen_pages+0xbc7/0xd30 mm/page_alloc.c:2943
 rcu_do_batch kernel/rcu/tree.c:2617 [inline]
 rcu_core+0x7cd/0x1070 kernel/rcu/tree.c:2869
 handle_softirqs+0x22a/0x840 kernel/softirq.c:622
 do_softirq+0x76/0xd0 kernel/softirq.c:523
 __local_bh_enable_ip+0xf8/0x130 kernel/softirq.c:450
 local_bh_enable include/linux/bottom_half.h:33 [inline]
 __alloc_skb+0x1aa/0x7d0 net/core/skbuff.c:697
 alloc_skb include/linux/skbuff.h:1383 [inline]
 mld_newpack+0x14c/0xc90 net/ipv6/mcast.c:1775
 add_grhead+0x5a/0x2a0 net/ipv6/mcast.c:1886
 add_grec+0x1452/0x1740 net/ipv6/mcast.c:2025
 mld_send_cr net/ipv6/mcast.c:2148 [inline]
 mld_ifc_work+0x6e6/0xe70 net/ipv6/mcast.c:2693
 process_one_work kernel/workqueue.c:3302 [inline]
 process_scheduled_works+0xb5d/0x1860 kernel/workqueue.c:3385
 worker_thread+0xa53/0xfc0 kernel/workqueue.c:3466
 kthread+0x388/0x470 kernel/kthread.c:436
 ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245

Memory state around the buggy address:
 ffff88810e948400: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff88810e948480: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88810e948500: fb fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc
                            ^
 ffff88810e948580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff88810e948600: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================


***

KASAN: slab-use-after-free Write in rcuref_put

tree:      torvalds
URL:       https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux
base:      6596a02b207886e9e00bb0161c7fd59fea53c081
arch:      amd64
compiler:  Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
config:    https://ci.syzbot.org/builds/ad85c1df-394a-471c-b2ea-0e168bab3b26/config
syz repro: https://ci.syzbot.org/findings/3d5ef30a-8158-4bce-901b-48b8fcc50925/syz_repro

==================================================================
BUG: KASAN: slab-use-after-free in instrument_atomic_read_write include/linux/instrumented.h:112 [inline]
BUG: KASAN: slab-use-after-free in atomic_sub_return_release include/linux/atomic/atomic-instrumented.h:326 [inline]
BUG: KASAN: slab-use-after-free in __rcuref_put include/linux/rcuref.h:109 [inline]
BUG: KASAN: slab-use-after-free in rcuref_put+0xf7/0x170 include/linux/rcuref.h:173
Write of size 4 at addr ffff8881130ec940 by task klogd/5265

CPU: 0 UID: 0 PID: 5265 Comm: klogd Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
Call Trace:
 <IRQ>
 dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
 print_address_description+0x55/0x1e0 mm/kasan/report.c:378
 print_report+0x58/0x70 mm/kasan/report.c:482
 kasan_report+0x117/0x150 mm/kasan/report.c:595
 check_region_inline mm/kasan/generic.c:-1 [inline]
 kasan_check_range+0x264/0x2c0 mm/kasan/generic.c:200
 instrument_atomic_read_write include/linux/instrumented.h:112 [inline]
 atomic_sub_return_release include/linux/atomic/atomic-instrumented.h:326 [inline]
 __rcuref_put include/linux/rcuref.h:109 [inline]
 rcuref_put+0xf7/0x170 include/linux/rcuref.h:173
 dst_release+0x24/0x1b0 net/core/dst.c:168
 inet_sock_destruct+0x564/0x740 net/ipv4/af_inet.c:165
 __sk_destruct+0x8d/0x9d0 net/core/sock.c:2352
 rcu_do_batch kernel/rcu/tree.c:2617 [inline]
 rcu_core+0x7cd/0x1070 kernel/rcu/tree.c:2869
 handle_softirqs+0x22a/0x840 kernel/softirq.c:622
 do_softirq+0x76/0xd0 kernel/softirq.c:523
 </IRQ>
 <TASK>
 __local_bh_enable_ip+0xf8/0x130 kernel/softirq.c:450
 local_bh_enable include/linux/bottom_half.h:33 [inline]
 __alloc_skb+0x1aa/0x7d0 net/core/skbuff.c:697
 alloc_skb include/linux/skbuff.h:1383 [inline]
 alloc_skb_with_frags+0xc8/0x760 net/core/skbuff.c:6734
 sock_alloc_send_pskb+0x878/0x990 net/core/sock.c:2998
 unix_dgram_sendmsg+0x460/0x18d0 net/unix/af_unix.c:2131
 sock_sendmsg_nosec net/socket.c:787 [inline]
 __sock_sendmsg net/socket.c:802 [inline]
 __sys_sendto+0x672/0x710 net/socket.c:2265
 __do_sys_sendto net/socket.c:2272 [inline]
 __se_sys_sendto net/socket.c:2268 [inline]
 __x64_sys_sendto+0xde/0x100 net/socket.c:2268
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x15f/0xf80 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f2da31309b5
Code: 8b 44 24 08 48 83 c4 28 48 98 c3 48 98 c3 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 26 45 31 c9 45 31 c0 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 76 7a 48 8b 15 44 c4 0c 00 f7 d8 64 89 02 48 83
RSP: 002b:00007fffcd483948 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f2da31309b5
RDX: 0000000000000071 RSI: 000055812f074f90 RDI: 0000000000000003
RBP: 000055812f070910 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000004000 R11: 0000000000000246 R12: 0000000000000013
R13: 00007f2da32be212 R14: 00007fffcd483a48 R15: 0000000000000000
 </TASK>

Allocated by task 3571:
 kasan_save_stack mm/kasan/common.c:57 [inline]
 kasan_save_track+0x3e/0x80 mm/kasan/common.c:78
 unpoison_slab_object mm/kasan/common.c:340 [inline]
 __kasan_slab_alloc+0x6c/0x80 mm/kasan/common.c:366
 kasan_slab_alloc include/linux/kasan.h:253 [inline]
 slab_post_alloc_hook mm/slub.c:4569 [inline]
 slab_alloc_node mm/slub.c:4898 [inline]
 kmem_cache_alloc_noprof+0x2bc/0x650 mm/slub.c:4905
 dst_alloc+0x105/0x170 net/core/dst.c:90
 ip6_dst_alloc net/ipv6/route.c:342 [inline]
 ip6_rt_pcpu_alloc net/ipv6/route.c:1419 [inline]
 rt6_make_pcpu_route net/ipv6/route.c:1468 [inline]
 ip6_pol_route+0xafb/0x13d0 net/ipv6/route.c:2319
 pol_lookup_func include/net/ip6_fib.h:667 [inline]
 fib6_rule_lookup+0x222/0x730 net/ipv6/fib6_rules.c:123
 ip6_route_output_flags_noref net/ipv6/route.c:2699 [inline]
 ip6_route_output_flags+0x364/0x5d0 net/ipv6/route.c:2711
 ip6_route_output include/net/ip6_route.h:100 [inline]
 ip6_dst_lookup_tail+0x1c3/0x15a0 net/ipv6/ip6_output.c:1155
 ip6_dst_lookup_flow+0x89/0x150 net/ipv6/ip6_output.c:1288
 send6+0x4dc/0x910 drivers/net/wireguard/socket.c:139
 wg_socket_send_skb_to_peer+0x111/0x1d0 drivers/net/wireguard/socket.c:177
 wg_packet_send_handshake_initiation drivers/net/wireguard/send.c:40 [inline]
 wg_packet_handshake_send_worker+0x203/0x350 drivers/net/wireguard/send.c:51
 process_one_work kernel/workqueue.c:3302 [inline]
 process_scheduled_works+0xb5d/0x1860 kernel/workqueue.c:3385
 worker_thread+0xa53/0xfc0 kernel/workqueue.c:3466
 kthread+0x388/0x470 kernel/kthread.c:436
 ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245

Freed by task 5265:
 kasan_save_stack mm/kasan/common.c:57 [inline]
 kasan_save_track+0x3e/0x80 mm/kasan/common.c:78
 kasan_save_free_info+0x46/0x50 mm/kasan/generic.c:584
 poison_slab_object mm/kasan/common.c:253 [inline]
 __kasan_slab_free+0x5c/0x80 mm/kasan/common.c:285
 kasan_slab_free include/linux/kasan.h:235 [inline]
 slab_free_hook mm/slub.c:2689 [inline]
 slab_free mm/slub.c:6246 [inline]
 kmem_cache_free+0x182/0x650 mm/slub.c:6373
 dst_destroy+0x235/0x350 net/core/dst.c:122
 rcu_do_batch kernel/rcu/tree.c:2617 [inline]
 rcu_core+0x7cd/0x1070 kernel/rcu/tree.c:2869
 handle_softirqs+0x22a/0x840 kernel/softirq.c:622
 do_softirq+0x76/0xd0 kernel/softirq.c:523
 __local_bh_enable_ip+0xf8/0x130 kernel/softirq.c:450
 local_bh_enable include/linux/bottom_half.h:33 [inline]
 __alloc_skb+0x1aa/0x7d0 net/core/skbuff.c:697
 alloc_skb include/linux/skbuff.h:1383 [inline]
 alloc_skb_with_frags+0xc8/0x760 net/core/skbuff.c:6734
 sock_alloc_send_pskb+0x878/0x990 net/core/sock.c:2998
 unix_dgram_sendmsg+0x460/0x18d0 net/unix/af_unix.c:2131
 sock_sendmsg_nosec net/socket.c:787 [inline]
 __sock_sendmsg net/socket.c:802 [inline]
 __sys_sendto+0x672/0x710 net/socket.c:2265
 __do_sys_sendto net/socket.c:2272 [inline]
 __se_sys_sendto net/socket.c:2268 [inline]
 __x64_sys_sendto+0xde/0x100 net/socket.c:2268
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x15f/0xf80 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Last potentially related work creation:
 kasan_save_stack+0x3e/0x60 mm/kasan/common.c:57
 kasan_record_aux_stack+0xbd/0xd0 mm/kasan/generic.c:556
 __call_rcu_common kernel/rcu/tree.c:3131 [inline]
 call_rcu+0xee/0x890 kernel/rcu/tree.c:3251
 udpv6_sendmsg+0x1e9c/0x2690 net/ipv6/udp.c:1712
 sock_sendmsg_nosec net/socket.c:787 [inline]
 __sock_sendmsg net/socket.c:802 [inline]
 ____sys_sendmsg+0x5c7/0x9f0 net/socket.c:2698
 ___sys_sendmsg+0x2a5/0x360 net/socket.c:2752
 __sys_sendmmsg+0x27c/0x4e0 net/socket.c:2841
 __do_sys_sendmmsg net/socket.c:2868 [inline]
 __se_sys_sendmmsg net/socket.c:2865 [inline]
 __x64_sys_sendmmsg+0xa0/0xc0 net/socket.c:2865
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x15f/0xf80 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

The buggy address belongs to the object at ffff8881130ec900
 which belongs to the cache ip6_dst_cache of size 232
The buggy address is located 64 bytes inside of
 freed 232-byte region [ffff8881130ec900, ffff8881130ec9e8)

The buggy address belongs to the physical page:
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1130ec
head: order:1 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
memcg:ffff8881130ec0f9
flags: 0x17ff00000000040(head|node=0|zone=2|lastcpupid=0x7ff)
page_type: f5(slab)
raw: 017ff00000000040 ffff888103f2c500 dead000000000100 dead000000000122
raw: 0000000000000000 0000018000150015 00000000f5000000 ffff8881130ec0f9
head: 017ff00000000040 ffff888103f2c500 dead000000000100 dead000000000122
head: 0000000000000000 0000018000150015 00000000f5000000 ffff8881130ec0f9
head: 017ff00000000001 ffffffffffffff81 00000000ffffffff 00000000ffffffff
head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000002
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 1, migratetype Unmovable, gfp_mask 0xd2820(GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 13, tgid 13 (kworker/u8:1), ts 74988667148, free_ts 74987085458
 set_page_owner include/linux/page_owner.h:32 [inline]
 post_alloc_hook+0x231/0x280 mm/page_alloc.c:1858
 prep_new_page mm/page_alloc.c:1866 [inline]
 get_page_from_freelist+0x24ba/0x2540 mm/page_alloc.c:3946
 __alloc_frozen_pages_noprof+0x18d/0x380 mm/page_alloc.c:5226
 alloc_slab_page mm/slub.c:3278 [inline]
 allocate_slab+0x77/0x660 mm/slub.c:3467
 new_slab mm/slub.c:3525 [inline]
 refill_objects+0x339/0x3d0 mm/slub.c:7251
 refill_sheaf mm/slub.c:2816 [inline]
 __pcs_replace_empty_main+0x321/0x720 mm/slub.c:4651
 alloc_from_pcs mm/slub.c:4749 [inline]
 slab_alloc_node mm/slub.c:4883 [inline]
 kmem_cache_alloc_noprof+0x37d/0x650 mm/slub.c:4905
 dst_alloc+0x105/0x170 net/core/dst.c:90
 ip6_dst_alloc net/ipv6/route.c:342 [inline]
 icmp6_dst_alloc+0x75/0x440 net/ipv6/route.c:3337
 ndisc_send_skb+0x44a/0x1670 net/ipv6/ndisc.c:491
 ndisc_send_ns+0xd7/0x160 net/ipv6/ndisc.c:671
 addrconf_dad_work+0xac4/0x14c0 net/ipv6/addrconf.c:4294
 process_one_work kernel/workqueue.c:3302 [inline]
 process_scheduled_works+0xb5d/0x1860 kernel/workqueue.c:3385
 worker_thread+0xa53/0xfc0 kernel/workqueue.c:3466
 kthread+0x388/0x470 kernel/kthread.c:436
 ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
page last free pid 5952 tgid 5952 stack trace:
 reset_page_owner include/linux/page_owner.h:25 [inline]
 __free_pages_prepare mm/page_alloc.c:1402 [inline]
 __free_frozen_pages+0xbc7/0xd30 mm/page_alloc.c:2943
 __slab_free+0x274/0x2c0 mm/slub.c:5608
 qlink_free mm/kasan/quarantine.c:163 [inline]
 qlist_free_all+0x99/0x100 mm/kasan/quarantine.c:179
 kasan_quarantine_reduce+0x148/0x160 mm/kasan/quarantine.c:286
 __kasan_slab_alloc+0x22/0x80 mm/kasan/common.c:350
 kasan_slab_alloc include/linux/kasan.h:253 [inline]
 slab_post_alloc_hook mm/slub.c:4569 [inline]
 slab_alloc_node mm/slub.c:4898 [inline]
 kmem_cache_alloc_noprof+0x2bc/0x650 mm/slub.c:4905
 alloc_empty_file+0x5b/0x1d0 fs/file_table.c:262
 alloc_file fs/file_table.c:396 [inline]
 alloc_file_pseudo+0x155/0x240 fs/file_table.c:425
 sock_alloc_file+0xb8/0x2e0 net/socket.c:543
 sock_map_fd net/socket.c:573 [inline]
 __sys_socket+0x13c/0x1b0 net/socket.c:1815
 __do_sys_socket net/socket.c:1820 [inline]
 __se_sys_socket net/socket.c:1818 [inline]
 __x64_sys_socket+0x7a/0x90 net/socket.c:1818
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x15f/0xf80 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Memory state around the buggy address:
 ffff8881130ec800: fb fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc
 ffff8881130ec880: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8881130ec900: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                           ^
 ffff8881130ec980: fb fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc
 ffff8881130eca00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================


***

WARNING in rcuref_put_slowpath

tree:      torvalds
URL:       https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux
base:      6596a02b207886e9e00bb0161c7fd59fea53c081
arch:      amd64
compiler:  Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
config:    https://ci.syzbot.org/builds/ad85c1df-394a-471c-b2ea-0e168bab3b26/config
syz repro: https://ci.syzbot.org/findings/e5d6936b-9f45-45fd-88ab-8e917a818ac4/syz_repro

------------[ cut here ]------------
rcuref - imbalanced put()
WARNING: lib/rcuref.c:266 at rcuref_put_slowpath+0x16e/0x1d0 lib/rcuref.c:266, CPU#0: udevd/5862
Modules linked in:
CPU: 0 UID: 0 PID: 5862 Comm: udevd Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:rcuref_put_slowpath+0x16e/0x1d0 lib/rcuref.c:266
Code: c1 e8 03 42 0f b6 04 38 84 c0 75 48 c7 03 00 00 00 a0 31 c0 e9 6d ff ff ff e8 9e ef 06 07 e8 99 1c 14 fd 48 8d 3d 02 a8 8c 0b <67> 48 0f b9 3a 48 89 df be 04 00 00 00 e8 50 5a 7f fd 48 89 d8 48
RSP: 0018:ffffc90000007c20 EFLAGS: 00010246
RAX: ffffffff84b1b457 RBX: ffff88811617cac0 RCX: ffff8881706f1d80
RDX: 0000000000000100 RSI: 00000000dfffffff RDI: ffffffff903e5c60
RBP: ffffc90000007cb8 R08: ffff88811617cac3 R09: 1ffff11022c2f958
R10: dffffc0000000000 R11: ffffed1022c2f959 R12: 1ffff92000000f84
R13: ffff888116047a08 R14: 00000000dfffffff R15: dffffc0000000000
FS:  00007f0704ed8c80(0000) GS:ffff88818dc14000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f58f0b456b8 CR3: 000000016883c000 CR4: 00000000000006f0
Call Trace:
 <IRQ>
 __rcuref_put include/linux/rcuref.h:117 [inline]
 rcuref_put+0x15b/0x170 include/linux/rcuref.h:173
 dst_release+0x24/0x1b0 net/core/dst.c:168
 inet_sock_destruct+0x564/0x740 net/ipv4/af_inet.c:165
 __sk_destruct+0x8d/0x9d0 net/core/sock.c:2352
 rcu_do_batch kernel/rcu/tree.c:2617 [inline]
 rcu_core+0x7cd/0x1070 kernel/rcu/tree.c:2869
 handle_softirqs+0x22a/0x840 kernel/softirq.c:622
 __do_softirq kernel/softirq.c:656 [inline]
 invoke_softirq kernel/softirq.c:496 [inline]
 __irq_exit_rcu+0xca/0x220 kernel/softirq.c:735
 irq_exit_rcu+0x9/0x30 kernel/softirq.c:752
 instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1061 [inline]
 sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1061
 </IRQ>
 <TASK>
 asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:697
RIP: 0010:lock_acquire+0x221/0x350 kernel/locking/lockdep.c:5872
Code: ff ff ff e8 a1 d7 16 0a f7 44 24 08 00 02 00 00 0f 84 3a ff ff ff 65 48 8b 05 8b f2 9e 11 48 3b 44 24 58 75 33 fb 48 83 c4 60 <5b> 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc cc 48 8d 3d 58 0a 95
RSP: 0018:ffffc900032075b0 EFLAGS: 00000282
RAX: 5df8ffef7d258800 RBX: 0000000000000000 RCX: 0000000000000046
RDX: 00000000c1e3b5cc RSI: ffffffff8e24e50c RDI: ffffffff8c289ee0
RBP: ffffffff81d5c3d6 R08: ffffffff81d5c3d6 R09: ffffffff8e95cce0
R10: ffffc900032076d8 R11: ffffffff81b105c0 R12: 0000000000000002
R13: ffffffff8e95cce0 R14: 0000000000000000 R15: 0000000000000246
 rcu_lock_acquire include/linux/rcupdate.h:300 [inline]
 rcu_read_lock include/linux/rcupdate.h:838 [inline]
 is_bpf_text_address+0x47/0x2b0 kernel/bpf/core.c:747
 kernel_text_address+0xa5/0xe0 kernel/extable.c:125
 __kernel_text_address+0xd/0x30 kernel/extable.c:79
 unwind_get_return_address+0x4d/0x90 arch/x86/kernel/unwind_orc.c:385
 arch_stack_walk+0xfb/0x150 arch/x86/kernel/stacktrace.c:26
 stack_trace_save+0xa9/0x100 kernel/stacktrace.c:122
 kasan_save_stack mm/kasan/common.c:57 [inline]
 kasan_save_track+0x3e/0x80 mm/kasan/common.c:78
 kasan_save_free_info+0x46/0x50 mm/kasan/generic.c:584
 poison_slab_object mm/kasan/common.c:253 [inline]
 __kasan_slab_free+0x5c/0x80 mm/kasan/common.c:285
 kasan_slab_free include/linux/kasan.h:235 [inline]
 slab_free_hook mm/slub.c:2689 [inline]
 slab_free mm/slub.c:6246 [inline]
 kfree+0x1c5/0x640 mm/slub.c:6561
 tomoyo_path_perm+0x403/0x560 security/tomoyo/file.c:847
 security_inode_getattr+0x12b/0x310 security/security.c:1895
 vfs_getattr fs/stat.c:259 [inline]
 vfs_fstat fs/stat.c:281 [inline]
 vfs_fstatat+0xb4/0x170 fs/stat.c:371
 __do_sys_newfstatat fs/stat.c:538 [inline]
 __se_sys_newfstatat fs/stat.c:532 [inline]
 __x64_sys_newfstatat+0x151/0x200 fs/stat.c:532
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x15f/0xf80 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f0704b165f4
Code: 64 c7 00 09 00 00 00 83 c8 ff c3 48 89 f2 b9 00 01 00 00 48 89 fe bf 9c ff ff ff e9 00 00 00 00 41 89 ca b8 06 01 00 00 0f 05 <45> 31 c0 3d 00 f0 ff ff 76 10 48 8b 15 03 a8 0d 00 f7 d8 41 83 c8
RSP: 002b:00007ffff3316628 EFLAGS: 00000206 ORIG_RAX: 0000000000000106
RAX: ffffffffffffffda RBX: 00007f0704bee460 RCX: 00007f0704b165f4
RDX: 00007ffff3316630 RSI: 00007f0704bb3130 RDI: 0000000000000009
RBP: 0000563dde263be0 R08: 0000000000000000 R09: 0000000000000001
R10: 0000000000001000 R11: 0000000000000206 R12: 0000000000000002
R13: 0000000000000002 R14: 0000563dde263be0 R15: 0000563db7af4ea6
 </TASK>
----------------
Code disassembly (best guess):
   0:	c1 e8 03             	shr    $0x3,%eax
   3:	42 0f b6 04 38       	movzbl (%rax,%r15,1),%eax
   8:	84 c0                	test   %al,%al
   a:	75 48                	jne    0x54
   c:	c7 03 00 00 00 a0    	movl   $0xa0000000,(%rbx)
  12:	31 c0                	xor    %eax,%eax
  14:	e9 6d ff ff ff       	jmp    0xffffff86
  19:	e8 9e ef 06 07       	call   0x706efbc
  1e:	e8 99 1c 14 fd       	call   0xfd141cbc
  23:	48 8d 3d 02 a8 8c 0b 	lea    0xb8ca802(%rip),%rdi        # 0xb8ca82c
* 2a:	67 48 0f b9 3a       	ud1    (%edx),%rdi <-- trapping instruction
  2f:	48 89 df             	mov    %rbx,%rdi
  32:	be 04 00 00 00       	mov    $0x4,%esi
  37:	e8 50 5a 7f fd       	call   0xfd7f5a8c
  3c:	48 89 d8             	mov    %rbx,%rax
  3f:	48                   	rex.W


***

If these findings have caused you to resend the series or submit a
separate fix, please add the following tag to your commit message:
  Tested-by: syzbot@syzkaller.appspotmail.com

---
This report is generated by a bot. It may contain errors.
syzbot ci engineers can be reached at syzkaller@googlegroups.com.

To test a patch for this bug, please reply with `#syz test`
(should be on a separate line).

The patch should be attached to the email.
Note: arguments like custom git repos and branches are not supported.

^ permalink raw reply

* Re: [PATCH] smb: smbdirect: move fs/smb/common/smbdirect/ to fs/smb/smbdirect/
From: Linus Torvalds @ 2026-04-22 15:36 UTC (permalink / raw)
  To: Steve French
  Cc: Stefan Metzmacher, Christoph Hellwig, linux-cifs, linux-rdma,
	netdev, samba-technical, Tom Talpey, Namjae Jeon
In-Reply-To: <CAH2r5msb3-HiPSv+HgBknEwDXGsv0xU=TGCxHdmc-VCLKzYCmw@mail.gmail.com>

On Wed, 22 Apr 2026 at 07:49, Steve French <smfrench@gmail.com> wrote:
>
> On Wed, Apr 22, 2026 at 3:16 AM Stefan Metzmacher <metze@samba.org> wrote:
> > >
> > > Why is this not in net/smbdirect/ or driver/infiniband/ulp/smdirect?
> >
> > Yes, I also thought about net/smbdirect.
>
> I would prefer to leave it in fs/smb for the time being, since it makes it
> easier to track since fs/smb/server and fs/smb/client have dependencies
> on it.   In the long run, I don't mind moving it, if it starts being
> used outside of smb client and server.

I personally have no hugely strong opinions, but I think Christophs
very question that gives two different alternative locations argues
for just leaving it in fs/smb/

That driver/infiniband/ulp/smdirect location in particular is just a
disgusting path.

It sure as hell is *not* a driver, it just uses the rdma infrastructure.

If rdma were to eventually itself split itself up into the driver code
and non-driver code (like networking does), that might change things,
but that's not happening now.

And as long as we expect smbdirect code to go through the smb
maintainer, I'd rather have the location be about that clear situation
rather than some arbitrary "it uses the rdma code" or "it's
networking".

Because that code is not primarily about networking or about rdma.
That code is primarily about smb.

So while I have no *strong* opinions and can deal with whatever
maintainers find convenient, I think fs/smb/smbdirect is at least
currently the sane location.

          Linus

^ permalink raw reply

* Re: [PATCH v1 1/2] vfio: add callback to get tph info for dma-buf
From: Alex Williamson @ 2026-04-22 15:23 UTC (permalink / raw)
  To: Zhiping Zhang
  Cc: Stanislav Fomichev, Keith Busch, Jason Gunthorpe, Leon Romanovsky,
	Bjorn Helgaas, linux-rdma, linux-pci, netdev, dri-devel,
	Yochai Cohen, Yishai Hadas, alex
In-Reply-To: <20260420183920.3626389-2-zhipingz@meta.com>

On Mon, 20 Apr 2026 11:39:15 -0700
Zhiping Zhang <zhipingz@meta.com> wrote:

> Add a dma-buf callback that returns raw TPH metadata from the exporter
> so peer devices can reuse the steering tag and processing hint
> associated with a VFIO-exported buffer.
> 
> Keep the existing VFIO_DEVICE_FEATURE_DMA_BUF uAPI layout intact by
> using a flag plus one extra trailing entries[] object for the optional
> TPH metadata. Rename the uAPI field dma_ranges to entries. The
> nr_ranges field remains the DMA range count; when VFIO_DMABUF_FLAG_TPH
> is set the kernel reads one extra entry beyond nr_ranges for the TPH
> metadata.
> 
> Add an st_width parameter to get_tph() so the exporter can reject
> steering tags that exceed the consumer's supported width (8 vs 16 bit).
> When no TPH metadata was supplied, make get_tph() return -EOPNOTSUPP.
> 
> Signed-off-by: Zhiping Zhang <zhipingz@meta.com>
> ---
>  drivers/vfio/pci/vfio_pci_dmabuf.c | 62 +++++++++++++++++++++++-------
>  include/linux/dma-buf.h            | 17 ++++++++
>  include/uapi/linux/vfio.h          | 28 ++++++++++++--
>  3 files changed, 89 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci_dmabuf.c
> index b1d658b8f7b5..fdc05f9ab3ae 100644
> --- a/drivers/vfio/pci/vfio_pci_dmabuf.c
> +++ b/drivers/vfio/pci/vfio_pci_dmabuf.c
> @@ -17,6 +17,9 @@ struct vfio_pci_dma_buf {
>  	struct phys_vec *phys_vec;
>  	struct p2pdma_provider *provider;
>  	u32 nr_ranges;
> +	u16 steering_tag;
> +	u8 ph;
> +	u8 tph_present : 1;
>  	u8 revoked : 1;
>  };
>  
> @@ -60,6 +63,22 @@ vfio_pci_dma_buf_map(struct dma_buf_attachment *attachment,
>  				       priv->size, dir);
>  }
>  
> +static int vfio_pci_dma_buf_get_tph(struct dma_buf *dmabuf, u16 *steering_tag,
> +				    u8 *ph, u8 st_width)
> +{
> +	struct vfio_pci_dma_buf *priv = dmabuf->priv;
> +
> +	if (!priv->tph_present)
> +		return -EOPNOTSUPP;
> +
> +	if (st_width < 16 && priv->steering_tag > ((1U << st_width) - 1))
> +		return -EINVAL;
> +
> +	*steering_tag = priv->steering_tag;
> +	*ph = priv->ph;
> +	return 0;
> +}
> +
>  static void vfio_pci_dma_buf_unmap(struct dma_buf_attachment *attachment,
>  				   struct sg_table *sgt,
>  				   enum dma_data_direction dir)
> @@ -89,6 +108,7 @@ static const struct dma_buf_ops vfio_pci_dmabuf_ops = {
>  	.pin = vfio_pci_dma_buf_pin,
>  	.unpin = vfio_pci_dma_buf_unpin,
>  	.attach = vfio_pci_dma_buf_attach,
> +	.get_tph = vfio_pci_dma_buf_get_tph,
>  	.map_dma_buf = vfio_pci_dma_buf_map,
>  	.unmap_dma_buf = vfio_pci_dma_buf_unmap,
>  	.release = vfio_pci_dma_buf_release,
> @@ -211,7 +231,9 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
>  				  size_t argsz)
>  {
>  	struct vfio_device_feature_dma_buf get_dma_buf = {};
> -	struct vfio_region_dma_range *dma_ranges;
> +	bool tph_supplied;
> +	u32 tph_index;
> +	struct vfio_region_dma_range *entries;
>  	DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
>  	struct vfio_pci_dma_buf *priv;
>  	size_t length;
> @@ -228,7 +250,10 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
>  	if (copy_from_user(&get_dma_buf, arg, sizeof(get_dma_buf)))
>  		return -EFAULT;
>  
> -	if (!get_dma_buf.nr_ranges || get_dma_buf.flags)
> +	tph_supplied = !!(get_dma_buf.flags & VFIO_DMABUF_FLAG_TPH);
> +	tph_index = get_dma_buf.nr_ranges;
> +	if (!get_dma_buf.nr_ranges ||
> +	    (get_dma_buf.flags & ~VFIO_DMABUF_FLAG_TPH))
>  		return -EINVAL;
>  
>  	/*
> @@ -237,19 +262,21 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
>  	if (get_dma_buf.region_index >= VFIO_PCI_ROM_REGION_INDEX)
>  		return -ENODEV;
>  
> -	dma_ranges = memdup_array_user(&arg->dma_ranges, get_dma_buf.nr_ranges,
> -				       sizeof(*dma_ranges));
> -	if (IS_ERR(dma_ranges))
> -		return PTR_ERR(dma_ranges);
> +	entries = memdup_array_user(&arg->entries,
> +				    get_dma_buf.nr_ranges +
> +					(tph_supplied ? 1 : 0),
> +				    sizeof(*entries));
> +	if (IS_ERR(entries))
> +		return PTR_ERR(entries);
>  
> -	ret = validate_dmabuf_input(&get_dma_buf, dma_ranges, &length);
> +	ret = validate_dmabuf_input(&get_dma_buf, entries, &length);
>  	if (ret)
> -		goto err_free_ranges;
> +		goto err_free_entries;
>  
>  	priv = kzalloc_obj(*priv);
>  	if (!priv) {
>  		ret = -ENOMEM;
> -		goto err_free_ranges;
> +		goto err_free_entries;
>  	}
>  	priv->phys_vec = kzalloc_objs(*priv->phys_vec, get_dma_buf.nr_ranges);
>  	if (!priv->phys_vec) {
> @@ -260,15 +287,22 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
>  	priv->vdev = vdev;
>  	priv->nr_ranges = get_dma_buf.nr_ranges;
>  	priv->size = length;
> +
> +	if (tph_supplied) {
> +		priv->steering_tag = entries[tph_index].tph.steering_tag;
> +		priv->ph = entries[tph_index].tph.ph;
> +		priv->tph_present = 1;
> +	}
> +
>  	ret = vdev->pci_ops->get_dmabuf_phys(vdev, &priv->provider,
>  					     get_dma_buf.region_index,
> -					     priv->phys_vec, dma_ranges,
> +					     priv->phys_vec, entries,
>  					     priv->nr_ranges);
>  	if (ret)
>  		goto err_free_phys;
>  
> -	kfree(dma_ranges);
> -	dma_ranges = NULL;
> +	kfree(entries);
> +	entries = NULL;
>  
>  	if (!vfio_device_try_get_registration(&vdev->vdev)) {
>  		ret = -ENODEV;
> @@ -311,8 +345,8 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
>  	kfree(priv->phys_vec);
>  err_free_priv:
>  	kfree(priv);
> -err_free_ranges:
> -	kfree(dma_ranges);
> +err_free_entries:
> +	kfree(entries);
>  	return ret;
>  }
>  
> diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
> index 133b9e637b55..b0a79ccbe100 100644
> --- a/include/linux/dma-buf.h
> +++ b/include/linux/dma-buf.h
> @@ -113,6 +113,23 @@ struct dma_buf_ops {
>  	 */
>  	void (*unpin)(struct dma_buf_attachment *attach);
>  
> +	/**
> +	 * @get_tph:
> +	 * @dmabuf: DMA buffer for which to retrieve TPH metadata
> +	 * @steering_tag: Returns the raw TPH steering tag
> +	 * @ph: Returns the TPH processing hint
> +	 * @st_width: Consumer's supported steering tag width in bits (8 or 16)
> +	 *
> +	 * Return the TPH (TLP Processing Hints) metadata associated with this
> +	 * DMA buffer. Exporters that do not provide TPH metadata should return
> +	 * -EOPNOTSUPP. If the steering tag exceeds @st_width bits, return
> +	 * -EINVAL.
> +	 *
> +	 * This callback is optional.
> +	 */
> +	int (*get_tph)(struct dma_buf *dmabuf, u16 *steering_tag, u8 *ph,
> +		       u8 st_width);
> +
>  	/**
>  	 * @map_dma_buf:
>  	 *
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index bb7b89330d35..a0bd24623c52 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -1490,16 +1490,36 @@ struct vfio_device_feature_bus_master {
>   * open_flags are the typical flags passed to open(2), eg O_RDWR, O_CLOEXEC,
>   * etc. offset/length specify a slice of the region to create the dmabuf from.
>   * nr_ranges is the total number of (P2P DMA) ranges that comprise the dmabuf.
> + * When VFIO_DMABUF_FLAG_TPH is set, entries[] contains one extra trailing
> + * object after the nr_ranges DMA ranges carrying the TPH steering tag and
> + * processing hint.

I really don't think we want to design an API where entries is
implicitly one-off from what's actually there.  This feeds back into
the below removal of the __counted by attribute, which is a red flag
that this is the wrong approach.

In general though, I'm really hoping that someone interested in
enabling TPH as an interface through vfio actually decides to take
resource targeting and revocation seriously.  There's no validation of
the steering tag here relative to what the user has access to and no
mechanism to revoke those tags if access changes.  In fact, there's not
even a proposed mechanism allowing the user to derive valid steering
tags.  Does the user implicitly know the value and the kernel just
allows it because... yolo?  Thanks,

Alex

>   *
> - * flags should be 0.
> + * flags should be 0 or VFIO_DMABUF_FLAG_TPH.
>   *
>   * Return: The fd number on success, -1 and errno is set on failure.
>   */
>  #define VFIO_DEVICE_FEATURE_DMA_BUF 11
>  
> +enum vfio_device_feature_dma_buf_flags {
> +	VFIO_DMABUF_FLAG_TPH = 1 << 0,
> +};
> +
> +struct vfio_region_dma_tph {
> +	__u16 steering_tag;
> +	__u8 ph;
> +	__u8 reserved;
> +	__u32 reserved2;
> +};
> +
>  struct vfio_region_dma_range {
> -	__u64 offset;
> -	__u64 length;
> +	union {
> +		__u64 offset;
> +		struct vfio_region_dma_tph tph;
> +	};
> +	union {
> +		__u64 length;
> +		__u64 reserved;
> +	};
>  };
>  
>  struct vfio_device_feature_dma_buf {
> @@ -1507,7 +1527,7 @@ struct vfio_device_feature_dma_buf {
>  	__u32	open_flags;
>  	__u32   flags;
>  	__u32   nr_ranges;
> -	struct vfio_region_dma_range dma_ranges[] __counted_by(nr_ranges);
> +	struct vfio_region_dma_range entries[];
>  };
>  
>  /* -------- API for Type1 VFIO IOMMU -------- */


^ permalink raw reply

* Help with PCIe THP on ConnectX-7
From: Bruce Merry @ 2026-04-22 15:22 UTC (permalink / raw)
  To: netdev

Hello

I'm hoping someone from NVIDIA Networking can help with this; I wasn't
sure of the best way to get in contact with the engineers working on
mlx5 kernel code so thought I'd try here. I'm trying to write some
(userspace) code using ibv_reg_mr_ex to register a memory region using
TLP Processing Hints (THP) to improve performance on an Epyc Turin
(Zen 5) system. However, in the kernel mlx5_st_create is bailing out
because of this check:

       if (!MLX5_CAP_GEN(dev, mkey_pcie_tph))
                return NULL;

As far as I can tell, that is checking a capability bit returned by
the firmware.

Before I spend a lot more time debugging, can you tell me whether
ConnectX-7 supports this feature at all (I'm on the latest 28.48.1000
firmware)? I see it mentioned in release notes for ConnectX-8
firmware, but not in ConnectX-7 release notes.

If ConnectX-7 does support it, do you have any tips on what might
cause that capability bit to be false and how to determine what the
cause is? For example, could that happen if the motherboard BIOS
doesn't support TPH?

Thanks
Bruce
-- 
Dr Bruce Merry
bmerry <@> gmail <.> com
http://www.brucemerry.org.za/
http://blog.brucemerry.org.za/


^ permalink raw reply

* Re: [PATCH] net: phonet: fix BUG_ON() in pn_socket_autobind()
From: Rémi Denis-Courmont @ 2026-04-22 15:13 UTC (permalink / raw)
  To: courmisch, davem, edumazet, kuba, pabeni, horms,
	Deepanshu Kartikey
  Cc: netdev, linux-kernel, Deepanshu Kartikey,
	syzbot+706f5eb79044e686c794
In-Reply-To: <20260422021533.16987-1-kartikey406@gmail.com>

Hi,

Le keskiviikkona 22. huhtikuuta 2026, 5.15.33 Itä-Euroopan kesäaika Deepanshu 
Kartikey a écrit :
> pn_socket_autobind() calls pn_socket_bind() and treats
> -EINVAL as a signal that the socket was already bound,
> then uses BUG_ON() to verify it:
> 
>     if (err != -EINVAL)
>         return err;
>     BUG_ON(!pn_port(pn_sk(sock->sk)->sobject));
> 
> However, pn_socket_bind() returns -EINVAL in multiple
> cases:
> 
>   1. address length too short
>   2. socket not in TCP_CLOSE state
>   3. socket already bound  <- only intended case
> 
> When -EINVAL comes from cases 1 or 2, sobject is still
> zero (never assigned), causing BUG_ON to fire and crash
> the kernel.
> 
> Fix this by checking the bound state directly via
> pn_port(sobject) BEFORE calling pn_socket_bind(),
> eliminating the ambiguous -EINVAL interpretation
> entirely.
> 
> Reported-by: syzbot+706f5eb79044e686c794@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=706f5eb79044e686c794
> Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
> ---
>  net/phonet/socket.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/net/phonet/socket.c b/net/phonet/socket.c
> index c4af26357144..5a55e7d14e85 100644
> --- a/net/phonet/socket.c
> +++ b/net/phonet/socket.c
> @@ -204,14 +204,14 @@ static int pn_socket_autobind(struct socket *sock)
>  	struct sockaddr_pn sa;
>  	int err;
> 
> +	if (pn_port(pn_sk(sock->sk)->sobject))
> +		return 0; /* socket was already bound */
> +

This was almost 20 years ago, but IIRC, we did not do it that way back then 
because it results in a data race on sobject if another task binds the socket 
in parallel.
 
>  	memset(&sa, 0, sizeof(sa));
>  	sa.spn_family = AF_PHONET;
>  	err = pn_socket_bind(sock, (struct sockaddr_unsized *)&sa,
>  			     sizeof(struct sockaddr_pn));
> -	if (err != -EINVAL)
> -		return err;
> -	BUG_ON(!pn_port(pn_sk(sock->sk)->sobject));
> -	return 0; /* socket was already bound */
> +	return err;
>  }
> 
>  static int pn_socket_connect(struct socket *sock, struct sockaddr_unsized
> *addr,


-- 
德尼-库尔蒙‧雷米
https://www.remlab.net/




^ permalink raw reply

* Re: [PATCH net 00/18] Remove a number of ISA and PCMCIA Ethernet drivers
From: Byron Stanoszek @ 2026-04-22 15:19 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan,
	linux-kernel, netdev, linux-doc
In-Reply-To: <41d9fe43-9aa5-49b4-89cd-9aa13e4e4ea9@lunn.ch>

On Wed, 22 Apr 2026, Andrew Lunn wrote:

> On Tue, Apr 21, 2026 at 11:03:28PM -0400, Byron Stanoszek wrote:
>> On Wed, 22 Apr 2026, Andrew Lunn wrote:
>>>
>>> Could you live with v6.18, which has an expected EOL of December 2028?
>>> If you are only updating once per year, security is not an issue, you
>>> just want stability.
>>
>> I could for the time being, but this hasn't worked for me in the past.
>> Usually what happens is the PC breaks down, and the customer swaps in a new
>> backplane+SBC and moves all their PCI cards over. I then find I need to
>> update the kernel just to get the Intel DRM to work properly on the new CPU.
>> Some of these systems were installed back in the Linux 2.6 era, so I've gone
>> through several "Intel DRM not working" steps ever since CPUs started
>> getting integrated graphics. 2028 will come fast.
>
> Hi Byron
>
> I will drop this driver from the patchset.

Andrew,

Thank you very much.

  -Byron


^ permalink raw reply

* [PATCH net-next v2 3/3] net/ethernet/zte/dinghai: add hardware register access and PCI capability scanning
From: Junyang Han @ 2026-04-22 14:49 UTC (permalink / raw)
  To: netdev
  Cc: davem, andrew+netdev, edumazet, kuba, pabeni, han.junyang,
	ran.ming, han.chengfei, zhang.yanze
In-Reply-To: <20260422144901.2403456-1-han.junyang@zte.com.cn>


[-- Attachment #1.1.1: Type: text/plain, Size: 18761 bytes --]

Implement PCI configuration space access, BAR mapping, capability
scanning (common/notify/device), and hardware queue register
definitions for DingHai PF device.

Signed-off-by: Junyang Han <han.junyang@zte.com.cn>
---
 drivers/net/ethernet/zte/dinghai/dh_queue.h |  71 ++++
 drivers/net/ethernet/zte/dinghai/en_pf.c    | 410 ++++++++++++++++++++
 drivers/net/ethernet/zte/dinghai/en_pf.h    |  38 ++
 3 files changed, 519 insertions(+)
 create mode 100644 drivers/net/ethernet/zte/dinghai/dh_queue.h

diff --git a/drivers/net/ethernet/zte/dinghai/dh_queue.h b/drivers/net/ethernet/zte/dinghai/dh_queue.h
new file mode 100644
index 000000000000..5067c73fed33
--- /dev/null
+++ b/drivers/net/ethernet/zte/dinghai/dh_queue.h
@@ -0,0 +1,71 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * ZTE DingHai Ethernet driver - PCI capability definitions
+ * Copyright (c) 2022-2026, ZTE Corporation.
+ */
+
+#ifndef __DH_QUEUE_H__
+#define __DH_QUEUE_H__
+
+/* Vector value used to disable MSI for queue */
+#define ZXDH_MSI_NO_VECTOR      0xff
+
+/* Status byte for guest to report progress, and synchronize features */
+/* We have seen device and processed generic fields */
+#define ZXDH_CONFIG_S_ACKNOWLEDGE 1
+/* We have found a driver for the device. */
+#define ZXDH_CONFIG_S_DRIVER      2
+/* Driver has used its parts of the config, and is happy */
+#define ZXDH_CONFIG_S_DRIVER_OK   4
+/* Driver has finished configuring features */
+#define ZXDH_CONFIG_S_FEATURES_OK 8
+/* Device entered invalid state, driver must reset it */
+#define ZXDH_CONFIG_S_NEEDS_RESET 0x40
+/* We've given up on this device */
+#define ZXDH_CONFIG_S_FAILED      0x80
+
+/* This is the PCI capability header: */
+struct zxdh_pf_pci_cap {
+	__u8 cap_vndr;		/* Generic PCI field: PCI_CAP_ID_VNDR */
+	__u8 cap_next;		/* Generic PCI field: next ptr. */
+	__u8 cap_len;		/* Generic PCI field: capability length */
+	__u8 cfg_type;		/* Identifies the structure. */
+	__u8 bar;		/* Where to find it. */
+	__u8 id;		/* Multiple capabilities of the same type */
+	__u8 padding[2];		/* Pad to full dword. */
+	__le32 offset;		/* Offset within bar. */
+	__le32 length;		/* Length of the structure, in bytes. */
+};
+
+/* Fields in ZXDH_PF_PCI_CAP_COMMON_CFG: */
+struct zxdh_pf_pci_common_cfg {
+	/* About the whole device. */
+	__le32 device_feature_select; /* read-write */
+	__le32 device_feature;	/* read-only */
+	__le32 guest_feature_select; /* read-write */
+	__le32 guest_feature;		/* read-write */
+	__le16 msix_config;		/* read-write */
+	__le16 num_queues;		/* read-only */
+	__u8 device_status;		/* read-write */
+	__u8 config_generation;	/* read-only */
+
+	/* About a specific virtqueue. */
+	__le16 queue_select;		/* read-write */
+	__le16 queue_size;		/* read-write, power of 2. */
+	__le16 queue_msix_vector;	/* read-write */
+	__le16 queue_enable;		/* read-write */
+	__le16 queue_notify_off;	/* read-only */
+	__le32 queue_desc_lo;		/* read-write */
+	__le32 queue_desc_hi;		/* read-write */
+	__le32 queue_avail_lo;		/* read-write */
+	__le32 queue_avail_hi;		/* read-write */
+	__le32 queue_used_lo;		/* read-write */
+	__le32 queue_used_hi;		/* read-write */
+};
+
+struct zxdh_pf_pci_notify_cap {
+	struct zxdh_pf_pci_cap cap;
+	__le32 notify_off_multiplier; /* Multiplier for queue_notify_off. */
+};
+
+#endif /* __DH_QUEUE_H__ */
diff --git a/drivers/net/ethernet/zte/dinghai/en_pf.c b/drivers/net/ethernet/zte/dinghai/en_pf.c
index 70dad28de544..0dd4dcbdefb0 100644
--- a/drivers/net/ethernet/zte/dinghai/en_pf.c
+++ b/drivers/net/ethernet/zte/dinghai/en_pf.c
@@ -9,6 +9,7 @@
 #include <net/devlink.h>
 #include "en_pf.h"
 #include "dh_log.h"
+#include "dh_queue.h"
 
 MODULE_AUTHOR("Junyang Han <han.junyang@zte.com.cn>");
 MODULE_DESCRIPTION("ZTE Corporation network adapters (DingHai series) Ethernet driver");
@@ -92,6 +93,415 @@ void dh_pf_pci_close(struct dh_core_dev *dev)
 	pci_disable_device(dev->pdev);
 }
 
+int32_t zxdh_pf_pci_find_capability(struct pci_dev *pdev, uint8_t cfg_type,
+				    uint32_t ioresource_types, int32_t *bars)
+{
+	int32_t pos = 0;
+	uint8_t type = 0;
+	uint8_t bar = 0;
+
+	for (pos = pci_find_capability(pdev, PCI_CAP_ID_VNDR); pos > 0;
+	     pos = pci_find_next_capability(pdev, pos, PCI_CAP_ID_VNDR)) {
+		pci_read_config_byte(pdev, pos + offsetof(struct zxdh_pf_pci_cap, cfg_type), &type);
+		pci_read_config_byte(pdev, pos + offsetof(struct zxdh_pf_pci_cap, bar), &bar);
+
+		/* ignore structures with reserved BAR values */
+		if (bar > ZXDH_PF_MAX_BAR_VAL)
+			continue;
+
+		if (type == cfg_type) {
+			if (pci_resource_len(pdev, bar) &&
+			    pci_resource_flags(pdev, bar) & ioresource_types) {
+				*bars |= (1 << bar);
+				return pos;
+			}
+		}
+	}
+
+	return 0;
+}
+
+void __iomem *zxdh_pf_map_capability(struct dh_core_dev *dh_dev, int32_t off,
+				     size_t minlen, uint32_t align,
+				     uint32_t start, uint32_t size,
+				     size_t *len, resource_size_t *pa,
+				     uint32_t *bar_off)
+{
+	struct pci_dev *pdev = dh_dev->pdev;
+	uint8_t bar = 0;
+	uint32_t offset = 0;
+	uint32_t length = 0;
+	void __iomem *p = NULL;
+
+	pci_read_config_byte(pdev, off + offsetof(struct zxdh_pf_pci_cap, bar), &bar);
+	pci_read_config_dword(pdev, off + offsetof(struct zxdh_pf_pci_cap, offset), &offset);
+	pci_read_config_dword(pdev, off + offsetof(struct zxdh_pf_pci_cap, length), &length);
+
+	if (bar_off)
+		*bar_off = offset;
+
+	if (length <= start) {
+		LOG_ERR(dh_dev, "bad capability len %u (>%u expected)\n", length, start);
+		return NULL;
+	}
+
+	if (length - start < minlen) {
+		LOG_ERR(dh_dev, "bad capability len %u (>=%zu expected)\n", length, minlen);
+		return NULL;
+	}
+
+	length -= start;
+	if (start + offset < offset) {
+		LOG_ERR(dh_dev, "map wrap-around %u+%u\n", start, offset);
+		return NULL;
+	}
+
+	offset += start;
+	if (offset & (align - 1)) {
+		LOG_ERR(dh_dev, "offset %u not aligned to %u\n", offset, align);
+		return NULL;
+	}
+
+	if (length > size)
+		length = size;
+
+	if (len)
+		*len = length;
+
+	if (minlen + offset < minlen || minlen + offset > pci_resource_len(pdev, bar)) {
+		LOG_ERR(dh_dev, "map custom queue %zu@%u out of range on bar %i length %lu\n",
+			minlen, offset, bar, (unsigned long)pci_resource_len(pdev, bar));
+		return NULL;
+	}
+
+	p = pci_iomap_range(pdev, bar, offset, length);
+	if (!p) {
+		LOG_ERR(dh_dev, "unable to map custom queue %u@%u on bar %i\n", length, offset, bar);
+	} else if (pa) {
+		*pa = pci_resource_start(pdev, bar) + offset;
+	}
+
+	return p;
+}
+
+int32_t zxdh_pf_common_cfg_init(struct dh_core_dev *dh_dev)
+{
+	int32_t common = 0;
+	struct zxdh_pf_device *pf_dev = dh_core_priv(dh_dev);
+	struct pci_dev *pdev = dh_dev->pdev;
+
+	/* check for a common config: if not, use legacy mode (bar 0). */
+	common = zxdh_pf_pci_find_capability(pdev, ZXDH_PCI_CAP_COMMON_CFG,
+					     IORESOURCE_IO | IORESOURCE_MEM,
+					     &pf_dev->modern_bars);
+	if (common == 0) {
+		LOG_ERR(dh_dev, "missing capabilities %i, leaving for legacy driver\n", common);
+		return -ENODEV;
+	}
+
+	pf_dev->common = zxdh_pf_map_capability(dh_dev, common,
+						sizeof(struct zxdh_pf_pci_common_cfg),
+						ZXDH_PF_ALIGN4, 0,
+						sizeof(struct zxdh_pf_pci_common_cfg),
+						NULL, NULL, NULL);
+	if (!pf_dev->common) {
+		LOG_ERR(dh_dev, "pf_dev->common is null\n");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+int32_t zxdh_pf_notify_cfg_init(struct dh_core_dev *dh_dev)
+{
+	int32_t notify = 0;
+	size_t notify_length = 0;
+	size_t notify_offset = 0;
+	struct zxdh_pf_device *pf_dev = dh_core_priv(dh_dev);
+	struct pci_dev *pdev = dh_dev->pdev;
+
+	/* If common is there, these should be too... */
+	notify = zxdh_pf_pci_find_capability(pdev, ZXDH_PCI_CAP_NOTIFY_CFG,
+					     IORESOURCE_IO | IORESOURCE_MEM,
+					     &pf_dev->modern_bars);
+	if (notify == 0) {
+		LOG_ERR(dh_dev, "missing capabilities %i\n", notify);
+		return -EINVAL;
+	}
+
+	pci_read_config_dword(pdev, notify + offsetof(struct zxdh_pf_pci_notify_cap,
+				notify_off_multiplier), &pf_dev->notify_offset_multiplier);
+	pci_read_config_dword(pdev, notify + offsetof(struct zxdh_pf_pci_notify_cap,
+				cap.length), &notify_length);
+	pci_read_config_dword(pdev, notify + offsetof(struct zxdh_pf_pci_notify_cap,
+				cap.offset), &notify_offset);
+
+	/* We don't know how many VQs we'll map, ahead of the time.
+	 * If notify length is small, map it all now. Otherwise, map each VQ individually later.
+	 */
+	if (notify_length + (notify_offset % PAGE_SIZE) <= PAGE_SIZE) {
+		pf_dev->notify_base = zxdh_pf_map_capability(dh_dev, notify,
+							    ZXDH_PF_MAP_MINLEN2,
+							    ZXDH_PF_ALIGN2, 0,
+							    notify_length,
+							    &pf_dev->notify_len,
+							    &pf_dev->notify_pa, NULL);
+		if (!pf_dev->notify_base) {
+			LOG_ERR(dh_dev, "pf_dev->notify_base is null\n");
+			return -EINVAL;
+		}
+	} else {
+		pf_dev->notify_map_cap = notify;
+	}
+
+	return 0;
+}
+
+int32_t zxdh_pf_device_cfg_init(struct dh_core_dev *dh_dev)
+{
+	int32_t device = 0;
+	struct zxdh_pf_device *pf_dev = dh_core_priv(dh_dev);
+	struct pci_dev *pdev = dh_dev->pdev;
+
+	/* Device capability is only mandatory for devices that have device-specific configuration. */
+	device = zxdh_pf_pci_find_capability(pdev, ZXDH_PCI_CAP_DEVICE_CFG,
+					     IORESOURCE_IO | IORESOURCE_MEM,
+					     &pf_dev->modern_bars);
+
+	/* we don't know how much we should map, but PAGE_SIZE is more than enough for all existing devices. */
+	if (device) {
+		pf_dev->device = zxdh_pf_map_capability(dh_dev, device, 0,
+						       ZXDH_PF_ALIGN4, 0, PAGE_SIZE,
+						       &pf_dev->device_len, NULL,
+						       &pf_dev->dev_cfg_bar_off);
+		if (!pf_dev->device) {
+			LOG_ERR(dh_dev, "pf_dev->device is null\n");
+			return -EINVAL;
+		}
+	}
+	return 0;
+}
+
+void zxdh_pf_modern_cfg_uninit(struct dh_core_dev *dh_dev)
+{
+	struct zxdh_pf_device *pf_dev = dh_core_priv(dh_dev);
+	struct pci_dev *pdev = dh_dev->pdev;
+
+	if (pf_dev->device)
+		pci_iounmap(pdev, pf_dev->device);
+	if (pf_dev->notify_base)
+		pci_iounmap(pdev, pf_dev->notify_base);
+	pci_iounmap(pdev, pf_dev->common);
+}
+
+int32_t zxdh_pf_modern_cfg_init(struct dh_core_dev *dh_dev)
+{
+	int32_t ret = 0;
+	struct zxdh_pf_device *pf_dev = dh_core_priv(dh_dev);
+	struct pci_dev *pdev = dh_dev->pdev;
+
+	ret = zxdh_pf_common_cfg_init(dh_dev);
+	if (ret) {
+		LOG_ERR(dh_dev, "zxdh_pf_common_cfg_init failed: %d\n", ret);
+		return -EINVAL;
+	}
+
+	ret = zxdh_pf_notify_cfg_init(dh_dev);
+	if (ret) {
+		LOG_ERR(dh_dev, "zxdh_pf_notify_cfg_init failed: %d\n", ret);
+		goto err_map_notify;
+	}
+
+	ret = zxdh_pf_device_cfg_init(dh_dev);
+	if (ret) {
+		LOG_ERR(dh_dev, "zxdh_pf_device_cfg_init failed: %d\n", ret);
+		goto err_map_device;
+	}
+
+	return 0;
+
+err_map_device:
+	if (pf_dev->notify_base)
+		pci_iounmap(pdev, pf_dev->notify_base);
+err_map_notify:
+	pci_iounmap(pdev, pf_dev->common);
+	return -EINVAL;
+}
+
+uint16_t zxdh_pf_get_queue_notify_off(struct dh_core_dev *dh_dev,
+				      uint16_t phy_index, uint16_t index)
+{
+	struct zxdh_pf_device *pf_dev = dh_core_priv(dh_dev);
+
+	if (pf_dev->packed_status)
+		iowrite16(phy_index, &pf_dev->common->queue_select);
+	else
+		iowrite16(index, &pf_dev->common->queue_select);
+
+	return ioread16(&pf_dev->common->queue_notify_off);
+}
+
+void __iomem *zxdh_pf_map_vq_notify(struct dh_core_dev *dh_dev,
+				     uint16_t phy_index, uint16_t index,
+				     resource_size_t *pa)
+{
+	struct zxdh_pf_device *pf_dev = dh_core_priv(dh_dev);
+	uint16_t off = 0;
+
+	off = zxdh_pf_get_queue_notify_off(dh_dev, phy_index, index);
+
+	if (pf_dev->notify_base) {
+		/* offset should not wrap */
+		if ((uint64_t)off * pf_dev->notify_offset_multiplier + 2 > pf_dev->notify_len) {
+			LOG_ERR(dh_dev, "bad notification offset %u (x %u) for queue %u > %zd",
+				off, pf_dev->notify_offset_multiplier, phy_index,
+				pf_dev->notify_len);
+			return NULL;
+		}
+
+		if (pa)
+			*pa = pf_dev->notify_pa + off * pf_dev->notify_offset_multiplier;
+
+		return pf_dev->notify_base + off * pf_dev->notify_offset_multiplier;
+	} else {
+		return zxdh_pf_map_capability(dh_dev, pf_dev->notify_map_cap, 2, 2,
+					      off * pf_dev->notify_offset_multiplier,
+					      2, NULL, pa, NULL);
+	}
+}
+
+void zxdh_pf_unmap_vq_notify(struct dh_core_dev *dh_dev, void *priv)
+{
+	struct zxdh_pf_device *pf_dev = dh_core_priv(dh_dev);
+
+	if (!pf_dev->notify_base)
+		pci_iounmap(dh_dev->pdev, priv);
+}
+
+void zxdh_pf_set_status(struct dh_core_dev *dh_dev, uint8_t status)
+{
+	struct zxdh_pf_device *pf_dev = dh_core_priv(dh_dev);
+
+	iowrite8(status, &pf_dev->common->device_status);
+}
+
+uint8_t zxdh_pf_get_status(struct dh_core_dev *dh_dev)
+{
+	struct zxdh_pf_device *pf_dev = dh_core_priv(dh_dev);
+
+	return ioread8(&pf_dev->common->device_status);
+}
+
+static uint8_t zxdh_pf_get_cfg_gen(struct dh_core_dev *dh_dev)
+{
+	struct zxdh_pf_device *pf_dev = dh_core_priv(dh_dev);
+	uint8_t config_generation = 0;
+
+	config_generation = ioread8(&pf_dev->common->config_generation);
+	LOG_INFO(dh_dev, "config_generation is %d\n", config_generation);
+
+	return config_generation;
+}
+
+void zxdh_pf_get_vf_mac(struct dh_core_dev *dh_dev, uint8_t *mac, int32_t vf_id)
+{
+	uint32_t DEV_MAC_L = 0;
+	uint16_t DEV_MAC_H = 0;
+	struct zxdh_pf_device *pf_dev = dh_core_priv(dh_dev);
+
+	if (pf_dev->pf_sriov_cap_base) {
+		DEV_MAC_L = ioread32(pf_dev->pf_sriov_cap_base +
+				     (pf_dev->sriov_bar_size) * vf_id +
+				     pf_dev->dev_cfg_bar_off);
+		mac[0] = DEV_MAC_L & 0xff;
+		mac[1] = (DEV_MAC_L >> 8) & 0xff;
+		mac[2] = (DEV_MAC_L >> 16) & 0xff;
+		mac[3] = (DEV_MAC_L >> 24) & 0xff;
+		DEV_MAC_H = ioread16(pf_dev->pf_sriov_cap_base +
+				      (pf_dev->sriov_bar_size) * vf_id +
+				      pf_dev->dev_cfg_bar_off +
+				      ZXDH_DEV_MAC_HIGH_OFFSET);
+		mac[4] = DEV_MAC_H & 0xff;
+		mac[5] = (DEV_MAC_H >> 8) & 0xff;
+	}
+}
+
+void zxdh_pf_set_vf_mac_reg(struct zxdh_pf_device *pf_dev, uint8_t *mac, int32_t vf_id)
+{
+	uint32_t DEV_MAC_L = 0;
+	uint16_t DEV_MAC_H = 0;
+
+	if (pf_dev->pf_sriov_cap_base) {
+		DEV_MAC_L = mac[0] | (mac[1] << 8) | (mac[2] << 16) | (mac[3] << 24);
+		DEV_MAC_H = mac[4] | (mac[5] << 8);
+		iowrite32(DEV_MAC_L, (pf_dev->pf_sriov_cap_base +
+			  (pf_dev->sriov_bar_size) * vf_id +
+			  pf_dev->dev_cfg_bar_off));
+		iowrite16(DEV_MAC_H, (pf_dev->pf_sriov_cap_base +
+			  (pf_dev->sriov_bar_size) * vf_id +
+			  pf_dev->dev_cfg_bar_off +
+			  ZXDH_DEV_MAC_HIGH_OFFSET));
+	}
+}
+
+void zxdh_pf_set_vf_mac(struct dh_core_dev *dh_dev, uint8_t *mac, int32_t vf_id)
+{
+	struct zxdh_pf_device *pf_dev = dh_core_priv(dh_dev);
+
+	zxdh_pf_set_vf_mac_reg(pf_dev, mac, vf_id);
+}
+
+void zxdh_set_mac(struct dh_core_dev *dh_dev, uint8_t *mac)
+{
+	uint32_t DEV_MAC_L = 0;
+	uint16_t DEV_MAC_H = 0;
+	struct zxdh_pf_device *pf_dev = dh_core_priv(dh_dev);
+
+	DEV_MAC_L = mac[0] | (mac[1] << 8) | (mac[2] << 16) | (mac[3] << 24);
+	DEV_MAC_H = mac[4] | (mac[5] << 8);
+	iowrite32(DEV_MAC_L, pf_dev->device);
+	iowrite16(DEV_MAC_H, pf_dev->device + ZXDH_DEV_MAC_HIGH_OFFSET);
+}
+
+void zxdh_get_mac(struct dh_core_dev *dh_dev, uint8_t *mac)
+{
+	uint32_t DEV_MAC_L = 0;
+	uint16_t DEV_MAC_H = 0;
+	struct zxdh_pf_device *pf_dev = dh_core_priv(dh_dev);
+
+	DEV_MAC_L = ioread32(pf_dev->device);
+	mac[0] = DEV_MAC_L & 0xff;
+	mac[1] = (DEV_MAC_L >> 8) & 0xff;
+	mac[2] = (DEV_MAC_L >> 16) & 0xff;
+	mac[3] = (DEV_MAC_L >> 24) & 0xff;
+	DEV_MAC_H = ioread16(pf_dev->device + ZXDH_DEV_MAC_HIGH_OFFSET);
+	mac[4] = DEV_MAC_H & 0xff;
+	mac[5] = (DEV_MAC_H >> 8) & 0xff;
+}
+
+uint64_t zxdh_pf_get_features(struct dh_core_dev *dh_dev)
+{
+	struct zxdh_pf_device *pf_dev = dh_core_priv(dh_dev);
+	uint64_t device_feature = 0;
+
+	iowrite32(0, &pf_dev->common->device_feature_select);
+	device_feature = ioread32(&pf_dev->common->device_feature);
+	iowrite32(1, &pf_dev->common->device_feature_select);
+	device_feature |= ((uint64_t)ioread32(&pf_dev->common->device_feature) << 32);
+
+	return device_feature;
+}
+
+void zxdh_pf_set_features(struct dh_core_dev *dh_dev, uint64_t features)
+{
+	struct zxdh_pf_device *pf_dev = dh_core_priv(dh_dev);
+
+	iowrite32(0, &pf_dev->common->guest_feature_select);
+	iowrite32((uint32_t)features, &pf_dev->common->guest_feature);
+	iowrite32(1, &pf_dev->common->guest_feature_select);
+	iowrite32(features >> 32, &pf_dev->common->guest_feature);
+}
+
 static int dh_pf_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 {
 	struct dh_core_dev *dh_dev = NULL;
diff --git a/drivers/net/ethernet/zte/dinghai/en_pf.h b/drivers/net/ethernet/zte/dinghai/en_pf.h
index a8b324adb948..0c4172c513a9 100644
--- a/drivers/net/ethernet/zte/dinghai/en_pf.h
+++ b/drivers/net/ethernet/zte/dinghai/en_pf.h
@@ -15,6 +15,24 @@
 #define ZXDH_PF_DEVICE_ID	0x8040
 #define ZXDH_VF_DEVICE_ID	0x8041
 
+/* Common configuration */
+#define ZXDH_PCI_CAP_COMMON_CFG	1
+/* Notifications */
+#define ZXDH_PCI_CAP_NOTIFY_CFG	2
+/* ISR access */
+#define ZXDH_PCI_CAP_ISR_CFG		3
+/* Device specific configuration */
+#define ZXDH_PCI_CAP_DEVICE_CFG	4
+/* PCI configuration access */
+#define ZXDH_PCI_CAP_PCI_CFG		5
+
+#define ZXDH_PF_MAX_BAR_VAL		0x5
+#define ZXDH_PF_ALIGN4			4
+#define ZXDH_PF_ALIGN2			2
+#define ZXDH_PF_MAP_MINLEN2		2
+
+#define ZXDH_DEV_MAC_HIGH_OFFSET	4
+
 enum dh_coredev_type {
 	DH_COREDEV_PF,
 	DH_COREDEV_VF,
@@ -34,7 +52,27 @@ struct dh_core_dev {
 };
 
 struct zxdh_pf_device {
+	struct zxdh_pf_pci_common_cfg __iomem *common;
+	/* Device-specific data (non-legacy mode)  */
+	/* Base of vq notifications (non-legacy mode). */
+	void __iomem *device;
+	void __iomem *notify_base;
+	void __iomem *pf_sriov_cap_base;
+	/* Physical base of vq notifications */
+	resource_size_t notify_pa;
+	/* So we can sanity-check accesses. */
+	size_t notify_len;
+	size_t device_len;
+	/* Capability for when we need to map notifications per-vq. */
+	int32_t notify_map_cap;
+	uint32_t notify_offset_multiplier;
+	/* Multiply queue_notify_off by this value. (non-legacy mode). */
+	int32_t modern_bars;
+
 	uint64_t pci_ioremap_addr[6];
+	uint64_t sriov_bar_size;
+	uint32_t dev_cfg_bar_off;
+	bool packed_status;
 	bool bar_chan_valid;
 	bool vepa;
 	struct mutex irq_lock; /* Protects IRQ operations */
-- 
2.27.0

[-- Attachment #1.1.2: Type: text/html , Size: 45260 bytes --]

^ permalink raw reply related

* [PATCH net-next v2 2/3] net/ethernet/zte/dinghai: add logging infrastructure
From: Junyang Han @ 2026-04-22 14:49 UTC (permalink / raw)
  To: netdev
  Cc: davem, andrew+netdev, edumazet, kuba, pabeni, han.junyang,
	ran.ming, han.chengfei, zhang.yanze
In-Reply-To: <20260422144901.2403456-1-han.junyang@zte.com.cn>


[-- Attachment #1.1.1: Type: text/plain, Size: 7392 bytes --]

Introduce logging macros (DH_LOG_EMERG/ALERT/CRIT/ERR/WARN/INFO/DBG)
and helper definitions for ZTE DingHai driver debugging.

Signed-off-by: Junyang Han <han.junyang@zte.com.cn>
---
 drivers/net/ethernet/zte/dinghai/dh_log.h | 60 +++++++++++++++++++++++
 drivers/net/ethernet/zte/dinghai/en_pf.c  | 49 +++++++++++++++---
 2 files changed, 101 insertions(+), 8 deletions(-)
 create mode 100644 drivers/net/ethernet/zte/dinghai/dh_log.h

diff --git a/drivers/net/ethernet/zte/dinghai/dh_log.h b/drivers/net/ethernet/zte/dinghai/dh_log.h
new file mode 100644
index 000000000000..488c1968ae73
--- /dev/null
+++ b/drivers/net/ethernet/zte/dinghai/dh_log.h
@@ -0,0 +1,60 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * ZTE DingHai Ethernet driver - logging infrastructure
+ * Copyright (c) 2022-2026, ZTE Corporation.
+ */
+
+#ifndef __DH_LOG_H__
+#define __DH_LOG_H__
+
+#include <linux/device.h>
+#include <linux/kernel.h>
+#include <linux/printk.h>
+
+#define MODULE_CMD		"zxdh_cmd"
+#define MODULE_NP		"zxdh_np"
+#define MODULE_PF		"zxdh_pf"
+#define MODULE_PTP		"zxdh_ptp"
+#define MODULE_TSN		"zxdh_tsn"
+#define MODULE_LAG		"zxdh_lag"
+#define MODULE_DHTOOLS		"zxdh_tool"
+#define MODULE_SEC		"zxdh_sec"
+#define MODULE_MPF		"zxdh_mpf"
+#define MODULE_FUC_HP		"zxdh_func_hp"
+#define MODULE_UACCE		"zxdh_uacce"
+#define MODULE_HEAL		"zxdh_health"
+
+#define DH_LOG_EMERG(module, __dev, fmt, arg...)			\
+	dev_emerg((__dev)->device, "[%s][%s][%d] " fmt,			\
+		  module, __func__, __LINE__, ##arg)
+
+#define DH_LOG_ALERT(module, __dev, fmt, arg...)			\
+	dev_alert((__dev)->device, "[%s][%s][%d] " fmt,			\
+		  module, __func__, __LINE__, ##arg)
+
+#define DH_LOG_CRIT(module, __dev, fmt, arg...)			\
+	dev_crit((__dev)->device, "[%s][%s][%d] " fmt,			\
+		  module, __func__, __LINE__, ##arg)
+
+#define DH_LOG_ERR(module, __dev, fmt, arg...)			\
+	dev_err((__dev)->device, "[%s][%s][%d] " fmt,			\
+		 module, __func__, __LINE__, ##arg)
+
+#define DH_LOG_WARNING(module, __dev, fmt, arg...)		\
+	dev_warn((__dev)->device, "[%s][%s][%d] " fmt,			\
+		 module, __func__, __LINE__, ##arg)
+
+#define DH_LOG_INFO(module, __dev, fmt, arg...)			\
+	dev_info((__dev)->device, "[%s][%s][%d] " fmt,			\
+		  module, __func__, __LINE__, ##arg)
+
+#define DH_LOG_DEBUG(module, __dev, fmt, arg...)			\
+	dev_dbg((__dev)->device, "[%s][%s][%d] " fmt,			\
+		 module, __func__, __LINE__, ##arg)
+
+#define LOG_ERR(__dev, fmt, arg...)		DH_LOG_ERR(MODULE_PF, __dev, fmt, ##arg)
+#define LOG_INFO(__dev, fmt, arg...)		DH_LOG_INFO(MODULE_PF, __dev, fmt, ##arg)
+#define LOG_DEBUG(__dev, fmt, arg...)		DH_LOG_DEBUG(MODULE_PF, __dev, fmt, ##arg)
+#define LOG_WARN(__dev, fmt, arg...)		DH_LOG_WARNING(MODULE_PF, __dev, fmt, ##arg)
+
+#endif /* __DH_LOG_H__ */
diff --git a/drivers/net/ethernet/zte/dinghai/en_pf.c b/drivers/net/ethernet/zte/dinghai/en_pf.c
index 5e13a8c24a28..70dad28de544 100644
--- a/drivers/net/ethernet/zte/dinghai/en_pf.c
+++ b/drivers/net/ethernet/zte/dinghai/en_pf.c
@@ -8,6 +8,7 @@
 #include <linux/pci.h>
 #include <net/devlink.h>
 #include "en_pf.h"
+#include "dh_log.h"
 
 MODULE_AUTHOR("Junyang Han <han.junyang@zte.com.cn>");
 MODULE_DESCRIPTION("ZTE Corporation network adapters (DingHai series) Ethernet driver");
@@ -31,26 +32,34 @@ static int dh_pf_pci_init(struct dh_core_dev *dev)
 	pci_set_drvdata(dev->pdev, dev);
 
 	ret = pci_enable_device(dev->pdev);
-	if (ret)
+	if (ret) {
+		LOG_ERR(dev, "pci_enable_device failed: %d\n", ret);
 		return -ENOMEM;
+	}
 
 	ret = dma_set_mask_and_coherent(dev->device, DMA_BIT_MASK(64));
 	if (ret) {
 		ret = dma_set_mask_and_coherent(dev->device, DMA_BIT_MASK(32));
-		if (ret)
+		if (ret) {
+			LOG_ERR(dev, "dma_set_mask_and_coherent failed: %d\n", ret);
 			goto err_pci;
+		}
 	}
 
 	ret = pci_request_selected_regions(dev->pdev,
 					   pci_select_bars(dev->pdev, IORESOURCE_MEM),
 					   "dh-pf");
-	if (ret)
+	if (ret) {
+		LOG_ERR(dev, "pci_request_selected_regions failed: %d\n", ret);
 		goto err_pci;
+	}
 
 	pci_set_master(dev->pdev);
 	ret = pci_save_state(dev->pdev);
-	if (ret)
+	if (ret) {
+		LOG_ERR(dev, "pci_save_state failed: %d\n", ret);
 		goto err_pci_save_state;
+	}
 
 	pf_dev = dev->priv;
 	pf_dev->pci_ioremap_addr[0] =
@@ -58,6 +67,9 @@ static int dh_pf_pci_init(struct dh_core_dev *dev)
 				  pci_resource_len(dev->pdev, 0));
 	if (!pf_dev->pci_ioremap_addr[0]) {
 		ret = -ENOMEM;
+		LOG_ERR(dev, "ioremap(0x%llx, 0x%llx) failed\n",
+			pci_resource_start(dev->pdev, 0),
+			pci_resource_len(dev->pdev, 0));
 		goto err_pci_save_state;
 	}
 
@@ -87,10 +99,13 @@ static int dh_pf_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	struct devlink *devlink = NULL;
 	int ret = 0;
 
+	dev_info(&pdev->dev, "pf level start\n");
 	devlink = devlink_alloc(&dh_pf_devlink_ops, sizeof(struct dh_core_dev),
 				&pdev->dev);
-	if (!devlink)
+	if (!devlink) {
+		dev_err(&pdev->dev, "devlink alloc failed\n");
 		return -ENOMEM;
+	}
 
 	dh_dev = devlink_priv(devlink);
 	dh_dev->device = &pdev->dev;
@@ -98,8 +113,10 @@ static int dh_pf_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	dh_dev->devlink = devlink;
 
 	pf_dev = dh_core_alloc_priv(dh_dev, sizeof(*pf_dev));
-	if (!pf_dev)
-		return -ENOMEM;
+	if (!pf_dev) {
+		LOG_ERR(dh_dev, "zxdh_pf_dev alloc failed\n");
+		goto err_pf_dev;
+	}
 
 	pf_dev->bar_chan_valid = false;
 	pf_dev->vepa = false;
@@ -107,10 +124,17 @@ static int dh_pf_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	mutex_init(&pf_dev->irq_lock);
 
 	dh_dev->coredev_type = GET_COREDEV_TYPE(pdev);
+	LOG_DEBUG(dh_dev, "%s device: %s\n",
+		  (dh_dev->coredev_type == DH_COREDEV_PF) ? "PF" : "VF",
+		  pci_name(pdev));
 
 	ret = dh_pf_pci_init(dh_dev);
-	if (ret)
+	if (ret) {
+		LOG_ERR(dh_dev, "dh_pf_pci_init failed: %d\n", ret);
 		goto err_cfg_init;
+	}
+
+	LOG_INFO(dh_dev, "pf level completed\n");
 
 	return 0;
 
@@ -118,6 +142,7 @@ static int dh_pf_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	mutex_destroy(&pf_dev->irq_lock);
 	mutex_destroy(&dh_dev->lock);
 	dh_core_free_priv(dh_dev);
+err_pf_dev:
 	devlink_free(devlink);
 	return -EPERM;
 }
@@ -128,12 +153,16 @@ static void dh_pf_remove(struct pci_dev *pdev)
 	struct devlink *devlink = priv_to_devlink(dh_dev);
 	struct zxdh_pf_device *pf_dev = dh_dev->priv;
 
+	LOG_INFO(dh_dev, "pf level start\n");
+
 	dh_pf_pci_close(dh_dev);
 	mutex_destroy(&pf_dev->irq_lock);
 	mutex_destroy(&dh_dev->lock);
 	dh_core_free_priv(dh_dev);
 	devlink_free(devlink);
 	pci_set_drvdata(pdev, NULL);
+
+	LOG_INFO(dh_dev, "pf level completed\n");
 }
 
 static void dh_pf_shutdown(struct pci_dev *pdev)
@@ -142,6 +171,8 @@ static void dh_pf_shutdown(struct pci_dev *pdev)
 	struct devlink *devlink = priv_to_devlink(dh_dev);
 	struct zxdh_pf_device *pf_dev = dh_dev->priv;
 
+	LOG_INFO(dh_dev, "pf level start\n");
+
 	dh_pf_pci_close(dh_dev);
 	mutex_destroy(&pf_dev->irq_lock);
 	mutex_destroy(&dh_dev->lock);
@@ -149,6 +180,8 @@ static void dh_pf_shutdown(struct pci_dev *pdev)
 	devlink_free(devlink);
 
 	pci_set_drvdata(pdev, NULL);
+
+	LOG_INFO(dh_dev, "pf level completed\n");
 }
 
 static struct pci_driver dh_pf_driver = {
-- 
2.27.0

[-- Attachment #1.1.2: Type: text/html , Size: 17292 bytes --]

^ permalink raw reply related

* [PATCH net-next v2 1/3] net/ethernet: add ZTE network driver support
From: Junyang Han @ 2026-04-22 14:48 UTC (permalink / raw)
  To: netdev
  Cc: davem, andrew+netdev, edumazet, kuba, pabeni, han.junyang,
	ran.ming, han.chengfei, zhang.yanze
In-Reply-To: <20260422144901.2403456-1-han.junyang@zte.com.cn>


[-- Attachment #1.1.1: Type: text/plain, Size: 11298 bytes --]

Add basic framework for ZTE DingHai ethernet PF driver, including
Kconfig/Makefile build support and PCIe device probe/remove skeleton.

Signed-off-by: Junyang Han <han.junyang@zte.com.cn>
---
 MAINTAINERS                               |   6 +
 drivers/net/ethernet/Kconfig              |   1 +
 drivers/net/ethernet/Makefile             |   1 +
 drivers/net/ethernet/zte/Kconfig          |  20 +++
 drivers/net/ethernet/zte/Makefile         |   6 +
 drivers/net/ethernet/zte/dinghai/Kconfig  |  34 +++++
 drivers/net/ethernet/zte/dinghai/Makefile |  10 ++
 drivers/net/ethernet/zte/dinghai/en_pf.c  | 162 ++++++++++++++++++++++
 drivers/net/ethernet/zte/dinghai/en_pf.h  |  62 +++++++++
 9 files changed, 302 insertions(+)
 create mode 100644 drivers/net/ethernet/zte/Kconfig
 create mode 100644 drivers/net/ethernet/zte/Makefile
 create mode 100644 drivers/net/ethernet/zte/dinghai/Kconfig
 create mode 100644 drivers/net/ethernet/zte/dinghai/Makefile
 create mode 100644 drivers/net/ethernet/zte/dinghai/en_pf.c
 create mode 100644 drivers/net/ethernet/zte/dinghai/en_pf.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 65902b97f5df..92ddac4bb310 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -29210,6 +29210,12 @@ S:	Maintained
 T:	git git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git
 F:	sound/hda/codecs/senarytech.c
 
+ZTE DINGHAI ETHERNET DRIVER
+M:	Junyang Han <han.junyang@zte.com.cn>
+L:	netdev@vger.kernel.org
+S:	Maintained
+F:	drivers/net/ethernet/zte/
+
 THE REST
 M:	Linus Torvalds <torvalds@linux-foundation.org>
 L:	linux-kernel@vger.kernel.org
diff --git a/drivers/net/ethernet/Kconfig b/drivers/net/ethernet/Kconfig
index bdc29d143160..ecc6fbb01510 100644
--- a/drivers/net/ethernet/Kconfig
+++ b/drivers/net/ethernet/Kconfig
@@ -190,5 +190,6 @@ source "drivers/net/ethernet/wangxun/Kconfig"
 source "drivers/net/ethernet/wiznet/Kconfig"
 source "drivers/net/ethernet/xilinx/Kconfig"
 source "drivers/net/ethernet/xircom/Kconfig"
+source "drivers/net/ethernet/zte/Kconfig"
 
 endif # ETHERNET
diff --git a/drivers/net/ethernet/Makefile b/drivers/net/ethernet/Makefile
index 6bffb60ba644..7476af77d6c8 100644
--- a/drivers/net/ethernet/Makefile
+++ b/drivers/net/ethernet/Makefile
@@ -106,3 +106,4 @@ obj-$(CONFIG_NET_VENDOR_XIRCOM) += xircom/
 obj-$(CONFIG_NET_VENDOR_SYNOPSYS) += synopsys/
 obj-$(CONFIG_NET_VENDOR_PENSANDO) += pensando/
 obj-$(CONFIG_OA_TC6) += oa_tc6.o
+obj-$(CONFIG_NET_VENDOR_ZTE) += zte/
diff --git a/drivers/net/ethernet/zte/Kconfig b/drivers/net/ethernet/zte/Kconfig
new file mode 100644
index 000000000000..b95c2fc7db77
--- /dev/null
+++ b/drivers/net/ethernet/zte/Kconfig
@@ -0,0 +1,20 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# ZTE driver configuration
+#
+
+config NET_VENDOR_ZTE
+    bool "ZTE devices"
+    default y
+    help
+      If you have a network (Ethernet) card belonging to this class, say Y.
+      Note that the answer to this question doesn't directly affect the
+      kernel: saying N will just cause the configurator to skip all
+      the questions about Zte cards. If you say Y, you will be asked
+      for your specific card in the following questions.
+
+if NET_VENDOR_ZTE
+
+source "drivers/net/ethernet/zte/dinghai/Kconfig"
+
+endif # NET_VENDOR_ZTE
diff --git a/drivers/net/ethernet/zte/Makefile b/drivers/net/ethernet/zte/Makefile
new file mode 100644
index 000000000000..cd9929b61559
--- /dev/null
+++ b/drivers/net/ethernet/zte/Makefile
@@ -0,0 +1,6 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Makefile for the ZTE device drivers
+#
+
+obj-$(CONFIG_DINGHAI) += dinghai/
diff --git a/drivers/net/ethernet/zte/dinghai/Kconfig b/drivers/net/ethernet/zte/dinghai/Kconfig
new file mode 100644
index 000000000000..94b5bd9b3c50
--- /dev/null
+++ b/drivers/net/ethernet/zte/dinghai/Kconfig
@@ -0,0 +1,34 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# ZTE DingHai Ethernet driver configuration
+#
+
+config DINGHAI
+    bool "ZTE DingHai Ethernet driver"
+    depends on NET_VENDOR_ZTE && PCI
+    select NET_DEVLINK
+    help
+      This driver supports ZTE DingHai Ethernet devices.
+
+      DingHai is a high-performance Ethernet controller that supports
+      multiple features including hardware offloading, SR-IOV, and
+      advanced virtualization capabilities.
+
+      If you say Y here, you can select specific driver variants below.
+
+      If unsure, say N.
+
+if DINGHAI
+
+config DINGHAI_PF
+    tristate "ZTE DingHai PF (Physical Function) driver"
+    help
+      This driver supports ZTE DingHai PCI Express Ethernet
+      adapters (PF).
+
+      To compile this driver as a module, choose M here. The module
+      will be named dinghai10e.
+
+      If unsure, say N.
+
+endif # DINGHAI
diff --git a/drivers/net/ethernet/zte/dinghai/Makefile b/drivers/net/ethernet/zte/dinghai/Makefile
new file mode 100644
index 000000000000..f55a8de518be
--- /dev/null
+++ b/drivers/net/ethernet/zte/dinghai/Makefile
@@ -0,0 +1,10 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Makefile for ZTE DingHai Ethernet driver
+#
+
+ccflags-y += -I$(src)
+
+obj-$(CONFIG_DINGHAI_PF) += dinghai10e.o
+dinghai10e-y := en_pf.o
+
diff --git a/drivers/net/ethernet/zte/dinghai/en_pf.c b/drivers/net/ethernet/zte/dinghai/en_pf.c
new file mode 100644
index 000000000000..5e13a8c24a28
--- /dev/null
+++ b/drivers/net/ethernet/zte/dinghai/en_pf.c
@@ -0,0 +1,162 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * ZTE DingHai Ethernet driver
+ * Copyright (c) 2022-2026, ZTE Corporation.
+ */
+
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <net/devlink.h>
+#include "en_pf.h"
+
+MODULE_AUTHOR("Junyang Han <han.junyang@zte.com.cn>");
+MODULE_DESCRIPTION("ZTE Corporation network adapters (DingHai series) Ethernet driver");
+MODULE_LICENSE("GPL");
+
+static const struct devlink_ops dh_pf_devlink_ops = {};
+
+const struct pci_device_id dh_pf_pci_table[] = {
+	{ PCI_DEVICE(ZXDH_PF_VENDOR_ID, ZXDH_PF_DEVICE_ID), 0 },
+	{ PCI_DEVICE(ZXDH_PF_VENDOR_ID, ZXDH_VF_DEVICE_ID), 0 },
+	{ 0, }
+};
+
+MODULE_DEVICE_TABLE(pci, dh_pf_pci_table);
+
+static int dh_pf_pci_init(struct dh_core_dev *dev)
+{
+	struct zxdh_pf_device *pf_dev = NULL;
+	int ret = 0;
+
+	pci_set_drvdata(dev->pdev, dev);
+
+	ret = pci_enable_device(dev->pdev);
+	if (ret)
+		return -ENOMEM;
+
+	ret = dma_set_mask_and_coherent(dev->device, DMA_BIT_MASK(64));
+	if (ret) {
+		ret = dma_set_mask_and_coherent(dev->device, DMA_BIT_MASK(32));
+		if (ret)
+			goto err_pci;
+	}
+
+	ret = pci_request_selected_regions(dev->pdev,
+					   pci_select_bars(dev->pdev, IORESOURCE_MEM),
+					   "dh-pf");
+	if (ret)
+		goto err_pci;
+
+	pci_set_master(dev->pdev);
+	ret = pci_save_state(dev->pdev);
+	if (ret)
+		goto err_pci_save_state;
+
+	pf_dev = dev->priv;
+	pf_dev->pci_ioremap_addr[0] =
+		(uint64_t)ioremap(pci_resource_start(dev->pdev, 0),
+				  pci_resource_len(dev->pdev, 0));
+	if (!pf_dev->pci_ioremap_addr[0]) {
+		ret = -ENOMEM;
+		goto err_pci_save_state;
+	}
+
+	return 0;
+
+err_pci_save_state:
+	pci_release_selected_regions(dev->pdev, pci_select_bars(dev->pdev, IORESOURCE_MEM));
+err_pci:
+	pci_disable_device(dev->pdev);
+	return ret;
+}
+
+void dh_pf_pci_close(struct dh_core_dev *dev)
+{
+	struct zxdh_pf_device *pf_dev = NULL;
+
+	pf_dev = dev->priv;
+	iounmap((void *)pf_dev->pci_ioremap_addr[0]);
+	pci_release_selected_regions(dev->pdev, pci_select_bars(dev->pdev, IORESOURCE_MEM));
+	pci_disable_device(dev->pdev);
+}
+
+static int dh_pf_probe(struct pci_dev *pdev, const struct pci_device_id *id)
+{
+	struct dh_core_dev *dh_dev = NULL;
+	struct zxdh_pf_device *pf_dev = NULL;
+	struct devlink *devlink = NULL;
+	int ret = 0;
+
+	devlink = devlink_alloc(&dh_pf_devlink_ops, sizeof(struct dh_core_dev),
+				&pdev->dev);
+	if (!devlink)
+		return -ENOMEM;
+
+	dh_dev = devlink_priv(devlink);
+	dh_dev->device = &pdev->dev;
+	dh_dev->pdev = pdev;
+	dh_dev->devlink = devlink;
+
+	pf_dev = dh_core_alloc_priv(dh_dev, sizeof(*pf_dev));
+	if (!pf_dev)
+		return -ENOMEM;
+
+	pf_dev->bar_chan_valid = false;
+	pf_dev->vepa = false;
+	mutex_init(&dh_dev->lock);
+	mutex_init(&pf_dev->irq_lock);
+
+	dh_dev->coredev_type = GET_COREDEV_TYPE(pdev);
+
+	ret = dh_pf_pci_init(dh_dev);
+	if (ret)
+		goto err_cfg_init;
+
+	return 0;
+
+err_cfg_init:
+	mutex_destroy(&pf_dev->irq_lock);
+	mutex_destroy(&dh_dev->lock);
+	dh_core_free_priv(dh_dev);
+	devlink_free(devlink);
+	return -EPERM;
+}
+
+static void dh_pf_remove(struct pci_dev *pdev)
+{
+	struct dh_core_dev *dh_dev = pci_get_drvdata(pdev);
+	struct devlink *devlink = priv_to_devlink(dh_dev);
+	struct zxdh_pf_device *pf_dev = dh_dev->priv;
+
+	dh_pf_pci_close(dh_dev);
+	mutex_destroy(&pf_dev->irq_lock);
+	mutex_destroy(&dh_dev->lock);
+	dh_core_free_priv(dh_dev);
+	devlink_free(devlink);
+	pci_set_drvdata(pdev, NULL);
+}
+
+static void dh_pf_shutdown(struct pci_dev *pdev)
+{
+	struct dh_core_dev *dh_dev = pci_get_drvdata(pdev);
+	struct devlink *devlink = priv_to_devlink(dh_dev);
+	struct zxdh_pf_device *pf_dev = dh_dev->priv;
+
+	dh_pf_pci_close(dh_dev);
+	mutex_destroy(&pf_dev->irq_lock);
+	mutex_destroy(&dh_dev->lock);
+	dh_core_free_priv(dh_dev);
+	devlink_free(devlink);
+
+	pci_set_drvdata(pdev, NULL);
+}
+
+static struct pci_driver dh_pf_driver = {
+	.name = "dinghai10e",
+	.id_table = dh_pf_pci_table,
+	.probe = dh_pf_probe,
+	.remove = dh_pf_remove,
+	.shutdown = dh_pf_shutdown,
+};
+
+module_pci_driver(dh_pf_driver);
diff --git a/drivers/net/ethernet/zte/dinghai/en_pf.h b/drivers/net/ethernet/zte/dinghai/en_pf.h
new file mode 100644
index 000000000000..a8b324adb948
--- /dev/null
+++ b/drivers/net/ethernet/zte/dinghai/en_pf.h
@@ -0,0 +1,62 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * ZTE DingHai Ethernet driver - PF header
+ * Copyright (c) 2022-2026, ZTE Corporation.
+ */
+
+#ifndef __ZXDH_EN_PF_H__
+#define __ZXDH_EN_PF_H__
+
+#include <linux/types.h>
+#include <linux/pci.h>
+#include <linux/mutex.h>
+
+#define ZXDH_PF_VENDOR_ID	0x1cf2
+#define ZXDH_PF_DEVICE_ID	0x8040
+#define ZXDH_VF_DEVICE_ID	0x8041
+
+enum dh_coredev_type {
+	DH_COREDEV_PF,
+	DH_COREDEV_VF,
+	DH_COREDEV_SF,
+	DH_COREDEV_MPF
+};
+
+struct devlink;
+
+struct dh_core_dev {
+	struct device *device;
+	enum dh_coredev_type coredev_type;
+	struct pci_dev *pdev;
+	struct devlink *devlink;
+	struct mutex lock; /* Protects device configuration */
+	void *priv;
+};
+
+struct zxdh_pf_device {
+	uint64_t pci_ioremap_addr[6];
+	bool bar_chan_valid;
+	bool vepa;
+	struct mutex irq_lock; /* Protects IRQ operations */
+};
+
+static inline void *dh_core_alloc_priv(struct dh_core_dev *dh_dev, size_t size)
+{
+	void *priv = kzalloc(size, GFP_KERNEL);
+
+	if (priv)
+		dh_dev->priv = priv;
+	return priv;
+}
+
+static inline void dh_core_free_priv(struct dh_core_dev *dh_dev)
+{
+	kfree(dh_dev->priv);
+	dh_dev->priv = NULL;
+}
+
+#define GET_COREDEV_TYPE(pdev) \
+	((pdev)->device == ZXDH_VF_DEVICE_ID ? DH_COREDEV_VF : DH_COREDEV_PF)
+
+#endif
+
-- 
2.27.0

[-- Attachment #1.1.2: Type: text/html , Size: 21496 bytes --]

^ permalink raw reply related

* [PATCH net-next v2 0/3] Add ZTE DingHai Ethernet PF driver
From: Junyang Han @ 2026-04-22 14:48 UTC (permalink / raw)
  To: netdev
  Cc: davem, andrew+netdev, edumazet, kuba, pabeni, han.junyang,
	ran.ming, han.chengfei, zhang.yanze


[-- Attachment #1.1.1: Type: text/plain, Size: 2712 bytes --]

This series adds initial support for the ZTE DingHai Ethernet controller,
a high-performance PCIe Ethernet device supporting SR-IOV, hardware
offloading, and advanced virtualization features.

Changes from v1 (addressing feedback from AndrewLunn):
- Update copyright years to 2022-2026
- Remove DRV_VERSION, MODULE_VERSION and related boilerplate
- Fix MODULE_AUTHOR to use person with email address
- Use module_pci_driver() instead of manual init/exit
- Remove empty suspend/resume callbacks
- Replace char priv[] flexible array with void *priv + kzalloc
- Switch logging from printk wrappers to dev_*() based macros
- Remove dh_helper.h and dh_log.c, simplify to dh_log.h only
- Fix variable declaration ordering (reverse Christmas tree)
- Remove unnecessary NULL check in remove and pf_dev=NULL in probe
- Fix indentation and remove unnecessary type casts
- Use kernel idiomatic "if (ret)" style

This patch series is organized as follows:
- Patch 1: Add basic driver framework (Kconfig/Makefile, probe/remove skeleton)
- Patch 2: Add logging infrastructure using kernel standard dev_* macros
- Patch 3: Add hardware register access and PCI capability scanning

This is the initial submission and only includes the PF (Physical Function)
driver. The VF (Virtual Function) driver will be submitted separately.

Junyang Han (3):
  net/ethernet: add ZTE network driver support
  net/ethernet/zte/dinghai: add logging infrastructure
  net/ethernet/zte/dinghai: add hardware register access and PCI
    capability scanning

 MAINTAINERS                                 |   6 +
 drivers/net/ethernet/Kconfig                |   1 +
 drivers/net/ethernet/Makefile               |   1 +
 drivers/net/ethernet/zte/Kconfig            |  20 +
 drivers/net/ethernet/zte/Makefile           |   6 +
 drivers/net/ethernet/zte/dinghai/Kconfig    |  34 ++
 drivers/net/ethernet/zte/dinghai/Makefile   |  10 +
 drivers/net/ethernet/zte/dinghai/dh_log.h   |  60 ++
 drivers/net/ethernet/zte/dinghai/dh_queue.h |  71 +++
 drivers/net/ethernet/zte/dinghai/en_pf.c    | 605 ++++++++++++++++++++
 drivers/net/ethernet/zte/dinghai/en_pf.h    | 100 ++++
 11 files changed, 914 insertions(+)
 create mode 100644 drivers/net/ethernet/zte/Kconfig
 create mode 100644 drivers/net/ethernet/zte/Makefile
 create mode 100644 drivers/net/ethernet/zte/dinghai/Kconfig
 create mode 100644 drivers/net/ethernet/zte/dinghai/Makefile
 create mode 100644 drivers/net/ethernet/zte/dinghai/dh_log.h
 create mode 100644 drivers/net/ethernet/zte/dinghai/dh_queue.h
 create mode 100644 drivers/net/ethernet/zte/dinghai/en_pf.c
 create mode 100644 drivers/net/ethernet/zte/dinghai/en_pf.h

-- 
2.27.0

[-- Attachment #1.1.2: Type: text/html , Size: 4876 bytes --]

^ permalink raw reply

* Re: [PATCH net-next 3/3] net/ethernet/zte/dinghai: add hardware register access and PCI capability scanning
From: Junyang Han @ 2026-04-22 14:47 UTC (permalink / raw)
  To: andrew+netdev
  Cc: davem, netdev, edumazet, kuba, pabeni, han.junyang, ran.ming,
	han.chengfei, zhang.yanze
In-Reply-To: <20260415015334.2018453-3-han.junyang@zte.com.cn>


[-- Attachment #1.1.1: Type: text/plain, Size: 4596 bytes --]

Hi Andrew,

Thank you for reviewing patch 3/3. Please see my responses below.

> +int32_t zxdh_pf_pci_find_capability(struct pci_dev *pdev, uint8_t cfg_type,
> +                    uint32_t ioresource_types, int32_t *bars)
> +{
> +    int32_t pos = 0;
> +    uint8_t type = 0;
> +    uint8_t bar = 0;
> +
> +    for (pos = pci_find_capability(pdev, PCI_CAP_ID_VNDR); pos > 0;
> +         pos = pci_find_next_capability(pdev, pos, PCI_CAP_ID_VNDR)) {
> +        pci_read_config_byte(pdev, pos + offsetof
> (struct zxdh_pf_pci_cap, cfg_type), &type);
> +        pci_read_config_byte(pdev, pos + offsetof
> (struct zxdh_pf_pci_cap, bar), &bar);

> Something odd going on with indentation? Has the mailer corrupted it?

The indentation in the original patch was correct (4-space). The mailer
appears to have corrupted the formatting in the display. Code is properly
indented in the actual patch.

> +
> +        /* ignore structures with reserved BAR values */
> +        if (bar > ZXDH_PF_MAX_BAR_VAL)
> +            continue;
> +
> +        if (type == cfg_type) {
> +            if (pci_resource_len(pdev, bar) &&
> +                pci_resource_flags(pdev, bar) & ioresource_types) {
> +                *bars |= (1 << bar);
> +                return pos;
> +            }
> +        }
> +    }
> +
> +    return 0;
> +}
> +
> +void __iomem *zxdh_pf_map_capability(struct dh_core_dev *dh_dev, int32_t off,
> +                     size_t minlen, uint32_t align,
> +                     uint32_t start, uint32_t size,
> +                     size_t *len, resource_size_t *pa,
> +                     uint32_t *bar_off)
> +    p = pci_iomap_range(pdev, bar, offset, length);
> +    if (unlikely(!p)) {

> Is this hot path? Please only use unlikely() when dealing with frames
> in the hot path.

Removed unlikely(). It's not in a hot path.

> +int32_t zxdh_pf_common_cfg_init(struct dh_core_dev *dh_dev)
> +{
> +    int32_t common = 0;
> +    struct zxdh_pf_device *pf_dev = dh_core_priv(dh_dev);
> +    struct pci_dev *pdev = dh_dev->pdev;
> +
> +    /* check for a common config: if not, use legacy mode (bar 0). */
> +    common = zxdh_pf_pci_find_capability(pdev, ZXDH_PCI_CAP_COMMON_CFG,
> +                         IORESOURCE_IO | IORESOURCE_MEM,
> +                         &pf_dev->modern_bars);
> +    if (common == 0) {
> +        LOG_ERR("missing capabilities %i, leaving for legacy driver\
> n", common);
> +        return -ENODEV;
> +    }
> +
> +    pf_dev->common = zxdh_pf_map_capability(dh_dev, common,
> +                        sizeof(struct zxdh_pf_pci_common_cfg),
> +                        ZXDH_PF_ALIGN4, 0,
> +                        sizeof(struct zxdh_pf_pci_common_cfg),
> +                        NULL, NULL, NULL);
> +    if (unlikely(!pf_dev->common)) {
> +        LOG_ERR("pf_dev->common is null\n");
> +        return -EINVAL;
> +    }
> +
> +    return 0;
> +}

> +int32_t zxdh_pf_notify_cfg_init(struct dh_core_dev *dh_dev)
> +{
> +    /* We don't know how many VQs we'll map, ahead of the time.
> +     * If notify length is small, map it all now. Otherwise, map each VQ individually later.
> +     */
> +    if ((uint64_t)notify_length + (notify_offset % PAGE_SIZE) <= PAGE_SIZE) {

> Please try to avoid casts. They suggest the types are wrong. You will
> probably have better code if you don't need the cast.

Fixed. Changed notify_length and notify_offset to size_t type.
The cast is no longer needed:
if (notify_length + (notify_offset % PAGE_SIZE) <= PAGE_SIZE)

> +int32_t zxdh_pf_modern_cfg_init(struct dh_core_dev *dh_dev)
> +{
> +    int32_t ret = 0;
> +    struct zxdh_pf_device *pf_dev = dh_core_priv(dh_dev);
> +    struct pci_dev *pdev = dh_dev->pdev;
> +
> +    ret = zxdh_pf_common_cfg_init(dh_dev);
> +    if (ret != 0) {

> if (ret)

> would be more normal.

Fixed. Changed to "if (ret)".

> +void zxdh_pf_get_vf_mac
> (struct dh_core_dev *dh_dev, uint8_t *mac, int32_t vf_id)
> +{
> +    uint32_t DEV_MAC_L = 0;
> +    uint16_t DEV_MAC_H = 0;
> +    struct zxdh_pf_device *pf_dev = dh_core_priv(dh_dev);
> +
> +    if (pf_dev->pf_sriov_cap_base) {
> +        DEV_MAC_L = ioread32((void __iomem *)(pf_dev->pf_sriov_cap_base +
> +                     (pf_dev->sriov_bar_size) * vf_id +
> +                     pf_dev->dev_cfg_bar_off));

> Is the cast needed? pf_dev->pf_sriov_cap_base should already be void *
> __iomem.

Fixed. Removed the cast. pf_sriov_cap_base is already void __iomem *.

Best regards,
Junyang Han

[-- Attachment #1.1.2: Type: text/html , Size: 10913 bytes --]

^ permalink raw reply

* Re: [PATCH] smb: smbdirect: move fs/smb/common/smbdirect/ to fs/smb/smbdirect/
From: Paulo Alcantara @ 2026-04-22 15:16 UTC (permalink / raw)
  To: Steve French, Stefan Metzmacher
  Cc: Christoph Hellwig, linux-cifs, linux-rdma, netdev,
	samba-technical, Tom Talpey, Linus Torvalds, Namjae Jeon
In-Reply-To: <CAH2r5msb3-HiPSv+HgBknEwDXGsv0xU=TGCxHdmc-VCLKzYCmw@mail.gmail.com>

Steve French <smfrench@gmail.com> writes:

> On Wed, Apr 22, 2026 at 3:16 AM Stefan Metzmacher <metze@samba.org> wrote:
>>
>> Hi Christoph,
>>
>> >> diff --git a/fs/smb/Makefile b/fs/smb/Makefile
>> >> index 9a1bf59a1a65..353b1c2eefc4 100644
>> >> --- a/fs/smb/Makefile
>> >> +++ b/fs/smb/Makefile
>> >> @@ -1,5 +1,6 @@
>> >>   # SPDX-License-Identifier: GPL-2.0
>> >>
>> >>   obj-$(CONFIG_SMBFS)                += common/
>> >> +obj-$(CONFIG_SMBDIRECT)             += smbdirect/
>> >
>> > Why is this not in net/smbdirect/ or driver/infiniband/ulp/smdirect?
>>
>> Yes, I also thought about net/smbdirect.
>
> I would prefer to leave it in fs/smb for the time being, since it makes it
> easier to track since fs/smb/server and fs/smb/client have dependencies
> on it.   In the long run, I don't mind moving it, if it starts being
> used outside
> of smb client and server.

Please let's not break backporting any further.  Decide where it will
end up at once.  We don't want the "fs/cifs -> fs/smb/client" history
all over again.

Won't samba be using it?  If so, you could consider an user outside
fs/smb/{client,server} and then leave it in net/ instead, as hch
suggested.

^ permalink raw reply

* Re: [PATCH net-next 1/3] net/ethernet: add ZTE network driver support
From: Junyang Han @ 2026-04-22 14:46 UTC (permalink / raw)
  To: andrew+netdev
  Cc: davem, netdev, edumazet, kuba, pabeni, han.junyang, ran.ming,
	han.chengfei, zhang.yanze
In-Reply-To: <20260415015334.2018453-1-han.junyang@zte.com.cn>


[-- Attachment #1.1.1: Type: text/plain, Size: 4071 bytes --]

Hi Andrew,

Thank you for your detailed review. Please see my responses below.

> Please always include a patch 0/X in a patch set, explaining the big
> picture.

Cover letter (patch 0/X) has been added in v2 explaining the big picture.

> + * ZTE DingHai Ethernet driver
> + * Copyright (c) 2022-2024, ZTE Corporation.

> And the last two years?

Copyright year updated to 2022-2026.

> +#define DRV_VERSION "1.0-1"

> Driver versions are generally useless. What does this actually mean
> for the given very limited driver? Are you going to change the version
> with each patchset?

Removed. Driver version is not needed for upstream submission.

> +#define DRV_SUMMARY "ZTE(R) zxdh-net driver"
> +
> +const char zxdh_pf_driver_version[] = DRV_VERSION;
> +static const char zxdh_pf_driver_string[] = DRV_SUMMARY;
> +static const char zxdh_pf_copyright[] = "Copyright (c)
>  2022-2024, ZTE Corporation.";

> You don't need this, you have the copyright above.

Removed all these. The copyright in the file header is sufficient.

> +MODULE_AUTHOR("ZTE");

> Author is a person, with an email address.

Fixed. Now using:
MODULE_AUTHOR("Junyang Han <han.junyang@zte.com.cn>");

> +MODULE_DESCRIPTION(DRV_SUMMARY);

> Please just put the string here, not #define.

Fixed. Now using:
MODULE_DESCRIPTION("ZTE Corporation network adapters (DingHai series) Ethernet driver");

> +MODULE_VERSION(DRV_VERSION);
> +MODULE_LICENSE("GPL");
> +static int dh_pf_pci_init(struct dh_core_dev *dev)
> +{
> +    int ret = 0;
> +    struct zxdh_pf_device *pf_dev = NULL;

> Reverse Christmas tree. This applies everywhere for a netdev driver.

Fixed. Function variable declarations now follow reverse Christmas tree.

> +static int dh_pf_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> +{

> +err_cfg_init:
> +    mutex_destroy(&pf_dev->irq_lock);
> +    mutex_destroy(&dh_dev->lock);
> +    devlink_free(devlink);
> +    pf_dev = NULL;

> Since this is a probe function, do you really need to set pf_dev to
> NULL? How is it going to keep a value over EPROBE_DEFER cycles?

Removed. pf_dev doesn't need to be set to NULL.

> +static void dh_pf_remove(struct pci_dev *pdev)
> +{
> +    struct dh_core_dev *dh_dev = pci_get_drvdata(pdev);
> +    struct devlink *devlink = priv_to_devlink(dh_dev);
> +    struct zxdh_pf_device *pf_dev = dh_core_priv(dh_dev);
> +
> +    if (!dh_dev)
> +        return;

> How does that happen?

Removed the NULL check. 

> +    dh_pf_pci_close(dh_dev);
> +    mutex_destroy(&pf_dev->irq_lock);
> +    mutex_destroy(&dh_dev->lock);
> +    devlink_free(devlink);
> +    pci_set_drvdata(pdev, NULL);
> +}

> +static int dh_pf_suspend(struct pci_dev *pdev, pm_message_t state)
> +{
> +    return 0;
> +}
> +
> +static int dh_pf_resume(struct pci_dev *pdev)
> +{
> +    return 0;
> +}

> If they do nothing, don't provide them. You can add them when you add
> suspend/resume support.

Removed. Suspend/resume will be added when needed.

> +static int __init dh_pf_pci_init_module(void)
> +{
> +    return pci_register_driver(&dh_pf_driver);
> +}
> +
> +static void __exit dh_pf_pci_exit_module(void)
> +{
> +    pci_unregister_driver(&dh_pf_driver);
> +}
> +
> +module_init(dh_pf_pci_init_module);
> +module_exit(dh_pf_pci_exit_module);

> The PCI subsystem offers a wrapper to do this.

Fixed. Now using module_pci_driver(dh_pf_driver).

> +struct dh_core_dev {
> +    struct device *device;
> +    enum dh_coredev_type coredev_type;
> +    struct pci_dev *pdev;
> +    struct devlink *devlink;
> +    struct mutex lock; /* Protects device configuration */
> +    char priv[] __aligned(32);

> That is unusual. priv is usually a void * and allocated. If you want
> an actual array, you might want to have a second member indicate the
> size of the array, look at all the work done recently on flexible
> arrays.

Fixed. Changed to void *priv, allocated dynamically with kzalloc().  

Best regards,
Junyang Han

[-- Attachment #1.1.2: Type: text/html , Size: 7872 bytes --]

^ permalink raw reply

* Re: [PATCH net 1/2] net/mlx5e: psp: Fix invalid access on PSP dev registration fail
From: Cosmin Ratiu @ 2026-04-22 15:13 UTC (permalink / raw)
  To: kuba@kernel.org
  Cc: Boris Pismenny, willemdebruijn.kernel@gmail.com,
	andrew+netdev@lunn.ch, daniel.zahka@gmail.com,
	davem@davemloft.net, leon@kernel.org,
	linux-kernel@vger.kernel.org, edumazet@google.com,
	linux-rdma@vger.kernel.org, Rahul Rameshbabu, Raed Salem,
	Dragos Tatulea, kees@kernel.org, Mark Bloch, pabeni@redhat.com,
	Tariq Toukan, Saeed Mahameed, netdev@vger.kernel.org,
	Gal Pressman
In-Reply-To: <e9d10b11f73c0ff212a5dee0b08d9ca90eca5407.camel@nvidia.com>

On Wed, 2026-04-22 at 09:25 +0000, Cosmin Ratiu wrote:
> On Tue, 2026-04-21 at 11:32 -0700, Jakub Kicinski wrote:
> > On Tue, 21 Apr 2026 17:34:32 +0000 Cosmin Ratiu wrote:
> > > > No, the normal thing to do is to propagate errors.
> > > > If you want to diverge from that _you_ should have a reason,
> > > > a better reason than a vague "kernel can fail".
> > > > I'd prefer for the driver to fail in an obvious way.
> > > > Which will be immediately spotted by the operator, not 2 weeks
> > > > later when 10% of the fleet is upgraded already.
> > > > The only exception I'd make is to keep devlink registered in
> > > > case the fix is to flash a different FW.  
> > > 
> > > In this case, PSP not working would be spotted on the next PSP
> > > dev-
> > > get
> > > op which produces zilch instead of working devices.
> > 
> > When you have X vendors times Y device generations times Z FW
> > versions
> > in your fleet dev-get returning nothing is not a failure. It just
> > means
> > you're running on a machine that's not capable. Best you can do to
> > spot a buggy kernel is to notice that the fraction of PSP traffic
> > is
> > decreasing over time. After significant portion of the fleet is
> > already
> > on the bad kernel.
> > 
> > > But I understand what you want. You'd like the netdevice to
> > > either
> > > be
> > > fully initialized with all supported+configured protocols or fail
> > > the
> > > open operation. No intermediate/partial states. This is a non-
> > > trivial
> > > refactor for mlx5, because mlx5_nic_enable() returns nothing.
> > > Refactoring seems possible though, its only caller is
> > > mlx5e_attach_netdev(), which returns errors. It's certainly not
> > > something that should be done for a net fix though.
> > > 
> > > I have a series pending for net-next where the PSP configuration
> > > is
> > > hooked to mlx5e_psp_set_config(). I will look into implementing
> > > what
> > > you propose there and propagate errors.
> > > 
> > > Meanwhile, do you want to take these fixes (1 and 2) or maybe
> > > just
> > > 2
> > > for net or not?
> > 
> > Can you call mlx5e_psp_cleanup() when register fails for now?
> 
> Done for the next version, currently undergoing testing.

There's a snag: priv->psp may be accessed concurrently from
mlx5e_get_stats() -> mlx5e_fold_sw_stats64() so we'd need to play
tricks with RCU and that goes beyond what a net fix should be: It's a
redesign of how priv->psp is handled in the driver. There's a risk we
are missing things, or it becomes more intrusive that what a fix should
be.

I would like to ask you: let's please not do this redesign of priv->psp
in a rush, and leave it for the net-next series I mentioned...

To reiterate, would you like to take patch 2?

Cosmin.

^ permalink raw reply

* Re: [PATCH] ipv6: udp: fix memory leak in udpv6_sendmsg error path
From: Jakub Kicinski @ 2026-04-22 15:04 UTC (permalink / raw)
  To: Mingyu Wang
  Cc: willemdebruijn.kernel, davem, dsahern, edumazet, pabeni, horms,
	netdev, linux-kernel
In-Reply-To: <20260422105802.486216-1-25181214217@stu.xidian.edu.cn>

On Wed, 22 Apr 2026 18:58:02 +0800 Mingyu Wang wrote:
> During fuzzing with failslab enabled, a memory leak was observed in the
> IPv6 UDP send path.
> 
> When sending via the lockless fast path (!corkreq), udpv6_sendmsg()
> calls ip6_make_skb() and assumes that the routing entry (dst_entry)
> reference has been stolen by the callee. However, if ip6_make_skb()
> fails early (e.g., due to an ENOMEM from memory allocation failure),
> it returns an error pointer without consuming the dst reference.
> 
> Since udpv6_sendmsg() unconditionally jumps to the 'out_no_dst' label,
> the unconsumed dst_entry is never released, resulting in a memory leak.
> 
> Fix this by explicitly calling dst_release(dst) when ip6_make_skb()
> returns an error.

Test this with cmsg_ip.sh on a debug-enabled kernel before you repost.
I think it causes crashes there.
-- 
pw-bot: cr

^ permalink raw reply

* [PATCH net 1/1] net: rds: fix MR cleanup on copy error
From: Ren Wei @ 2026-04-22 14:52 UTC (permalink / raw)
  To: netdev, linux-rdma, rds-devel
  Cc: achender, davem, edumazet, kuba, pabeni, horms, leon,
	santosh.shilimkar, jhubbard, yuantan098, yifanwucs, tomapufckgml,
	bird, draw51280, n05ec
In-Reply-To: <cover.1776764247.git.draw51280@163.com>

From: Ao Zhou <draw51280@163.com>

__rds_rdma_map() hands sg/pages ownership to the transport after
get_mr() succeeds. If copying the generated cookie back to user space
fails after that point, the error path must not free those resources
again before dropping the MR reference.

Remove the duplicate unpin/free from the put_user() failure branch so
that MR teardown is handled only through the existing final cleanup
path.

Fixes: 0d4597c8c5ab ("net/rds: Track user mapped pages through special API")
Cc: stable@kernel.org
Reported-by: Yuan Tan <yuantan098@gmail.com>
Reported-by: Yifan Wu <yifanwucs@gmail.com>
Reported-by: Juefei Pu <tomapufckgml@gmail.com>
Reported-by: Xin Liu <bird@lzu.edu.cn>
Signed-off-by: Ao Zhou <draw51280@163.com>
Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
---
 net/rds/rdma.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/net/rds/rdma.c b/net/rds/rdma.c
index aa6465dc742c..61fb6e45281b 100644
--- a/net/rds/rdma.c
+++ b/net/rds/rdma.c
@@ -326,10 +326,6 @@ static int __rds_rdma_map(struct rds_sock *rs, struct rds_get_mr_args *args,
 
 	if (args->cookie_addr &&
 	    put_user(cookie, (u64 __user *)(unsigned long)args->cookie_addr)) {
-		if (!need_odp) {
-			unpin_user_pages(pages, nr_pages);
-			kfree(sg);
-		}
 		ret = -EFAULT;
 		goto out;
 	}
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH] smb: smbdirect: move fs/smb/common/smbdirect/ to fs/smb/smbdirect/
From: Steve French @ 2026-04-22 14:49 UTC (permalink / raw)
  To: Stefan Metzmacher
  Cc: Christoph Hellwig, linux-cifs, linux-rdma, netdev,
	samba-technical, Tom Talpey, Linus Torvalds, Namjae Jeon
In-Reply-To: <9cb0901c-18c5-4858-941c-3b37ee112af9@samba.org>

On Wed, Apr 22, 2026 at 3:16 AM Stefan Metzmacher <metze@samba.org> wrote:
>
> Hi Christoph,
>
> >> diff --git a/fs/smb/Makefile b/fs/smb/Makefile
> >> index 9a1bf59a1a65..353b1c2eefc4 100644
> >> --- a/fs/smb/Makefile
> >> +++ b/fs/smb/Makefile
> >> @@ -1,5 +1,6 @@
> >>   # SPDX-License-Identifier: GPL-2.0
> >>
> >>   obj-$(CONFIG_SMBFS)                += common/
> >> +obj-$(CONFIG_SMBDIRECT)             += smbdirect/
> >
> > Why is this not in net/smbdirect/ or driver/infiniband/ulp/smdirect?
>
> Yes, I also thought about net/smbdirect.

I would prefer to leave it in fs/smb for the time being, since it makes it
easier to track since fs/smb/server and fs/smb/client have dependencies
on it.   In the long run, I don't mind moving it, if it starts being
used outside
of smb client and server.


> As IPPROTO_SMBDIRECT or PF_SMBDIRECT will be the next step,
> see the open discussion here:
> https://lore.kernel.org/linux-cifs/cover.1775571957.git.metze@samba.org/
> (I'll follow with that discussion soon)
>
> I was just unsure about the consequences, e.g. would
> the maintainer/pull request flow have to change in that case?
> Or would Steve be able to take the changes via his trees?
> Any I also didn't want to offend anybody, so I just took
> what Linus proposed.
>
> Using driver/infiniband/ulp/smdirect would also work,
> if everybody prefer that.
>
> > As far as I can tell there is zero file system logic in this code.
> >
> >> -#include "../common/smbdirect/smbdirect_public.h"
> >> +#include "../smbdirect/public.h"
> >
> > And all these relative includes suggest you really want a
> > include/linux/smdirect/ instead.
>
> Yes, that's my also my goal in the next steps.
>
> > While we're at it: __SMBDIRECT_EXPORT_SYMBOL__ is really odd.
> > One thing is the __ pre- and postfix that make it look weird.
>
> Yes, the __SMBDIRECT_EXPORT_SYMBOL__ was mainly a temporary
> thing, now it's useless and I'll remove it.
>
> > The other is that EXPORT_SYMBOL_FOR_MODULES is for very specific
> > symbols that really should not exported.  What this warrants instead
> > is a normal EXPORT_SYMBOL_NS_GPL.
>
> I want the exported functions be minimal, as most of
> of should go via the socket layer instead.
>
> If EXPORT_SYMBOL_NS_GPL(func, "smbdirect") is better than
> EXPORT_SYMBOL_FOR_MODULES() I can change that.
>
> It means cifs.ko and ksmbd.ko would need MODULE_IMPORT_NS("smbdirect"), correct?
>
> Thanks!
> metze



-- 
Thanks,

Steve

^ permalink raw reply

* Re: [PATCH] mptcp: do not drop partial packets
From: Shardul Bankar @ 2026-04-22 14:40 UTC (permalink / raw)
  To: Paolo Abeni, matttbe, martineau
  Cc: geliang, davem, edumazet, kuba, horms, netdev, mptcp,
	linux-kernel, janak, kalpan.jani
In-Reply-To: <fc2ca10a-1fa8-48f8-b140-6c59595fc08b@redhat.com>

On Wed, 2026-04-22 at 15:51 +0200, Paolo Abeni wrote:
> On 4/22/26 2:09 PM, Shardul Bankar wrote:
> > +
> > +       /* Partial packet: map_seq < ack_seq < end_seq.
> > +        * Skip the already-acked bytes and enqueue the new data.
> >          */
> > -       MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_DUPDATA);
> > -       mptcp_drop(sk, skb);
> > -       return false;
> > +       copy_len = MPTCP_SKB_CB(skb)->end_seq - msk->ack_seq;
> > +       MPTCP_SKB_CB(skb)->offset += msk->ack_seq -
> > MPTCP_SKB_CB(skb)->map_seq;
> 
> here MPTCP_SKB_CB(skb)->offset is always != 0 ...
> 
> > +       msk->bytes_received += copy_len;
> > +       WRITE_ONCE(msk->ack_seq, msk->ack_seq + copy_len);
> > +       tail = skb_peek_tail(&sk->sk_receive_queue);
> > +       if (tail && mptcp_try_coalesce(sk, tail, skb))
> 
> ... so mptcp_try_coalesce() will always fail.

Thanks, good catch. Fixed in v2.

Regards,
Shardul

^ permalink raw reply

* [PATCH v2] mptcp: do not drop partial packets
From: Shardul Bankar @ 2026-04-22 14:39 UTC (permalink / raw)
  To: pabeni, matttbe, martineau
  Cc: geliang, davem, edumazet, kuba, horms, netdev, mptcp,
	linux-kernel, janak, kalpan.jani, shardulsb08, Shardul Bankar

When a packet arrives with map_seq < ack_seq < end_seq, the beginning
of the packet has already been acknowledged but the end contains new
data.  Currently the entire packet is dropped as "old data," forcing
the sender to retransmit.

Instead, skip the already-acked bytes by adjusting the skb offset and
enqueue only the new portion.  Update bytes_received and ack_seq to
reflect the new data consumed.

A previous attempt at this fix (commit 1d2ce718811a ("mptcp: do not
drop partial packets"), reverted in commit bf39160c4218 ("Revert
"mptcp: do not drop partial packets"")) also added a zero-window
check and changed rcv_wnd_sent initialization, which caused test
regressions.  This version addresses only the partial packet handling
without modifying receive window accounting.

Fixes: ab174ad8ef76 ("mptcp: move ooo skbs into msk out of order queue.")
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/600
Signed-off-by: Shardul Bankar <shardul.b@mpiricsoftware.com>
---
v2: Drop the mptcp_try_coalesce() attempt for partial packets, since
    non-zero offset always prevents coalescing (Paolo).

 net/mptcp/protocol.c | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 614c3f583ca0..73ec6563ab2b 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -397,12 +397,24 @@ static bool __mptcp_move_skb(struct sock *sk, struct sk_buff *skb)
 		return false;
 	}
 
-	/* old data, keep it simple and drop the whole pkt, sender
-	 * will retransmit as needed, if needed.
+	/* Completely old data? */
+	if (!after64(MPTCP_SKB_CB(skb)->end_seq, msk->ack_seq)) {
+		MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_DUPDATA);
+		mptcp_drop(sk, skb);
+		return false;
+	}
+
+	/* Partial packet: map_seq < ack_seq < end_seq.
+	 * Skip the already-acked bytes and enqueue the new data.
 	 */
-	MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_DUPDATA);
-	mptcp_drop(sk, skb);
-	return false;
+	copy_len = MPTCP_SKB_CB(skb)->end_seq - msk->ack_seq;
+	MPTCP_SKB_CB(skb)->offset += msk->ack_seq - MPTCP_SKB_CB(skb)->map_seq;
+	msk->bytes_received += copy_len;
+	WRITE_ONCE(msk->ack_seq, msk->ack_seq + copy_len);
+
+	skb_set_owner_r(skb, sk);
+	__skb_queue_tail(&sk->sk_receive_queue, skb);
+	return true;
 }
 
 static void mptcp_stop_rtx_timer(struct sock *sk)
-- 
2.34.1


^ permalink raw reply related

* Re: [PATCH net-next v2 14/14] net: macb: use context swapping in .ndo_change_mtu()
From: Théo Lebrun @ 2026-04-22 14:35 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: nicolas.ferre, claudiu.beznea, andrew+netdev, davem, edumazet,
	pabeni, richardcochran, linux, pvalerio, conor, nb,
	vladimir.kondratiev, gregory.clement, benoit.monin, tawfik.bayouk,
	thomas.petazzoni, maxime.chevallier, netdev, linux-kernel
In-Reply-To: <20260414005657.3767049-1-kuba@kernel.org>

Hello Jakub,

On Tue Apr 14, 2026 at 2:56 AM CEST, Jakub Kicinski wrote:
> This is an AI-generated review of your patch. The human sending this
> email has considered the AI review valid, or at least plausible.
> ---
> net: macb: use context swapping in .ndo_change_mtu()
>
> This commit introduces a mechanism to perform context swapping when changing
> the MTU while the interface is running. It allocates a new context,
> reconfigures the hardware via a start and end swap process, and then frees
> the old context to avoid PHY reinitialization and gracefully handle memory
> pressure.
>
>> diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
>> index d1536cb1e40bb..79371718f8392 100644
>> --- a/drivers/net/ethernet/cadence/macb_main.c
>> +++ b/drivers/net/ethernet/cadence/macb_main.c
>> @@ -3430,11 +3430,25 @@ static int macb_close(struct net_device *netdev)
>>  
>>  static int macb_change_mtu(struct net_device *netdev, int new_mtu)
>>  {
>> -	if (netif_running(netdev))
>> -		return -EBUSY;
>> +	struct macb *bp = netdev_priv(netdev);
>> +	bool running = netif_running(netdev);
>> +	struct macb_context *new_ctx;
>> +
>> +	if (running) {
>> +		new_ctx = macb_context_alloc(bp, new_mtu,
>> +					     bp->configured_rx_ring_size,
>> +					     bp->configured_tx_ring_size);
>> +		if (IS_ERR(new_ctx))
>> +			return PTR_ERR(new_ctx);
>> +
>> +		macb_context_swap_start(bp);
>
> Does macb_context_swap_start() gracefully halt the transmitter before the
> context is freed?
>
> It appears to attempt disabling the hardware transmitter by clearing the
> TE and RE bits in the NCR register, but does not poll the hardware to ensure
> it has reached an idle state.
>
> Immediately after, macb_context_swap_end() frees the descriptor rings and data
> buffers via macb_free_consistent().
>
> If the MAC DMA engine is still mid-transfer, could it read or write to kernel
> memory that has already been freed or reallocated?

Yes and we already have the solution in macb_halt_tx(), used by
macb_tx_error_task(). Close and swap should also use it. Will fix.

Though from the manual's description of TE it sounds super unlikely.

Rx isn't an issue, modifying RE in NCR is the way to go.

> Can this code cause an AB-BA deadlock between napi_disable() and
> cancel_work_sync()?
>
> In macb_context_swap_start(), the code calls napi_disable() for all queues,
> and then immediately calls cancel_work_sync() on the error tasks.
>
> Both of these work items internally invoke napi_disable().
>
> If either error task runs concurrently, it will block forever inside its
> napi_disable() waiting for the NAPI state bits (which were already set by the
> first call in macb_context_swap_start()) to clear.
>
> Consequently, could cancel_work_sync() hang indefinitely waiting for the
> blocked work task to finish?

Wrong because napi_disable() grabs the netdev->lock mutex.

>>  	}
>>  
>>  	WRITE_ONCE(netdev->mtu, new_mtu);
>>  
>> +	if (running)
>> +		macb_context_swap_end(bp, new_ctx);
>
> Does macb_context_swap_end() leak pending TX buffers and their DMA mappings?

> When changing the MTU while the interface is running, macb_context_swap_end()
> calls macb_free_consistent() to free the old DMA context. While RX buffers are
> properly freed, for the TX queues it appears to only free the tracking array
> pointer without iterating through the ring to unmap the DMA mappings and free
> the actual buffers.
>
> Any packets queued in the hardware ring at the moment of the MTU swap might
> be permanently leaked, which could lead to memory and IOMMU space exhaustion
> under active traffic.

Yes, I have a pending patch for this. Will send soon. It is a bugfix
fixing the close codepath and we'll reuse the code in the swap
operation.

> Will MTU changes on EMAC devices permanently disable hardware interrupts?
>
> macb_context_swap_start() disables all interrupts unconditionally by writing
> -1 to the IDR register.
>
> However, macb_context_swap_end() restores interrupts via the IER register
> inside an explicit check:
>
>     if (!(bp->caps & MACB_CAPS_MACB_IS_EMAC)) {
>         ...
>     }
>
> Since there is no else branch to re-enable interrupts for EMAC devices,
> could changing the MTU on these devices leave interrupts permanently masked
> and cause the network interface to silently stall?

Ignoring because we will refuse swap for EMAC devices. Their state is
too different; they have different ndo callbacks for everything and
don't use NAPI.

> Is there a missing spinlock when modifying the IER and NCR registers in
> macb_context_swap_end()?
>
> Modifications to the NCR and interrupt registers in macb_context_swap_start()
> are protected by the spinlock. However, macb_context_swap_end() performs
> lockless read-modify-write operations on both IER and NCR:
>
>     ctrl = macb_readl(bp, NCR);
>     macb_writel(bp, NCR, ctrl | MACB_BIT(RE) | MACB_BIT(TE));
>
> If concurrent hardware events or background tasks take the lock and modify
> NCR during this window, could their updates be silently overwritten?

Yes swap_end needs to wrap its access in the bp->lock spinlock to have
the read-modify-write operation be atomic.

Thanks,

--
Théo Lebrun, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


^ permalink raw reply

* Re: [PATCH bpf-next] selftests/bpf: drop xdping tool
From: Alan Maguire @ 2026-04-22 14:30 UTC (permalink / raw)
  To: Alexis Lothoré (eBPF Foundation), Andrii Nakryiko,
	Eduard Zingerman, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Kumar Kartikeya Dwivedi, Song Liu,
	Yonghong Song, Jiri Olsa, Shuah Khan, David S. Miller,
	Jakub Kicinski, Jesper Dangaard Brouer, John Fastabend,
	Stanislav Fomichev
  Cc: ebpf, Bastien Curutchet, Thomas Petazzoni, linux-kernel, bpf,
	linux-kselftest, netdev
In-Reply-To: <20260417-xdping-v1-1-9b0ce0e7adf8@bootlin.com>

On 17/04/2026 16:33, Alexis Lothoré (eBPF Foundation) wrote:
> As part of a larger cleanup effort in the bpf selftests directory,
> tests and scripts are either being converted to the test_progs framework
> (so they are executed automatically in bpf CI), or removed if not
> relevant for such integration.
> 
> The test_xdping.sh script (with the associated xdping.c) acts as a RTT
> measurement tool, by attaching two small xdp programs to two interfaces.
> Converting this test to test_progs may not make much sense:
> - RTT measurement does not really fit in the scope of a functional test,
>   this is rather about measuring some performance level.
> - there are other existing tests in test_progs that actively validate
>   XDP features like program attachment, return value processing, packet
>   modification, etc
> 
> Drop test_xdping.sh and the corresponding xdping.c userspace part. Keep
> the ebpf part (xdping_kern.c), as it is used by another test integrated
> in test_progs (btf_dump)
> 
> Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>

Reviewed-by: Alan Maguire <alan.maguire@oracle.com>

as discussed, switching to loading xdp_dummy.bpf.o in prog_tests/btf_dump.c 
would be good too (feel free to retain the Reviewed-by: with that v2 change).

Thanks!

 
> ---
>  tools/testing/selftests/bpf/.gitignore     |   1 -
>  tools/testing/selftests/bpf/Makefile       |   3 -
>  tools/testing/selftests/bpf/test_xdping.sh | 103 ------------
>  tools/testing/selftests/bpf/xdping.c       | 254 -----------------------------
>  4 files changed, 361 deletions(-)
> 
> diff --git a/tools/testing/selftests/bpf/.gitignore b/tools/testing/selftests/bpf/.gitignore
> index bfdc5518ecc8..986a6389186b 100644
> --- a/tools/testing/selftests/bpf/.gitignore
> +++ b/tools/testing/selftests/bpf/.gitignore
> @@ -21,7 +21,6 @@ test_lirc_mode2_user
>  flow_dissector_load
>  test_tcpnotify_user
>  test_libbpf
> -xdping
>  test_cpp
>  *.d
>  *.subskel.h
> diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
> index 78e60040811e..00a986a7d088 100644
> --- a/tools/testing/selftests/bpf/Makefile
> +++ b/tools/testing/selftests/bpf/Makefile
> @@ -111,7 +111,6 @@ TEST_FILES = xsk_prereqs.sh $(wildcard progs/btf_dump_test_case_*.c)
>  # Order correspond to 'make run_tests' order
>  TEST_PROGS := test_kmod.sh \
>  	test_lirc_mode2.sh \
> -	test_xdping.sh \
>  	test_bpftool_build.sh \
>  	test_doc_build.sh \
>  	test_xsk.sh \
> @@ -134,7 +133,6 @@ TEST_GEN_PROGS_EXTENDED = \
>  	xdp_features \
>  	xdp_hw_metadata \
>  	xdp_synproxy \
> -	xdping \
>  	xskxceiver
>  
>  TEST_GEN_FILES += $(TEST_KMODS) liburandom_read.so urandom_read sign-file uprobe_multi
> @@ -320,7 +318,6 @@ $(OUTPUT)/test_tcpnotify_user: $(CGROUP_HELPERS) $(TESTING_HELPERS) $(TRACE_HELP
>  $(OUTPUT)/test_sock_fields: $(CGROUP_HELPERS) $(TESTING_HELPERS)
>  $(OUTPUT)/test_tag: $(TESTING_HELPERS)
>  $(OUTPUT)/test_lirc_mode2_user: $(TESTING_HELPERS)
> -$(OUTPUT)/xdping: $(TESTING_HELPERS)
>  $(OUTPUT)/flow_dissector_load: $(TESTING_HELPERS)
>  $(OUTPUT)/test_maps: $(TESTING_HELPERS)
>  $(OUTPUT)/test_verifier: $(TESTING_HELPERS) $(CAP_HELPERS) $(UNPRIV_HELPERS)
> diff --git a/tools/testing/selftests/bpf/test_xdping.sh b/tools/testing/selftests/bpf/test_xdping.sh
> deleted file mode 100755
> index c3d82e0a7378..000000000000
> --- a/tools/testing/selftests/bpf/test_xdping.sh
> +++ /dev/null
> @@ -1,103 +0,0 @@
> -#!/bin/bash
> -# SPDX-License-Identifier: GPL-2.0
> -
> -# xdping tests
> -#   Here we setup and teardown configuration required to run
> -#   xdping, exercising its options.
> -#
> -#   Setup is similar to test_tunnel tests but without the tunnel.
> -#
> -# Topology:
> -# ---------
> -#     root namespace   |     tc_ns0 namespace
> -#                      |
> -#      ----------      |     ----------
> -#      |  veth1  | --------- |  veth0  |
> -#      ----------    peer    ----------
> -#
> -# Device Configuration
> -# --------------------
> -# Root namespace with BPF
> -# Device names and addresses:
> -#	veth1 IP: 10.1.1.200
> -#	xdp added to veth1, xdpings originate from here.
> -#
> -# Namespace tc_ns0 with BPF
> -# Device names and addresses:
> -#       veth0 IPv4: 10.1.1.100
> -#	For some tests xdping run in server mode here.
> -#
> -
> -readonly TARGET_IP="10.1.1.100"
> -readonly TARGET_NS="xdp_ns0"
> -
> -readonly LOCAL_IP="10.1.1.200"
> -
> -setup()
> -{
> -	ip netns add $TARGET_NS
> -	ip link add veth0 type veth peer name veth1
> -	ip link set veth0 netns $TARGET_NS
> -	ip netns exec $TARGET_NS ip addr add ${TARGET_IP}/24 dev veth0
> -	ip addr add ${LOCAL_IP}/24 dev veth1
> -	ip netns exec $TARGET_NS ip link set veth0 up
> -	ip link set veth1 up
> -}
> -
> -cleanup()
> -{
> -	set +e
> -	ip netns delete $TARGET_NS 2>/dev/null
> -	ip link del veth1 2>/dev/null
> -	if [[ $server_pid -ne 0 ]]; then
> -		kill -TERM $server_pid
> -	fi
> -}
> -
> -test()
> -{
> -	client_args="$1"
> -	server_args="$2"
> -
> -	echo "Test client args '$client_args'; server args '$server_args'"
> -
> -	server_pid=0
> -	if [[ -n "$server_args" ]]; then
> -		ip netns exec $TARGET_NS ./xdping $server_args &
> -		server_pid=$!
> -		sleep 10
> -	fi
> -	./xdping $client_args $TARGET_IP
> -
> -	if [[ $server_pid -ne 0 ]]; then
> -		kill -TERM $server_pid
> -		server_pid=0
> -	fi
> -
> -	echo "Test client args '$client_args'; server args '$server_args': PASS"
> -}
> -
> -set -e
> -
> -server_pid=0
> -
> -trap cleanup EXIT
> -
> -setup
> -
> -for server_args in "" "-I veth0 -s -S" ; do
> -	# client in skb mode
> -	client_args="-I veth1 -S"
> -	test "$client_args" "$server_args"
> -
> -	# client with count of 10 RTT measurements.
> -	client_args="-I veth1 -S -c 10"
> -	test "$client_args" "$server_args"
> -done
> -
> -# Test drv mode
> -test "-I veth1 -N" "-I veth0 -s -N"
> -test "-I veth1 -N -c 10" "-I veth0 -s -N"
> -
> -echo "OK. All tests passed"
> -exit 0
> diff --git a/tools/testing/selftests/bpf/xdping.c b/tools/testing/selftests/bpf/xdping.c
> deleted file mode 100644
> index 9ed8c796645d..000000000000
> --- a/tools/testing/selftests/bpf/xdping.c
> +++ /dev/null
> @@ -1,254 +0,0 @@
> -// SPDX-License-Identifier: GPL-2.0
> -/* Copyright (c) 2019, Oracle and/or its affiliates. All rights reserved. */
> -
> -#include <linux/bpf.h>
> -#include <linux/if_link.h>
> -#include <arpa/inet.h>
> -#include <assert.h>
> -#include <errno.h>
> -#include <signal.h>
> -#include <stdio.h>
> -#include <stdlib.h>
> -#include <string.h>
> -#include <unistd.h>
> -#include <libgen.h>
> -#include <net/if.h>
> -#include <sys/types.h>
> -#include <sys/socket.h>
> -#include <netdb.h>
> -
> -#include "bpf/bpf.h"
> -#include "bpf/libbpf.h"
> -
> -#include "xdping.h"
> -#include "testing_helpers.h"
> -
> -static int ifindex;
> -static __u32 xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
> -
> -static void cleanup(int sig)
> -{
> -	bpf_xdp_detach(ifindex, xdp_flags, NULL);
> -	if (sig)
> -		exit(1);
> -}
> -
> -static int get_stats(int fd, __u16 count, __u32 raddr)
> -{
> -	struct pinginfo pinginfo = { 0 };
> -	char inaddrbuf[INET_ADDRSTRLEN];
> -	struct in_addr inaddr;
> -	__u16 i;
> -
> -	inaddr.s_addr = raddr;
> -
> -	printf("\nXDP RTT data:\n");
> -
> -	if (bpf_map_lookup_elem(fd, &raddr, &pinginfo)) {
> -		perror("bpf_map_lookup elem");
> -		return 1;
> -	}
> -
> -	for (i = 0; i < count; i++) {
> -		if (pinginfo.times[i] == 0)
> -			break;
> -
> -		printf("64 bytes from %s: icmp_seq=%d ttl=64 time=%#.5f ms\n",
> -		       inet_ntop(AF_INET, &inaddr, inaddrbuf,
> -				 sizeof(inaddrbuf)),
> -		       count + i + 1,
> -		       (double)pinginfo.times[i]/1000000);
> -	}
> -
> -	if (i < count) {
> -		fprintf(stderr, "Expected %d samples, got %d.\n", count, i);
> -		return 1;
> -	}
> -
> -	bpf_map_delete_elem(fd, &raddr);
> -
> -	return 0;
> -}
> -
> -static void show_usage(const char *prog)
> -{
> -	fprintf(stderr,
> -		"usage: %s [OPTS] -I interface destination\n\n"
> -		"OPTS:\n"
> -		"    -c count		Stop after sending count requests\n"
> -		"			(default %d, max %d)\n"
> -		"    -I interface	interface name\n"
> -		"    -N			Run in driver mode\n"
> -		"    -s			Server mode\n"
> -		"    -S			Run in skb mode\n",
> -		prog, XDPING_DEFAULT_COUNT, XDPING_MAX_COUNT);
> -}
> -
> -int main(int argc, char **argv)
> -{
> -	__u32 mode_flags = XDP_FLAGS_DRV_MODE | XDP_FLAGS_SKB_MODE;
> -	struct addrinfo *a, hints = { .ai_family = AF_INET };
> -	__u16 count = XDPING_DEFAULT_COUNT;
> -	struct pinginfo pinginfo = { 0 };
> -	const char *optstr = "c:I:NsS";
> -	struct bpf_program *main_prog;
> -	int prog_fd = -1, map_fd = -1;
> -	struct sockaddr_in rin;
> -	struct bpf_object *obj;
> -	struct bpf_map *map;
> -	char *ifname = NULL;
> -	char filename[256];
> -	int opt, ret = 1;
> -	__u32 raddr = 0;
> -	int server = 0;
> -	char cmd[256];
> -
> -	while ((opt = getopt(argc, argv, optstr)) != -1) {
> -		switch (opt) {
> -		case 'c':
> -			count = atoi(optarg);
> -			if (count < 1 || count > XDPING_MAX_COUNT) {
> -				fprintf(stderr,
> -					"min count is 1, max count is %d\n",
> -					XDPING_MAX_COUNT);
> -				return 1;
> -			}
> -			break;
> -		case 'I':
> -			ifname = optarg;
> -			ifindex = if_nametoindex(ifname);
> -			if (!ifindex) {
> -				fprintf(stderr, "Could not get interface %s\n",
> -					ifname);
> -				return 1;
> -			}
> -			break;
> -		case 'N':
> -			xdp_flags |= XDP_FLAGS_DRV_MODE;
> -			break;
> -		case 's':
> -			/* use server program */
> -			server = 1;
> -			break;
> -		case 'S':
> -			xdp_flags |= XDP_FLAGS_SKB_MODE;
> -			break;
> -		default:
> -			show_usage(basename(argv[0]));
> -			return 1;
> -		}
> -	}
> -
> -	if (!ifname) {
> -		show_usage(basename(argv[0]));
> -		return 1;
> -	}
> -	if (!server && optind == argc) {
> -		show_usage(basename(argv[0]));
> -		return 1;
> -	}
> -
> -	if ((xdp_flags & mode_flags) == mode_flags) {
> -		fprintf(stderr, "-N or -S can be specified, not both.\n");
> -		show_usage(basename(argv[0]));
> -		return 1;
> -	}
> -
> -	if (!server) {
> -		/* Only supports IPv4; see hints initialization above. */
> -		if (getaddrinfo(argv[optind], NULL, &hints, &a) || !a) {
> -			fprintf(stderr, "Could not resolve %s\n", argv[optind]);
> -			return 1;
> -		}
> -		memcpy(&rin, a->ai_addr, sizeof(rin));
> -		raddr = rin.sin_addr.s_addr;
> -		freeaddrinfo(a);
> -	}
> -
> -	/* Use libbpf 1.0 API mode */
> -	libbpf_set_strict_mode(LIBBPF_STRICT_ALL);
> -
> -	snprintf(filename, sizeof(filename), "%s_kern.bpf.o", argv[0]);
> -
> -	if (bpf_prog_test_load(filename, BPF_PROG_TYPE_XDP, &obj, &prog_fd)) {
> -		fprintf(stderr, "load of %s failed\n", filename);
> -		return 1;
> -	}
> -
> -	main_prog = bpf_object__find_program_by_name(obj,
> -						     server ? "xdping_server" : "xdping_client");
> -	if (main_prog)
> -		prog_fd = bpf_program__fd(main_prog);
> -	if (!main_prog || prog_fd < 0) {
> -		fprintf(stderr, "could not find xdping program");
> -		return 1;
> -	}
> -
> -	map = bpf_object__next_map(obj, NULL);
> -	if (map)
> -		map_fd = bpf_map__fd(map);
> -	if (!map || map_fd < 0) {
> -		fprintf(stderr, "Could not find ping map");
> -		goto done;
> -	}
> -
> -	signal(SIGINT, cleanup);
> -	signal(SIGTERM, cleanup);
> -
> -	printf("Setting up XDP for %s, please wait...\n", ifname);
> -
> -	printf("XDP setup disrupts network connectivity, hit Ctrl+C to quit\n");
> -
> -	if (bpf_xdp_attach(ifindex, prog_fd, xdp_flags, NULL) < 0) {
> -		fprintf(stderr, "Link set xdp fd failed for %s\n", ifname);
> -		goto done;
> -	}
> -
> -	if (server) {
> -		close(prog_fd);
> -		close(map_fd);
> -		printf("Running server on %s; press Ctrl+C to exit...\n",
> -		       ifname);
> -		do { } while (1);
> -	}
> -
> -	/* Start xdping-ing from last regular ping reply, e.g. for a count
> -	 * of 10 ICMP requests, we start xdping-ing using reply with seq number
> -	 * 10.  The reason the last "real" ping RTT is much higher is that
> -	 * the ping program sees the ICMP reply associated with the last
> -	 * XDP-generated packet, so ping doesn't get a reply until XDP is done.
> -	 */
> -	pinginfo.seq = htons(count);
> -	pinginfo.count = count;
> -
> -	if (bpf_map_update_elem(map_fd, &raddr, &pinginfo, BPF_ANY)) {
> -		fprintf(stderr, "could not communicate with BPF map: %s\n",
> -			strerror(errno));
> -		cleanup(0);
> -		goto done;
> -	}
> -
> -	/* We need to wait for XDP setup to complete. */
> -	sleep(10);
> -
> -	snprintf(cmd, sizeof(cmd), "ping -c %d -I %s %s",
> -		 count, ifname, argv[optind]);
> -
> -	printf("\nNormal ping RTT data\n");
> -	printf("[Ignore final RTT; it is distorted by XDP using the reply]\n");
> -
> -	ret = system(cmd);
> -
> -	if (!ret)
> -		ret = get_stats(map_fd, count, raddr);
> -
> -	cleanup(0);
> -
> -done:
> -	if (prog_fd > 0)
> -		close(prog_fd);
> -	if (map_fd > 0)
> -		close(map_fd);
> -
> -	return ret;
> -}
> 
> ---
> base-commit: b7fb68124aa80db90394236a9a4a6add12f4425d
> change-id: 20260417-xdping-5c2ef5a63899
> 
> Best regards,
> --  
> Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
> 


^ permalink raw reply

* Re: [PATCH] net/stmmac: Fix typos: 'tx_undeflow_irq' -> 'tx_underflow_irq'
From: Jakub Raczynski @ 2026-04-22 14:15 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: netdev, linux-kernel, kuba, davem, andrew+netdev, kernel-janitors,
	linux-arm-kernel, linux-stm32
In-Reply-To: <f1d51362-ca8f-481a-b9c1-400ab6422686@lunn.ch>

[-- Attachment #1: Type: text/plain, Size: 1403 bytes --]

On Wed, Apr 22, 2026 at 02:47:38PM +0200, Andrew Lunn wrote:
> > I don't see anything wrong with it?
> > - naming is correct, same as stmmac_extra_stats from common.h, as it
> >   wouldn't compile otherwise
> > - string length is ok, as max name length is ETH_GSTRING_LEN=32 and it is
> >   not close
> > - ethtool just polls data from driver and in my tests it is ok
> > - all instances of 'undeflow' are changed
> > - 'underflow' semantic is ok, 'undeflow' is just not correct
> > 
> > Please correct me if I am wrong, but imo no issues with this patch.
> 
> ABI
> 
> This name is published as part of the kAPI. You are changing its
> name. User space could be looking for this name, even thought it has a
> typo in it.
> 
>      Andrew
>
I don't think it is? This part of extra stats (struct stmmac_extra_stats) and
is not part of standard ABI from
Documentation/ABI/testing/sysfs-class-net-statistics
nor is mentioned in
Documentation/networking/device_drivers/ethernet/stmicro/stmmac.rst

These extra stats are specific to stmmac driver and most of these are more
than standard
https://www.kernel.org/doc/html/v7.0/networking/statistics.html#c.rtnl_link_stats64
This name does not exist outside stmmac driver, so while some application may
expect this (stmmac specific app), question is should this typo stick?

This type of typo is even mentioned in scripts/spelling.txt.

Regards
Jakub Raczynski

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox