netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net v3 0/3] net: netpoll: fix a memleak and create a selftest
@ 2025-09-05 17:25 Breno Leitao
  2025-09-05 17:25 ` [PATCH net v3 1/3] netpoll: fix incorrect refcount handling causing incorrect cleanup Breno Leitao
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Breno Leitao @ 2025-09-05 17:25 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Shuah Khan, Simon Horman, david decotigny
  Cc: linux-kernel, netdev, linux-kselftest, asantostc, efault, calvin,
	kernel-team, Breno Leitao, stable, jv

Fix a memory leak issue on netpoll and create a netconsole test that exposes
the problem, when run with kmemleak enabled.

This is a merge of two patches I've sent individually and are merged on
the same patchset[1][2].

Link: https://lore.kernel.org/all/20250904-netconsole_torture-v2-0-5775ed5dc366@debian.org/ [1]
Link: https://lore.kernel.org/all/20250902165426.6d6cd172@kernel.org/ [2]

Signed-off-by: Breno Leitao <leitao@debian.org>
---
Changes in v3:
- this patchset is a merge of the fix and the selftest together as
recommended by Jakub.

Changes in v2:
- Reuse the netconsole creation from lib_netcons.sh. Thus, refactoring
  the create_dynamic_target() (Jakub)
- Move the "wait" to after all the messages has been sent.
- Link to v1: https://lore.kernel.org/r/20250902-netconsole_torture-v1-1-03c6066598e9@debian.org

---
Breno Leitao (3):
      netpoll: fix incorrect refcount handling causing incorrect cleanup
      selftest: netcons: refactor target creation
      selftest: netcons: create a torture test

 net/core/netpoll.c                                 |   7 +-
 tools/testing/selftests/drivers/net/Makefile       |   1 +
 .../selftests/drivers/net/lib/sh/lib_netcons.sh    |  30 +++--
 .../selftests/drivers/net/netcons_torture.sh       | 127 +++++++++++++++++++++
 4 files changed, 152 insertions(+), 13 deletions(-)
---
base-commit: d69eb204c255c35abd9e8cb621484e8074c75eaa
change-id: 20250902-netconsole_torture-8fc23f0aca99

Best regards,
--  
Breno Leitao <leitao@debian.org>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH net v3 1/3] netpoll: fix incorrect refcount handling causing incorrect cleanup
  2025-09-05 17:25 [PATCH net v3 0/3] net: netpoll: fix a memleak and create a selftest Breno Leitao
@ 2025-09-05 17:25 ` Breno Leitao
  2025-09-08 10:12   ` Simon Horman
  2025-09-08 20:47   ` Calvin Owens
  2025-09-05 17:25 ` [PATCH net v3 2/3] selftest: netcons: refactor target creation Breno Leitao
  2025-09-05 17:25 ` [PATCH net v3 3/3] selftest: netcons: create a torture test Breno Leitao
  2 siblings, 2 replies; 18+ messages in thread
From: Breno Leitao @ 2025-09-05 17:25 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Shuah Khan, Simon Horman, david decotigny
  Cc: linux-kernel, netdev, linux-kselftest, asantostc, efault, calvin,
	kernel-team, Breno Leitao, stable, jv

commit efa95b01da18 ("netpoll: fix use after free") incorrectly
ignored the refcount and prematurely set dev->npinfo to NULL during
netpoll cleanup, leading to improper behavior and memory leaks.

Scenario causing lack of proper cleanup:

1) A netpoll is associated with a NIC (e.g., eth0) and netdev->npinfo is
   allocated, and refcnt = 1
   - Keep in mind that npinfo is shared among all netpoll instances. In
     this case, there is just one.

2) Another netpoll is also associated with the same NIC and
   npinfo->refcnt += 1.
   - Now dev->npinfo->refcnt = 2;
   - There is just one npinfo associated to the netdev.

3) When the first netpolls goes to clean up:
   - The first cleanup succeeds and clears np->dev->npinfo, ignoring
     refcnt.
     - It basically calls `RCU_INIT_POINTER(np->dev->npinfo, NULL);`
   - Set dev->npinfo = NULL, without proper cleanup
   - No ->ndo_netpoll_cleanup() is either called

4) Now the second target tries to clean up
   - The second cleanup fails because np->dev->npinfo is already NULL.
     * In this case, ops->ndo_netpoll_cleanup() was never called, and
       the skb pool is not cleaned as well (for the second netpoll
       instance)
  - This leaks npinfo and skbpool skbs, which is clearly reported by
    kmemleak.

Revert commit efa95b01da18 ("netpoll: fix use after free") and adds
clarifying comments emphasizing that npinfo cleanup should only happen
once the refcount reaches zero, ensuring stable and correct netpoll
behavior.

Cc: stable@vger.kernel.org
Cc: jv@jvosburgh.net
Fixes: efa95b01da18 ("netpoll: fix use after free")
Signed-off-by: Breno Leitao <leitao@debian.org>
---
 net/core/netpoll.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 5f65b62346d4e..19676cd379640 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -815,6 +815,10 @@ static void __netpoll_cleanup(struct netpoll *np)
 	if (!npinfo)
 		return;
 
+	/* At this point, there is a single npinfo instance per netdevice, and
+	 * its refcnt tracks how many netpoll structures are linked to it. We
+	 * only perform npinfo cleanup when the refcnt decrements to zero.
+	 */
 	if (refcount_dec_and_test(&npinfo->refcnt)) {
 		const struct net_device_ops *ops;
 
@@ -824,8 +828,7 @@ static void __netpoll_cleanup(struct netpoll *np)
 
 		RCU_INIT_POINTER(np->dev->npinfo, NULL);
 		call_rcu(&npinfo->rcu, rcu_cleanup_netpoll_info);
-	} else
-		RCU_INIT_POINTER(np->dev->npinfo, NULL);
+	}
 
 	skb_pool_flush(np);
 }

-- 
2.47.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net v3 2/3] selftest: netcons: refactor target creation
  2025-09-05 17:25 [PATCH net v3 0/3] net: netpoll: fix a memleak and create a selftest Breno Leitao
  2025-09-05 17:25 ` [PATCH net v3 1/3] netpoll: fix incorrect refcount handling causing incorrect cleanup Breno Leitao
@ 2025-09-05 17:25 ` Breno Leitao
  2025-09-08 10:13   ` Simon Horman
  2025-09-05 17:25 ` [PATCH net v3 3/3] selftest: netcons: create a torture test Breno Leitao
  2 siblings, 1 reply; 18+ messages in thread
From: Breno Leitao @ 2025-09-05 17:25 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Shuah Khan, Simon Horman, david decotigny
  Cc: linux-kernel, netdev, linux-kselftest, asantostc, efault, calvin,
	kernel-team, Breno Leitao, stable

Extract the netconsole target creation from create_dynamic_target(), by
moving it from create_dynamic_target() into a new helper function. This
enables other tests to use the creation of netconsole targets with
arbitrary parameters and no sleep.

The new helper will be utilized by forthcoming torture-type selftests
that require dynamic target management.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 .../selftests/drivers/net/lib/sh/lib_netcons.sh    | 30 ++++++++++++++--------
 1 file changed, 19 insertions(+), 11 deletions(-)

diff --git a/tools/testing/selftests/drivers/net/lib/sh/lib_netcons.sh b/tools/testing/selftests/drivers/net/lib/sh/lib_netcons.sh
index b6071e80ebbb6..4fc102407e3a6 100644
--- a/tools/testing/selftests/drivers/net/lib/sh/lib_netcons.sh
+++ b/tools/testing/selftests/drivers/net/lib/sh/lib_netcons.sh
@@ -113,31 +113,39 @@ function set_network() {
 	configure_ip
 }
 
-function create_dynamic_target() {
-	local FORMAT=${1:-"extended"}
+function _create_dynamic_target() {
+	local FORMAT="${1:?FORMAT parameter required}"
+	local NCPATH="${2:?NCPATH parameter required}"
 
 	DSTMAC=$(ip netns exec "${NAMESPACE}" \
 		 ip link show "${DSTIF}" | awk '/ether/ {print $2}')
 
 	# Create a dynamic target
-	mkdir "${NETCONS_PATH}"
+	mkdir "${NCPATH}"
 
-	echo "${DSTIP}" > "${NETCONS_PATH}"/remote_ip
-	echo "${SRCIP}" > "${NETCONS_PATH}"/local_ip
-	echo "${DSTMAC}" > "${NETCONS_PATH}"/remote_mac
-	echo "${SRCIF}" > "${NETCONS_PATH}"/dev_name
+	echo "${DSTIP}" > "${NCPATH}"/remote_ip
+	echo "${SRCIP}" > "${NCPATH}"/local_ip
+	echo "${DSTMAC}" > "${NCPATH}"/remote_mac
+	echo "${SRCIF}" > "${NCPATH}"/dev_name
 
 	if [ "${FORMAT}" == "basic" ]
 	then
 		# Basic target does not support release
-		echo 0 > "${NETCONS_PATH}"/release
-		echo 0 > "${NETCONS_PATH}"/extended
+		echo 0 > "${NCPATH}"/release
+		echo 0 > "${NCPATH}"/extended
 	elif [ "${FORMAT}" == "extended" ]
 	then
-		echo 1 > "${NETCONS_PATH}"/extended
+		echo 1 > "${NCPATH}"/extended
 	fi
 
-	echo 1 > "${NETCONS_PATH}"/enabled
+	echo 1 > "${NCPATH}"/enabled
+
+}
+
+function create_dynamic_target() {
+	local FORMAT=${1:-"extended"}
+	local NCPATH=${2:-"$NETCONS_PATH"}
+	_create_dynamic_target "${FORMAT}" "${NCPATH}"
 
 	# This will make sure that the kernel was able to
 	# load the netconsole driver configuration. The console message

-- 
2.47.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net v3 3/3] selftest: netcons: create a torture test
  2025-09-05 17:25 [PATCH net v3 0/3] net: netpoll: fix a memleak and create a selftest Breno Leitao
  2025-09-05 17:25 ` [PATCH net v3 1/3] netpoll: fix incorrect refcount handling causing incorrect cleanup Breno Leitao
  2025-09-05 17:25 ` [PATCH net v3 2/3] selftest: netcons: refactor target creation Breno Leitao
@ 2025-09-05 17:25 ` Breno Leitao
  2025-09-08 10:13   ` Simon Horman
  2 siblings, 1 reply; 18+ messages in thread
From: Breno Leitao @ 2025-09-05 17:25 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Shuah Khan, Simon Horman, david decotigny
  Cc: linux-kernel, netdev, linux-kselftest, asantostc, efault, calvin,
	kernel-team, Breno Leitao, stable

Create a netconsole test that puts a lot of pressure on the netconsole
list manipulation. Do it by creating dynamic targets and deleting
targets while messages are being sent. Also put interface down while the
messages are being sent, as creating parallel targets.

The code launches three background jobs on distinct schedules:

 * Toggle netcons target every 30 iterations
 * create and delete random_target every 50 iterations
 * toggle iface every 70 iterations

This creates multiple concurrency sources that interact with netconsole
states. This is good practice to simulate stress, and exercise netpoll
and netconsole locks.

This test already found an issue as reported in [1]

Link: https://lore.kernel.org/all/20250901-netpoll_memleak-v1-1-34a181977dfc@debian.org/ [1]
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Andre Carvalho <asantostc@gmail.com>
---
 tools/testing/selftests/drivers/net/Makefile       |   1 +
 .../selftests/drivers/net/netcons_torture.sh       | 127 +++++++++++++++++++++
 2 files changed, 128 insertions(+)

diff --git a/tools/testing/selftests/drivers/net/Makefile b/tools/testing/selftests/drivers/net/Makefile
index 984ece05f7f92..2b253b1ff4f38 100644
--- a/tools/testing/selftests/drivers/net/Makefile
+++ b/tools/testing/selftests/drivers/net/Makefile
@@ -17,6 +17,7 @@ TEST_PROGS := \
 	netcons_fragmented_msg.sh \
 	netcons_overflow.sh \
 	netcons_sysdata.sh \
+	netcons_torture.sh \
 	netpoll_basic.py \
 	ping.py \
 	queues.py \
diff --git a/tools/testing/selftests/drivers/net/netcons_torture.sh b/tools/testing/selftests/drivers/net/netcons_torture.sh
new file mode 100755
index 0000000000000..3608051d475ff
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/netcons_torture.sh
@@ -0,0 +1,127 @@
+#!/usr/bin/env bash
+# SPDX-License-Identifier: GPL-2.0
+
+# Repeatedly send kernel messages, toggles netconsole targets on and off,
+# creates and deletes targets in parallel, and toggles the source interface to
+# simulate stress conditions.
+#
+# This test aims to verify the robustness of netconsole under dynamic
+# configurations and concurrent operations.
+#
+# The major goal is to run this test with LOCKDEP, Kmemleak and KASAN to make
+# sure no issues is reported.
+#
+# Author: Breno Leitao <leitao@debian.org>
+
+set -euo pipefail
+
+SCRIPTDIR=$(dirname "$(readlink -e "${BASH_SOURCE[0]}")")
+
+source "${SCRIPTDIR}"/lib/sh/lib_netcons.sh
+
+# Number of times the main loop run
+ITERATIONS=${1:-1000}
+
+# Only test extended format
+FORMAT="extended"
+# And ipv6 only
+IP_VERSION="ipv6"
+
+# Create, enable and delete some targets.
+create_and_delete_random_target() {
+	COUNT=2
+	RND_PREFIX=$(mktemp -u netcons_rnd_XXXX_)
+
+	if [ -d "${NETCONS_CONFIGFS}/${RND_PREFIX}${COUNT}"  ] || \
+	   [ -d "${NETCONS_CONFIGFS}/${RND_PREFIX}0" ]; then
+		echo "Function didn't finish yet, skipping it." >&2
+		return
+	fi
+
+	# enable COUNT targets
+	for i in $(seq ${COUNT})
+	do
+		RND_TARGET="${RND_PREFIX}"${i}
+		RND_TARGET_PATH="${NETCONS_CONFIGFS}"/"${RND_TARGET}"
+
+		# Basic population so the target can come up
+		_create_dynamic_target "${FORMAT}" "${RND_TARGET_PATH}"
+	done
+
+	echo "netconsole selftest: ${COUNT} additional targets were created" > /dev/kmsg
+	# disable them all
+	for i in $(seq ${COUNT})
+	do
+		RND_TARGET="${RND_PREFIX}"${i}
+		RND_TARGET_PATH="${NETCONS_CONFIGFS}"/"${RND_TARGET}"
+		echo 0 > "${RND_TARGET_PATH}"/enabled
+		rmdir "${RND_TARGET_PATH}"
+	done
+}
+
+# Disable and enable the target mid-air, while messages
+# are being transmitted.
+toggle_netcons_target() {
+	for i in $(seq 2)
+	do
+		if [ ! -d "${NETCONS_PATH}" ]
+		then
+			break
+		fi
+		echo 0 > "${NETCONS_PATH}"/enabled 2> /dev/null || true
+		# Try to enable a bit harder, given it might fail to enable
+		# Write to `enabled` might fail depending on the lock, which is
+		# highly contentious here
+		for _ in $(seq 5)
+		do
+			echo 1 > "${NETCONS_PATH}"/enabled 2> /dev/null || true
+		done
+	done
+}
+
+toggle_iface(){
+	ip link set "${SRCIF}" down
+	ip link set "${SRCIF}" up
+}
+
+# Start here
+
+modprobe netdevsim 2> /dev/null || true
+modprobe netconsole 2> /dev/null || true
+
+# Check for basic system dependency and exit if not found
+check_for_dependencies
+# Set current loglevel to KERN_INFO(6), and default to KERN_NOTICE(5)
+echo "6 5" > /proc/sys/kernel/printk
+# Remove the namespace, interfaces and netconsole target on exit
+trap cleanup EXIT
+# Create one namespace and two interfaces
+set_network "${IP_VERSION}"
+# Create a dynamic target for netconsole
+create_dynamic_target "${FORMAT}"
+
+for i in $(seq "$ITERATIONS")
+do
+	for _ in $(seq 10)
+	do
+		echo "${MSG}: ${TARGET} ${i}" > /dev/kmsg
+	done
+	wait
+
+	if (( i % 30 == 0 )); then
+		toggle_netcons_target &
+	fi
+
+	if (( i % 50 == 0 )); then
+		# create some targets, enable them, send msg and disable
+		# all in a parallel thread
+		create_and_delete_random_target &
+	fi
+
+	if (( i % 70 == 0 )); then
+		toggle_iface &
+	fi
+done
+wait
+
+exit "${ksft_pass}"

-- 
2.47.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH net v3 1/3] netpoll: fix incorrect refcount handling causing incorrect cleanup
  2025-09-05 17:25 ` [PATCH net v3 1/3] netpoll: fix incorrect refcount handling causing incorrect cleanup Breno Leitao
@ 2025-09-08 10:12   ` Simon Horman
  2025-09-08 20:47   ` Calvin Owens
  1 sibling, 0 replies; 18+ messages in thread
From: Simon Horman @ 2025-09-08 10:12 UTC (permalink / raw)
  To: Breno Leitao
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Shuah Khan, david decotigny, linux-kernel, netdev,
	linux-kselftest, asantostc, efault, calvin, kernel-team, stable,
	jv

On Fri, Sep 05, 2025 at 10:25:07AM -0700, Breno Leitao wrote:
> commit efa95b01da18 ("netpoll: fix use after free") incorrectly
> ignored the refcount and prematurely set dev->npinfo to NULL during
> netpoll cleanup, leading to improper behavior and memory leaks.
> 
> Scenario causing lack of proper cleanup:
> 
> 1) A netpoll is associated with a NIC (e.g., eth0) and netdev->npinfo is
>    allocated, and refcnt = 1
>    - Keep in mind that npinfo is shared among all netpoll instances. In
>      this case, there is just one.
> 
> 2) Another netpoll is also associated with the same NIC and
>    npinfo->refcnt += 1.
>    - Now dev->npinfo->refcnt = 2;
>    - There is just one npinfo associated to the netdev.
> 
> 3) When the first netpolls goes to clean up:
>    - The first cleanup succeeds and clears np->dev->npinfo, ignoring
>      refcnt.
>      - It basically calls `RCU_INIT_POINTER(np->dev->npinfo, NULL);`
>    - Set dev->npinfo = NULL, without proper cleanup
>    - No ->ndo_netpoll_cleanup() is either called
> 
> 4) Now the second target tries to clean up
>    - The second cleanup fails because np->dev->npinfo is already NULL.
>      * In this case, ops->ndo_netpoll_cleanup() was never called, and
>        the skb pool is not cleaned as well (for the second netpoll
>        instance)
>   - This leaks npinfo and skbpool skbs, which is clearly reported by
>     kmemleak.
> 
> Revert commit efa95b01da18 ("netpoll: fix use after free") and adds
> clarifying comments emphasizing that npinfo cleanup should only happen
> once the refcount reaches zero, ensuring stable and correct netpoll
> behavior.
> 
> Cc: stable@vger.kernel.org
> Cc: jv@jvosburgh.net
> Fixes: efa95b01da18 ("netpoll: fix use after free")
> Signed-off-by: Breno Leitao <leitao@debian.org>

Reviewed-by: Simon Horman <horms@kernel.org>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net v3 2/3] selftest: netcons: refactor target creation
  2025-09-05 17:25 ` [PATCH net v3 2/3] selftest: netcons: refactor target creation Breno Leitao
@ 2025-09-08 10:13   ` Simon Horman
  0 siblings, 0 replies; 18+ messages in thread
From: Simon Horman @ 2025-09-08 10:13 UTC (permalink / raw)
  To: Breno Leitao
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Shuah Khan, david decotigny, linux-kernel, netdev,
	linux-kselftest, asantostc, efault, calvin, kernel-team, stable

On Fri, Sep 05, 2025 at 10:25:08AM -0700, Breno Leitao wrote:
> Extract the netconsole target creation from create_dynamic_target(), by
> moving it from create_dynamic_target() into a new helper function. This
> enables other tests to use the creation of netconsole targets with
> arbitrary parameters and no sleep.
> 
> The new helper will be utilized by forthcoming torture-type selftests
> that require dynamic target management.
> 
> Signed-off-by: Breno Leitao <leitao@debian.org>

Reviewed-by: Simon Horman <horms@kernel.org>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net v3 3/3] selftest: netcons: create a torture test
  2025-09-05 17:25 ` [PATCH net v3 3/3] selftest: netcons: create a torture test Breno Leitao
@ 2025-09-08 10:13   ` Simon Horman
  0 siblings, 0 replies; 18+ messages in thread
From: Simon Horman @ 2025-09-08 10:13 UTC (permalink / raw)
  To: Breno Leitao
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Shuah Khan, david decotigny, linux-kernel, netdev,
	linux-kselftest, asantostc, efault, calvin, kernel-team, stable

On Fri, Sep 05, 2025 at 10:25:09AM -0700, Breno Leitao wrote:
> Create a netconsole test that puts a lot of pressure on the netconsole
> list manipulation. Do it by creating dynamic targets and deleting
> targets while messages are being sent. Also put interface down while the
> messages are being sent, as creating parallel targets.
> 
> The code launches three background jobs on distinct schedules:
> 
>  * Toggle netcons target every 30 iterations
>  * create and delete random_target every 50 iterations
>  * toggle iface every 70 iterations
> 
> This creates multiple concurrency sources that interact with netconsole
> states. This is good practice to simulate stress, and exercise netpoll
> and netconsole locks.
> 
> This test already found an issue as reported in [1]
> 
> Link: https://lore.kernel.org/all/20250901-netpoll_memleak-v1-1-34a181977dfc@debian.org/ [1]
> Signed-off-by: Breno Leitao <leitao@debian.org>
> Reviewed-by: Andre Carvalho <asantostc@gmail.com>

Reviewed-by: Simon Horman <horms@kernel.org>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net v3 1/3] netpoll: fix incorrect refcount handling causing incorrect cleanup
  2025-09-05 17:25 ` [PATCH net v3 1/3] netpoll: fix incorrect refcount handling causing incorrect cleanup Breno Leitao
  2025-09-08 10:12   ` Simon Horman
@ 2025-09-08 20:47   ` Calvin Owens
  2025-09-09  1:29     ` Jakub Kicinski
  2025-09-09 14:05     ` Breno Leitao
  1 sibling, 2 replies; 18+ messages in thread
From: Calvin Owens @ 2025-09-08 20:47 UTC (permalink / raw)
  To: Breno Leitao
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Shuah Khan, Simon Horman, david decotigny,
	linux-kernel, netdev, linux-kselftest, asantostc, efault,
	kernel-team, stable, jv

On Friday 09/05 at 10:25 -0700, Breno Leitao wrote:
> commit efa95b01da18 ("netpoll: fix use after free") incorrectly
> ignored the refcount and prematurely set dev->npinfo to NULL during
> netpoll cleanup, leading to improper behavior and memory leaks.
> 
> Scenario causing lack of proper cleanup:
> 
> 1) A netpoll is associated with a NIC (e.g., eth0) and netdev->npinfo is
>    allocated, and refcnt = 1
>    - Keep in mind that npinfo is shared among all netpoll instances. In
>      this case, there is just one.
> 
> 2) Another netpoll is also associated with the same NIC and
>    npinfo->refcnt += 1.
>    - Now dev->npinfo->refcnt = 2;
>    - There is just one npinfo associated to the netdev.
> 
> 3) When the first netpolls goes to clean up:
>    - The first cleanup succeeds and clears np->dev->npinfo, ignoring
>      refcnt.
>      - It basically calls `RCU_INIT_POINTER(np->dev->npinfo, NULL);`
>    - Set dev->npinfo = NULL, without proper cleanup
>    - No ->ndo_netpoll_cleanup() is either called
> 
> 4) Now the second target tries to clean up
>    - The second cleanup fails because np->dev->npinfo is already NULL.
>      * In this case, ops->ndo_netpoll_cleanup() was never called, and
>        the skb pool is not cleaned as well (for the second netpoll
>        instance)
>   - This leaks npinfo and skbpool skbs, which is clearly reported by
>     kmemleak.
> 
> Revert commit efa95b01da18 ("netpoll: fix use after free") and adds
> clarifying comments emphasizing that npinfo cleanup should only happen
> once the refcount reaches zero, ensuring stable and correct netpoll
> behavior.

This makes sense to me.

Just curious, did you try the original OOPS reproducer?
https://lore.kernel.org/lkml/96b940137a50e5c387687bb4f57de8b0435a653f.1404857349.git.decot@googlers.com/

I wonder if there might be a demon lurking in bonding+netpoll that this
was papering over? Not a reason not to fix the leaks IMO, I'm just
curious, I don't want to spend time on it if you already did :)

The discussion on v1 isn't enlightening either:
https://lore.kernel.org/lkml/0f692012238337f2c40893319830ae042523ce18.1404172155.git.decot@googlers.com/

Thanks,
Calvin

> Cc: stable@vger.kernel.org
> Cc: jv@jvosburgh.net
> Fixes: efa95b01da18 ("netpoll: fix use after free")
> Signed-off-by: Breno Leitao <leitao@debian.org>
> ---
>  net/core/netpoll.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/net/core/netpoll.c b/net/core/netpoll.c
> index 5f65b62346d4e..19676cd379640 100644
> --- a/net/core/netpoll.c
> +++ b/net/core/netpoll.c
> @@ -815,6 +815,10 @@ static void __netpoll_cleanup(struct netpoll *np)
>  	if (!npinfo)
>  		return;
>  
> +	/* At this point, there is a single npinfo instance per netdevice, and
> +	 * its refcnt tracks how many netpoll structures are linked to it. We
> +	 * only perform npinfo cleanup when the refcnt decrements to zero.
> +	 */
>  	if (refcount_dec_and_test(&npinfo->refcnt)) {
>  		const struct net_device_ops *ops;
>  
> @@ -824,8 +828,7 @@ static void __netpoll_cleanup(struct netpoll *np)
>  
>  		RCU_INIT_POINTER(np->dev->npinfo, NULL);
>  		call_rcu(&npinfo->rcu, rcu_cleanup_netpoll_info);
> -	} else
> -		RCU_INIT_POINTER(np->dev->npinfo, NULL);
> +	}
>  
>  	skb_pool_flush(np);
>  }
> 
> -- 
> 2.47.3
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net v3 1/3] netpoll: fix incorrect refcount handling causing incorrect cleanup
  2025-09-08 20:47   ` Calvin Owens
@ 2025-09-09  1:29     ` Jakub Kicinski
  2025-09-09 20:17       ` Breno Leitao
  2025-09-10  0:18       ` Jay Vosburgh
  2025-09-09 14:05     ` Breno Leitao
  1 sibling, 2 replies; 18+ messages in thread
From: Jakub Kicinski @ 2025-09-09  1:29 UTC (permalink / raw)
  To: Calvin Owens
  Cc: Breno Leitao, Andrew Lunn, David S. Miller, Eric Dumazet,
	Paolo Abeni, Shuah Khan, Simon Horman, david decotigny,
	linux-kernel, netdev, linux-kselftest, asantostc, efault,
	kernel-team, stable, jv

On Mon, 8 Sep 2025 13:47:24 -0700 Calvin Owens wrote:
> I wonder if there might be a demon lurking in bonding+netpoll that this
> was papering over? Not a reason not to fix the leaks IMO, I'm just
> curious, I don't want to spend time on it if you already did :)

+1, I also feel like it'd be good to have some bonding tests in place
when we're removing a hack added specifically for bonding.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net v3 1/3] netpoll: fix incorrect refcount handling causing incorrect cleanup
  2025-09-08 20:47   ` Calvin Owens
  2025-09-09  1:29     ` Jakub Kicinski
@ 2025-09-09 14:05     ` Breno Leitao
  2025-09-10  0:40       ` Calvin Owens
  1 sibling, 1 reply; 18+ messages in thread
From: Breno Leitao @ 2025-09-09 14:05 UTC (permalink / raw)
  To: Calvin Owens
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Shuah Khan, Simon Horman, david decotigny,
	linux-kernel, netdev, linux-kselftest, asantostc, efault,
	kernel-team, stable, jv

On Mon, Sep 08, 2025 at 01:47:24PM -0700, Calvin Owens wrote:
> On Friday 09/05 at 10:25 -0700, Breno Leitao wrote:
> > commit efa95b01da18 ("netpoll: fix use after free") incorrectly
> > ignored the refcount and prematurely set dev->npinfo to NULL during
> > netpoll cleanup, leading to improper behavior and memory leaks.
> > 
> > Scenario causing lack of proper cleanup:
> > 
> > 1) A netpoll is associated with a NIC (e.g., eth0) and netdev->npinfo is
> >    allocated, and refcnt = 1
> >    - Keep in mind that npinfo is shared among all netpoll instances. In
> >      this case, there is just one.
> > 
> > 2) Another netpoll is also associated with the same NIC and
> >    npinfo->refcnt += 1.
> >    - Now dev->npinfo->refcnt = 2;
> >    - There is just one npinfo associated to the netdev.
> > 
> > 3) When the first netpolls goes to clean up:
> >    - The first cleanup succeeds and clears np->dev->npinfo, ignoring
> >      refcnt.
> >      - It basically calls `RCU_INIT_POINTER(np->dev->npinfo, NULL);`
> >    - Set dev->npinfo = NULL, without proper cleanup
> >    - No ->ndo_netpoll_cleanup() is either called
> > 
> > 4) Now the second target tries to clean up
> >    - The second cleanup fails because np->dev->npinfo is already NULL.
> >      * In this case, ops->ndo_netpoll_cleanup() was never called, and
> >        the skb pool is not cleaned as well (for the second netpoll
> >        instance)
> >   - This leaks npinfo and skbpool skbs, which is clearly reported by
> >     kmemleak.
> > 
> > Revert commit efa95b01da18 ("netpoll: fix use after free") and adds
> > clarifying comments emphasizing that npinfo cleanup should only happen
> > once the refcount reaches zero, ensuring stable and correct netpoll
> > behavior.
> 
> This makes sense to me.
> 
> Just curious, did you try the original OOPS reproducer?
> https://lore.kernel.org/lkml/96b940137a50e5c387687bb4f57de8b0435a653f.1404857349.git.decot@googlers.com/

Yes, but I have not been able to reproduce the problem at all.
I've have tested it using netdevsim, and here is a quick log of what I
run:

	+ modprobe netconsole
	+ modprobe bonding mode=4
	[   86.540950] Warning: miimon must be specified, otherwise bonding will not detect link failure, speed and duplex which are essential for 802.3ad operation
	[   86.541617] Forcing miimon to 100msec
	[   86.541893] MII link monitoring set to 100 ms
	+ echo +bond0
	[   86.547802] bonding: bond0 is being created...
	+ ifconfig bond0 192.168.56.3 up
	+ mkdir /sys/kernel/config/netconsole/blah
	+ echo 0
	[   86.614772] netconsole: network logging has already stopped
	./run.sh: line 19: echo: write error: Invalid argument
	+ echo bond0
	+ echo 192.168.56.42
	+ echo 1
	[   86.622318] netconsole: netconsole: local port 6665
	[   86.622550] netconsole: netconsole: local IPv4 address 0.0.0.0
	[   86.622819] netconsole: netconsole: interface name 'bond0'
	[   86.623038] netconsole: netconsole: local ethernet address '00:00:00:00:00:00'
	[   86.623466] netconsole: netconsole: remote port 6666
	[   86.623675] netconsole: netconsole: remote IPv4 address 192.168.56.42
	[   86.623924] netconsole: netconsole: remote ethernet address ff:ff:ff:ff:ff:ff
	[   86.624264] netpoll: netconsole: local IP 192.168.56.3
	[   86.643174] netconsole: network logging started
	+ ifenslave bond0 eth1
	[   86.659899] bond0: (slave eth1): Enslaving as a backup interface with a down link
	+ ifenslave bond0 eth2
	[   86.687630] bond0: (slave eth2): Enslaving as a backup interface with a down link
	+ sleep 3
	+ ifenslave -d bond0 eth1
	[   89.735701] bond0: (slave eth1): Releasing backup interface
	[   89.737239] bond0: (slave eth1): the permanent HWaddr of slave - 06:44:84:94:87:c7 - is still in use by bond - set the HWaddr of slave to a different address to avoid conflicts
	+ sleep 1
	+ echo -bond0
	[   90.798676] bonding: bond0 is being deleted...
	[   90.815595] netconsole: network logging stopped on interface bond0 as it unregistered
	[   90.816416] bond0 (unregistering): (slave eth2): Releasing backup interface
	[   90.863054] bond0 (unregistering): Released all slaves
	+ ls -lR /
	+ tail -30
	<snip>

	+ echo +bond0
	./run.sh: line 39: /sys/class/net/bonding_masters: Permission denied
	+ ifconfig bond0 192.168.56.3 up
	SIOCSIFADDR: No such device
	bond0: ERROR while getting interface flags: No such device
	bond0: ERROR while

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net v3 1/3] netpoll: fix incorrect refcount handling causing incorrect cleanup
  2025-09-09  1:29     ` Jakub Kicinski
@ 2025-09-09 20:17       ` Breno Leitao
  2025-09-09 23:16         ` Jakub Kicinski
  2025-09-10  0:18       ` Jay Vosburgh
  1 sibling, 1 reply; 18+ messages in thread
From: Breno Leitao @ 2025-09-09 20:17 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Calvin Owens, Andrew Lunn, David S. Miller, Eric Dumazet,
	Paolo Abeni, Shuah Khan, Simon Horman, david decotigny,
	linux-kernel, netdev, linux-kselftest, asantostc, efault,
	kernel-team, stable, jv

On Mon, Sep 08, 2025 at 06:29:58PM -0700, Jakub Kicinski wrote:
> On Mon, 8 Sep 2025 13:47:24 -0700 Calvin Owens wrote:
> > I wonder if there might be a demon lurking in bonding+netpoll that this
> > was papering over? Not a reason not to fix the leaks IMO, I'm just
> > curious, I don't want to spend time on it if you already did :)
> 
> +1, I also feel like it'd be good to have some bonding tests in place
> when we're removing a hack added specifically for bonding.

Do you prefer to have a separated bonding selftest, or, is it better to
add some bond operations in the torture selftest?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net v3 1/3] netpoll: fix incorrect refcount handling causing incorrect cleanup
  2025-09-09 20:17       ` Breno Leitao
@ 2025-09-09 23:16         ` Jakub Kicinski
  2025-09-10 14:12           ` Breno Leitao
  0 siblings, 1 reply; 18+ messages in thread
From: Jakub Kicinski @ 2025-09-09 23:16 UTC (permalink / raw)
  To: Breno Leitao
  Cc: Calvin Owens, Andrew Lunn, David S. Miller, Eric Dumazet,
	Paolo Abeni, Shuah Khan, Simon Horman, david decotigny,
	linux-kernel, netdev, linux-kselftest, asantostc, efault,
	kernel-team, stable, jv

On Tue, 9 Sep 2025 13:17:27 -0700 Breno Leitao wrote:
> On Mon, Sep 08, 2025 at 06:29:58PM -0700, Jakub Kicinski wrote:
> > On Mon, 8 Sep 2025 13:47:24 -0700 Calvin Owens wrote:  
> > > I wonder if there might be a demon lurking in bonding+netpoll that this
> > > was papering over? Not a reason not to fix the leaks IMO, I'm just
> > > curious, I don't want to spend time on it if you already did :)  
> > 
> > +1, I also feel like it'd be good to have some bonding tests in place
> > when we're removing a hack added specifically for bonding.  
> 
> Do you prefer to have a separated bonding selftest, or, is it better to
> add some bond operations in the torture selftest?

Normal test is preferable, given the flakiness rate and patch volume
I'm a bit scared of randomized testing as part of CI.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net v3 1/3] netpoll: fix incorrect refcount handling causing incorrect cleanup
  2025-09-09  1:29     ` Jakub Kicinski
  2025-09-09 20:17       ` Breno Leitao
@ 2025-09-10  0:18       ` Jay Vosburgh
  2025-09-10 14:07         ` Breno Leitao
  1 sibling, 1 reply; 18+ messages in thread
From: Jay Vosburgh @ 2025-09-10  0:18 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Calvin Owens, Breno Leitao, Andrew Lunn, David S. Miller,
	Eric Dumazet, Paolo Abeni, Shuah Khan, Simon Horman,
	david decotigny, linux-kernel, netdev, linux-kselftest, asantostc,
	efault, kernel-team, stable

Jakub Kicinski <kuba@kernel.org> wrote:

>On Mon, 8 Sep 2025 13:47:24 -0700 Calvin Owens wrote:
>> I wonder if there might be a demon lurking in bonding+netpoll that this
>> was papering over? Not a reason not to fix the leaks IMO, I'm just
>> curious, I don't want to spend time on it if you already did :)
>
>+1, I also feel like it'd be good to have some bonding tests in place
>when we're removing a hack added specifically for bonding.

	I'll disclaimer this by saying up front that I'm not super
familiar with the innards of netpoll.

	That said, I looked at commit efa95b01da18 ("netpoll: fix use
after free") and the relevant upstream discussion, and I'm not sure the
assertion that "After a bonding master reclaims the netpoll info struct,
slaves could still hold a pointer to the reclaimed data" is correct.

	I'm not sure the efa9 patch's reference count math is
correct (more on that below).

	Second, I'm a bit unsure what's going on with the struct netpoll
*np parameter of __netpoll_setup for the second and subsequent netpoll
instances (i.e., second and later call), as the function will
unconditionally do

	npinfo->netpoll = np;

	which it seems like would overwrite the "np" supplied by any
prior calls to __netpoll_setup.  In bonding, slave_enable_netpoll()
stashes the "np" it allocates as slave->np, and slave_disable_netpoll
relies on __netpoll_free to free it, so I don't think it's lost, but it
seems like netpoll internally only tracks one of these at a time,
regardless of the reference count.

	On the reference counting, the upstream example from the prior
discussion includes:

    mkdir /sys/kernel/config/netconsole/blah
    echo 0 > /sys/kernel/config/netconsole/blah/enabled
    echo bond0 > /sys/kernel/config/netconsole/blah/dev_name
    echo 192.168.56.42 > /sys/kernel/config/netconsole/blah/remote_ip
    echo 1 > /sys/kernel/config/netconsole/blah/enabled
    # npinfo refcnt ->1
    ifenslave bond0 eth1
    # npinfo refcnt ->2
    ifenslave bond0 eth0
    # (this should be optional, preventing ndo_cleanup_nepoll below)
    # npinfo refcnt ->3

	I'm suspicious of the refcnt values here; both then and now, the
npinfo for each of the relevant interfaces is a separate per-interface
allocation in __netpoll_setup, so I'm not sure what exactly is supposed
to be getting a refcnt of 3.

	If there are two netpoll instances using the slave in question
(either directly or via the bond itself), then clearing the
np->dev->npinfo pointer looks like the wrong thing to do until the last
reference is released.

	-J

---
	-Jay Vosburgh, jv@jvosburgh.net

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net v3 1/3] netpoll: fix incorrect refcount handling causing incorrect cleanup
  2025-09-09 14:05     ` Breno Leitao
@ 2025-09-10  0:40       ` Calvin Owens
  0 siblings, 0 replies; 18+ messages in thread
From: Calvin Owens @ 2025-09-10  0:40 UTC (permalink / raw)
  To: Breno Leitao
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Shuah Khan, Simon Horman, david decotigny,
	linux-kernel, netdev, linux-kselftest, asantostc, efault,
	kernel-team, stable, jv

On Tuesday 09/09 at 07:05 -0700, Breno Leitao wrote:
> On Mon, Sep 08, 2025 at 01:47:24PM -0700, Calvin Owens wrote:
> > On Friday 09/05 at 10:25 -0700, Breno Leitao wrote:
> > > commit efa95b01da18 ("netpoll: fix use after free") incorrectly
> > > ignored the refcount and prematurely set dev->npinfo to NULL during
> > > netpoll cleanup, leading to improper behavior and memory leaks.
> > > 
> > > Scenario causing lack of proper cleanup:
> > > 
> > > 1) A netpoll is associated with a NIC (e.g., eth0) and netdev->npinfo is
> > >    allocated, and refcnt = 1
> > >    - Keep in mind that npinfo is shared among all netpoll instances. In
> > >      this case, there is just one.
> > > 
> > > 2) Another netpoll is also associated with the same NIC and
> > >    npinfo->refcnt += 1.
> > >    - Now dev->npinfo->refcnt = 2;
> > >    - There is just one npinfo associated to the netdev.
> > > 
> > > 3) When the first netpolls goes to clean up:
> > >    - The first cleanup succeeds and clears np->dev->npinfo, ignoring
> > >      refcnt.
> > >      - It basically calls `RCU_INIT_POINTER(np->dev->npinfo, NULL);`
> > >    - Set dev->npinfo = NULL, without proper cleanup
> > >    - No ->ndo_netpoll_cleanup() is either called
> > > 
> > > 4) Now the second target tries to clean up
> > >    - The second cleanup fails because np->dev->npinfo is already NULL.
> > >      * In this case, ops->ndo_netpoll_cleanup() was never called, and
> > >        the skb pool is not cleaned as well (for the second netpoll
> > >        instance)
> > >   - This leaks npinfo and skbpool skbs, which is clearly reported by
> > >     kmemleak.
> > > 
> > > Revert commit efa95b01da18 ("netpoll: fix use after free") and adds
> > > clarifying comments emphasizing that npinfo cleanup should only happen
> > > once the refcount reaches zero, ensuring stable and correct netpoll
> > > behavior.
> > 
> > This makes sense to me.
> > 
> > Just curious, did you try the original OOPS reproducer?
> > https://lore.kernel.org/lkml/96b940137a50e5c387687bb4f57de8b0435a653f.1404857349.git.decot@googlers.com/
> 
> Yes, but I have not been able to reproduce the problem at all.
> I've have tested it using netdevsim, and here is a quick log of what I
> run:

Nice, thanks for clarifying.

I also tried reverting a few commits like [1] around the time that smell
vaguely related, on top of your fix, but the repro still never triggers
anything for me either. I was using virtio interfaces in KVM.

The world may never know :)

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=69b0216ac255

> 	+ modprobe netconsole
> 	+ modprobe bonding mode=4
> 	[   86.540950] Warning: miimon must be specified, otherwise bonding will not detect link failure, speed and duplex which are essential for 802.3ad operation
> 	[   86.541617] Forcing miimon to 100msec
> 	[   86.541893] MII link monitoring set to 100 ms
> 	+ echo +bond0
> 	[   86.547802] bonding: bond0 is being created...
> 	+ ifconfig bond0 192.168.56.3 up
> 	+ mkdir /sys/kernel/config/netconsole/blah
> 	+ echo 0
> 	[   86.614772] netconsole: network logging has already stopped
> 	./run.sh: line 19: echo: write error: Invalid argument
> 	+ echo bond0
> 	+ echo 192.168.56.42
> 	+ echo 1
> 	[   86.622318] netconsole: netconsole: local port 6665
> 	[   86.622550] netconsole: netconsole: local IPv4 address 0.0.0.0
> 	[   86.622819] netconsole: netconsole: interface name 'bond0'
> 	[   86.623038] netconsole: netconsole: local ethernet address '00:00:00:00:00:00'
> 	[   86.623466] netconsole: netconsole: remote port 6666
> 	[   86.623675] netconsole: netconsole: remote IPv4 address 192.168.56.42
> 	[   86.623924] netconsole: netconsole: remote ethernet address ff:ff:ff:ff:ff:ff
> 	[   86.624264] netpoll: netconsole: local IP 192.168.56.3
> 	[   86.643174] netconsole: network logging started
> 	+ ifenslave bond0 eth1
> 	[   86.659899] bond0: (slave eth1): Enslaving as a backup interface with a down link
> 	+ ifenslave bond0 eth2
> 	[   86.687630] bond0: (slave eth2): Enslaving as a backup interface with a down link
> 	+ sleep 3
> 	+ ifenslave -d bond0 eth1
> 	[   89.735701] bond0: (slave eth1): Releasing backup interface
> 	[   89.737239] bond0: (slave eth1): the permanent HWaddr of slave - 06:44:84:94:87:c7 - is still in use by bond - set the HWaddr of slave to a different address to avoid conflicts
> 	+ sleep 1
> 	+ echo -bond0
> 	[   90.798676] bonding: bond0 is being deleted...
> 	[   90.815595] netconsole: network logging stopped on interface bond0 as it unregistered
> 	[   90.816416] bond0 (unregistering): (slave eth2): Releasing backup interface
> 	[   90.863054] bond0 (unregistering): Released all slaves
> 	+ ls -lR /
> 	+ tail -30
> 	<snip>
> 
> 	+ echo +bond0
> 	./run.sh: line 39: /sys/class/net/bonding_masters: Permission denied

I don't get -EACCES here like you seem to, but nothing interesting
happens either.

> 	+ ifconfig bond0 192.168.56.3 up
> 	SIOCSIFADDR: No such device
> 	bond0: ERROR while getting interface flags: No such device
> 	bond0: ERROR while

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net v3 1/3] netpoll: fix incorrect refcount handling causing incorrect cleanup
  2025-09-10  0:18       ` Jay Vosburgh
@ 2025-09-10 14:07         ` Breno Leitao
  0 siblings, 0 replies; 18+ messages in thread
From: Breno Leitao @ 2025-09-10 14:07 UTC (permalink / raw)
  To: Jay Vosburgh
  Cc: Jakub Kicinski, Calvin Owens, Andrew Lunn, David S. Miller,
	Eric Dumazet, Paolo Abeni, Shuah Khan, Simon Horman,
	david decotigny, linux-kernel, netdev, linux-kselftest, asantostc,
	efault, kernel-team, stable

Hello Jay,

On Tue, Sep 09, 2025 at 05:18:26PM -0700, Jay Vosburgh wrote:
> 	Second, I'm a bit unsure what's going on with the struct netpoll
> *np parameter of __netpoll_setup for the second and subsequent netpoll
> instances (i.e., second and later call), as the function will
> unconditionally do
> 
> 	npinfo->netpoll = np;
> 
> 	which it seems like would overwrite the "np" supplied by any
> prior calls to __netpoll_setup. 

This is clearly a bug. Trying to understand where this field is used, it
seems it is not used at all.

I was not able to find any usage of it, and removing it and compiling
with `allyesconfig` didn't complain about any other users also.

I think we should remove it.

commit 5da5611575ce94ca557c194d4147ae3011cedb6f
Author: Breno Leitao <leitao@debian.org>
Date:   Wed Sep 10 06:32:25 2025 -0700

    netpoll: remove unused netpoll pointer from netpoll_info
    
    The netpoll_info structure contained a useless pointer back to its
    associated netpoll. This field is never used, and the assignment in
    __netpoll_setup() is does not comtemplate multiple instances, as
    reported by Jay[1].
    
    Drop both the member and its initialization to simplify the structure.
    
    Link: https://lore.kernel.org/all/2930648.1757463506@famine/ [1]
    Reported-by: Jay Vosburgh <jv@jvosburgh.net>
    Signed-off-by: Breno Leitao <leitao@debian.org>

diff --git a/include/linux/netpoll.h b/include/linux/netpoll.h
index b5ea9882eda8b..f22eec4660405 100644
--- a/include/linux/netpoll.h
+++ b/include/linux/netpoll.h
@@ -55,7 +55,6 @@ struct netpoll_info {
 
 	struct delayed_work tx_work;
 
-	struct netpoll *netpoll;
 	struct rcu_head rcu;
 };
 
diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 5f65b62346d4e..c58faa7471650 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -591,7 +591,6 @@ int __netpoll_setup(struct netpoll *np, struct net_device *ndev)
 
 	np->dev = ndev;
 	strscpy(np->dev_name, ndev->name, IFNAMSIZ);
-	npinfo->netpoll = np;
 
 	/* fill up the skb queue */
 	refill_skbs(np);

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH net v3 1/3] netpoll: fix incorrect refcount handling causing incorrect cleanup
  2025-09-09 23:16         ` Jakub Kicinski
@ 2025-09-10 14:12           ` Breno Leitao
  2025-09-10 17:58             ` Jakub Kicinski
  0 siblings, 1 reply; 18+ messages in thread
From: Breno Leitao @ 2025-09-10 14:12 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Calvin Owens, Andrew Lunn, David S. Miller, Eric Dumazet,
	Paolo Abeni, Shuah Khan, Simon Horman, david decotigny,
	linux-kernel, netdev, linux-kselftest, asantostc, efault,
	kernel-team, stable, jv

On Tue, Sep 09, 2025 at 04:16:25PM -0700, Jakub Kicinski wrote:
> On Tue, 9 Sep 2025 13:17:27 -0700 Breno Leitao wrote:
> > On Mon, Sep 08, 2025 at 06:29:58PM -0700, Jakub Kicinski wrote:
> > > On Mon, 8 Sep 2025 13:47:24 -0700 Calvin Owens wrote:  
> > > > I wonder if there might be a demon lurking in bonding+netpoll that this
> > > > was papering over? Not a reason not to fix the leaks IMO, I'm just
> > > > curious, I don't want to spend time on it if you already did :)  
> > > 
> > > +1, I also feel like it'd be good to have some bonding tests in place
> > > when we're removing a hack added specifically for bonding.  
> > 
> > Do you prefer to have a separated bonding selftest, or, is it better to
> > add some bond operations in the torture selftest?
> 
> Normal test is preferable, given the flakiness rate and patch volume
> I'm a bit scared of randomized testing as part of CI.

Ok, I will create a selftest to cover the netpoll part of bonding, as
soon as my understanding is good enough. I don't think it will be quick,
but, it is on my hi-pri todo list.

Do you want to have the selftest done before merging this patch, or, can
they go in parallel?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net v3 1/3] netpoll: fix incorrect refcount handling causing incorrect cleanup
  2025-09-10 14:12           ` Breno Leitao
@ 2025-09-10 17:58             ` Jakub Kicinski
  2025-09-10 18:50               ` Breno Leitao
  0 siblings, 1 reply; 18+ messages in thread
From: Jakub Kicinski @ 2025-09-10 17:58 UTC (permalink / raw)
  To: Breno Leitao
  Cc: Calvin Owens, Andrew Lunn, David S. Miller, Eric Dumazet,
	Paolo Abeni, Shuah Khan, Simon Horman, david decotigny,
	linux-kernel, netdev, linux-kselftest, asantostc, efault,
	kernel-team, stable, jv

On Wed, 10 Sep 2025 07:12:03 -0700 Breno Leitao wrote:
> On Tue, Sep 09, 2025 at 04:16:25PM -0700, Jakub Kicinski wrote:
> > On Tue, 9 Sep 2025 13:17:27 -0700 Breno Leitao wrote:  
> > > On Mon, Sep 08, 2025 at 06:29:58PM -0700, Jakub Kicinski wrote:  
> > > > On Mon, 8 Sep 2025 13:47:24 -0700 Calvin Owens wrote:    
> > > > > I wonder if there might be a demon lurking in bonding+netpoll that this
> > > > > was papering over? Not a reason not to fix the leaks IMO, I'm just
> > > > > curious, I don't want to spend time on it if you already did :)    
> > > > 
> > > > +1, I also feel like it'd be good to have some bonding tests in place
> > > > when we're removing a hack added specifically for bonding.    
> > > 
> > > Do you prefer to have a separated bonding selftest, or, is it better to
> > > add some bond operations in the torture selftest?  
> > 
> > Normal test is preferable, given the flakiness rate and patch volume
> > I'm a bit scared of randomized testing as part of CI.  
> 
> Ok, I will create a selftest to cover the netpoll part of bonding, as
> soon as my understanding is good enough. I don't think it will be quick,
> but, it is on my hi-pri todo list.
> 
> Do you want to have the selftest done before merging this patch, or, can
> they go in parallel?

I said "it'd be good to have some bonding tests in place when we're
removing a hack added specifically for bonding."
"In place" means part of CI when we're merging this fix.
Please read emails more carefully.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net v3 1/3] netpoll: fix incorrect refcount handling causing incorrect cleanup
  2025-09-10 17:58             ` Jakub Kicinski
@ 2025-09-10 18:50               ` Breno Leitao
  0 siblings, 0 replies; 18+ messages in thread
From: Breno Leitao @ 2025-09-10 18:50 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Calvin Owens, Andrew Lunn, David S. Miller, Eric Dumazet,
	Paolo Abeni, Shuah Khan, Simon Horman, david decotigny,
	linux-kernel, netdev, linux-kselftest, asantostc, efault,
	kernel-team, stable, jv

On Wed, Sep 10, 2025 at 10:58:58AM -0700, Jakub Kicinski wrote:
> On Wed, 10 Sep 2025 07:12:03 -0700 Breno Leitao wrote:
> > On Tue, Sep 09, 2025 at 04:16:25PM -0700, Jakub Kicinski wrote:
> > > On Tue, 9 Sep 2025 13:17:27 -0700 Breno Leitao wrote:  
> > > > On Mon, Sep 08, 2025 at 06:29:58PM -0700, Jakub Kicinski wrote:  
> > > > > On Mon, 8 Sep 2025 13:47:24 -0700 Calvin Owens wrote:    
> > > > > > I wonder if there might be a demon lurking in bonding+netpoll that this
> > > > > > was papering over? Not a reason not to fix the leaks IMO, I'm just
> > > > > > curious, I don't want to spend time on it if you already did :)    
> > > > > 
> > > > > +1, I also feel like it'd be good to have some bonding tests in place
> > > > > when we're removing a hack added specifically for bonding.    
> > > > 
> > > > Do you prefer to have a separated bonding selftest, or, is it better to
> > > > add some bond operations in the torture selftest?  
> > > 
> > > Normal test is preferable, given the flakiness rate and patch volume
> > > I'm a bit scared of randomized testing as part of CI.  
> > 
> > Ok, I will create a selftest to cover the netpoll part of bonding, as
> > soon as my understanding is good enough. I don't think it will be quick,
> > but, it is on my hi-pri todo list.
> > 
> > Do you want to have the selftest done before merging this patch, or, can
> > they go in parallel?
> 
> I said "it'd be good to have some bonding tests in place when we're
> removing a hack added specifically for bonding."
> "In place" means part of CI when we're merging this fix.
> Please read emails more carefully.

Apologies for the misunderstanding, It was unclear that the bonding
selftest should come before the fix. Thanks for the clarification.

I am planning to create a selftest similar to the original reported to cause
the issue[1], where I create a bond device with two netdevsim, and bind
netconsole to it.

[1] https://lore.kernel.org/lkml/96b940137a50e5c387687bb4f57de8b0435a653f.1404857349.git.decot@googlers.com/

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2025-09-10 18:50 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-05 17:25 [PATCH net v3 0/3] net: netpoll: fix a memleak and create a selftest Breno Leitao
2025-09-05 17:25 ` [PATCH net v3 1/3] netpoll: fix incorrect refcount handling causing incorrect cleanup Breno Leitao
2025-09-08 10:12   ` Simon Horman
2025-09-08 20:47   ` Calvin Owens
2025-09-09  1:29     ` Jakub Kicinski
2025-09-09 20:17       ` Breno Leitao
2025-09-09 23:16         ` Jakub Kicinski
2025-09-10 14:12           ` Breno Leitao
2025-09-10 17:58             ` Jakub Kicinski
2025-09-10 18:50               ` Breno Leitao
2025-09-10  0:18       ` Jay Vosburgh
2025-09-10 14:07         ` Breno Leitao
2025-09-09 14:05     ` Breno Leitao
2025-09-10  0:40       ` Calvin Owens
2025-09-05 17:25 ` [PATCH net v3 2/3] selftest: netcons: refactor target creation Breno Leitao
2025-09-08 10:13   ` Simon Horman
2025-09-05 17:25 ` [PATCH net v3 3/3] selftest: netcons: create a torture test Breno Leitao
2025-09-08 10:13   ` Simon Horman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).