netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/6] net: netconsole: Fix netconsole unsafe locking
@ 2024-08-01 16:11 Breno Leitao
  2024-08-01 16:11 ` [PATCH net-next 1/6] net: netconsole: selftests: Create a new netconsole selftest Breno Leitao
                   ` (5 more replies)
  0 siblings, 6 replies; 11+ messages in thread
From: Breno Leitao @ 2024-08-01 16:11 UTC (permalink / raw)
  To: kuba, davem, edumazet, pabeni
  Cc: thevlad, thepacketgeek, riel, horms, netdev, linux-kernel,
	paulmck, davej

Problem:
=======

The current locking mechanism in netconsole is unsafe and suboptimal due
to the following issues:

1) Lock Release and Reacquisition Mid-Loop:

In netconsole_netdev_event(), the target_list_lock is released and
reacquired within a loop, potentially causing collisions and cleaning up
targets that are being enabled.

	int netconsole_netdev_event()
	{
	...
		spin_lock_irqsave(&target_list_lock, flags);
		list_for_each_entry(nt, &target_list, list) {
			spin_unlock_irqrestore(&target_list_lock, flags);
			__netpoll_cleanup(&nt->np);
			spin_lock_irqsave(&target_list_lock, flags);
		}
		spin_lock_irqsave(&target_list_lock, flags);
	...
	}

2) Non-Atomic Cleanup Operations:

In enabled_store(), the cleanup of structures is not atomic, risking
cleanup of structures that are in the process of being enabled.

	size_t enabled_store()
	{
	...
		spin_lock_irqsave(&target_list_lock, flags);
		nt->enabled = false;
		spin_unlock_irqrestore(&target_list_lock, flags);
		netpoll_cleanup(&nt->np);
	...
	}


These issues stem from the following limitations in netconsole's locking
design:

1) write_{ext_}msg() functions:

	a) Cannot sleep
	b) Must iterate through targets and send messages to all enabled entries.
	c) List iteration is protected by target_list_lock spinlock.

2) Network event handling in netconsole_netdev_event():

	a) Needs to sleep
	b) Requires iteration over the target list (holding
	   target_list_lock spinlock).
	c) Some events necessitate netpoll struct cleanup, which *needs*
	   to sleep.

The target_list_lock needs to be used by non-sleepable functions while
also protecting operations that may sleep, leading to the current unsafe
design.


Solution:
========

1) Dual Locking Mechanism:
	- Retain current target_list_lock for non-sleepable use cases.
	- Introduce target_cleanup_list_lock (mutex) for sleepable
	  operations.

2) Deferred Cleanup:
	- Implement atomic, deferred cleanup of structures using the new
	  mutex (target_cleanup_list_lock).
	- Avoid the `goto` in the middle of the list_for_each_entry

3) Separate Cleanup List:
	- Create target_cleanup_list for deferred cleanup, protected by
	  target_cleanup_list_lock.
	- This allows cleanup() to sleep without affecting message
	  transmission.
	- When iterating over targets, move devices needing cleanup to
	  target_cleanup_list.
	- Handle cleanup under the target_cleanup_list_lock mutex.

4) Make a clear locking hierarchy

	- The target_cleanup_list_lock takes precedence over target_list_lock.

	- Major Workflow Locking Sequences:
		a) Network Event Affecting Netpoll (netconsole_netdev_event):
			rtnl -> target_cleanup_list_lock -> target_list_lock

		b) Message Writing (write_msg()):
			console_lock -> target_list_lock

		c) Configfs Target Enable/Disable (enabled_store()):
			dynamic_netconsole_mutex -> target_cleanup_list_lock -> target_list_lock


This hierarchy ensures consistent lock acquisition order across
different operations, preventing deadlocks and maintaining proper
synchronization. The target_cleanup_list_lock's higher priority allows
for safe deferred cleanup operations without interfering with regular
message transmission protected by target_list_lock.  Each workflow
follows a specific locking sequence, ensuring that operations like
network event handling, message writing, and target management are
properly synchronized and do not conflict with each other.


Testing:
=======

This patchset contains a basic version of the test I am using to test
netconsole. I hope to include it into NIPA also.

I've also tested with some parallel dynamic removal/add, and network
events, and I haven't seen any issue. Tested with KASAN and LOCKDEP.


Breno Leitao (6):
  net: netconsole: selftests: Create a new netconsole selftest
  net: netpoll: extract core of netpoll_cleanup
  net: netconsole: Correct mismatched return types
  net: netconsole: Standardize variable naming
  net: netconsole: Unify Function Return Paths
  net: netconsole: Defer netpoll cleanup to avoid lock release during
    list traversal

 MAINTAINERS                                   |   1 +
 drivers/net/netconsole.c                      | 173 +++++++++++-------
 include/linux/netpoll.h                       |   1 +
 net/core/netpoll.c                            |  12 +-
 .../net/netconsole/basic_integration_test.sh  | 153 ++++++++++++++++
 5 files changed, 272 insertions(+), 68 deletions(-)
 create mode 100755 tools/testing/selftests/net/netconsole/basic_integration_test.sh

-- 
2.43.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH net-next 1/6] net: netconsole: selftests: Create a new netconsole selftest
  2024-08-01 16:11 [PATCH net-next 0/6] net: netconsole: Fix netconsole unsafe locking Breno Leitao
@ 2024-08-01 16:11 ` Breno Leitao
  2024-08-01 16:53   ` Jakub Kicinski
  2024-08-01 16:11 ` [PATCH net-next 2/6] net: netpoll: extract core of netpoll_cleanup Breno Leitao
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 11+ messages in thread
From: Breno Leitao @ 2024-08-01 16:11 UTC (permalink / raw)
  To: kuba, davem, edumazet, pabeni, Shuah Khan
  Cc: thevlad, thepacketgeek, riel, horms, netdev, linux-kernel,
	paulmck, davej, open list:KERNEL SELFTEST FRAMEWORK

Adds a selftest that creates two virtual interfaces, assigns one to a
new namespace, and assigns IP addresses to both.

It listens on the destination interface using netcat and configures a
dynamic target on netconsole, pointing to the destination IP address.

The test then checks if the message was received properly on the
destination interface.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 MAINTAINERS                                   |   1 +
 .../net/netconsole/basic_integration_test.sh  | 153 ++++++++++++++++++
 2 files changed, 154 insertions(+)
 create mode 100755 tools/testing/selftests/net/netconsole/basic_integration_test.sh

diff --git a/MAINTAINERS b/MAINTAINERS
index c0a3d9e93689..59207365c9f5 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -15768,6 +15768,7 @@ M:	Breno Leitao <leitao@debian.org>
 S:	Maintained
 F:	Documentation/networking/netconsole.rst
 F:	drivers/net/netconsole.c
+F:	tools/testing/selftests/net/netconsole/
 
 NETDEVSIM
 M:	Jakub Kicinski <kuba@kernel.org>
diff --git a/tools/testing/selftests/net/netconsole/basic_integration_test.sh b/tools/testing/selftests/net/netconsole/basic_integration_test.sh
new file mode 100755
index 000000000000..fbabbc633451
--- /dev/null
+++ b/tools/testing/selftests/net/netconsole/basic_integration_test.sh
@@ -0,0 +1,153 @@
+#!/usr/bin/env bash
+# SPDX-License-Identifier: GPL-2.0
+
+# This test creates two virtual interfaces, assigns one of them (the "destination
+# interface") to a new namespace, and assigns IP addresses to both interfaces.
+#
+# It listens on the destination interface using netcat (nc) and configures a
+# dynamic target on netconsole, pointing to the destination IP address.
+#
+# Finally, it checks whether the message was received properly on the
+# destination interface.  Note that this test may pollute the kernel log buffer
+# (dmesg) and relies on dynamic configuration and namespaces being configured."
+#
+# Author: Breno Leitao <leitao@debian.org>
+
+SCRIPTDIR=$(dirname "$(readlink -e "${BASH_SOURCE[0]}")")
+
+# Simple script to test dynamic targets in netconsole
+SRCIF="veth0"
+SRCIP=192.168.1.1
+
+DSTIF="veth1"
+DSTIP=192.168.1.2
+
+PORT="6666"
+MSG="netconsole selftest"
+TARGET=$(mktemp -u netcons_XXXXX)
+NETCONS_CONFIGFS="/sys/kernel/config/netconsole"
+FULLPATH="${NETCONS_CONFIGFS}"/"${TARGET}"
+# This will be have some tmp values appened to it in set_network()
+NAMESPACE="netconsns"
+
+# Used to create and delete namespaces
+source "${SCRIPTDIR}"/../lib.sh
+
+function set_network() {
+	# this is coming from lib.sh
+	setup_ns "${NAMESPACE}"
+	NAMESPACE=${NS_LIST[0]}
+	ip link add "${SRCIF}" type veth peer name "${DSTIF}"
+
+	# "${DSTIF}"
+	ip link set "${DSTIF}" netns "${NAMESPACE}"
+	ip netns exec "${NAMESPACE}" ip addr add "${DSTIP}"/24 dev "${DSTIF}"
+	ip netns exec "${NAMESPACE}" ip link set "${DSTIF}" up
+
+	# later, configure "${SRCIF}"
+	ip addr add "${SRCIP}"/24 dev "${SRCIF}"
+	ip link set "${SRCIF}" up
+}
+
+function create_dynamic_target() {
+	DSTMAC=$(ip netns exec "${NAMESPACE}" ip link show "${DSTIF}" | awk '/ether/ {print $2}')
+
+	# Create a dynamic target
+	mkdir ${FULLPATH}
+
+	echo ${DSTIP} > ${FULLPATH}/remote_ip
+	echo ${SRCIP} > ${FULLPATH}/local_ip
+	echo ${DSTMAC} > ${FULLPATH}/remote_mac
+	echo "${SRCIF}" > ${FULLPATH}/dev_name
+
+	echo 1 > ${FULLPATH}/enabled
+}
+
+function cleanup() {
+	echo 0 > "${FULLPATH}"/enabled
+	rmdir "${FULLPATH}"
+	# This will delete DSTIF also
+	ip link del "${SRCIF}"
+	# this is coming from lib.sh
+	cleanup_all_ns
+}
+
+function listen_port() {
+	OUTPUT=${1}
+	echo "Saving content in ${OUTPUT}"
+	timeout 2 ip netns exec "${NAMESPACE}" nc -u -l "${PORT}" | sed '/^$/q' > ${OUTPUT}
+}
+
+function validate_result() {
+	TMPFILENAME=/tmp/"${TARGET}"
+
+	# sleep until the file isc reated
+	sleep 1
+	# Check if the file exists
+	if [ ! -f "$TMPFILENAME" ]; then
+	    echo "FAIL: File was not generate." >&2
+	    return ${ksft_fail}
+	fi
+
+	if ! grep -q "${MSG}" "${TMPFILENAME}"; then
+	    echo "FAIL: ${MSG} not found in ${TMPFILENAME}" >&2
+	    cat ${TMPFILENAME} >&2
+	    return ${ksft_fail}
+	fi
+
+	rm ${TMPFILENAME}
+	return ${ksft_pass}
+}
+
+function check_for_dependencies() {
+	if [ "$(id -u)" -ne 0 ]; then
+		echo "This script must be run as root" >&2
+		exit 1
+	fi
+
+	if ! which nc > /dev/null ; then
+		echo "SKIP: nc(1) is not available" >&2
+		exit ${ksft_skip}
+	fi
+
+	if ! which ip > /dev/null ; then
+		echo "SKIP: ip(1) is not available" >&2
+		exit ${ksft_skip}
+	fi
+
+	if [ ! -d "${NETCONS_CONFIGFS}" ]; then
+		echo "SKIP: directory ${NETCONS_CONFIGFS} does not exist. Check if NETCONSOLE_DYNAMIC is enabled" >&2
+		exit ${ksft_skip}
+	fi
+
+	if ip link show veth0 2> /dev/null; then
+		echo "SKIP: interface veth0 exists in the system. Not overwriting it."
+		exit ${ksft_skip}
+	fi
+}
+
+# ========== #
+# Start here #
+# ========== #
+
+# Check for basic system dependency and exit if not found
+check_for_dependencies
+# Create one namespace and two interfaces
+set_network
+# Create a dynamic target for netconsole
+create_dynamic_target
+# Listed for netconsole port inside the namespace and destination interface
+listen_port /tmp/"${TARGET}" &
+# Wait for nc to start and listen to the port.
+sleep 1
+
+# Send the message
+echo "${MSG}: ${TARGET}" > /dev/kmsg
+
+# Make sure the message was received in the dst part
+validate_result
+ret=$?
+# Remove the namespace, interfaces and netconsole target
+cleanup
+
+exit ${ret}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net-next 2/6] net: netpoll: extract core of netpoll_cleanup
  2024-08-01 16:11 [PATCH net-next 0/6] net: netconsole: Fix netconsole unsafe locking Breno Leitao
  2024-08-01 16:11 ` [PATCH net-next 1/6] net: netconsole: selftests: Create a new netconsole selftest Breno Leitao
@ 2024-08-01 16:11 ` Breno Leitao
  2024-08-01 16:12 ` [PATCH net-next 3/6] net: netconsole: Correct mismatched return types Breno Leitao
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: Breno Leitao @ 2024-08-01 16:11 UTC (permalink / raw)
  To: kuba, davem, edumazet, pabeni
  Cc: thevlad, thepacketgeek, riel, horms, netdev, linux-kernel,
	paulmck, davej

Extract the core part of netpoll_cleanup(), so, it could be called from
a caller that has the rtnl lock already.

Netconsole uses this in a weird way right now:

	__netpoll_cleanup(&nt->np);
	spin_lock_irqsave(&target_list_lock, flags);
	netdev_put(nt->np.dev, &nt->np.dev_tracker);
	nt->np.dev = NULL;
	nt->enabled = false;

This will be replaced by do_netpoll_cleanup() as the locking situation
is overhauled.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Rik van Riel <riel@surriel.com>
---
 include/linux/netpoll.h |  1 +
 net/core/netpoll.c      | 12 +++++++++---
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/include/linux/netpoll.h b/include/linux/netpoll.h
index bd19c4b91e31..cd4e28db0cbd 100644
--- a/include/linux/netpoll.h
+++ b/include/linux/netpoll.h
@@ -64,6 +64,7 @@ int netpoll_setup(struct netpoll *np);
 void __netpoll_cleanup(struct netpoll *np);
 void __netpoll_free(struct netpoll *np);
 void netpoll_cleanup(struct netpoll *np);
+void do_netpoll_cleanup(struct netpoll *np);
 netdev_tx_t netpoll_send_skb(struct netpoll *np, struct sk_buff *skb);
 
 #ifdef CONFIG_NETPOLL
diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 55bcacf67df3..a58ea724790c 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -853,14 +853,20 @@ void __netpoll_free(struct netpoll *np)
 }
 EXPORT_SYMBOL_GPL(__netpoll_free);
 
+void do_netpoll_cleanup(struct netpoll *np)
+{
+	__netpoll_cleanup(np);
+	netdev_put(np->dev, &np->dev_tracker);
+	np->dev = NULL;
+}
+EXPORT_SYMBOL(do_netpoll_cleanup);
+
 void netpoll_cleanup(struct netpoll *np)
 {
 	rtnl_lock();
 	if (!np->dev)
 		goto out;
-	__netpoll_cleanup(np);
-	netdev_put(np->dev, &np->dev_tracker);
-	np->dev = NULL;
+	do_netpoll_cleanup(np);
 out:
 	rtnl_unlock();
 }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net-next 3/6] net: netconsole: Correct mismatched return types
  2024-08-01 16:11 [PATCH net-next 0/6] net: netconsole: Fix netconsole unsafe locking Breno Leitao
  2024-08-01 16:11 ` [PATCH net-next 1/6] net: netconsole: selftests: Create a new netconsole selftest Breno Leitao
  2024-08-01 16:11 ` [PATCH net-next 2/6] net: netpoll: extract core of netpoll_cleanup Breno Leitao
@ 2024-08-01 16:12 ` Breno Leitao
  2024-08-01 16:12 ` [PATCH net-next 4/6] net: netconsole: Standardize variable naming Breno Leitao
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: Breno Leitao @ 2024-08-01 16:12 UTC (permalink / raw)
  To: kuba, davem, edumazet, pabeni
  Cc: thevlad, thepacketgeek, riel, horms, netdev, linux-kernel,
	paulmck, davej

netconsole incorrectly mixes int and ssize_t types by using int for
return variables in functions that should return ssize_t.

This is fixed by updating the return variables to the appropriate
ssize_t type, ensuring consistency across the function definitions.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 drivers/net/netconsole.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index 9c09293b5258..c0ad4df7252f 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -336,7 +336,7 @@ static ssize_t enabled_store(struct config_item *item,
 	struct netconsole_target *nt = to_target(item);
 	unsigned long flags;
 	bool enabled;
-	int err;
+	ssize_t err;
 
 	mutex_lock(&dynamic_netconsole_mutex);
 	err = kstrtobool(buf, &enabled);
@@ -394,7 +394,7 @@ static ssize_t release_store(struct config_item *item, const char *buf,
 {
 	struct netconsole_target *nt = to_target(item);
 	bool release;
-	int err;
+	ssize_t err;
 
 	mutex_lock(&dynamic_netconsole_mutex);
 	if (nt->enabled) {
@@ -422,7 +422,7 @@ static ssize_t extended_store(struct config_item *item, const char *buf,
 {
 	struct netconsole_target *nt = to_target(item);
 	bool extended;
-	int err;
+	ssize_t err;
 
 	mutex_lock(&dynamic_netconsole_mutex);
 	if (nt->enabled) {
@@ -469,7 +469,7 @@ static ssize_t local_port_store(struct config_item *item, const char *buf,
 		size_t count)
 {
 	struct netconsole_target *nt = to_target(item);
-	int rv = -EINVAL;
+	ssize_t rv = -EINVAL;
 
 	mutex_lock(&dynamic_netconsole_mutex);
 	if (nt->enabled) {
@@ -492,7 +492,7 @@ static ssize_t remote_port_store(struct config_item *item,
 		const char *buf, size_t count)
 {
 	struct netconsole_target *nt = to_target(item);
-	int rv = -EINVAL;
+	ssize_t rv = -EINVAL;
 
 	mutex_lock(&dynamic_netconsole_mutex);
 	if (nt->enabled) {
@@ -685,7 +685,7 @@ static ssize_t userdatum_value_store(struct config_item *item, const char *buf,
 	struct userdatum *udm = to_userdatum(item);
 	struct netconsole_target *nt;
 	struct userdata *ud;
-	int ret;
+	ssize_t ret;
 
 	if (count > MAX_USERDATA_VALUE_LENGTH)
 		return -EMSGSIZE;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net-next 4/6] net: netconsole: Standardize variable naming
  2024-08-01 16:11 [PATCH net-next 0/6] net: netconsole: Fix netconsole unsafe locking Breno Leitao
                   ` (2 preceding siblings ...)
  2024-08-01 16:12 ` [PATCH net-next 3/6] net: netconsole: Correct mismatched return types Breno Leitao
@ 2024-08-01 16:12 ` Breno Leitao
  2024-08-01 16:12 ` [PATCH net-next 5/6] net: netconsole: Unify Function Return Paths Breno Leitao
  2024-08-01 16:12 ` [PATCH net-next 6/6] net: netconsole: Defer netpoll cleanup to avoid lock release during list traversal Breno Leitao
  5 siblings, 0 replies; 11+ messages in thread
From: Breno Leitao @ 2024-08-01 16:12 UTC (permalink / raw)
  To: kuba, davem, edumazet, pabeni
  Cc: thevlad, thepacketgeek, riel, horms, netdev, linux-kernel,
	paulmck, davej

Update variable names from err to ret in cases where the variable may
return non-error values.

This change facilitates a forthcoming patch that relies on ret being
used consistently to handle return values, regardless of whether they
indicate an error or not.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 drivers/net/netconsole.c | 50 ++++++++++++++++++++--------------------
 1 file changed, 25 insertions(+), 25 deletions(-)

diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index c0ad4df7252f..0980a61f8775 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -336,14 +336,14 @@ static ssize_t enabled_store(struct config_item *item,
 	struct netconsole_target *nt = to_target(item);
 	unsigned long flags;
 	bool enabled;
-	ssize_t err;
+	ssize_t ret;
 
 	mutex_lock(&dynamic_netconsole_mutex);
-	err = kstrtobool(buf, &enabled);
-	if (err)
+	ret = kstrtobool(buf, &enabled);
+	if (ret)
 		goto out_unlock;
 
-	err = -EINVAL;
+	ret = -EINVAL;
 	if (enabled == nt->enabled) {
 		pr_info("network logging has already %s\n",
 			nt->enabled ? "started" : "stopped");
@@ -365,8 +365,8 @@ static ssize_t enabled_store(struct config_item *item,
 		 */
 		netpoll_print_options(&nt->np);
 
-		err = netpoll_setup(&nt->np);
-		if (err)
+		ret = netpoll_setup(&nt->np);
+		if (ret)
 			goto out_unlock;
 
 		nt->enabled = true;
@@ -386,7 +386,7 @@ static ssize_t enabled_store(struct config_item *item,
 	return strnlen(buf, count);
 out_unlock:
 	mutex_unlock(&dynamic_netconsole_mutex);
-	return err;
+	return ret;
 }
 
 static ssize_t release_store(struct config_item *item, const char *buf,
@@ -394,18 +394,18 @@ static ssize_t release_store(struct config_item *item, const char *buf,
 {
 	struct netconsole_target *nt = to_target(item);
 	bool release;
-	ssize_t err;
+	ssize_t ret;
 
 	mutex_lock(&dynamic_netconsole_mutex);
 	if (nt->enabled) {
 		pr_err("target (%s) is enabled, disable to update parameters\n",
 		       config_item_name(&nt->group.cg_item));
-		err = -EINVAL;
+		ret = -EINVAL;
 		goto out_unlock;
 	}
 
-	err = kstrtobool(buf, &release);
-	if (err)
+	ret = kstrtobool(buf, &release);
+	if (ret)
 		goto out_unlock;
 
 	nt->release = release;
@@ -414,7 +414,7 @@ static ssize_t release_store(struct config_item *item, const char *buf,
 	return strnlen(buf, count);
 out_unlock:
 	mutex_unlock(&dynamic_netconsole_mutex);
-	return err;
+	return ret;
 }
 
 static ssize_t extended_store(struct config_item *item, const char *buf,
@@ -422,18 +422,18 @@ static ssize_t extended_store(struct config_item *item, const char *buf,
 {
 	struct netconsole_target *nt = to_target(item);
 	bool extended;
-	ssize_t err;
+	ssize_t ret;
 
 	mutex_lock(&dynamic_netconsole_mutex);
 	if (nt->enabled) {
 		pr_err("target (%s) is enabled, disable to update parameters\n",
 		       config_item_name(&nt->group.cg_item));
-		err = -EINVAL;
+		ret = -EINVAL;
 		goto out_unlock;
 	}
 
-	err = kstrtobool(buf, &extended);
-	if (err)
+	ret = kstrtobool(buf, &extended);
+	if (ret)
 		goto out_unlock;
 
 	nt->extended = extended;
@@ -442,7 +442,7 @@ static ssize_t extended_store(struct config_item *item, const char *buf,
 	return strnlen(buf, count);
 out_unlock:
 	mutex_unlock(&dynamic_netconsole_mutex);
-	return err;
+	return ret;
 }
 
 static ssize_t dev_name_store(struct config_item *item, const char *buf,
@@ -469,7 +469,7 @@ static ssize_t local_port_store(struct config_item *item, const char *buf,
 		size_t count)
 {
 	struct netconsole_target *nt = to_target(item);
-	ssize_t rv = -EINVAL;
+	ssize_t ret = -EINVAL;
 
 	mutex_lock(&dynamic_netconsole_mutex);
 	if (nt->enabled) {
@@ -478,21 +478,21 @@ static ssize_t local_port_store(struct config_item *item, const char *buf,
 		goto out_unlock;
 	}
 
-	rv = kstrtou16(buf, 10, &nt->np.local_port);
-	if (rv < 0)
+	ret = kstrtou16(buf, 10, &nt->np.local_port);
+	if (ret < 0)
 		goto out_unlock;
 	mutex_unlock(&dynamic_netconsole_mutex);
 	return strnlen(buf, count);
 out_unlock:
 	mutex_unlock(&dynamic_netconsole_mutex);
-	return rv;
+	return ret;
 }
 
 static ssize_t remote_port_store(struct config_item *item,
 		const char *buf, size_t count)
 {
 	struct netconsole_target *nt = to_target(item);
-	ssize_t rv = -EINVAL;
+	ssize_t ret = -EINVAL;
 
 	mutex_lock(&dynamic_netconsole_mutex);
 	if (nt->enabled) {
@@ -501,14 +501,14 @@ static ssize_t remote_port_store(struct config_item *item,
 		goto out_unlock;
 	}
 
-	rv = kstrtou16(buf, 10, &nt->np.remote_port);
-	if (rv < 0)
+	ret = kstrtou16(buf, 10, &nt->np.remote_port);
+	if (ret < 0)
 		goto out_unlock;
 	mutex_unlock(&dynamic_netconsole_mutex);
 	return strnlen(buf, count);
 out_unlock:
 	mutex_unlock(&dynamic_netconsole_mutex);
-	return rv;
+	return ret;
 }
 
 static ssize_t local_ip_store(struct config_item *item, const char *buf,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net-next 5/6] net: netconsole: Unify Function Return Paths
  2024-08-01 16:11 [PATCH net-next 0/6] net: netconsole: Fix netconsole unsafe locking Breno Leitao
                   ` (3 preceding siblings ...)
  2024-08-01 16:12 ` [PATCH net-next 4/6] net: netconsole: Standardize variable naming Breno Leitao
@ 2024-08-01 16:12 ` Breno Leitao
  2024-08-01 16:12 ` [PATCH net-next 6/6] net: netconsole: Defer netpoll cleanup to avoid lock release during list traversal Breno Leitao
  5 siblings, 0 replies; 11+ messages in thread
From: Breno Leitao @ 2024-08-01 16:12 UTC (permalink / raw)
  To: kuba, davem, edumazet, pabeni
  Cc: thevlad, thepacketgeek, riel, horms, netdev, linux-kernel,
	paulmck, davej

The return flow in netconsole's dynamic functions is currently
inconsistent. This patch aims to streamline and standardize the process
by ensuring that the mutex is unlocked before returning the ret value.

Additionally, this update includes a minor functional change where
certain strnlen() operations are performed with the
dynamic_netconsole_mutex locked. This adjustment is not anticipated to
cause any issues, however, it is crucial to document this change for
clarity.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 drivers/net/netconsole.c | 38 +++++++++++++++-----------------------
 1 file changed, 15 insertions(+), 23 deletions(-)

diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index 0980a61f8775..eb9799edb95b 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -382,8 +382,7 @@ static ssize_t enabled_store(struct config_item *item,
 		netpoll_cleanup(&nt->np);
 	}
 
-	mutex_unlock(&dynamic_netconsole_mutex);
-	return strnlen(buf, count);
+	ret = strnlen(buf, count);
 out_unlock:
 	mutex_unlock(&dynamic_netconsole_mutex);
 	return ret;
@@ -410,8 +409,7 @@ static ssize_t release_store(struct config_item *item, const char *buf,
 
 	nt->release = release;
 
-	mutex_unlock(&dynamic_netconsole_mutex);
-	return strnlen(buf, count);
+	ret = strnlen(buf, count);
 out_unlock:
 	mutex_unlock(&dynamic_netconsole_mutex);
 	return ret;
@@ -437,9 +435,7 @@ static ssize_t extended_store(struct config_item *item, const char *buf,
 		goto out_unlock;
 
 	nt->extended = extended;
-
-	mutex_unlock(&dynamic_netconsole_mutex);
-	return strnlen(buf, count);
+	ret = strnlen(buf, count);
 out_unlock:
 	mutex_unlock(&dynamic_netconsole_mutex);
 	return ret;
@@ -481,8 +477,7 @@ static ssize_t local_port_store(struct config_item *item, const char *buf,
 	ret = kstrtou16(buf, 10, &nt->np.local_port);
 	if (ret < 0)
 		goto out_unlock;
-	mutex_unlock(&dynamic_netconsole_mutex);
-	return strnlen(buf, count);
+	ret = strnlen(buf, count);
 out_unlock:
 	mutex_unlock(&dynamic_netconsole_mutex);
 	return ret;
@@ -504,8 +499,7 @@ static ssize_t remote_port_store(struct config_item *item,
 	ret = kstrtou16(buf, 10, &nt->np.remote_port);
 	if (ret < 0)
 		goto out_unlock;
-	mutex_unlock(&dynamic_netconsole_mutex);
-	return strnlen(buf, count);
+	ret = strnlen(buf, count);
 out_unlock:
 	mutex_unlock(&dynamic_netconsole_mutex);
 	return ret;
@@ -515,6 +509,7 @@ static ssize_t local_ip_store(struct config_item *item, const char *buf,
 		size_t count)
 {
 	struct netconsole_target *nt = to_target(item);
+	ssize_t ret = -EINVAL;
 
 	mutex_lock(&dynamic_netconsole_mutex);
 	if (nt->enabled) {
@@ -541,17 +536,17 @@ static ssize_t local_ip_store(struct config_item *item, const char *buf,
 			goto out_unlock;
 	}
 
-	mutex_unlock(&dynamic_netconsole_mutex);
-	return strnlen(buf, count);
+	ret = strnlen(buf, count);
 out_unlock:
 	mutex_unlock(&dynamic_netconsole_mutex);
-	return -EINVAL;
+	return ret;
 }
 
 static ssize_t remote_ip_store(struct config_item *item, const char *buf,
 	       size_t count)
 {
 	struct netconsole_target *nt = to_target(item);
+	ssize_t ret = -EINVAL;
 
 	mutex_lock(&dynamic_netconsole_mutex);
 	if (nt->enabled) {
@@ -578,11 +573,10 @@ static ssize_t remote_ip_store(struct config_item *item, const char *buf,
 			goto out_unlock;
 	}
 
-	mutex_unlock(&dynamic_netconsole_mutex);
-	return strnlen(buf, count);
+	ret = strnlen(buf, count);
 out_unlock:
 	mutex_unlock(&dynamic_netconsole_mutex);
-	return -EINVAL;
+	return ret;
 }
 
 static ssize_t remote_mac_store(struct config_item *item, const char *buf,
@@ -590,6 +584,7 @@ static ssize_t remote_mac_store(struct config_item *item, const char *buf,
 {
 	struct netconsole_target *nt = to_target(item);
 	u8 remote_mac[ETH_ALEN];
+	ssize_t ret = -EINVAL;
 
 	mutex_lock(&dynamic_netconsole_mutex);
 	if (nt->enabled) {
@@ -604,11 +599,10 @@ static ssize_t remote_mac_store(struct config_item *item, const char *buf,
 		goto out_unlock;
 	memcpy(nt->np.remote_mac, remote_mac, ETH_ALEN);
 
-	mutex_unlock(&dynamic_netconsole_mutex);
-	return strnlen(buf, count);
+	ret = strnlen(buf, count);
 out_unlock:
 	mutex_unlock(&dynamic_netconsole_mutex);
-	return -EINVAL;
+	return ret;
 }
 
 struct userdatum {
@@ -700,9 +694,7 @@ static ssize_t userdatum_value_store(struct config_item *item, const char *buf,
 	ud = to_userdata(item->ci_parent);
 	nt = userdata_to_target(ud);
 	update_userdata(nt);
-
-	mutex_unlock(&dynamic_netconsole_mutex);
-	return count;
+	ret = count;
 out_unlock:
 	mutex_unlock(&dynamic_netconsole_mutex);
 	return ret;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net-next 6/6] net: netconsole: Defer netpoll cleanup to avoid lock release during list traversal
  2024-08-01 16:11 [PATCH net-next 0/6] net: netconsole: Fix netconsole unsafe locking Breno Leitao
                   ` (4 preceding siblings ...)
  2024-08-01 16:12 ` [PATCH net-next 5/6] net: netconsole: Unify Function Return Paths Breno Leitao
@ 2024-08-01 16:12 ` Breno Leitao
  2024-08-01 16:21   ` Stephen Hemminger
  5 siblings, 1 reply; 11+ messages in thread
From: Breno Leitao @ 2024-08-01 16:12 UTC (permalink / raw)
  To: kuba, davem, edumazet, pabeni
  Cc: thevlad, thepacketgeek, riel, horms, netdev, linux-kernel,
	paulmck, davej

Current issue:
- The `target_list_lock` spinlock is held while iterating over
  target_list() entries.
- Mid-loop, the lock is released to call __netpoll_cleanup(), then
  reacquired.
- This practice compromises the protection provided by
  `target_list_lock`.

Reason for current design:
1. __netpoll_cleanup() may sleep, incompatible with holding a spinlock.
2. target_list_lock must be a spinlock because write_msg() cannot sleep.
   (See commit b5427c27173e ("[NET] netconsole: Support multiple logging
    targets"))

Defer the cleanup of the netpoll structure to outside the
target_list_lock() protected area. Create another list
(target_cleanup_list) to hold the entries that need to be cleaned up,
and clean them using a mutex (target_cleanup_list_lock).

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 drivers/net/netconsole.c | 83 ++++++++++++++++++++++++++++++++--------
 1 file changed, 67 insertions(+), 16 deletions(-)

diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index eb9799edb95b..dd984ee4a564 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -37,6 +37,7 @@
 #include <linux/configfs.h>
 #include <linux/etherdevice.h>
 #include <linux/utsname.h>
+#include <linux/rtnetlink.h>
 
 MODULE_AUTHOR("Maintainer: Matt Mackall <mpm@selenic.com>");
 MODULE_DESCRIPTION("Console driver for network interfaces");
@@ -72,9 +73,16 @@ __setup("netconsole=", option_setup);
 
 /* Linked list of all configured targets */
 static LIST_HEAD(target_list);
+/* target_cleanup_list is used to track targets that need to be cleaned outside
+ * of target_list_lock. It should be cleaned in the same function it is
+ * populated.
+ */
+static LIST_HEAD(target_cleanup_list);
 
 /* This needs to be a spinlock because write_msg() cannot sleep */
 static DEFINE_SPINLOCK(target_list_lock);
+/* This needs to be a mutex because netpoll_cleanup might sleep */
+static DEFINE_MUTEX(target_cleanup_list_lock);
 
 /*
  * Console driver for extended netconsoles.  Registered on the first use to
@@ -210,6 +218,46 @@ static struct netconsole_target *alloc_and_init(void)
 	return nt;
 }
 
+/* Clean up every target in the cleanup_list and move the clean targets back to
+ * the main target_list.
+ */
+static void netconsole_process_cleanups_core(void)
+{
+	struct netconsole_target *nt, *tmp;
+	unsigned long flags;
+
+	/* The cleanup needs RTNL locked */
+	ASSERT_RTNL();
+
+	mutex_lock(&target_cleanup_list_lock);
+	list_for_each_entry_safe(nt, tmp, &target_cleanup_list, list) {
+		/* all entries in the cleanup_list needs to be disabled */
+		WARN_ON_ONCE(nt->enabled);
+		do_netpoll_cleanup(&nt->np);
+		/* moved the cleaned target to target_list. Need to hold both
+		 * locks
+		 */
+		spin_lock_irqsave(&target_list_lock, flags);
+		list_move(&nt->list, &target_list);
+		spin_unlock_irqrestore(&target_list_lock, flags);
+	}
+	WARN_ON_ONCE(!list_empty(&target_cleanup_list));
+	mutex_unlock(&target_cleanup_list_lock);
+}
+
+/* Do the list cleanup with the rtnl lock hold.  rtnl lock is necessary because
+ * netdev might be cleaned-up by calling __netpoll_cleanup(),
+ */
+static void netconsole_process_cleanups(void)
+{
+	/* rtnl lock is called here, because it has precedence over
+	 * target_cleanup_list_lock mutex and target_cleanup_list
+	 */
+	rtnl_lock();
+	netconsole_process_cleanups_core();
+	rtnl_unlock();
+}
+
 #ifdef	CONFIG_NETCONSOLE_DYNAMIC
 
 /*
@@ -376,13 +424,20 @@ static ssize_t enabled_store(struct config_item *item,
 		 * otherwise we might end up in write_msg() with
 		 * nt->np.dev == NULL and nt->enabled == true
 		 */
+		mutex_lock(&target_cleanup_list_lock);
 		spin_lock_irqsave(&target_list_lock, flags);
 		nt->enabled = false;
+		/* Remove the target from the list, while holding
+		 * target_list_lock
+		 */
+		list_move(&nt->list, &target_cleanup_list);
 		spin_unlock_irqrestore(&target_list_lock, flags);
-		netpoll_cleanup(&nt->np);
+		mutex_unlock(&target_cleanup_list_lock);
 	}
 
 	ret = strnlen(buf, count);
+	/* Deferred cleanup */
+	netconsole_process_cleanups();
 out_unlock:
 	mutex_unlock(&dynamic_netconsole_mutex);
 	return ret;
@@ -942,7 +997,7 @@ static int netconsole_netdev_event(struct notifier_block *this,
 				   unsigned long event, void *ptr)
 {
 	unsigned long flags;
-	struct netconsole_target *nt;
+	struct netconsole_target *nt, *tmp;
 	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
 	bool stopped = false;
 
@@ -950,9 +1005,9 @@ static int netconsole_netdev_event(struct notifier_block *this,
 	      event == NETDEV_RELEASE || event == NETDEV_JOIN))
 		goto done;
 
+	mutex_lock(&target_cleanup_list_lock);
 	spin_lock_irqsave(&target_list_lock, flags);
-restart:
-	list_for_each_entry(nt, &target_list, list) {
+	list_for_each_entry_safe(nt, tmp, &target_list, list) {
 		netconsole_target_get(nt);
 		if (nt->np.dev == dev) {
 			switch (event) {
@@ -962,25 +1017,16 @@ static int netconsole_netdev_event(struct notifier_block *this,
 			case NETDEV_RELEASE:
 			case NETDEV_JOIN:
 			case NETDEV_UNREGISTER:
-				/* rtnl_lock already held
-				 * we might sleep in __netpoll_cleanup()
-				 */
 				nt->enabled = false;
-				spin_unlock_irqrestore(&target_list_lock, flags);
-
-				__netpoll_cleanup(&nt->np);
-
-				spin_lock_irqsave(&target_list_lock, flags);
-				netdev_put(nt->np.dev, &nt->np.dev_tracker);
-				nt->np.dev = NULL;
+				list_move(&nt->list, &target_cleanup_list);
 				stopped = true;
-				netconsole_target_put(nt);
-				goto restart;
 			}
 		}
 		netconsole_target_put(nt);
 	}
 	spin_unlock_irqrestore(&target_list_lock, flags);
+	mutex_unlock(&target_cleanup_list_lock);
+
 	if (stopped) {
 		const char *msg = "had an event";
 
@@ -999,6 +1045,11 @@ static int netconsole_netdev_event(struct notifier_block *this,
 			dev->name, msg);
 	}
 
+	/* Process target_cleanup_list entries. By the end, target_cleanup_list
+	 * should be empty
+	 */
+	netconsole_process_cleanups_core();
+
 done:
 	return NOTIFY_DONE;
 }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next 6/6] net: netconsole: Defer netpoll cleanup to avoid lock release during list traversal
  2024-08-01 16:12 ` [PATCH net-next 6/6] net: netconsole: Defer netpoll cleanup to avoid lock release during list traversal Breno Leitao
@ 2024-08-01 16:21   ` Stephen Hemminger
  2024-08-01 16:29     ` Breno Leitao
  0 siblings, 1 reply; 11+ messages in thread
From: Stephen Hemminger @ 2024-08-01 16:21 UTC (permalink / raw)
  To: Breno Leitao
  Cc: kuba, davem, edumazet, pabeni, thevlad, thepacketgeek, riel,
	horms, netdev, linux-kernel, paulmck, davej

On Thu,  1 Aug 2024 09:12:03 -0700
Breno Leitao <leitao@debian.org> wrote:

> diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
> index eb9799edb95b..dd984ee4a564 100644
> --- a/drivers/net/netconsole.c
> +++ b/drivers/net/netconsole.c
> @@ -37,6 +37,7 @@
>  #include <linux/configfs.h>
>  #include <linux/etherdevice.h>
>  #include <linux/utsname.h>
> +#include <linux/rtnetlink.h>
>  
>  MODULE_AUTHOR("Maintainer: Matt Mackall <mpm@selenic.com>");

Should the Maintainer part be removed from the AUTHOR string by now?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next 6/6] net: netconsole: Defer netpoll cleanup to avoid lock release during list traversal
  2024-08-01 16:21   ` Stephen Hemminger
@ 2024-08-01 16:29     ` Breno Leitao
  0 siblings, 0 replies; 11+ messages in thread
From: Breno Leitao @ 2024-08-01 16:29 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: kuba, davem, edumazet, pabeni, thevlad, thepacketgeek, riel,
	horms, netdev, linux-kernel, paulmck, davej

Hello Stephen,

On Thu, Aug 01, 2024 at 09:21:56AM -0700, Stephen Hemminger wrote:
> On Thu,  1 Aug 2024 09:12:03 -0700
> Breno Leitao <leitao@debian.org> wrote:
> 
> > diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
> > index eb9799edb95b..dd984ee4a564 100644
> > --- a/drivers/net/netconsole.c
> > +++ b/drivers/net/netconsole.c
> > @@ -37,6 +37,7 @@
> >  #include <linux/configfs.h>
> >  #include <linux/etherdevice.h>
> >  #include <linux/utsname.h>
> > +#include <linux/rtnetlink.h>
> >  
> >  MODULE_AUTHOR("Maintainer: Matt Mackall <mpm@selenic.com>");
> 
> Should the Maintainer part be removed from the AUTHOR string by now?

Yes, I think this is a good idea. I will send a patch soon.

--breno

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next 1/6] net: netconsole: selftests: Create a new netconsole selftest
  2024-08-01 16:11 ` [PATCH net-next 1/6] net: netconsole: selftests: Create a new netconsole selftest Breno Leitao
@ 2024-08-01 16:53   ` Jakub Kicinski
  2024-08-02  8:09     ` Breno Leitao
  0 siblings, 1 reply; 11+ messages in thread
From: Jakub Kicinski @ 2024-08-01 16:53 UTC (permalink / raw)
  To: Breno Leitao
  Cc: davem, edumazet, pabeni, Shuah Khan, thevlad, thepacketgeek, riel,
	horms, netdev, linux-kernel, paulmck, davej,
	open list:KERNEL SELFTEST FRAMEWORK

On Thu,  1 Aug 2024 09:11:58 -0700 Breno Leitao wrote:
>  .../net/netconsole/basic_integration_test.sh  | 153 ++++++++++++++++++

It needs to be included in a Makefile
If we only have one script I'd put it directly in .../net/, 
or drivers/netdevsim/? each target should technically have
a Kconfig, Makefile, settings, no point for a single script.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next 1/6] net: netconsole: selftests: Create a new netconsole selftest
  2024-08-01 16:53   ` Jakub Kicinski
@ 2024-08-02  8:09     ` Breno Leitao
  0 siblings, 0 replies; 11+ messages in thread
From: Breno Leitao @ 2024-08-02  8:09 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, edumazet, pabeni, Shuah Khan, thevlad, thepacketgeek, riel,
	horms, netdev, linux-kernel, paulmck, davej,
	open list:KERNEL SELFTEST FRAMEWORK

Hello Jakub,

On Thu, Aug 01, 2024 at 09:53:22AM -0700, Jakub Kicinski wrote:
> On Thu,  1 Aug 2024 09:11:58 -0700 Breno Leitao wrote:
> >  .../net/netconsole/basic_integration_test.sh  | 153 ++++++++++++++++++
> 
> It needs to be included in a Makefile
> If we only have one script I'd put it directly in .../net/, 
> or drivers/netdevsim/? each target should technically have
> a Kconfig, Makefile, settings, no point for a single script.

Thanks for the feedback, I will wait a bit for more feedback, and then
send a v2 where I would move the script back to .../net.

--breno

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-08-02  8:09 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-01 16:11 [PATCH net-next 0/6] net: netconsole: Fix netconsole unsafe locking Breno Leitao
2024-08-01 16:11 ` [PATCH net-next 1/6] net: netconsole: selftests: Create a new netconsole selftest Breno Leitao
2024-08-01 16:53   ` Jakub Kicinski
2024-08-02  8:09     ` Breno Leitao
2024-08-01 16:11 ` [PATCH net-next 2/6] net: netpoll: extract core of netpoll_cleanup Breno Leitao
2024-08-01 16:12 ` [PATCH net-next 3/6] net: netconsole: Correct mismatched return types Breno Leitao
2024-08-01 16:12 ` [PATCH net-next 4/6] net: netconsole: Standardize variable naming Breno Leitao
2024-08-01 16:12 ` [PATCH net-next 5/6] net: netconsole: Unify Function Return Paths Breno Leitao
2024-08-01 16:12 ` [PATCH net-next 6/6] net: netconsole: Defer netpoll cleanup to avoid lock release during list traversal Breno Leitao
2024-08-01 16:21   ` Stephen Hemminger
2024-08-01 16:29     ` Breno Leitao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).