* [PATCH net v2 1/2] netconsole: Fix race condition in between reader and writer of userdata
2025-10-22 17:39 [PATCH net v2 0/2] netconsole: Fix userdata race condition Gustavo Luiz Duarte
@ 2025-10-22 17:39 ` Gustavo Luiz Duarte
2025-10-22 17:39 ` [PATCH net v2 2/2] selftests: netconsole: Add race condition test for userdata corruption Gustavo Luiz Duarte
2025-10-23 1:01 ` [PATCH net v2 0/2] netconsole: Fix userdata race condition Jakub Kicinski
2 siblings, 0 replies; 7+ messages in thread
From: Gustavo Luiz Duarte @ 2025-10-22 17:39 UTC (permalink / raw)
To: Andre Carvalho, Simon Horman, Breno Leitao, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Matthew Wood, Shuah Khan
Cc: netdev, linux-kernel, linux-kselftest, Gustavo Luiz Duarte
The update_userdata() function constructs the complete userdata string
in nt->extradata_complete and updates nt->userdata_length. This data
is then read by write_msg() and write_ext_msg() when sending netconsole
messages. However, update_userdata() was not holding target_list_lock
during this process, allowing concurrent message transmission to read
partially updated userdata.
This race condition could result in netconsole messages containing
incomplete or inconsistent userdata - for example, reading the old
userdata_length with new extradata_complete content, or vice versa,
leading to truncated or corrupted output.
Fix this by acquiring target_list_lock with spin_lock_irqsave() before
updating extradata_complete and userdata_length, and releasing it after
both fields are fully updated. This ensures that readers see a
consistent view of the userdata, preventing corruption during concurrent
access.
The fix aligns with the existing locking pattern used throughout the
netconsole code, where target_list_lock protects access to target
fields including buf[] and msgcounter that are accessed during message
transmission.
Fixes: df03f830d099 ("net: netconsole: cache userdata formatted string in netconsole_target")
Signed-off-by: Gustavo Luiz Duarte <gustavold@gmail.com>
---
drivers/net/netconsole.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index 194570443493..1f9cf6b12dfc 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -888,6 +888,9 @@ static void update_userdata(struct netconsole_target *nt)
{
int complete_idx = 0, child_count = 0;
struct list_head *entry;
+ unsigned long flags;
+
+ spin_lock_irqsave(&target_list_lock, flags);
/* Clear the current string in case the last userdatum was deleted */
nt->userdata_length = 0;
@@ -918,6 +921,8 @@ static void update_userdata(struct netconsole_target *nt)
}
nt->userdata_length = strnlen(nt->extradata_complete,
sizeof(nt->extradata_complete));
+
+ spin_unlock_irqrestore(&target_list_lock, flags);
}
static ssize_t userdatum_value_store(struct config_item *item, const char *buf,
--
2.47.3
^ permalink raw reply related [flat|nested] 7+ messages in thread* [PATCH net v2 2/2] selftests: netconsole: Add race condition test for userdata corruption
2025-10-22 17:39 [PATCH net v2 0/2] netconsole: Fix userdata race condition Gustavo Luiz Duarte
2025-10-22 17:39 ` [PATCH net v2 1/2] netconsole: Fix race condition in between reader and writer of userdata Gustavo Luiz Duarte
@ 2025-10-22 17:39 ` Gustavo Luiz Duarte
2025-10-23 1:01 ` [PATCH net v2 0/2] netconsole: Fix userdata race condition Jakub Kicinski
2 siblings, 0 replies; 7+ messages in thread
From: Gustavo Luiz Duarte @ 2025-10-22 17:39 UTC (permalink / raw)
To: Andre Carvalho, Simon Horman, Breno Leitao, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Matthew Wood, Shuah Khan
Cc: netdev, linux-kernel, linux-kselftest, Gustavo Luiz Duarte
Add a test to verify that netconsole userdata handling is properly
synchronized under concurrent read/write operations. The test creates
two competing loops: one continuously sending netconsole messages
(which read userdata), and another rapidly alternating userdata values
between two distinct 198-byte patterns filled with 'A' and 'B'
characters.
Without proper synchronization, concurrent reads and writes could result
in torn reads where a message contains mixed userdata (e.g., starting
with 'A' but containing 'B', or vice versa). The test monitors 10,000
messages and fails if it detects any such corruption, ensuring that the
netconsole implementation maintains data consistency through proper
locking mechanisms.
This test validates the fix for potential race conditions in the
netconsole userdata path and serves as a regression test to prevent
similar issues in the future.
Signed-off-by: Gustavo Luiz Duarte <gustavold@gmail.com>
---
tools/testing/selftests/drivers/net/Makefile | 1 +
.../selftests/drivers/net/netcons_race_userdata.sh | 87 ++++++++++++++++++++++
2 files changed, 88 insertions(+)
diff --git a/tools/testing/selftests/drivers/net/Makefile b/tools/testing/selftests/drivers/net/Makefile
index 6e41635bd55a..ba7dedc54711 100644
--- a/tools/testing/selftests/drivers/net/Makefile
+++ b/tools/testing/selftests/drivers/net/Makefile
@@ -17,6 +17,7 @@ TEST_PROGS := \
netcons_cmdline.sh \
netcons_fragmented_msg.sh \
netcons_overflow.sh \
+ netcons_race_userdata.sh \
netcons_sysdata.sh \
netpoll_basic.py \
ping.py \
diff --git a/tools/testing/selftests/drivers/net/netcons_race_userdata.sh b/tools/testing/selftests/drivers/net/netcons_race_userdata.sh
new file mode 100755
index 000000000000..017ee042eb75
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/netcons_race_userdata.sh
@@ -0,0 +1,87 @@
+#!/usr/bin/env bash
+# SPDX-License-Identifier: GPL-2.0
+
+# This test verifies that netconsole userdata remains consistent under concurrent
+# read/write operations. It creates two loops: one continuously writing netconsole
+# messages (which read userdata) and another rapidly alternating userdata values
+# between two distinct patterns. The test checks that no message contains corrupted
+# or mixed userdata, ensuring proper synchronization in the netconsole implementation.
+#
+# Author: Gustavo Luiz Duarte <gustavold@gmail.com>
+
+set -euo pipefail
+
+SCRIPTDIR=$(dirname "$(readlink -e "${BASH_SOURCE[0]}")")
+
+source "${SCRIPTDIR}"/lib/sh/lib_netcons.sh
+
+function loop_set_userdata() {
+ MSGA=$(printf 'A%.0s' {1..198})
+ MSGB=$(printf 'B%.0s' {1..198})
+
+ while true; do
+ echo "$MSGA" > "${NETCONS_PATH}/userdata/${USERDATA_KEY}/value"
+ echo "$MSGB" > "${NETCONS_PATH}/userdata/${USERDATA_KEY}/value"
+ done
+}
+
+function loop_print_msg() {
+ while true; do
+ echo "test msg" > /dev/kmsg
+ done
+}
+
+cleanup_children() {
+ pkill_socat
+ kill "$child1" "$child2" 2> /dev/null || true
+ wait "$child1" "$child2" 2> /dev/null || true
+ # Remove the namespace, interfaces and netconsole target
+ cleanup
+}
+
+modprobe netdevsim 2> /dev/null || true
+modprobe netconsole 2> /dev/null || true
+
+OUTPUT_FILE="stdout"
+# Check for basic system dependency and exit if not found
+check_for_dependencies
+# Set current loglevel to KERN_INFO(6), and default to KERN_NOTICE(5)
+echo "6 5" > /proc/sys/kernel/printk
+# kill child processes and remove interfaces on exit
+trap cleanup_children EXIT
+
+# Create one namespace and two interfaces
+set_network
+# Create a dynamic target for netconsole
+create_dynamic_target
+# Set userdata "key" with the "value" value
+set_user_data
+
+# Start userdata read loop (printk)
+loop_print_msg &
+child1=$!
+
+# Start userdata write loop
+loop_set_userdata &
+child2=$!
+
+# Start socat to listen for netconsole messages and check for corrupted userdata.
+MAX_COUNT=10000
+i=0
+while read -r line; do
+ if [ $i -ge $MAX_COUNT ]; then
+ echo "Test passed."
+ exit "${ksft_pass}"
+ fi
+
+ if [[ "$line" == "key=A"* && "$line" == *"B"* ||
+ "$line" == "key=B"* && "$line" == *"A"* ]]; then
+ echo "Test failed. Found corrupted userdata: $line"
+ exit "${ksft_fail}"
+ fi
+
+ i=$((i + 1))
+done < <(listen_port_and_save_to ${OUTPUT_FILE} 2> /dev/null)
+
+echo "socat died before we could check $MAX_COUNT messages. Skipping test."
+exit "${ksft_skip}"
--
2.47.3
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: [PATCH net v2 0/2] netconsole: Fix userdata race condition
2025-10-22 17:39 [PATCH net v2 0/2] netconsole: Fix userdata race condition Gustavo Luiz Duarte
2025-10-22 17:39 ` [PATCH net v2 1/2] netconsole: Fix race condition in between reader and writer of userdata Gustavo Luiz Duarte
2025-10-22 17:39 ` [PATCH net v2 2/2] selftests: netconsole: Add race condition test for userdata corruption Gustavo Luiz Duarte
@ 2025-10-23 1:01 ` Jakub Kicinski
2025-10-24 21:10 ` Gustavo Luiz Duarte
2 siblings, 1 reply; 7+ messages in thread
From: Jakub Kicinski @ 2025-10-23 1:01 UTC (permalink / raw)
To: Gustavo Luiz Duarte
Cc: Andre Carvalho, Simon Horman, Breno Leitao, Andrew Lunn,
David S. Miller, Eric Dumazet, Paolo Abeni, Matthew Wood,
Shuah Khan, netdev, linux-kernel, linux-kselftest
On Wed, 22 Oct 2025 10:39:56 -0700 Gustavo Luiz Duarte wrote:
> This series fixes a race condition in netconsole's userdata handling
> where concurrent message transmission could read partially updated
> userdata fields, resulting in corrupted netconsole output.
>
> The first patch adds a selftest that reproduces the race condition by
> continuously sending messages while rapidly changing userdata values,
> detecting any torn reads in the output.
>
> The second patch fixes the issue by ensuring update_userdata() holds
> the target_list_lock while updating both extradata_complete and
> userdata_length, preventing readers from seeing inconsistent state.
>
> This targets net tree as it fixes a bug introduced in commit df03f830d099
> ("net: netconsole: cache userdata formatted string in netconsole_target").
This test is skipping on debug kernel builds in netdev CI.
TAP version 13
1..1
# overriding timeout to 360
# selftests: drivers/net: netcons_race_userdata.sh
# socat died before we could check 10000 messages. Skipping test.
ok 1 selftests: drivers/net: netcons_race_userdata.sh # SKIP
We can't have skips for SW tests.
I think Breno was fighting with a similar problem in the past.
Not sure what he ended up doing. Maybe just leave it at the print?
Don't actually mark the test as skipped?
Slightly more advanced option is to only do that if KSFT_MACHINE_SLOW
per:
https://github.com/linux-netdev/nipa/wiki/How-to-run-netdev-selftests-CI-style#dealing-with-slow-runners-in-performancelatency-tests
--
pw-bot: cr
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH net v2 0/2] netconsole: Fix userdata race condition
2025-10-23 1:01 ` [PATCH net v2 0/2] netconsole: Fix userdata race condition Jakub Kicinski
@ 2025-10-24 21:10 ` Gustavo Luiz Duarte
2025-10-24 23:42 ` Jakub Kicinski
0 siblings, 1 reply; 7+ messages in thread
From: Gustavo Luiz Duarte @ 2025-10-24 21:10 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Andre Carvalho, Simon Horman, Breno Leitao, Andrew Lunn,
David S. Miller, Eric Dumazet, Paolo Abeni, Matthew Wood,
Shuah Khan, netdev, linux-kernel, linux-kselftest
On Wed, Oct 22, 2025 at 10:01 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Wed, 22 Oct 2025 10:39:56 -0700 Gustavo Luiz Duarte wrote:
> > This series fixes a race condition in netconsole's userdata handling
> > where concurrent message transmission could read partially updated
> > userdata fields, resulting in corrupted netconsole output.
> >
> > The first patch adds a selftest that reproduces the race condition by
> > continuously sending messages while rapidly changing userdata values,
> > detecting any torn reads in the output.
> >
> > The second patch fixes the issue by ensuring update_userdata() holds
> > the target_list_lock while updating both extradata_complete and
> > userdata_length, preventing readers from seeing inconsistent state.
> >
> > This targets net tree as it fixes a bug introduced in commit df03f830d099
> > ("net: netconsole: cache userdata formatted string in netconsole_target").
>
> This test is skipping on debug kernel builds in netdev CI.
>
> TAP version 13
> 1..1
> # overriding timeout to 360
> # selftests: drivers/net: netcons_race_userdata.sh
> # socat died before we could check 10000 messages. Skipping test.
> ok 1 selftests: drivers/net: netcons_race_userdata.sh # SKIP
>
> We can't have skips for SW tests.
>
> I think Breno was fighting with a similar problem in the past.
> Not sure what he ended up doing. Maybe just leave it at the print?
> Don't actually mark the test as skipped?
>
> Slightly more advanced option is to only do that if KSFT_MACHINE_SLOW
> per:
> https://github.com/linux-netdev/nipa/wiki/How-to-run-netdev-selftests-CI-style#dealing-with-slow-runners-in-performancelatency-tests
There are two reasons for hitting this skip.
1. The hardcoded 2s timeout in listen_port_and_save_to() expired
2. socat died or failed to start for mysterious reasons
#1 should probably be a success (we ran the test for this long and no
corruption found), and for #2 we can try to return whatever exit code
socat give us.
Retrieving socat return code is a bit tricky because we are running it
in a subshell, but we can save it in a temp file.
I can also send a follow up patch to use a longer timeout in
listen_port_and_save_to() if KSFT_MACHINE_SLOW
> --
> pw-bot: cr
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH net v2 0/2] netconsole: Fix userdata race condition
2025-10-24 21:10 ` Gustavo Luiz Duarte
@ 2025-10-24 23:42 ` Jakub Kicinski
2025-10-27 15:56 ` Gustavo Luiz Duarte
0 siblings, 1 reply; 7+ messages in thread
From: Jakub Kicinski @ 2025-10-24 23:42 UTC (permalink / raw)
To: Gustavo Luiz Duarte
Cc: Andre Carvalho, Simon Horman, Breno Leitao, Andrew Lunn,
David S. Miller, Eric Dumazet, Paolo Abeni, Matthew Wood,
Shuah Khan, netdev, linux-kernel, linux-kselftest
On Fri, 24 Oct 2025 18:10:06 -0300 Gustavo Luiz Duarte wrote:
> There are two reasons for hitting this skip.
> 1. The hardcoded 2s timeout in listen_port_and_save_to() expired
> 2. socat died or failed to start for mysterious reasons
>
> #1 should probably be a success (we ran the test for this long and no
> corruption found), and for #2 we can try to return whatever exit code
> socat give us.
> Retrieving socat return code is a bit tricky because we are running it
> in a subshell, but we can save it in a temp file.
>
> I can also send a follow up patch to use a longer timeout in
> listen_port_and_save_to() if KSFT_MACHINE_SLOW
Frankly I'm not sure this test is worth the compute cycles it will burn.
It's a direct repro for a very specific problem. The changes it will
occur again for the same field a pretty low. Maybe just repost patch 1?
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net v2 0/2] netconsole: Fix userdata race condition
2025-10-24 23:42 ` Jakub Kicinski
@ 2025-10-27 15:56 ` Gustavo Luiz Duarte
0 siblings, 0 replies; 7+ messages in thread
From: Gustavo Luiz Duarte @ 2025-10-27 15:56 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Andre Carvalho, Simon Horman, Breno Leitao, Andrew Lunn,
David S. Miller, Eric Dumazet, Paolo Abeni, Matthew Wood,
Shuah Khan, netdev, linux-kernel, linux-kselftest
On Fri, Oct 24, 2025 at 8:42 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> Frankly I'm not sure this test is worth the compute cycles it will burn.
> It's a direct repro for a very specific problem. The changes it will
> occur again for the same field a pretty low. Maybe just repost patch 1?
Fair enough. I will repost with only patch 1 then.
^ permalink raw reply [flat|nested] 7+ messages in thread