From: Shuhao Fu <sfual@cse.ust.hk>
To: Robin van der Gracht <robin@protonic.nl>,
Oleksij Rempel <o.rempel@pengutronix.de>,
linux-can@vger.kernel.org
Cc: kernel@pengutronix.de, Oliver Hartkopp <socketcan@hartkopp.net>,
Marc Kleine-Budde <mkl@pengutronix.de>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH net] can: j1939: fix lockless local-destination check
Date: Sun, 19 Apr 2026 22:46:00 +0800 [thread overview]
Message-ID: <20260419144600.GA4091724@chcpu16> (raw)
In-Reply-To: <20260419140614.GA4041240@chcpu16>
Hi,
Here is the userspace-triggered KCSAN setup I used locally to reproduce
the warning.
This is with a small KCSAN-only repro setup that makes the race easier
to observe, and the mechanism is explicit. On the writer side, right
after the real `priv->ents[0x80].nusers` update in
`j1939_local_ecu_get()` / `j1939_local_ecu_put()`, the test hook calls
`kcsan_check_write()` on that exact `nusers` slot, sets a
`writer_armed` flag, and spins for up to 20000 `cpu_relax()` iterations
waiting for a reader to show up. On the reader side, right before the
real `priv->ents[0x80].nusers` load in `j1939_tp_send()`, the matching
hook checks `writer_armed`, sets `reader_seen`, and calls
`kcsan_check_read()` on the same address. So the repro does not invent a
fake field or a fake access path; it keeps the real writer visible for
longer and asks KCSAN to watch the exact `nusers` slot, which makes the
existing race much easier to catch.
1. Build the kernel with the dedicated J1939 KCSAN config:
./tools/testing/kunit/kunit.py build \
--arch=x86_64 \
--kunitconfig=kernel/kcsan/j1939_userspace_race.kunitconfig \
--build_dir=../out-j1939-userspace-kcsan \
--make_options CC=clang-20 \
--make_options LD=ld.bfd \
--make_options AR=llvm-ar-20 \
--make_options NM=llvm-nm-20 \
--make_options OBJCOPY=llvm-objcopy-20 \
--make_options READELF=llvm-readelf-20 \
--make_options LLVM_IAS=1 \
--jobs 8
2. Prepare a minimal initramfs with a static userspace helper and an
`/init` script that enables KCSAN, narrows reporting to the J1939
paths of interest, and runs the helper:
mkdir -p /tmp/j1939-kcsan-userspace/initramfs/{bin,sbin,usr/bin,usr/sbin,proc,sys,dev,tmp}
gcc -static -O2 -Wall -Wextra -pthread \
-o /tmp/j1939-kcsan-userspace/initramfs/bin/j1939_kcsan_repro \
tools/testing/selftests/net/can/j1939_kcsan_repro.c
cp /usr/bin/busybox /tmp/j1939-kcsan-userspace/initramfs/bin/busybox
for app in sh mount cat echo sync mkdir poweroff reboot; do
ln -sf busybox /tmp/j1939-kcsan-userspace/initramfs/bin/$app
done
cp tools/testing/selftests/net/can/j1939_kcsan_init.sh \
/tmp/j1939-kcsan-userspace/initramfs/init
chmod +x /tmp/j1939-kcsan-userspace/initramfs/init
(cd /tmp/j1939-kcsan-userspace/initramfs && \
find . -print0 | cpio --null -o --format=newc | gzip -9 > ../initramfs.cpio.gz)
3. Boot the guest under QEMU:
timeout 180 qemu-system-x86_64 \
-accel tcg \
-smp 4 \
-m 2048 \
-kernel out-j1939-userspace-kcsan/arch/x86/boot/bzImage \
-initrd /tmp/j1939-kcsan-userspace/initramfs.cpio.gz \
-append "console=ttyS0 rdinit=/init loglevel=7 ignore_loglevel panic=-1 kunit.enable=0 kcsan.early_enable=0" \
-nographic \
-no-reboot
4. The userspace helper then creates and brings up `vcan0`, runs one
thread that repeatedly opens/binds/closes a J1939 socket on source
address `0x80`, and runs two sender threads that bind to `0x81`,
connect to destination `0x80`, and send 64-byte payloads so the
kernel takes the TP path.
The exact trigger code is in
`tools/testing/selftests/net/can/j1939_kcsan_repro.c`. The core of it
is:
static void *writer_thread(void *arg)
{
while (!stop_flag) {
int fd = open_j1939_socket();
bind_writer_socket(fd, WRITER_SRC_ADDR);
close(fd);
}
}
static void *sender_thread(void *arg)
{
fd = open_j1939_socket();
bind_sender_socket(fd, SENDER_SRC_ADDR);
connect_sender_socket(fd, DEST_ADDR);
while (!stop_flag)
send(fd, payload, sizeof(payload), 0);
}
int main(void)
{
pthread_create(&writer, NULL, writer_thread, NULL);
pthread_create(&sender1, NULL, sender_thread, NULL);
pthread_create(&sender2, NULL, sender_thread, NULL);
nanosleep(&req, NULL);
}
With that setup I reproduced:
BUG: KCSAN: data-race in j1939_local_ecu_put / j1939_tp_send
In the actual rerun log I got locally, this was not a one-off hit. The
same warning appeared twice in one run, first at line 9430 and then
again at line 9625, both on the same 4-byte address
`0xffffa432c216c828`.
The first hit was:
read-write to 0xffffa432c216c828 of 4 bytes by task 66 on cpu 2:
j1939_local_ecu_put+0x50/0x2d0
j1939_sk_release+0x2e2/0x450
sock_close+0x57/0x120
__fput+0x218/0x480
fput_close_sync+0x9c/0x130
__x64_sys_close+0x51/0x90
read to 0xffffa432c216c828 of 4 bytes by task 68 on cpu 3:
j1939_tp_send+0x1ae/0x3d0
j1939_sk_sendmsg+0x57b/0x8a0
__sys_sendto+0x274/0x280
__x64_sys_sendto+0x71/0x90
x64_sys_call+0x2d35/0x3020
do_syscall_64+0xc7/0x300
The second hit in the same run reported the same pair again:
read-write to 0xffffa432c216c828 of 4 bytes by task 66 on cpu 2:
j1939_local_ecu_put+0x50/0x2d0
j1939_sk_release+0x2e2/0x450
sock_close+0x57/0x120
__fput+0x218/0x480
read to 0xffffa432c216c828 of 4 bytes by task 68 on cpu 3:
j1939_tp_send+0x1ae/0x3d0
j1939_sk_sendmsg+0x57b/0x8a0
__sys_sendto+0x274/0x280
__x64_sys_sendto+0x71/0x90
The outer `timeout` eventually kills QEMU, but these KCSAN reports are
emitted well before that.
Thanks,
Shuhao
On Sun, Apr 19, 2026 at 10:06:35PM +0800, Shuhao Fu wrote:
> j1939_priv.ents[].nusers is documented as protected by priv->lock, and
> its updates already happen under that lock. j1939_can_recv() also reads
> it under read_lock_bh(). However, j1939_session_skb_queue() and
> j1939_tp_send() still read priv->ents[da].nusers without taking the
> lock.
>
> Those transport-side checks decide whether to set J1939_ECU_LOCAL_DST, so
> they can race with j1939_local_ecu_get() and j1939_local_ecu_put() while
> userspace is binding or releasing sockets concurrently with TP traffic.
> This can misclassify TP/ETP sessions as local or remote and take the wrong
> transport path.
>
> Fix both transport paths by routing the destination-locality check through
> a helper that reads ents[].nusers under read_lock_bh(&priv->lock).
>
> Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol")
> Signed-off-by: Shuhao Fu <sfual@cse.ust.hk>
> ---
> net/can/j1939/transport.c | 18 ++++++++++++++----
> 1 file changed, 14 insertions(+), 4 deletions(-)
>
> diff --git a/net/can/j1939/transport.c b/net/can/j1939/transport.c
> index df93d57907da7e..8a31cb23bc76d0 100644
> --- a/net/can/j1939/transport.c
> +++ b/net/can/j1939/transport.c
> @@ -351,6 +351,18 @@ static void j1939_session_skb_drop_old(struct j1939_session *session)
> }
> }
>
> +static bool j1939_address_is_local(struct j1939_priv *priv, u8 addr)
> +{
> + bool local = false;
> +
> + read_lock_bh(&priv->lock);
> + if (j1939_address_is_unicast(addr) && priv->ents[addr].nusers)
> + local = true;
> + read_unlock_bh(&priv->lock);
> +
> + return local;
> +}
> +
> void j1939_session_skb_queue(struct j1939_session *session,
> struct sk_buff *skb)
> {
> @@ -359,8 +371,7 @@ void j1939_session_skb_queue(struct j1939_session *session,
>
> j1939_ac_fixup(priv, skb);
>
> - if (j1939_address_is_unicast(skcb->addr.da) &&
> - priv->ents[skcb->addr.da].nusers)
> + if (j1939_address_is_local(priv, skcb->addr.da))
> skcb->flags |= J1939_ECU_LOCAL_DST;
>
> skcb->flags |= J1939_ECU_LOCAL_SRC;
> @@ -2038,8 +2049,7 @@ struct j1939_session *j1939_tp_send(struct j1939_priv *priv,
> return ERR_PTR(ret);
>
> /* fix DST flags, it may be used there soon */
> - if (j1939_address_is_unicast(skcb->addr.da) &&
> - priv->ents[skcb->addr.da].nusers)
> + if (j1939_address_is_local(priv, skcb->addr.da))
> skcb->flags |= J1939_ECU_LOCAL_DST;
>
> /* src is always local, I'm sending ... */
> --
> 2.25.1
next prev parent reply other threads:[~2026-04-19 14:46 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-19 14:06 [PATCH net] can: j1939: fix lockless local-destination check Shuhao Fu
2026-04-19 14:46 ` Shuhao Fu [this message]
2026-04-20 12:18 ` Oleksij Rempel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260419144600.GA4091724@chcpu16 \
--to=sfual@cse.ust.hk \
--cc=kernel@pengutronix.de \
--cc=linux-can@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mkl@pengutronix.de \
--cc=o.rempel@pengutronix.de \
--cc=robin@protonic.nl \
--cc=socketcan@hartkopp.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox