public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Shuhao Fu <sfual@cse.ust.hk>
To: Robin van der Gracht <robin@protonic.nl>,
	Oleksij Rempel <o.rempel@pengutronix.de>,
	linux-can@vger.kernel.org
Cc: kernel@pengutronix.de, Oliver Hartkopp <socketcan@hartkopp.net>,
	Marc Kleine-Budde <mkl@pengutronix.de>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH net] can: j1939: fix lockless local-destination check
Date: Sun, 19 Apr 2026 22:46:00 +0800	[thread overview]
Message-ID: <20260419144600.GA4091724@chcpu16> (raw)
In-Reply-To: <20260419140614.GA4041240@chcpu16>

Hi,

Here is the userspace-triggered KCSAN setup I used locally to reproduce
the warning.

This is with a small KCSAN-only repro setup that makes the race easier
to observe, and the mechanism is explicit. On the writer side, right
after the real `priv->ents[0x80].nusers` update in
`j1939_local_ecu_get()` / `j1939_local_ecu_put()`, the test hook calls
`kcsan_check_write()` on that exact `nusers` slot, sets a
`writer_armed` flag, and spins for up to 20000 `cpu_relax()` iterations
waiting for a reader to show up. On the reader side, right before the
real `priv->ents[0x80].nusers` load in `j1939_tp_send()`, the matching
hook checks `writer_armed`, sets `reader_seen`, and calls
`kcsan_check_read()` on the same address. So the repro does not invent a
fake field or a fake access path; it keeps the real writer visible for
longer and asks KCSAN to watch the exact `nusers` slot, which makes the
existing race much easier to catch.

1. Build the kernel with the dedicated J1939 KCSAN config:

   ./tools/testing/kunit/kunit.py build \
     --arch=x86_64 \
     --kunitconfig=kernel/kcsan/j1939_userspace_race.kunitconfig \
     --build_dir=../out-j1939-userspace-kcsan \
     --make_options CC=clang-20 \
     --make_options LD=ld.bfd \
     --make_options AR=llvm-ar-20 \
     --make_options NM=llvm-nm-20 \
     --make_options OBJCOPY=llvm-objcopy-20 \
     --make_options READELF=llvm-readelf-20 \
     --make_options LLVM_IAS=1 \
     --jobs 8

2. Prepare a minimal initramfs with a static userspace helper and an
   `/init` script that enables KCSAN, narrows reporting to the J1939
   paths of interest, and runs the helper:

   mkdir -p /tmp/j1939-kcsan-userspace/initramfs/{bin,sbin,usr/bin,usr/sbin,proc,sys,dev,tmp}
   gcc -static -O2 -Wall -Wextra -pthread \
     -o /tmp/j1939-kcsan-userspace/initramfs/bin/j1939_kcsan_repro \
     tools/testing/selftests/net/can/j1939_kcsan_repro.c
   cp /usr/bin/busybox /tmp/j1939-kcsan-userspace/initramfs/bin/busybox
   for app in sh mount cat echo sync mkdir poweroff reboot; do
     ln -sf busybox /tmp/j1939-kcsan-userspace/initramfs/bin/$app
   done
   cp tools/testing/selftests/net/can/j1939_kcsan_init.sh \
     /tmp/j1939-kcsan-userspace/initramfs/init
   chmod +x /tmp/j1939-kcsan-userspace/initramfs/init
   (cd /tmp/j1939-kcsan-userspace/initramfs && \
     find . -print0 | cpio --null -o --format=newc | gzip -9 > ../initramfs.cpio.gz)

3. Boot the guest under QEMU:

   timeout 180 qemu-system-x86_64 \
     -accel tcg \
     -smp 4 \
     -m 2048 \
     -kernel out-j1939-userspace-kcsan/arch/x86/boot/bzImage \
     -initrd /tmp/j1939-kcsan-userspace/initramfs.cpio.gz \
     -append "console=ttyS0 rdinit=/init loglevel=7 ignore_loglevel panic=-1 kunit.enable=0 kcsan.early_enable=0" \
     -nographic \
     -no-reboot

4. The userspace helper then creates and brings up `vcan0`, runs one
   thread that repeatedly opens/binds/closes a J1939 socket on source
   address `0x80`, and runs two sender threads that bind to `0x81`,
   connect to destination `0x80`, and send 64-byte payloads so the
   kernel takes the TP path.

The exact trigger code is in
`tools/testing/selftests/net/can/j1939_kcsan_repro.c`. The core of it
is:

   static void *writer_thread(void *arg)
   {
           while (!stop_flag) {
                   int fd = open_j1939_socket();

                   bind_writer_socket(fd, WRITER_SRC_ADDR);
                   close(fd);
           }
   }

   static void *sender_thread(void *arg)
   {
           fd = open_j1939_socket();
           bind_sender_socket(fd, SENDER_SRC_ADDR);
           connect_sender_socket(fd, DEST_ADDR);

           while (!stop_flag)
                   send(fd, payload, sizeof(payload), 0);
   }

   int main(void)
   {
           pthread_create(&writer, NULL, writer_thread, NULL);
           pthread_create(&sender1, NULL, sender_thread, NULL);
           pthread_create(&sender2, NULL, sender_thread, NULL);
           nanosleep(&req, NULL);
   }

With that setup I reproduced:

   BUG: KCSAN: data-race in j1939_local_ecu_put / j1939_tp_send

In the actual rerun log I got locally, this was not a one-off hit. The
same warning appeared twice in one run, first at line 9430 and then
again at line 9625, both on the same 4-byte address
`0xffffa432c216c828`.

The first hit was:

   read-write to 0xffffa432c216c828 of 4 bytes by task 66 on cpu 2:
    j1939_local_ecu_put+0x50/0x2d0
    j1939_sk_release+0x2e2/0x450
    sock_close+0x57/0x120
    __fput+0x218/0x480
    fput_close_sync+0x9c/0x130
    __x64_sys_close+0x51/0x90

   read to 0xffffa432c216c828 of 4 bytes by task 68 on cpu 3:
    j1939_tp_send+0x1ae/0x3d0
    j1939_sk_sendmsg+0x57b/0x8a0
    __sys_sendto+0x274/0x280
    __x64_sys_sendto+0x71/0x90
    x64_sys_call+0x2d35/0x3020
    do_syscall_64+0xc7/0x300

The second hit in the same run reported the same pair again:

   read-write to 0xffffa432c216c828 of 4 bytes by task 66 on cpu 2:
    j1939_local_ecu_put+0x50/0x2d0
    j1939_sk_release+0x2e2/0x450
    sock_close+0x57/0x120
    __fput+0x218/0x480

   read to 0xffffa432c216c828 of 4 bytes by task 68 on cpu 3:
    j1939_tp_send+0x1ae/0x3d0
    j1939_sk_sendmsg+0x57b/0x8a0
    __sys_sendto+0x274/0x280
    __x64_sys_sendto+0x71/0x90

The outer `timeout` eventually kills QEMU, but these KCSAN reports are
emitted well before that.

Thanks,
Shuhao


On Sun, Apr 19, 2026 at 10:06:35PM +0800, Shuhao Fu wrote:
> j1939_priv.ents[].nusers is documented as protected by priv->lock, and
> its updates already happen under that lock. j1939_can_recv() also reads
> it under read_lock_bh(). However, j1939_session_skb_queue() and
> j1939_tp_send() still read priv->ents[da].nusers without taking the
> lock.
> 
> Those transport-side checks decide whether to set J1939_ECU_LOCAL_DST, so
> they can race with j1939_local_ecu_get() and j1939_local_ecu_put() while
> userspace is binding or releasing sockets concurrently with TP traffic.
> This can misclassify TP/ETP sessions as local or remote and take the wrong
> transport path.
> 
> Fix both transport paths by routing the destination-locality check through
> a helper that reads ents[].nusers under read_lock_bh(&priv->lock).
> 
> Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol")
> Signed-off-by: Shuhao Fu <sfual@cse.ust.hk>
> ---
>  net/can/j1939/transport.c | 18 ++++++++++++++----
>  1 file changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/net/can/j1939/transport.c b/net/can/j1939/transport.c
> index df93d57907da7e..8a31cb23bc76d0 100644
> --- a/net/can/j1939/transport.c
> +++ b/net/can/j1939/transport.c
> @@ -351,6 +351,18 @@ static void j1939_session_skb_drop_old(struct j1939_session *session)
>  	}
>  }
>  
> +static bool j1939_address_is_local(struct j1939_priv *priv, u8 addr)
> +{
> +	bool local = false;
> +
> +	read_lock_bh(&priv->lock);
> +	if (j1939_address_is_unicast(addr) && priv->ents[addr].nusers)
> +		local = true;
> +	read_unlock_bh(&priv->lock);
> +
> +	return local;
> +}
> +
>  void j1939_session_skb_queue(struct j1939_session *session,
>  			     struct sk_buff *skb)
>  {
> @@ -359,8 +371,7 @@ void j1939_session_skb_queue(struct j1939_session *session,
>  
>  	j1939_ac_fixup(priv, skb);
>  
> -	if (j1939_address_is_unicast(skcb->addr.da) &&
> -	    priv->ents[skcb->addr.da].nusers)
> +	if (j1939_address_is_local(priv, skcb->addr.da))
>  		skcb->flags |= J1939_ECU_LOCAL_DST;
>  
>  	skcb->flags |= J1939_ECU_LOCAL_SRC;
> @@ -2038,8 +2049,7 @@ struct j1939_session *j1939_tp_send(struct j1939_priv *priv,
>  		return ERR_PTR(ret);
>  
>  	/* fix DST flags, it may be used there soon */
> -	if (j1939_address_is_unicast(skcb->addr.da) &&
> -	    priv->ents[skcb->addr.da].nusers)
> +	if (j1939_address_is_local(priv, skcb->addr.da))
>  		skcb->flags |= J1939_ECU_LOCAL_DST;
>  
>  	/* src is always local, I'm sending ... */
> -- 
> 2.25.1

  reply	other threads:[~2026-04-19 14:46 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-19 14:06 [PATCH net] can: j1939: fix lockless local-destination check Shuhao Fu
2026-04-19 14:46 ` Shuhao Fu [this message]
2026-04-20 12:18 ` Oleksij Rempel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260419144600.GA4091724@chcpu16 \
    --to=sfual@cse.ust.hk \
    --cc=kernel@pengutronix.de \
    --cc=linux-can@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mkl@pengutronix.de \
    --cc=o.rempel@pengutronix.de \
    --cc=robin@protonic.nl \
    --cc=socketcan@hartkopp.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox