All of lore.kernel.org
 help / color / mirror / Atom feed
From: Shuhao Fu <sfual@cse.ust.hk>
To: Robin van der Gracht <robin@protonic.nl>,
	Oleksij Rempel <o.rempel@pengutronix.de>,
	linux-can@vger.kernel.org
Cc: kernel@pengutronix.de, Oliver Hartkopp <socketcan@hartkopp.net>,
	Marc Kleine-Budde <mkl@pengutronix.de>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH net] can: j1939: fix lockless local-destination check
Date: Sun, 19 Apr 2026 22:46:00 +0800	[thread overview]
Message-ID: <20260419144600.GA4091724@chcpu16> (raw)
In-Reply-To: <20260419140614.GA4041240@chcpu16>

Hi,

Here is the userspace-triggered KCSAN setup I used locally to reproduce
the warning.

This is with a small KCSAN-only repro setup that makes the race easier
to observe, and the mechanism is explicit. On the writer side, right
after the real `priv->ents[0x80].nusers` update in
`j1939_local_ecu_get()` / `j1939_local_ecu_put()`, the test hook calls
`kcsan_check_write()` on that exact `nusers` slot, sets a
`writer_armed` flag, and spins for up to 20000 `cpu_relax()` iterations
waiting for a reader to show up. On the reader side, right before the
real `priv->ents[0x80].nusers` load in `j1939_tp_send()`, the matching
hook checks `writer_armed`, sets `reader_seen`, and calls
`kcsan_check_read()` on the same address. So the repro does not invent a
fake field or a fake access path; it keeps the real writer visible for
longer and asks KCSAN to watch the exact `nusers` slot, which makes the
existing race much easier to catch.

1. Build the kernel with the dedicated J1939 KCSAN config:

   ./tools/testing/kunit/kunit.py build \
     --arch=x86_64 \
     --kunitconfig=kernel/kcsan/j1939_userspace_race.kunitconfig \
     --build_dir=../out-j1939-userspace-kcsan \
     --make_options CC=clang-20 \
     --make_options LD=ld.bfd \
     --make_options AR=llvm-ar-20 \
     --make_options NM=llvm-nm-20 \
     --make_options OBJCOPY=llvm-objcopy-20 \
     --make_options READELF=llvm-readelf-20 \
     --make_options LLVM_IAS=1 \
     --jobs 8

2. Prepare a minimal initramfs with a static userspace helper and an
   `/init` script that enables KCSAN, narrows reporting to the J1939
   paths of interest, and runs the helper:

   mkdir -p /tmp/j1939-kcsan-userspace/initramfs/{bin,sbin,usr/bin,usr/sbin,proc,sys,dev,tmp}
   gcc -static -O2 -Wall -Wextra -pthread \
     -o /tmp/j1939-kcsan-userspace/initramfs/bin/j1939_kcsan_repro \
     tools/testing/selftests/net/can/j1939_kcsan_repro.c
   cp /usr/bin/busybox /tmp/j1939-kcsan-userspace/initramfs/bin/busybox
   for app in sh mount cat echo sync mkdir poweroff reboot; do
     ln -sf busybox /tmp/j1939-kcsan-userspace/initramfs/bin/$app
   done
   cp tools/testing/selftests/net/can/j1939_kcsan_init.sh \
     /tmp/j1939-kcsan-userspace/initramfs/init
   chmod +x /tmp/j1939-kcsan-userspace/initramfs/init
   (cd /tmp/j1939-kcsan-userspace/initramfs && \
     find . -print0 | cpio --null -o --format=newc | gzip -9 > ../initramfs.cpio.gz)

3. Boot the guest under QEMU:

   timeout 180 qemu-system-x86_64 \
     -accel tcg \
     -smp 4 \
     -m 2048 \
     -kernel out-j1939-userspace-kcsan/arch/x86/boot/bzImage \
     -initrd /tmp/j1939-kcsan-userspace/initramfs.cpio.gz \
     -append "console=ttyS0 rdinit=/init loglevel=7 ignore_loglevel panic=-1 kunit.enable=0 kcsan.early_enable=0" \
     -nographic \
     -no-reboot

4. The userspace helper then creates and brings up `vcan0`, runs one
   thread that repeatedly opens/binds/closes a J1939 socket on source
   address `0x80`, and runs two sender threads that bind to `0x81`,
   connect to destination `0x80`, and send 64-byte payloads so the
   kernel takes the TP path.

The exact trigger code is in
`tools/testing/selftests/net/can/j1939_kcsan_repro.c`. The core of it
is:

   static void *writer_thread(void *arg)
   {
           while (!stop_flag) {
                   int fd = open_j1939_socket();

                   bind_writer_socket(fd, WRITER_SRC_ADDR);
                   close(fd);
           }
   }

   static void *sender_thread(void *arg)
   {
           fd = open_j1939_socket();
           bind_sender_socket(fd, SENDER_SRC_ADDR);
           connect_sender_socket(fd, DEST_ADDR);

           while (!stop_flag)
                   send(fd, payload, sizeof(payload), 0);
   }

   int main(void)
   {
           pthread_create(&writer, NULL, writer_thread, NULL);
           pthread_create(&sender1, NULL, sender_thread, NULL);
           pthread_create(&sender2, NULL, sender_thread, NULL);
           nanosleep(&req, NULL);
   }

With that setup I reproduced:

   BUG: KCSAN: data-race in j1939_local_ecu_put / j1939_tp_send

In the actual rerun log I got locally, this was not a one-off hit. The
same warning appeared twice in one run, first at line 9430 and then
again at line 9625, both on the same 4-byte address
`0xffffa432c216c828`.

The first hit was:

   read-write to 0xffffa432c216c828 of 4 bytes by task 66 on cpu 2:
    j1939_local_ecu_put+0x50/0x2d0
    j1939_sk_release+0x2e2/0x450
    sock_close+0x57/0x120
    __fput+0x218/0x480
    fput_close_sync+0x9c/0x130
    __x64_sys_close+0x51/0x90

   read to 0xffffa432c216c828 of 4 bytes by task 68 on cpu 3:
    j1939_tp_send+0x1ae/0x3d0
    j1939_sk_sendmsg+0x57b/0x8a0
    __sys_sendto+0x274/0x280
    __x64_sys_sendto+0x71/0x90
    x64_sys_call+0x2d35/0x3020
    do_syscall_64+0xc7/0x300

The second hit in the same run reported the same pair again:

   read-write to 0xffffa432c216c828 of 4 bytes by task 66 on cpu 2:
    j1939_local_ecu_put+0x50/0x2d0
    j1939_sk_release+0x2e2/0x450
    sock_close+0x57/0x120
    __fput+0x218/0x480

   read to 0xffffa432c216c828 of 4 bytes by task 68 on cpu 3:
    j1939_tp_send+0x1ae/0x3d0
    j1939_sk_sendmsg+0x57b/0x8a0
    __sys_sendto+0x274/0x280
    __x64_sys_sendto+0x71/0x90

The outer `timeout` eventually kills QEMU, but these KCSAN reports are
emitted well before that.

Thanks,
Shuhao


On Sun, Apr 19, 2026 at 10:06:35PM +0800, Shuhao Fu wrote:
> j1939_priv.ents[].nusers is documented as protected by priv->lock, and
> its updates already happen under that lock. j1939_can_recv() also reads
> it under read_lock_bh(). However, j1939_session_skb_queue() and
> j1939_tp_send() still read priv->ents[da].nusers without taking the
> lock.
> 
> Those transport-side checks decide whether to set J1939_ECU_LOCAL_DST, so
> they can race with j1939_local_ecu_get() and j1939_local_ecu_put() while
> userspace is binding or releasing sockets concurrently with TP traffic.
> This can misclassify TP/ETP sessions as local or remote and take the wrong
> transport path.
> 
> Fix both transport paths by routing the destination-locality check through
> a helper that reads ents[].nusers under read_lock_bh(&priv->lock).
> 
> Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol")
> Signed-off-by: Shuhao Fu <sfual@cse.ust.hk>
> ---
>  net/can/j1939/transport.c | 18 ++++++++++++++----
>  1 file changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/net/can/j1939/transport.c b/net/can/j1939/transport.c
> index df93d57907da7e..8a31cb23bc76d0 100644
> --- a/net/can/j1939/transport.c
> +++ b/net/can/j1939/transport.c
> @@ -351,6 +351,18 @@ static void j1939_session_skb_drop_old(struct j1939_session *session)
>  	}
>  }
>  
> +static bool j1939_address_is_local(struct j1939_priv *priv, u8 addr)
> +{
> +	bool local = false;
> +
> +	read_lock_bh(&priv->lock);
> +	if (j1939_address_is_unicast(addr) && priv->ents[addr].nusers)
> +		local = true;
> +	read_unlock_bh(&priv->lock);
> +
> +	return local;
> +}
> +
>  void j1939_session_skb_queue(struct j1939_session *session,
>  			     struct sk_buff *skb)
>  {
> @@ -359,8 +371,7 @@ void j1939_session_skb_queue(struct j1939_session *session,
>  
>  	j1939_ac_fixup(priv, skb);
>  
> -	if (j1939_address_is_unicast(skcb->addr.da) &&
> -	    priv->ents[skcb->addr.da].nusers)
> +	if (j1939_address_is_local(priv, skcb->addr.da))
>  		skcb->flags |= J1939_ECU_LOCAL_DST;
>  
>  	skcb->flags |= J1939_ECU_LOCAL_SRC;
> @@ -2038,8 +2049,7 @@ struct j1939_session *j1939_tp_send(struct j1939_priv *priv,
>  		return ERR_PTR(ret);
>  
>  	/* fix DST flags, it may be used there soon */
> -	if (j1939_address_is_unicast(skcb->addr.da) &&
> -	    priv->ents[skcb->addr.da].nusers)
> +	if (j1939_address_is_local(priv, skcb->addr.da))
>  		skcb->flags |= J1939_ECU_LOCAL_DST;
>  
>  	/* src is always local, I'm sending ... */
> -- 
> 2.25.1

  reply	other threads:[~2026-04-19 14:46 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-19 14:06 [PATCH net] can: j1939: fix lockless local-destination check Shuhao Fu
2026-04-19 14:46 ` Shuhao Fu [this message]
2026-04-20 12:18 ` Oleksij Rempel
2026-05-06 12:57 ` Marc Kleine-Budde

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260419144600.GA4091724@chcpu16 \
    --to=sfual@cse.ust.hk \
    --cc=kernel@pengutronix.de \
    --cc=linux-can@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mkl@pengutronix.de \
    --cc=o.rempel@pengutronix.de \
    --cc=robin@protonic.nl \
    --cc=socketcan@hartkopp.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.