All of lore.kernel.org
 help / color / mirror / Atom feed
From: Shuhao Fu <sfual@cse.ust.hk>
To: Mauro Carvalho Chehab <mchehab@kernel.org>, linux-media@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Subject: Re: [PATCH] media: rzg2l-cru: serialize state transitions with qlock
Date: Tue, 21 Apr 2026 14:27:08 +0800	[thread overview]
Message-ID: <20260421062708.GA2548544@chcpu16> (raw)
In-Reply-To: <20260421060307.GA2522920@chcpu16>

Hi,

Here is the best reproduction detail I could put together locally.

From source review, I think there are two windows where process-context
state updates can overlap the IRQ handler's reads of `cru->state`:

- on streamoff, `rzg2l_cru_stop_streaming()` stores `STOPPING` before
  it calls `rzg2l_cru_set_stream(cru, 0)`, while interrupts are only
  disabled later in `rzg2l_cru_stop_image_processing()`
- on streamon, `rzg2l_cru_start_image_processing()` enables interrupts
  before `rzg2l_cru_set_stream(cru, 1)` returns, while
  `rzg2l_cru_start_streaming_vq()` stores `STARTING` only after that

I do not have an RZ/G2L board or an arm64/QEMU model for this CRU
block, so I could not reproduce either path from a real userspace V4L2
stream on actual hardware. The setup below is only the best local
reference proof I could produce in this environment. It is not a claim
of a natural hardware-backed repro.

Locally, I targeted the streamoff-side `STOPPING` vs IRQ overlap.

1. Build a KCSAN/KUnit kernel with the dedicated config fragment:

   ./tools/testing/kunit/kunit.py build \
     --arch=x86_64 \
     --kunitconfig=kernel/kcsan/rzg2l_cru.kunitconfig \
     --build_dir=../out-rzg2l-kunit-red2 \
     --make_options CC=clang-20 \
     --make_options LD=ld.bfd \
     --make_options AR=llvm-ar-20 \
     --make_options NM=llvm-nm-20 \
     --make_options OBJCOPY=llvm-objcopy-20 \
     --make_options READELF=llvm-readelf-20 \
     --make_options LLVM_IAS=1 \
     --jobs 8

2. Boot that kernel under QEMU:

   timeout 90 qemu-system-x86_64 \
     -m 1024 \
     -kernel out-rzg2l-kunit-red2/arch/x86/boot/bzImage \
     -append 'kunit.filter_glob=kcsan.test_rzg2l_cru_state_stop_vs_irq* kunit.enable=1 console=ttyS0 kunit_shutdown=reboot' \
     -no-reboot \
     -nographic \
     -accel tcg \
     -smp 4

3. The KUnit/KCSAN test creates a fake `rzg2l_cru_dev`, records the
   address of `cru->state`, and then runs two worker sides concurrently:

   - writer side: `test_rzg2l_cru_kunit_stop()`, which just calls the
     real `rzg2l_cru_stop_streaming()`
   - reader side: `test_rzg2l_cru_kunit_irq()`, which seeds minimal
     fake MMIO state and then calls the real `rzg2l_cru_irq()`

So the harness does not invent a fake state variable or a fake reader.
It only provides enough fake object/MMIO state for the real driver code
to run on x86 and reproduce the stop-side overlap in a controlled way.

With that setup I got repeated KCSAN reports of:

   BUG: KCSAN: data-race in rzg2l_cru_irq / test_rzg2l_cru_kunit_stop

The first hit in my local log was:

   write to 0xffff9bd4c1c03cf4 of 4 bytes by task 54 on cpu 0:
    test_rzg2l_cru_kunit_stop+0x14/0x30
    test_kernel_rzg2l_cru_stop+0x20/0x30
    access_thread+0x93/0xe0

   read to 0xffff9bd4c1c03cf4 of 4 bytes by task 53 on cpu 3:
    rzg2l_cru_irq+0x110/0x2d0
    test_rzg2l_cru_kunit_irq+0x4d/0x60
    test_kernel_rzg2l_cru_irq+0x20/0x30

The same run then hit the same race again in the 3-thread and 4-thread
variants, still on the same 4-byte `state` address and still with the
same writer/reader pair.

Thanks,
Shuhao


On Tue, Apr 21, 2026 at 02:03:26PM +0800, Shuhao Fu wrote:
> struct rzg2l_cru_dev.state is documented as protected by qlock, and the
> IRQ path already reads and updates it under that lock. However,
> rzg2l_cru_stop_streaming() writes STOPPING and
> rzg2l_cru_start_streaming_vq() writes STARTING without taking qlock.
> 
> That lets process-context stream control race with rzg2l_cru_irq().
> If the IRQ handler misses a concurrent STOPPING update, it can continue
> normal frame completion and slot refill after streamoff has begun. A
> similar race around STARTING can make the IRQ path observe the wrong
> phase during startup synchronization.
> 
> Fix both state transitions by serializing the writes with qlock, while
> still keeping rzg2l_cru_set_stream() outside the locked region.

  reply	other threads:[~2026-04-21  6:27 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-21  6:03 [PATCH] media: rzg2l-cru: serialize state transitions with qlock Shuhao Fu
2026-04-21  6:27 ` Shuhao Fu [this message]
2026-04-22  6:46 ` Jacopo Mondi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260421062708.GA2548544@chcpu16 \
    --to=sfual@cse.ust.hk \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=mchehab@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.