public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Shuhao Fu <sfual@cse.ust.hk>
To: Mauro Carvalho Chehab <mchehab@kernel.org>, linux-media@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Subject: Re: [PATCH] media: rzg2l-cru: serialize state transitions with qlock
Date: Tue, 21 Apr 2026 14:27:08 +0800	[thread overview]
Message-ID: <20260421062708.GA2548544@chcpu16> (raw)
In-Reply-To: <20260421060307.GA2522920@chcpu16>

Hi,

Here is the best reproduction detail I could put together locally.

From source review, I think there are two windows where process-context
state updates can overlap the IRQ handler's reads of `cru->state`:

- on streamoff, `rzg2l_cru_stop_streaming()` stores `STOPPING` before
  it calls `rzg2l_cru_set_stream(cru, 0)`, while interrupts are only
  disabled later in `rzg2l_cru_stop_image_processing()`
- on streamon, `rzg2l_cru_start_image_processing()` enables interrupts
  before `rzg2l_cru_set_stream(cru, 1)` returns, while
  `rzg2l_cru_start_streaming_vq()` stores `STARTING` only after that

I do not have an RZ/G2L board or an arm64/QEMU model for this CRU
block, so I could not reproduce either path from a real userspace V4L2
stream on actual hardware. The setup below is only the best local
reference proof I could produce in this environment. It is not a claim
of a natural hardware-backed repro.

Locally, I targeted the streamoff-side `STOPPING` vs IRQ overlap.

1. Build a KCSAN/KUnit kernel with the dedicated config fragment:

   ./tools/testing/kunit/kunit.py build \
     --arch=x86_64 \
     --kunitconfig=kernel/kcsan/rzg2l_cru.kunitconfig \
     --build_dir=../out-rzg2l-kunit-red2 \
     --make_options CC=clang-20 \
     --make_options LD=ld.bfd \
     --make_options AR=llvm-ar-20 \
     --make_options NM=llvm-nm-20 \
     --make_options OBJCOPY=llvm-objcopy-20 \
     --make_options READELF=llvm-readelf-20 \
     --make_options LLVM_IAS=1 \
     --jobs 8

2. Boot that kernel under QEMU:

   timeout 90 qemu-system-x86_64 \
     -m 1024 \
     -kernel out-rzg2l-kunit-red2/arch/x86/boot/bzImage \
     -append 'kunit.filter_glob=kcsan.test_rzg2l_cru_state_stop_vs_irq* kunit.enable=1 console=ttyS0 kunit_shutdown=reboot' \
     -no-reboot \
     -nographic \
     -accel tcg \
     -smp 4

3. The KUnit/KCSAN test creates a fake `rzg2l_cru_dev`, records the
   address of `cru->state`, and then runs two worker sides concurrently:

   - writer side: `test_rzg2l_cru_kunit_stop()`, which just calls the
     real `rzg2l_cru_stop_streaming()`
   - reader side: `test_rzg2l_cru_kunit_irq()`, which seeds minimal
     fake MMIO state and then calls the real `rzg2l_cru_irq()`

So the harness does not invent a fake state variable or a fake reader.
It only provides enough fake object/MMIO state for the real driver code
to run on x86 and reproduce the stop-side overlap in a controlled way.

With that setup I got repeated KCSAN reports of:

   BUG: KCSAN: data-race in rzg2l_cru_irq / test_rzg2l_cru_kunit_stop

The first hit in my local log was:

   write to 0xffff9bd4c1c03cf4 of 4 bytes by task 54 on cpu 0:
    test_rzg2l_cru_kunit_stop+0x14/0x30
    test_kernel_rzg2l_cru_stop+0x20/0x30
    access_thread+0x93/0xe0

   read to 0xffff9bd4c1c03cf4 of 4 bytes by task 53 on cpu 3:
    rzg2l_cru_irq+0x110/0x2d0
    test_rzg2l_cru_kunit_irq+0x4d/0x60
    test_kernel_rzg2l_cru_irq+0x20/0x30

The same run then hit the same race again in the 3-thread and 4-thread
variants, still on the same 4-byte `state` address and still with the
same writer/reader pair.

Thanks,
Shuhao


On Tue, Apr 21, 2026 at 02:03:26PM +0800, Shuhao Fu wrote:
> struct rzg2l_cru_dev.state is documented as protected by qlock, and the
> IRQ path already reads and updates it under that lock. However,
> rzg2l_cru_stop_streaming() writes STOPPING and
> rzg2l_cru_start_streaming_vq() writes STARTING without taking qlock.
> 
> That lets process-context stream control race with rzg2l_cru_irq().
> If the IRQ handler misses a concurrent STOPPING update, it can continue
> normal frame completion and slot refill after streamoff has begun. A
> similar race around STARTING can make the IRQ path observe the wrong
> phase during startup synchronization.
> 
> Fix both state transitions by serializing the writes with qlock, while
> still keeping rzg2l_cru_set_stream() outside the locked region.

      reply	other threads:[~2026-04-21  6:27 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-21  6:03 [PATCH] media: rzg2l-cru: serialize state transitions with qlock Shuhao Fu
2026-04-21  6:27 ` Shuhao Fu [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260421062708.GA2548544@chcpu16 \
    --to=sfual@cse.ust.hk \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=mchehab@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox