public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [BUG] usb: cdc-wdm: Missing barriers in ad-hoc lockless buffer
@ 2026-03-04  8:41 Gui-Dong Han
  2026-03-04  9:15 ` Greg KH
  2026-03-05 11:35 ` Oliver Neukum
  0 siblings, 2 replies; 8+ messages in thread
From: Gui-Dong Han @ 2026-03-04  8:41 UTC (permalink / raw)
  To: Greg KH, oneukum; +Cc: robert.hodaszi, kees, linux-usb, LKML, Jia-Ju Bai

Hello maintainers,

I would like to report a potential concurrency bug in
drivers/usb/class/cdc-wdm.c.

The driver implements an ad-hoc lockless buffer using desc->ubuf and
desc->length. In wdm_read(), the read side checks
READ_ONCE(desc->length) outside the spinlock. However, the write side
in wdm_in_callback() updates the buffer and length without WRITE_ONCE
and any memory barriers.

Due to compiler optimization or CPU out-of-order execution, the
desc->length update can be reordered before the memmove. If this
happens, wdm_read() can see the new length and call copy_to_user() on
uninitialized memory. This also violates LKMM data race rules [1].

Additionally, the driver relies heavily on set_bit() and test_bit() on
desc->flags for synchronization. These bit operations do not provide
implicit barriers, which might lead to similar ordering issues.

Proposed solutions:
1. Short-term: Add WRITE_ONCE() and smp_wmb() on the write side, and
smp_rmb() on the read side.
2. Long-term: Replace the ad-hoc buffer with kfifo. This is a classic
single-reader (holding desc->rlock) and single-writer (holding
desc->iuspin) scenario, making it a perfect fit for kfifo.

I discovered this issue while studying the driver's code. The presence
of a READ_ONCE() on the read side without a matching WRITE_ONCE() on
the write side caught my attention as a potential data race under the
LKMM. In my opinion, implementing ad-hoc lockless algorithms directly
within individual drivers is highly error-prone. To avoid these subtle
memory ordering and barrier bugs, drivers should rely on established,
well-tested kernel libraries like kfifo to handle this type of
concurrency.

I am currently trying to reproduce the issue via stress testing on
ARM64, though the race window is tight. I will also attempt a kfifo
refactoring. However, since I am not familiar with this specific
driver, I welcome anyone else to take over the kfifo conversion to
eliminate these potential bugs and simplify the code.

Thank you for your attention to this matter.

Thanks.

[1] https://elixir.bootlin.com/linux/v6.19-rc5/source/tools/memory-model/Documentation/explanation.txt#L2231

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [BUG] usb: cdc-wdm: Missing barriers in ad-hoc lockless buffer
  2026-03-04  8:41 [BUG] usb: cdc-wdm: Missing barriers in ad-hoc lockless buffer Gui-Dong Han
@ 2026-03-04  9:15 ` Greg KH
  2026-03-05 11:35 ` Oliver Neukum
  1 sibling, 0 replies; 8+ messages in thread
From: Greg KH @ 2026-03-04  9:15 UTC (permalink / raw)
  To: Gui-Dong Han; +Cc: oneukum, robert.hodaszi, kees, linux-usb, LKML, Jia-Ju Bai

On Wed, Mar 04, 2026 at 04:41:32PM +0800, Gui-Dong Han wrote:
> Hello maintainers,
> 
> I would like to report a potential concurrency bug in
> drivers/usb/class/cdc-wdm.c.
> 
> The driver implements an ad-hoc lockless buffer using desc->ubuf and
> desc->length. In wdm_read(), the read side checks
> READ_ONCE(desc->length) outside the spinlock. However, the write side
> in wdm_in_callback() updates the buffer and length without WRITE_ONCE
> and any memory barriers.
> 
> Due to compiler optimization or CPU out-of-order execution, the
> desc->length update can be reordered before the memmove. If this
> happens, wdm_read() can see the new length and call copy_to_user() on
> uninitialized memory. This also violates LKMM data race rules [1].
> 
> Additionally, the driver relies heavily on set_bit() and test_bit() on
> desc->flags for synchronization. These bit operations do not provide
> implicit barriers, which might lead to similar ordering issues.
> 
> Proposed solutions:
> 1. Short-term: Add WRITE_ONCE() and smp_wmb() on the write side, and
> smp_rmb() on the read side.
> 2. Long-term: Replace the ad-hoc buffer with kfifo. This is a classic
> single-reader (holding desc->rlock) and single-writer (holding
> desc->iuspin) scenario, making it a perfect fit for kfifo.
> 
> I discovered this issue while studying the driver's code. The presence
> of a READ_ONCE() on the read side without a matching WRITE_ONCE() on
> the write side caught my attention as a potential data race under the
> LKMM. In my opinion, implementing ad-hoc lockless algorithms directly
> within individual drivers is highly error-prone. To avoid these subtle
> memory ordering and barrier bugs, drivers should rely on established,
> well-tested kernel libraries like kfifo to handle this type of
> concurrency.
> 
> I am currently trying to reproduce the issue via stress testing on
> ARM64, though the race window is tight. I will also attempt a kfifo
> refactoring. However, since I am not familiar with this specific
> driver, I welcome anyone else to take over the kfifo conversion to
> eliminate these potential bugs and simplify the code.
> 
> Thank you for your attention to this matter.

Patches to help resolve this would be great, and yes, moving to kfifo
might make this work much better overall.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [BUG] usb: cdc-wdm: Missing barriers in ad-hoc lockless buffer
  2026-03-04  8:41 [BUG] usb: cdc-wdm: Missing barriers in ad-hoc lockless buffer Gui-Dong Han
  2026-03-04  9:15 ` Greg KH
@ 2026-03-05 11:35 ` Oliver Neukum
  2026-03-05 12:28   ` Gui-Dong Han
  1 sibling, 1 reply; 8+ messages in thread
From: Oliver Neukum @ 2026-03-05 11:35 UTC (permalink / raw)
  To: Gui-Dong Han, Greg KH, oneukum
  Cc: robert.hodaszi, kees, linux-usb, LKML, Jia-Ju Bai

On 04.03.26 09:41, Gui-Dong Han wrote:
> Additionally, the driver relies heavily on set_bit() and test_bit() on
> desc->flags for synchronization. These bit operations do not provide
> implicit barriers, which might lead to similar ordering issues.

Hi,

going through the driver it seems to me like all relevant cases
are covered by the iuspin spinlock. Do you have any concrete suspicions
or do you generally distrust this style?

	Regards
		Oliver


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [BUG] usb: cdc-wdm: Missing barriers in ad-hoc lockless buffer
  2026-03-05 11:35 ` Oliver Neukum
@ 2026-03-05 12:28   ` Gui-Dong Han
  2026-03-05 12:44     ` Oliver Neukum
  0 siblings, 1 reply; 8+ messages in thread
From: Gui-Dong Han @ 2026-03-05 12:28 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: Greg KH, robert.hodaszi, kees, linux-usb, LKML, Jia-Ju Bai

On Thu, Mar 5, 2026 at 7:35 PM Oliver Neukum <oneukum@suse.com> wrote:
>
> On 04.03.26 09:41, Gui-Dong Han wrote:
> > Additionally, the driver relies heavily on set_bit() and test_bit() on
> > desc->flags for synchronization. These bit operations do not provide
> > implicit barriers, which might lead to similar ordering issues.
>
> Hi,
>
> going through the driver it seems to me like all relevant cases
> are covered by the iuspin spinlock. Do you have any concrete suspicions
> or do you generally distrust this style?

Hi Oliver,

Thanks for taking a close look.

I did notice some potential race conditions regarding concurrency
visibility, but I admit I haven't constructed a concrete, harmful
example like the one for ->length and ->ubuf. Analyzing the flags is
less intuitive, and constructing a clear failure path is
time-consuming, so I eventually gave up on building a strict proof for
it.

Specifically, wdm_read() accesses ->length and then ->flags, often
without holding ->iuspin. Similarly, wdm_in_callback() involves
multiple accesses to both.

I am indeed nervous about this pattern. Without barriers, changes to
these associated fields made on one CPU might be observed in a
completely different order on another CPU.

Thanks.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [BUG] usb: cdc-wdm: Missing barriers in ad-hoc lockless buffer
  2026-03-05 12:28   ` Gui-Dong Han
@ 2026-03-05 12:44     ` Oliver Neukum
  2026-03-05 13:26       ` Gui-Dong Han
  0 siblings, 1 reply; 8+ messages in thread
From: Oliver Neukum @ 2026-03-05 12:44 UTC (permalink / raw)
  To: Gui-Dong Han, Oliver Neukum
  Cc: Greg KH, robert.hodaszi, kees, linux-usb, LKML, Jia-Ju Bai



On 05.03.26 13:28, Gui-Dong Han wrote:

> Specifically, wdm_read() accesses ->length and then ->flags, often
> without holding ->iuspin. Similarly, wdm_in_callback() involves
> multiple accesses to both.
> 
> I am indeed nervous about this pattern. Without barriers, changes to
> these associated fields made on one CPU might be observed in a
> completely different order on another CPU.

OK, I rechecked. And indeed it seems to me like setting WDM_OVERFLOW
and WDM_READ could be reordered with respect to each other in
wdm_in_callback() and that must not happen.
What do you think?

	Regards
		Oliver


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [BUG] usb: cdc-wdm: Missing barriers in ad-hoc lockless buffer
  2026-03-05 12:44     ` Oliver Neukum
@ 2026-03-05 13:26       ` Gui-Dong Han
  2026-03-06  9:25         ` Oliver Neukum
  0 siblings, 1 reply; 8+ messages in thread
From: Gui-Dong Han @ 2026-03-05 13:26 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: Greg KH, robert.hodaszi, kees, linux-usb, LKML, Jia-Ju Bai

On Thu, Mar 5, 2026 at 8:44 PM Oliver Neukum <oneukum@suse.com> wrote:
>
>
>
> On 05.03.26 13:28, Gui-Dong Han wrote:
>
> > Specifically, wdm_read() accesses ->length and then ->flags, often
> > without holding ->iuspin. Similarly, wdm_in_callback() involves
> > multiple accesses to both.
> >
> > I am indeed nervous about this pattern. Without barriers, changes to
> > these associated fields made on one CPU might be observed in a
> > completely different order on another CPU.
>
> OK, I rechecked. And indeed it seems to me like setting WDM_OVERFLOW
> and WDM_READ could be reordered with respect to each other in
> wdm_in_callback() and that must not happen.
> What do you think?

Hi Oliver,

Based on my shallow understanding, reordering issues typically happen
between different memory addresses, not within the same one.

For a single variable, hardware cache coherence guarantees the
modification order. For example, if a CPU writes 1, 2, and 4 to int a,
other CPUs will definitely observe the changes in that exact 1-2-4
sequence. So set_bit() operations on the same desc->flags should be
safe from being observed out-of-order.

The real danger of weak memory architectures lies in accessing
associated variables. For instance, if we write 1 to int a and then 2
to int b, another CPU might observe b == 2 before a == 1. This is
exactly the situation I pointed out in my original report regarding
the lack of barriers between desc->ubuf and desc->length.

Honestly, lockless algorithm design is incredibly hard, which is why
drivers should probably just rely on well-tested libraries instead of
rolling their own. I am definitely no expert in this dark art, just
know enough to be dangerous :)

Thanks.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [BUG] usb: cdc-wdm: Missing barriers in ad-hoc lockless buffer
  2026-03-05 13:26       ` Gui-Dong Han
@ 2026-03-06  9:25         ` Oliver Neukum
  2026-03-06 10:36           ` Gui-Dong Han
  0 siblings, 1 reply; 8+ messages in thread
From: Oliver Neukum @ 2026-03-06  9:25 UTC (permalink / raw)
  To: Gui-Dong Han; +Cc: Greg KH, robert.hodaszi, kees, linux-usb, LKML, Jia-Ju Bai



On 05.03.26 14:26, Gui-Dong Han wrote:

Hi,

> Based on my shallow understanding, reordering issues typically happen
> between different memory addresses, not within the same one.

Nevertheless, you've found the issue, hence I will ask you :-)

Is that something we can depend on or is that just how it works
on the architectures we are currently running on? If I go to the effort
of checking for reordering effects, I want to do it right in all cases.
   
> The real danger of weak memory architectures lies in accessing
> associated variables. For instance, if we write 1 to int a and then 2
> to int b, another CPU might observe b == 2 before a == 1. This is
> exactly the situation I pointed out in my original report regarding
> the lack of barriers between desc->ubuf and desc->length.

Yes. Hence I was looking. The results of a completed IO can be

a) data
b) an error
c) a buffer overflow

thus there must be ordering between recording any of these results
and changing WDM_READ, right?

> Honestly, lockless algorithm design is incredibly hard, which is why
> drivers should probably just rely on well-tested libraries instead of
> rolling their own. I am definitely no expert in this dark art, just
> know enough to be dangerous :)

I agree. The issue is that lockless IO is also error handling, not
just the buffer.

	Regards
		Oliver


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [BUG] usb: cdc-wdm: Missing barriers in ad-hoc lockless buffer
  2026-03-06  9:25         ` Oliver Neukum
@ 2026-03-06 10:36           ` Gui-Dong Han
  0 siblings, 0 replies; 8+ messages in thread
From: Gui-Dong Han @ 2026-03-06 10:36 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: Greg KH, robert.hodaszi, kees, linux-usb, LKML, Jia-Ju Bai

On Fri, Mar 6, 2026 at 5:25 PM Oliver Neukum <oneukum@suse.com> wrote:
>
>
>
> On 05.03.26 14:26, Gui-Dong Han wrote:
>
> Hi,
>
> > Based on my shallow understanding, reordering issues typically happen
> > between different memory addresses, not within the same one.
>
> Nevertheless, you've found the issue, hence I will ask you :-)
>
> Is that something we can depend on or is that just how it works
> on the architectures we are currently running on? If I go to the effort
> of checking for reordering effects, I want to do it right in all cases.

Yes, we can depend on it. Both the LKMM [1] and the documentation for
ARM [2] (the most widespread weak memory architecture) explicitly
require cache coherence. This guarantees the historical consistency of
modifications to any single memory address across all CPUs.

If a hardware architecture were actually that weird, it wouldn't offer
any benefits to sell, and Linux wouldn't support it anyway :)

>
> > The real danger of weak memory architectures lies in accessing
> > associated variables. For instance, if we write 1 to int a and then 2
> > to int b, another CPU might observe b == 2 before a == 1. This is
> > exactly the situation I pointed out in my original report regarding
> > the lack of barriers between desc->ubuf and desc->length.
>
> Yes. Hence I was looking. The results of a completed IO can be
>
> a) data
> b) an error
> c) a buffer overflow
>
> thus there must be ordering between recording any of these results
> and changing WDM_READ, right?

Yes, that seems correct.

>
> > Honestly, lockless algorithm design is incredibly hard, which is why
> > drivers should probably just rely on well-tested libraries instead of
> > rolling their own. I am definitely no expert in this dark art, just
> > know enough to be dangerous :)
>
> I agree. The issue is that lockless IO is also error handling, not
> just the buffer.

Agreed. The lockless buffer logic is entangled with the error
handling, which complicates things further.

Thanks.

[1] https://elixir.bootlin.com/linux/v7.0-rc1/source/tools/memory-model/Documentation/explanation.txt#L660
[2] https://developer.arm.com/documentation/102336/0100/Memory-ordering

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-03-06 10:36 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-04  8:41 [BUG] usb: cdc-wdm: Missing barriers in ad-hoc lockless buffer Gui-Dong Han
2026-03-04  9:15 ` Greg KH
2026-03-05 11:35 ` Oliver Neukum
2026-03-05 12:28   ` Gui-Dong Han
2026-03-05 12:44     ` Oliver Neukum
2026-03-05 13:26       ` Gui-Dong Han
2026-03-06  9:25         ` Oliver Neukum
2026-03-06 10:36           ` Gui-Dong Han

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox