* dwc3_set_mode vs. __dwc3_set_mode race
@ 2022-11-26 9:10 Sven Peter
2022-11-27 23:12 ` Janne Grunau
2022-11-28 19:16 ` Thinh Nguyen
0 siblings, 2 replies; 3+ messages in thread
From: Sven Peter @ 2022-11-26 9:10 UTC (permalink / raw)
To: Thinh Nguyen; +Cc: Greg Kroah-Hartman, linux-usb, asahi
Hi Thinh,
I've run into a race between dwc3_set_mode and __dwc3_set_mode accessing
dwc->desired_dr_role: It's an incredibly tight race that's hard to hit since
role switch events need to come in just after each other. It's reproducible
with an Apple M1 connected to a device that very quickly switches
roles when shutting down (which happens to be another M1). This sometimes
triggers a device->host->device switch sequence which is fast enough to hit this
race:
CPU A
dwc3_set_mode(DWC3_GCTL_PRTCAP_HOST) // first role switch event
spin_lock_irqsave(&dwc->lock, flags);
dwc->desired_dr_role = mode; // DWC3_GCTL_PRTCAP_HOST
spin_unlock_irqrestore(&dwc->lock, flags);
queue_work(system_freezable_wq, &dwc->drd_work); // true, schedules __dwc3_set_mode
CPU B
__dwc3_set_mode
// ....
spin_lock_irqsave(&dwc->lock, flags);
dwc3_set_prtcap(dwc, dwc->desired_dr_role); // DWC3_GCTL_PRTCAP_HOST
spin_unlock_irqrestore(&dwc->lock, flags);
CPU A
dwc3_set_mode(DWC3_GCTL_PRTCAP_DEVICE) // second role switch event
spin_lock_irqsave(&dwc->lock, flags);
dwc->desired_dr_role = mode; // DWC3_GCTL_PRTCAP_DEVICE
spin_unlock_irqrestore(&dwc->lock, flags);
CPU B (continues running __dwc3_set_mode)
switch (dwc->desired_dr_role) { // DWC3_GCTL_PRTCAP_DEVICE
case DWC3_GCTL_PRTCAP_HOST:
// not executed since desired_dr_role is DWC3_GCTL_PRTCAP_DEVICE now
break;
CPU A (continues running dwc3_set_mode)
queue_work(system_freezable_wq, &dwc->drd_work); // __dwc3_set_mode is still running
CPU B (continues running __dwc3_set_mode)
case DWC3_GCTL_PRTCAP_DEVICE:
// ....
ret = dwc3_gadget_init(dwc);
We then have DWC3_GCTL.DWC3_GCTL_PRTCAPDIR = DWC3_GCTL_PRTCAP_HOST and
dwc->current_dr_role = DWC3_GCTL_PRTCAP_HOST but initialized the controller in
device mode when calling dwc3_gadget_init. This obviously doesn't work and is
not easy to recover from.
Unfortunately we can't just lock dwc3->mutex since dwc3_set_mode may be called
from an extcon interrupt in atomic context (which is probably the reason for
deferring the mode switch to a workqueue).
Otherwise I can only think of creating a single-threaded work queue and
allocating a new work_struct together with desired_role inside dwc3_set_mode
and putting that onto the queue, i.e. something like:
struct dwc3_set_mode_work {
struct dwc3 *dwc;
u32 desired_dr_role;
struct work_struct work;
};
void dwc3_set_mode(struct dwc3 *dwc, u32 mode)
{
struct dwc3_set_mode_work *work = kzalloc(sizeof(*work), GFP_ATOMIC);
INIT_WORK(&work->work, __dwc3_set_mode);
work->dwc = dwc;
work->desired_dr_role = mode;
queue_work(dwc->drd_work_queue, &work->work);
}
That way all role switch events will be executed in order and we can't race
desired_dr_role anymore.
I'm not very happy with that solution but can't think of anything else.
Any thoughts or ideas? I can otherwise prepare a patch.
Thanks,
Sven
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: dwc3_set_mode vs. __dwc3_set_mode race
2022-11-26 9:10 dwc3_set_mode vs. __dwc3_set_mode race Sven Peter
@ 2022-11-27 23:12 ` Janne Grunau
2022-11-28 19:16 ` Thinh Nguyen
1 sibling, 0 replies; 3+ messages in thread
From: Janne Grunau @ 2022-11-27 23:12 UTC (permalink / raw)
To: Sven Peter; +Cc: Thinh Nguyen, Greg Kroah-Hartman, linux-usb, asahi
Hej,
On 2022-11-26 10:10:48 +0100, Sven Peter wrote:
> Hi Thinh,
>
> I've run into a race between dwc3_set_mode and __dwc3_set_mode accessing
> dwc->desired_dr_role: It's an incredibly tight race that's hard to hit since
> role switch events need to come in just after each other. It's reproducible
> with an Apple M1 connected to a device that very quickly switches
> roles when shutting down (which happens to be another M1). This sometimes
> triggers a device->host->device switch sequence which is fast enough to hit this
> race:
>
> CPU A
> dwc3_set_mode(DWC3_GCTL_PRTCAP_HOST) // first role switch event
> spin_lock_irqsave(&dwc->lock, flags);
> dwc->desired_dr_role = mode; // DWC3_GCTL_PRTCAP_HOST
> spin_unlock_irqrestore(&dwc->lock, flags);
> queue_work(system_freezable_wq, &dwc->drd_work); // true, schedules __dwc3_set_mode
>
> CPU B
> __dwc3_set_mode
> // ....
> spin_lock_irqsave(&dwc->lock, flags);
> dwc3_set_prtcap(dwc, dwc->desired_dr_role); // DWC3_GCTL_PRTCAP_HOST
> spin_unlock_irqrestore(&dwc->lock, flags);
>
> CPU A
> dwc3_set_mode(DWC3_GCTL_PRTCAP_DEVICE) // second role switch event
> spin_lock_irqsave(&dwc->lock, flags);
> dwc->desired_dr_role = mode; // DWC3_GCTL_PRTCAP_DEVICE
> spin_unlock_irqrestore(&dwc->lock, flags);
>
> CPU B (continues running __dwc3_set_mode)
> switch (dwc->desired_dr_role) { // DWC3_GCTL_PRTCAP_DEVICE
> case DWC3_GCTL_PRTCAP_HOST:
> // not executed since desired_dr_role is DWC3_GCTL_PRTCAP_DEVICE now
> break;
>
> CPU A (continues running dwc3_set_mode)
> queue_work(system_freezable_wq, &dwc->drd_work); // __dwc3_set_mode is still running
>
> CPU B (continues running __dwc3_set_mode)
> case DWC3_GCTL_PRTCAP_DEVICE:
> // ....
> ret = dwc3_gadget_init(dwc);
>
>
> We then have DWC3_GCTL.DWC3_GCTL_PRTCAPDIR = DWC3_GCTL_PRTCAP_HOST and
> dwc->current_dr_role = DWC3_GCTL_PRTCAP_HOST but initialized the controller in
> device mode when calling dwc3_gadget_init. This obviously doesn't work and is
> not easy to recover from.
>
> Unfortunately we can't just lock dwc3->mutex since dwc3_set_mode may be called
> from an extcon interrupt in atomic context (which is probably the reason for
> deferring the mode switch to a workqueue).
>
> Otherwise I can only think of creating a single-threaded work queue and
> allocating a new work_struct together with desired_role inside dwc3_set_mode
> and putting that onto the queue, i.e. something like:
>
> struct dwc3_set_mode_work {
> struct dwc3 *dwc;
> u32 desired_dr_role;
> struct work_struct work;
> };
>
> void dwc3_set_mode(struct dwc3 *dwc, u32 mode)
> {
> struct dwc3_set_mode_work *work = kzalloc(sizeof(*work), GFP_ATOMIC);
>
> INIT_WORK(&work->work, __dwc3_set_mode);
> work->dwc = dwc;
> work->desired_dr_role = mode;
> queue_work(dwc->drd_work_queue, &work->work);
> }
>
> That way all role switch events will be executed in order and we can't race
> desired_dr_role anymore.
> I'm not very happy with that solution but can't think of anything else.
>
> Any thoughts or ideas? I can otherwise prepare a patch.
your alternate solution to operate on a stack copy of
dwc->desired_dr_role in __dwc3_set_mode() avoids this race and seems to
work fine. Tested with 2 Apple silicon devices. Trivial patch left to
the sender since I did just the trivial typing and testing.
thanks
Janne
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: dwc3_set_mode vs. __dwc3_set_mode race
2022-11-26 9:10 dwc3_set_mode vs. __dwc3_set_mode race Sven Peter
2022-11-27 23:12 ` Janne Grunau
@ 2022-11-28 19:16 ` Thinh Nguyen
1 sibling, 0 replies; 3+ messages in thread
From: Thinh Nguyen @ 2022-11-28 19:16 UTC (permalink / raw)
To: Sven Peter
Cc: Thinh Nguyen, Greg Kroah-Hartman, linux-usb@vger.kernel.org,
asahi@lists.linux.dev
On Sat, Nov 26, 2022, Sven Peter wrote:
> Hi Thinh,
>
> I've run into a race between dwc3_set_mode and __dwc3_set_mode accessing
> dwc->desired_dr_role: It's an incredibly tight race that's hard to hit since
> role switch events need to come in just after each other. It's reproducible
> with an Apple M1 connected to a device that very quickly switches
> roles when shutting down (which happens to be another M1). This sometimes
> triggers a device->host->device switch sequence which is fast enough to hit this
> race:
>
> CPU A
> dwc3_set_mode(DWC3_GCTL_PRTCAP_HOST) // first role switch event
> spin_lock_irqsave(&dwc->lock, flags);
> dwc->desired_dr_role = mode; // DWC3_GCTL_PRTCAP_HOST
> spin_unlock_irqrestore(&dwc->lock, flags);
> queue_work(system_freezable_wq, &dwc->drd_work); // true, schedules __dwc3_set_mode
>
> CPU B
> __dwc3_set_mode
> // ....
> spin_lock_irqsave(&dwc->lock, flags);
> dwc3_set_prtcap(dwc, dwc->desired_dr_role); // DWC3_GCTL_PRTCAP_HOST
> spin_unlock_irqrestore(&dwc->lock, flags);
>
> CPU A
> dwc3_set_mode(DWC3_GCTL_PRTCAP_DEVICE) // second role switch event
> spin_lock_irqsave(&dwc->lock, flags);
> dwc->desired_dr_role = mode; // DWC3_GCTL_PRTCAP_DEVICE
> spin_unlock_irqrestore(&dwc->lock, flags);
>
> CPU B (continues running __dwc3_set_mode)
> switch (dwc->desired_dr_role) { // DWC3_GCTL_PRTCAP_DEVICE
> case DWC3_GCTL_PRTCAP_HOST:
> // not executed since desired_dr_role is DWC3_GCTL_PRTCAP_DEVICE now
> break;
>
> CPU A (continues running dwc3_set_mode)
> queue_work(system_freezable_wq, &dwc->drd_work); // __dwc3_set_mode is still running
>
> CPU B (continues running __dwc3_set_mode)
> case DWC3_GCTL_PRTCAP_DEVICE:
> // ....
> ret = dwc3_gadget_init(dwc);
>
>
> We then have DWC3_GCTL.DWC3_GCTL_PRTCAPDIR = DWC3_GCTL_PRTCAP_HOST and
> dwc->current_dr_role = DWC3_GCTL_PRTCAP_HOST but initialized the controller in
> device mode when calling dwc3_gadget_init. This obviously doesn't work and is
> not easy to recover from.
>
> Unfortunately we can't just lock dwc3->mutex since dwc3_set_mode may be called
> from an extcon interrupt in atomic context (which is probably the reason for
> deferring the mode switch to a workqueue).
>
> Otherwise I can only think of creating a single-threaded work queue and
> allocating a new work_struct together with desired_role inside dwc3_set_mode
> and putting that onto the queue, i.e. something like:
>
> struct dwc3_set_mode_work {
> struct dwc3 *dwc;
> u32 desired_dr_role;
> struct work_struct work;
> };
>
> void dwc3_set_mode(struct dwc3 *dwc, u32 mode)
> {
> struct dwc3_set_mode_work *work = kzalloc(sizeof(*work), GFP_ATOMIC);
>
> INIT_WORK(&work->work, __dwc3_set_mode);
> work->dwc = dwc;
> work->desired_dr_role = mode;
> queue_work(dwc->drd_work_queue, &work->work);
> }
>
> That way all role switch events will be executed in order and we can't race
> desired_dr_role anymore.
> I'm not very happy with that solution but can't think of anything else.
>
> Any thoughts or ideas? I can otherwise prepare a patch.
>
>
> Thanks,
>
>
> Sven
Your revised change in the new patch is a better option.
Thanks,
Thinh
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2022-11-28 19:18 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-11-26 9:10 dwc3_set_mode vs. __dwc3_set_mode race Sven Peter
2022-11-27 23:12 ` Janne Grunau
2022-11-28 19:16 ` Thinh Nguyen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).