All of lore.kernel.org
 help / color / mirror / Atom feed
From: Felipe Balbi <balbi@kernel.org>
To: Alan Stern <stern@rowland.harvard.edu>
Cc: USB mailing list <linux-usb@vger.kernel.org>
Subject: Re: Disconnect race in Gadget core
Date: Tue, 11 May 2021 11:22:51 +0300	[thread overview]
Message-ID: <87r1idfzms.fsf@kernel.org> (raw)
In-Reply-To: <20210510193849.GB873147@rowland.harvard.edu>

[-- Attachment #1: Type: text/plain, Size: 6849 bytes --]


Hi,

Alan Stern <stern@rowland.harvard.edu> writes:
>> Alan Stern <stern@rowland.harvard.edu> writes:
>> > I just noticed this potential race in the Gadget core, specifically, in 
>> > usb_gadget_remove_driver.  Here's the relevant code:
>> >
>> > 	usb_gadget_disconnect(udc->gadget);
>> > 	if (udc->gadget->irq)
>> > 		synchronize_irq(udc->gadget->irq);
>> > 	udc->driver->unbind(udc->gadget);
>> > 	usb_gadget_udc_stop(udc);
>> >
>> > usb_gadget_disconnect will turn off the D+ pull-up, telling the host (or 
>> > upstream hub) that the gadget is no longer connected to the bus.  The 
>> > udc->driver->unbind call then tells the gadget driver that it will no 
>> > longer receive any callbacks from the UDC driver or the gadget core.
>> >
>> > Now suppose that at just this moment, the user unplugs the USB cable.  
>> > The UDC driver will notice that the Vbus voltage has gone away and will 
>> > invoke the gadget driver's ->disconnect callback.  After all, it doesn't 
>> > realize it should avoid making these callbacks until usb_gadget_udc_stop 
>> > has run.
>> >
>> > As a result, the gadget driver's disconnect callback may be invoked 
>> > _after_ the driver has been unbound from the gadget.
>> 
>> argh, nice catch!
>> 
>> > How should we fix this potential problem?
>> 
>> I think we can just move usb_gadget_udc_stop(). Looking at a few udcs,
>> they're just either freeing or masking UDC's interrupts which should
>> prevent the race you describe while also making sure that no further
>> interrupts will trigger.
>> 
>> Perhaps we could move udc_stop() before synchronize_irq(). Do you
>> foresee any issues with that for net2272 or dummy?
>
> I don't think it will be that easy.  As you may remember, there have 
> been a number of commits over the years fiddling with the order of 
> operations during gadget driver unbinding (although I don't think any of 
> them moved udc_stop from its position at the end).  Still, it's a 
> delicate matter.
>
> The purpose of synchronize_irq is to wait until any currently running 
> IRQ handlers have finished.  Obviously this doesn't do any good unless 
> the software has arranged beforehand that no new interrupt requests will 
> be generated.  In this case, turning off the pull-up is currently not 
> sufficient.  But waiting until after udc_stop has returned isn't the 
> answer; it wouldn't prevent callbacks from being issued between the 
> unbind and the udc_stop.
>
> And I'm quite sure that calling udc_stop before unbind would be wrong.  
> The gadget driver would then get various errors (like requests 
> completing with -ESHUTDOWN status) it's not prepared to handle.
>
> So what we need is a way to tell the UDC driver to stop issuing 
> callbacks.  The ones defined in struct usb_gadget_driver are:
>
> 	bind and unbind,
> 	bus suspend and bus resume,
> 	setup,
> 	bus reset,
> 	disconnect.
>
> Bind and unbind aren't relevant for this discussion because they are 
> synchronous, not invoked in response to interrupts.
>
> In theory there shouldn't be any bus-resume, setup, or bus-reset 
> interrupts once the pull-up is turned off, but it would be good to 
> protect against bad hardware which may produce them.
>
> Bus suspend is a real possibility.  It occurs when the UDC hasn't 
> received any packets for a few milliseconds, which certainly will be the 
> case when the pull-up is off.  UDC hardware and drivers may or may not 
> be smart enough to realize that under this circumstance, lack of packets 
> shouldn't be reported as a bus suspend.
>
> And of course, a cable disconnect can occur at any time -- nothing we 
> can do about that.
>
> Putting it all together, we need to tell the UDC driver, somewhere 
> between turning the pull-up off and doing the synchronize_irq, to stop 
> issuing disconnect (and possibly also setup and suspend) callbacks.  I 
> don't think we can use the pull-up call for this purpose; a gadget 
> driver may want for some reason to disconnect logically from the bus 
> while still knowing whether Vbus is present.  (You may disagree about 
> that, but otherwise what's the point of having a disconnect callback in 
> the first place?)
>
> Which means in the end that struct usb_gadget_ops needs either to have a 
> new callback for this purpose or to have an existing callback augmented 
> (for example, the ->pullup callback could get an additional argument 
> saying whether to continue issuing callbacks).  Or another possibility 
> is to make UDC drivers call a core routine to do disconnect callbacks 
> instead of issuing those callbacks themselves, and have the core filter 
> out callbacks that come at the wrong time.  Either way, every UDC driver 
> would need to be modified.
>
> What do you think?  Is this reasoning accurate, or have I missed 
> something?

Right, I'm arguing that, perhaps, ->udc_stop() is the one that should
have said semantics. For starters, 'stop' has a very clear meaning and,
considering my quick review of 3 or 4 UDC drivers, they are just masking
or releasing interrupts which would prevent ->suspend() and
->disconnect() from being called. It could be, however, that if we
change the semantics of udc_stop to fit your description above,
->udc_start() may have to change accordingly. Using dwc3 as an example,
here are the relevant implementations:

> static int dwc3_gadget_start(struct usb_gadget *g,
> 		struct usb_gadget_driver *driver)
> {
> 	struct dwc3		*dwc = gadget_to_dwc(g);
> 	unsigned long		flags;
> 	int			ret;
> 	int			irq;
>
> 	irq = dwc->irq_gadget;
> 	ret = request_threaded_irq(irq, dwc3_interrupt, dwc3_thread_interrupt,
> 			IRQF_SHARED, "dwc3", dwc->ev_buf);

request interrupt line and enable it. Prepare the udc to call gadget ops.

> 	if (ret) {
> 		dev_err(dwc->dev, "failed to request irq #%d --> %d\n",
> 				irq, ret);
> 		return ret;
> 	}
>
> 	spin_lock_irqsave(&dwc->lock, flags);
> 	dwc->gadget_driver	= driver;

internal pointer cached for convenience

> 	spin_unlock_irqrestore(&dwc->lock, flags);
>
> 	return 0;
> }
>
> static int dwc3_gadget_stop(struct usb_gadget *g)
> {
> 	struct dwc3		*dwc = gadget_to_dwc(g);
> 	unsigned long		flags;
>
> 	spin_lock_irqsave(&dwc->lock, flags);
> 	dwc->gadget_driver	= NULL;
> 	spin_unlock_irqrestore(&dwc->lock, flags);
>
> 	free_irq(dwc->irq_gadget, dwc->ev_buf);

drop the interrupt line. This makes the synchronize_irq() call
irrelevant in usb_gadget_remove_driver().

I'm not against adding new udc methods to gadget ops, but it seems
fitting that udc_start/udc_stop would fit your description while some
new members could be given the task of priming the default control pipe
to receive the first SETUP packet.

What do you think?

-- 
balbi

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 511 bytes --]

  parent reply	other threads:[~2021-05-11  8:31 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-10 15:24 Disconnect race in Gadget core Alan Stern
2021-05-10 16:43 ` Felipe Balbi
2021-05-10 19:38   ` Alan Stern
2021-05-11  2:53     ` Peter Chen
2021-05-11 19:15       ` Alan Stern
2021-05-12  9:37         ` Peter Chen
2021-05-12  9:41           ` Felipe Balbi
2021-05-12 19:33           ` Alan Stern
2021-05-11  8:22     ` Felipe Balbi [this message]
2021-05-11 21:26       ` Alan Stern
2021-05-12  7:00         ` Felipe Balbi
2021-05-12 15:33           ` Alan Stern
2021-05-14  7:35             ` Felipe Balbi
2021-05-14 16:58               ` Alan Stern
2021-05-15  6:41                 ` Felipe Balbi
2021-05-15 15:31                   ` Alan Stern
2021-05-16  9:43                     ` Felipe Balbi
2021-05-16 14:51                       ` Alan Stern
2021-05-17  2:00                         ` Peter Chen
2021-05-17  5:33                           ` Felipe Balbi
2021-05-17  5:35                         ` Felipe Balbi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87r1idfzms.fsf@kernel.org \
    --to=balbi@kernel.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=stern@rowland.harvard.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.