linux-usb.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Eli Billauer <eli.billauer@gmail.com>
To: Alan Stern <stern@rowland.harvard.edu>
Cc: linux-usb@vger.kernel.org,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Hans de Goede <hdegoede@redhat.com>,
	Oliver Neukum <oliver@neukum.org>
Subject: Re: usb: core: URB completer callback possibly called after usb_kill_anchored_urbs() returns
Date: Sat, 25 Jul 2020 19:44:02 +0300	[thread overview]
Message-ID: <5F1C6152.8000500@gmail.com> (raw)
In-Reply-To: <20200724155133.GC1388675@rowland.harvard.edu>

Hello Alan & all,

Thanks for your response.

The thing is that I'm not alone assuming that it's fine to free 
resources after usb_kill_anchored_urbs() returns. Most notable is 
usb-skeleton.c, which does exactly that in skel_disconnect():

     usb_kill_anchored_urbs(&dev->submitted);

     /* decrement our usage count */
     kref_put(&dev->kref, skel_delete);

Needless to say, skel_delete() frees the struct that the URBs' contexts 
point at.

Keeping track of the number of uncompleted URBs, as you suggested, is 
indeed a solution. But as it seems that the only problem is the race 
condition between usb_kill_anchored_urbs() and __usb_hcd_giveback_urb(), 
I suppose it's enough to ensure that the resources are not freed while 
__usb_hcd_giveback_urb() is doing its unanchor-before-complete thing.

After taking a second look, I discovered that there's already a function 
that takes the race condition into consideration: 
usb_wait_anchor_empty_timeout().

Looking again at commit 6ec4147, which I mentioned before, it turns out 
that it added a counter to each anchor struct (atomic_t 
suspend_wakeups). It's incremented in __usb_hcd_giveback_urb() just 
before unanchoring a URB, and decremented after the completion has been 
called. This is made with calls to usb_anchor_suspend_wakeups() and 
usb_anchor_resume_wakeups(), but that's the essence of these calls.

And there's a wait queue in place, which gets a wakeup call by 
usb_anchor_resume_wakeups(), if the anchor's list is empty and the 
counter is zero after decrementing it. In the said commit, 
usb_wait_anchor_empty_timeout() was modified to check the counter as 
well, so when it returns, the anchor is genuinely idle.

So I would say that the safe way to go is

   usb_kill_anchored_urbs(&ep->anchor);
   if (!usb_wait_anchor_empty_timeout(&ep->anchor, 1000)) {
      /* This is really bad */
   }
   /* Release memory */

And if indeed so, I'm thinking about submitting a patch, which adds a 
usb_wait_anchor_empty_timeout() at the bottom of 
usb_kill_anchored_urbs(). So that the function does what people out 
there think it does.

Have I missed something?

Thanks,
    Eli

On 24/07/20 18:51, Alan Stern wrote:
> On Fri, Jul 24, 2020 at 03:46:40PM +0300, Eli Billauer wrote:
>    
>> Hello,
>>
>> My understanding is it should be OK to assume that no calls to completer
>> callbacks will be made after usb_kill_anchored_urbs() returns (for that
>> anchor, of course).
>>      
> As you have discovered, that is not a correct assumption.
>
>    
>>   However __usb_hcd_giveback_urb() in
>> drivers/usb/core/hcd.c doesn't seem to work that way. It unanchors first,
>> then calls the complete method:
>>
>>      usb_unanchor_urb(urb);
>>      if (likely(status == 0))
>>          usb_led_activity(USB_LED_EVENT_HOST);
>>
>>      /* pass ownership to the completion handler */
>>      urb->status = status;
>>      kcov_remote_start_usb((u64)urb->dev->bus->busnum);
>>      urb->complete(urb);
>>
>> So if usb_kill_anchored_urbs() is called while __usb_hcd_giveback_urb() is
>> in the middle of this code passage, it will miss the URB that is being
>> finished, and possibly return before the completer has been called.
>>
>> It might sound like a theoretic race condition, but I actually got a kernel
>> panic after yanking the USB plug in the middle of heavy traffic. The panic's
>> call trace indicated that the driver's completer callback function attempted
>> to access memory that had been freed previously. As this happened within an
>> IRQ, it was a fullblown computer freeze.
>>
>> The same driver's memory freeing mechanism indeed calls
>> usb_kill_anchored_urbs() first, then frees the URBs' related data structure.
>> So it seems like it freed the memory just before the completer callback was
>> invoked.
>>      
> Right.  There is a genuine race.  Althouogh usb_kill_anchored_urbs()
> does wait for the completion handlers of all the URBs it kills to
> finish, there is some ambiguity about what URBs are on the anchor.
>
>    
>> I would love to submit a patch that moves the usb_unanchor_urb() call a few
>> rows down, but something tells me it's not that simple.
>>      
> No, it isn't.
>
>    
>> As a side note, the comment along with commit 6ec4147, which added
>> usb_anchor_{suspend,resume}_wakeups calls, said among others: "But
>> __usb_hcd_giveback_urb() calls usb_unanchor_urb before calling the
>> completion handler. This is necessary as the completion handler may
>> re-submit and re-anchor the urb". Not sure I understood this part, though.
>>      
> Suppose the completion routine puts the URB onto a different anchor and
> then calls usb_submit_urb().  If __usb_hcd_giveback_urb() then called
> usb_unanchor_urb(), the URB would incorrectly be removed from the wrong
> anchor!
>
> Currently the only way to handle this situation properly is to keep
> track of whether each URB has completed.  For example, if the driver has
> successfully submitted 237 URBs but the completion routine has only been
> called 235 times, the driver will know that there are still two URBs
> pending.
>
> Alan Stern
>
>    


  reply	other threads:[~2020-07-25 16:44 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-24 12:46 usb: core: URB completer callback possibly called after usb_kill_anchored_urbs() returns Eli Billauer
2020-07-24 15:51 ` Alan Stern
2020-07-25 16:44   ` Eli Billauer [this message]
2020-07-25 19:53     ` Alan Stern

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5F1C6152.8000500@gmail.com \
    --to=eli.billauer@gmail.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hdegoede@redhat.com \
    --cc=linux-usb@vger.kernel.org \
    --cc=oliver@neukum.org \
    --cc=stern@rowland.harvard.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).