[02/20] usb: host: xhci: check DYING state earlier

linux-usb.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Mathias Nyman <mathias.nyman@linux.intel.com>
To: Felipe Balbi <felipe.balbi@linux.intel.com>
Cc: Linux USB <linux-usb@vger.kernel.org>,
	Alan Stern <stern@rowland.harvard.edu>
Subject: [02/20] usb: host: xhci: check DYING state earlier
Date: Wed, 2 May 2018 16:02:52 +0300	[thread overview]
Message-ID: <71780ddd-d0f5-e580-af3f-401d33e36506@linux.intel.com> (raw)

On 02.05.2018 14:46, Felipe Balbi wrote:
> 
> Hi,
> 
> Mathias Nyman <mathias.nyman@linux.intel.com> writes:
>> On 17.04.2018 10:07, Felipe Balbi wrote:
>>>
>>> Hi,
>>>
>>> Mathias Nyman <mathias.nyman@linux.intel.com> writes:
>>>> On 16.04.2018 15:29, Felipe Balbi wrote:
>>>>> Instead of allocating urb priv and, maybe, bail out due to xhci being
>>>>> in DYING state, we can move the check earlier and avoid the memory
>>>>> allocation altogether.
>>>>
>>>> This also moves checking for DYING outside the lock.
>>>>
>>>> Most cases set DYING with lock held, so if we first get the lock before
>>>> checking DYING we have a better chance of not being in the process of dying.
>>>
>>> pretty sure that's atomic, though.
>>
>> That's not what I'm after, your fix is cleaning up code in the case where DYING flag is
>> set before xhci_urb_enqueue() is called. I'm worried about the case when setting DYING flag races
>> with xhci_urb_enqueue(). i.e. xhci_urb_enqueue() is spinning on the lock, waiting, while
>> some other part of the driver is desperately trying to access hw with lock held, failing,
>> finally setting DYING flag, and then releasing lock.
>>
>> If the check is done before taking the lock then the URB might be queued on dying device,
>> at a time when xhci_hc_died already started cancelling and giving back all queued URB
> 
> this can only happen if checking that bit isn't an atomic operation
> which, AFAICT, it is. IOW, it would be the same if you were to change:
> 
> 	if (a & b)
>          	return -1;
> 
> to:
> 
> 	if (test_bit(b, &a))
>          	return -1;
> 
> right? Now, if this isn't an atomic operation, I'm happy to be educated.

Again, it's not about being atomic.
As an example lets take the get port status request racing with queueing a URB.
After this patch the following is possible:


CPU:0					CPU:1
get port status				queue URB

xhci_hub_control()			xhci_queue_urb()	
spin_lock(lock), got it			XHCI_STATE_DYING? no, continue
temp = readl(port_array[wIndex])	spin_lock(lock), already taken, spin here
if (temp == ~(u32)0) {
xhci_hc_died(xhci)				
xhc_state |= XHCI_STATE_DYING
cleanup_command_queue()
kill_endpoint_urbs()
spin_unlock(lock) // at URB giveback	spin_lock(lock) got it, finally
					allocate urb_priv, plus other stuff
					xhci_queue_*_tx()		
					count_trbs_needed(urb)
					prepare_transfer()
					queue_trb() // for each trb

So its more likely we end up queuing URBs on a dead host, a host that the driver already
started tearing down, freeing URBs. xhci_hub_control() was just one example,
you can replace it with almost any function that calls xhci_hc_died()

> 
>>>> Small thing, but so is this cleanup, so not sure its worth the change
>>>
>>> Look at the result. With this change we don't need to take a lock,
>>> allocate memory, search for endpoint index, search for endpoint
>>> state. All of those are needed for proper operation of the function, but
>>> if the controller has already died, there's no point in going any
>>> further.
>>
>> But we might miss the fact that host died, and go even further, adding URB to list,
>> writing TRBs to ringbuffers etc.
>>
>> In code we save one line,
>> goto: free_priv
> 
> We're saving a lot more than that, actually. All of the following ends
> up being skipped. All of these are unnecessary work when xHC has already
> died:

In lines of code in the driver it's just one line.

In extra code being run it's a gamble.
Before the patch we ran the below code, after the patch it's either nothing, or the below
code plus all the URB/TRB queuing code.

> 
> 8<------------------------------------------------------------------------
> 
> slot_id = urb->dev->slot_id;
> ep_index = xhci_get_endpoint_index(&urb->ep->desc);
> ep_state = &xhci->devs[slot_id]->eps[ep_index].ep_state;
> 
> if (!HCD_HW_ACCESSIBLE(hcd)) {
>          if (!in_interrupt())
>                  xhci_dbg(xhci, "urb submitted during PCI suspend\n");
>          return -ESHUTDOWN;
> }
> 
> if (usb_endpoint_xfer_isoc(&urb->ep->desc))
>          num_tds = urb->number_of_packets;
> else if (usb_endpoint_is_bulk_out(&urb->ep->desc) &&
>      urb->transfer_buffer_length > 0 &&
>      urb->transfer_flags & URB_ZERO_PACKET &&
>      !(urb->transfer_buffer_length % usb_endpoint_maxp(&urb->ep->desc)))
>          num_tds = 2;
> else
>          num_tds = 1;
> 
> urb_priv = kzalloc(sizeof(struct urb_priv) +
>                     num_tds * sizeof(struct xhci_td), mem_flags);
> if (!urb_priv)
>          return -ENOMEM;
> 
> urb_priv->num_tds = num_tds;
> urb_priv->num_tds_done = 0;
> urb->hcpriv = urb_priv;
> 
> trace_xhci_urb_enqueue(urb);
> 
> if (usb_endpoint_xfer_control(&urb->ep->desc)) {
>          /* Check to see if the max packet size for the default control
>           * endpoint changed during FS device enumeration
>           */
>          if (urb->dev->speed == USB_SPEED_FULL) {
>                  ret = xhci_check_maxpacket(xhci, slot_id,
>                                  ep_index, urb);
>                  if (ret < 0) {
>                          xhci_urb_free_priv(urb_priv);
>                          urb->hcpriv = NULL;
>                          return ret;
>                  }
>          }
> }
> 
> spin_lock_irqsave(&xhci->lock, flags);
> 
> 8<------------------------------------------------------------------------
>
---
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

next             reply	other threads:[~2018-05-02 13:02 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-02 13:02 Mathias Nyman [this message]
  -- strict thread matches above, loose matches on Subject: below --
2018-05-02 14:11 [02/20] usb: host: xhci: check DYING state earlier Alan Stern
2018-05-02 11:46 Felipe Balbi
2018-05-02 11:38 Mathias Nyman
2018-04-17  7:07 Felipe Balbi
2018-04-16 13:31 Mathias Nyman
2018-04-16 12:29 Felipe Balbi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=71780ddd-d0f5-e580-af3f-401d33e36506@linux.intel.com \
    --to=mathias.nyman@linux.intel.com \
    --cc=felipe.balbi@linux.intel.com \
    --cc=linux-usb@vger.kernel.org \
    --cc=stern@rowland.harvard.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).