public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Mathias Nyman <mathias.nyman@linux.intel.com>
To: Udipto Goswami <udipto.goswami@oss.qualcomm.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Mathias Nyman <mathias.nyman@intel.com>,
	linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org,
	Alan Stern <stern@rowland.harvard.edu>
Subject: Re: [RFC PATCH] usb: xhci: Skip configure EP for disabled slots during teardown
Date: Wed, 7 Jan 2026 18:50:40 +0200	[thread overview]
Message-ID: <7631bc7d-e3b2-45b2-9b85-f03ed1d6b3cd@linux.intel.com> (raw)
In-Reply-To: <CAMTwNXDFM=csMEJ1ZhiTOeQ-dDH4eu4ze9XRFbSj0d-4Fxsp=g@mail.gmail.com>

On 1/6/26 12:22, Udipto Goswami wrote:
> On Mon, Jan 5, 2026 at 4:32 PM Mathias Nyman
> <mathias.nyman@linux.intel.com> wrote:
>>
>> Hi
>>
>> On 1/5/26 10:48, Udipto Goswami wrote:
>>> Consider a scenario when a HS headset fails resume and the hub performs
>>> a logical disconnect, the USB core tears down endpoints and calls
>>> hcd->check_bandwidth() on the way out, which with xHCI translates to a
>>> drop-only Configure Endpoint command (add_flags == SLOT_FLAG, drop_flags
>>> != 0). If the slot is already disabled (slot_id == 0) or the virtual
>>> device has been freed, issuing this Configure Endpoint command is
>>> pointless and may appear stuck until event handling catches up,
>>> causing unnecessary delays during disconnect teardown.
>>>
>>> Fix this by adding a check in xhci_check_bandwidth(), return success
>>> immediately if slot_id == 0 or vdev is missing, preventing the
>>> Configure Endpoint command from being queued at all. Additionally,
>>> in xhci_configure_endpoint() for drop-only Configure Endpoint operations,
>>> return success early if slot_id == 0 or vdev is already freed,
>>> avoiding spurious command waits.
>>>
>>> Signed-off-by: Udipto Goswami <udipto.goswami@oss.qualcomm.com>
>>
>> Makes sense to prevent unnecessary 'configure endpoint' commands
>>
>> Could you share more details how we end up tearing down endpoints and
>> calling xhci_check_bandwidth() after vdev is freed and slot_id set to zero?
>>
>> Did the whole xHC controller fail to resume and was reinitialized in
>> xhci_resume() power_lost path?
>>
>> Or is this related to audio offload and xhci sideband usage?
>>
>> If we end up in this situation in normal headset resume failure then there
>> might be something else wrong.
>>
> 
> Apologies! My mailbox was configured with HTML.
> Re-sending in plain text.
> 
> Hi Mathias,
> 
> Yes, we are using offloaded audio in this case and xhci-sideband is involved.
> 
> Scenario:
> The headset is connected to the platform with no active playback, so
> it suspends. No physical disconnect occurs.
> 
> 1. Audio DSP sends a playback request while the USB headset (device
> 1-1) is suspended
> 2. Resume chain is triggered:
>     handle_uaudio_stream_req
>     → enable_audio_stream
>     → snd_usb_autoresume
>     → dwc3-parent_wrapper (Qualcomm) → xhci → roothub → USB headset (1-1)
> 3. Resume fails at device 1-1:The headset fails to resume from
> suspend. Note that the xHCI controller itself resumes
> successfully—only the headset device fails.
> 4. Hub performs logical disconnect as a recovery mechanism
> 5. Race condition occurs: The USB core begins to teardown (calling
> 'check_bandwidth()'), but the xHCI driver may have already started
> freeing the slot due to the failed resume.
> 
> Two parallel paths:
> PATH1: (slower usb core teardown)
> 
> hub_port_connect_change()
> └─ Device resume fails
>     └─ hub_port_logical_disconnect()
>        └─ usb_disconnect()
>           └─ usb_disable_device()
>              ├─ usb_disable_endpoint() [for each endpoint]
>              │  └─ usb_hcd_disable_endpoint()
>              └─ usb_hcd_alloc_bandwidth()
>                 └─ usb_hcd_check_bandwidth()
>                    └─ xhci_check_bandwidth() ← POINT OF FAILURE
>                       └─ Tries to issue Configure Endpoint
>                          └─ But slot_id == 0 or virt_dev == NULL!
> 
> PATH2: (faster - xhci slot cleanup)
> hub_port_logical_disconnect()
> └─ usb_disconnect()
>     └─ usb_release_dev()
>        └─ usb_hcd_free_dev()
>           └─ xhci_free_dev()
>              └─ xhci_disable_slot()
>                 ├─ Issues TRB_DISABLE_SLOT command
>                 ├─ Waits for completion
>                 └─ xhci_free_virt_device()
>                    ├─ Sets udev->slot_id = 0
>                    ├─ Frees virt_dev
>                    └─ Sets xhci->devs[slot_id] = NULL
> 
> RACE TIMELINE:
> 
> Path 2 (fast)
>        Path 1 (slow)
> ─────────────────────────────────────────────────
> T1: xhci_free_dev() starts
> T2: xhci_disable_slot() issued
> T3: slot_id = 0
> T4: virt_dev freed
> usb_disable_endpoint()
> T5: xhci->devs[slot_id] = NULL                             (still processing...)
> T6:
>       xhci_check_bandwidth() ← RACE!
> T7:
>       Tries Configure Endpoint
> T8:
>       But slot is already freed!
> 
> Path 1 is slower because it must iterate through all endpoints,
> calling usb_disable_endpoint() for each one before reaching
> check_bandwidth().
> Path 2 completes faster with a single disable slot command. So if
> T3-T5 has already executed, meaning tthe slot has already freed then
> configure endpoint commands can be skipped i.e T6-T8.
> Please let me know if this makes sense ?

Thanks, well explained and nicely laid out.

There is something still odd in this scenario.

There shouldn't be two racing paths as both cases should be handled by
the hub work 'thread' that only has one active work item.

If resume fails then hub_port_logical_disconnect() is called and marks the device
as "USB_STATE_NOTATTACHED", and adds a change_bit for the port.
hub work should take over from there.

hub work should then do:
hub_event()
   port_event(hub, i);    // because hub->change_bit is set for this port
     hub_port_connect_change()
       hub_port_connect()
         if (udev)
           usb_disconnect()
             usb_disable_device()  //children first
               usb_disable_device_endpoints()  // for each endpoint
                 usb_hcd_alloc_bandwidth(dev, NULL, NULL, NULL);
                   hcd->driver->check_bandwidth()  // does all the configure endpoint commands
             device_del(&udev->dev);
             hub_free_dev(udev)
               hcd->driver->free_dev(hcd, udev);  // clears virt_dev and slot_id here
             put_device(&udev->dev);

To me this looks like driver->check_bandwitdth() is called before driver->free_dev().
  
Thanks
Mathias


      reply	other threads:[~2026-01-07 16:50 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-05  8:48 [RFC PATCH] usb: xhci: Skip configure EP for disabled slots during teardown Udipto Goswami
2026-01-05 11:02 ` Mathias Nyman
2026-01-06 10:22   ` Udipto Goswami
2026-01-07 16:50     ` Mathias Nyman [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7631bc7d-e3b2-45b2-9b85-f03ed1d6b3cd@linux.intel.com \
    --to=mathias.nyman@linux.intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=mathias.nyman@intel.com \
    --cc=stern@rowland.harvard.edu \
    --cc=udipto.goswami@oss.qualcomm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox