From: Lu Baolu <baolu.lu@linux.intel.com>
To: Mathias Nyman <mathias.nyman@linux.intel.com>,
Baolin Wang <baolin.wang@linaro.org>
Cc: Mathias Nyman <mathias.nyman@intel.com>,
Greg KH <gregkh@linuxfoundation.org>,
USB <linux-usb@vger.kernel.org>,
LKML <linux-kernel@vger.kernel.org>,
Mark Brown <broonie@kernel.org>, "Lu, Baolu" <baolu.lu@intel.com>
Subject: Re: [PATCH 2/2] usb: host: xhci: Handle the right timeout command
Date: Thu, 22 Dec 2016 09:39:22 +0800 [thread overview]
Message-ID: <585B2ECA.7020703@linux.intel.com> (raw)
In-Reply-To: <585A7C20.9060500@linux.intel.com>
Hi,
On 12/21/2016 08:57 PM, Mathias Nyman wrote:
> On 21.12.2016 08:57, Lu Baolu wrote:
>> Hi Mathias,
>>
>> I have some comments for the implementation of
>> xhci_handle_command_timeout() as well.
>>
>> On 12/20/2016 11:13 PM, Mathias Nyman wrote:
>>> On 20.12.2016 09:30, Baolin Wang wrote:
>>> ...
>>>
>>> Alright, I gathered all current work related to xhci races and timeouts
>>> and put them into a branch:
>>>
>>> git://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git timeout_race_fixes
>>>
>>> Its based on 4.9
>>> It includes a few other patches just to avoid conflicts and make my life easier
>>>
>>> Interesting patches are:
>>>
>>> ee4eb91 xhci: remove unnecessary check for pending timer
>>> 0cba67d xhci: detect stop endpoint race using pending timer instead of counter.
>>> 4f2535f xhci: Handle command completion and timeout race
>>> b9d00d7 usb: host: xhci: Fix possible wild pointer when handling abort command
>>> 529a5a0 usb: xhci: fix possible wild pointer
>>> 4766555 xhci: Fix race related to abort operation
>>> de834a3 xhci: Use delayed_work instead of timer for command timeout
>>> 69973b8 Linux 4.9
>>>
>>> The fixes for command queue races will go to usb-linus and stable, the
>>> reworks for stop ep watchdog timer will go to usb-next.
>>>
>>> Still completely untested, (well it compiles)
>>>
>>> Felipe gave instructions how to modify dwc3 driver to timeout on address
>>> devicecommands to test these, I'll try to set that up.
>>>
>>> All additional testing is welcome, especially if you can trigger timeouts
>>> and races
>>>
>>> -Mathias
>>>
>>>
>>
>> I post the code below and add my comments in line.
>>
>> 1276 void xhci_handle_command_timeout(struct work_struct *work)
>> 1277 {
>> 1278 struct xhci_hcd *xhci;
>> 1279 int ret;
>> 1280 unsigned long flags;
>> 1281 u64 hw_ring_state;
>> 1282
>> 1283 xhci = container_of(to_delayed_work(work), struct xhci_hcd, cmd_timer);
>> 1284
>> 1285 spin_lock_irqsave(&xhci->lock, flags);
>> 1286
>> 1287 /*
>> 1288 * If timeout work is pending, or current_cmd is NULL, it means we
>> 1289 * raced with command completion. Command is handled so just return.
>> 1290 */
>> 1291 if (!xhci->current_cmd || delayed_work_pending(&xhci->cmd_timer)) {
>> 1292 spin_unlock_irqrestore(&xhci->lock, flags);
>> 1293 return;
>> 1294 }
>> 1295 /* mark this command to be cancelled */
>> 1296 xhci->current_cmd->status = COMP_CMD_ABORT;
>> 1297
>> 1298 /* Make sure command ring is running before aborting it */
>> 1299 hw_ring_state = xhci_read_64(xhci, &xhci->op_regs->cmd_ring);
>> 1300 if ((xhci->cmd_ring_state & CMD_RING_STATE_RUNNING) &&
>> 1301 (hw_ring_state & CMD_RING_RUNNING)) {
>> 1302 /* Prevent new doorbell, and start command abort */
>> 1303 xhci->cmd_ring_state = CMD_RING_STATE_ABORTED;
>> 1304 spin_unlock_irqrestore(&xhci->lock, flags);
>> 1305 xhci_dbg(xhci, "Command timeout\n");
>> 1306 ret = xhci_abort_cmd_ring(xhci);
>> 1307 if (unlikely(ret == -ESHUTDOWN)) {
>> 1308 xhci_err(xhci, "Abort command ring failed\n");
>> 1309 xhci_cleanup_command_queue(xhci);
>> 1310 usb_hc_died(xhci_to_hcd(xhci)->primary_hcd);
>> 1311 xhci_dbg(xhci, "xHCI host controller is dead.\n");
>> 1312 }
>> 1313 return;
>> 1314 }
>> 1315
>> 1316 /* host removed. Bail out */
>> 1317 if (xhci->xhc_state & XHCI_STATE_REMOVING) {
>> 1318 spin_unlock_irqrestore(&xhci->lock, flags);
>> 1319 xhci_dbg(xhci, "host removed, ring start fail?\n");
>> 1320 xhci_cleanup_command_queue(xhci);
>> 1321 return;
>> 1322 }
>>
>> I think this part of code should be moved up to line 1295.
>
> The XHCI_STATE_REMOVING and XHCI_STATE_DYING needs a rework,
> I'm working on that.
>
> Basically we want XHCI_STATE_REMOVING to mean that all devices are going,
> away and driver will be removed. Don't bother with re-calculating available
> bandwidths after every device removal, but do use xhci hardware to disable
> devices cleanly etc.
>
> XHCI_STATE_DYING should mean hardware is not working/responding. Don't
> bother writing any registers or queuing anything. Just return all
> pending and cancelled URBs, notify core we died, and free all allocated memory.
Okay, thanks for the information.
>
>>
>> 1323
>> 1324 /* command timeout on stopped ring, ring can't be aborted */
>> 1325 xhci_dbg(xhci, "Command timeout on stopped ring\n");
>> 1326 xhci_handle_stopped_cmd_ring(xhci, xhci->current_cmd);
>> 1327 spin_unlock_irqrestore(&xhci->lock, flags);
>>
>> This part of code is tricky. I have no idea about in which case should this
>> code be executed? Anyway, we shouldn't call xhci_handle_stopped_cmd_ring()
>> here, right?
>>
>
> This isn't changed it these patches.
>
> It will remove the aborted commands and restart the ring. It's useful if we
> want to abort a command but command ring was not running. (if for some
> unkown reason it was stopped, or forgot to restart.
Make sense.
So how about put a warning (instead of a debug message which will normally
be ignored) here?
Best regards,
Lu Baolu
next prev parent reply other threads:[~2016-12-22 1:39 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-12-05 7:51 [PATCH 1/2] usb: host: xhci: Fix possible wild pointer when handling abort command Baolin Wang
2016-12-05 7:51 ` [PATCH 2/2] usb: host: xhci: Handle the right timeout command Baolin Wang
2016-12-12 15:52 ` Mathias Nyman
2016-12-13 3:21 ` Baolin Wang
2016-12-19 10:33 ` Mathias Nyman
2016-12-19 11:34 ` Baolin Wang
2016-12-19 12:13 ` Mathias Nyman
2016-12-20 3:23 ` Baolin Wang
2016-12-20 4:29 ` Lu Baolu
2016-12-20 6:06 ` Baolin Wang
2016-12-20 6:39 ` Lu Baolu
2016-12-20 6:46 ` Baolin Wang
2016-12-20 7:18 ` Lu Baolu
2016-12-20 7:30 ` Baolin Wang
2016-12-20 15:13 ` Mathias Nyman
2016-12-21 2:22 ` Baolin Wang
2016-12-21 13:00 ` Mathias Nyman
2016-12-27 3:07 ` Baolin Wang
2017-01-02 14:57 ` Mathias Nyman
2017-01-03 6:20 ` Baolin Wang
2016-12-21 6:17 ` Lu Baolu
2016-12-21 12:48 ` Mathias Nyman
2016-12-21 14:10 ` OGAWA Hirofumi
2016-12-21 15:04 ` Mathias Nyman
2016-12-21 15:18 ` OGAWA Hirofumi
2016-12-22 1:46 ` Lu Baolu
2016-12-23 12:54 ` Mathias Nyman
2016-12-22 1:43 ` Lu Baolu
2016-12-21 6:57 ` Lu Baolu
2016-12-21 12:57 ` Mathias Nyman
2016-12-22 1:39 ` Lu Baolu [this message]
2016-12-05 14:08 ` [PATCH 1/2] usb: host: xhci: Fix possible wild pointer when handling abort command Mathias Nyman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=585B2ECA.7020703@linux.intel.com \
--to=baolu.lu@linux.intel.com \
--cc=baolin.wang@linaro.org \
--cc=baolu.lu@intel.com \
--cc=broonie@kernel.org \
--cc=gregkh@linuxfoundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-usb@vger.kernel.org \
--cc=mathias.nyman@intel.com \
--cc=mathias.nyman@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).