* [RFT PATCH] xhci: Fix handling timeouted commands on hosts in weird states.
@ 2016-05-12 11:07 Mathias Nyman
2016-05-13 8:29 ` Greg KH
0 siblings, 1 reply; 7+ messages in thread
From: Mathias Nyman @ 2016-05-12 11:07 UTC (permalink / raw)
To: Mathias Nyman; +Cc: stable
If commands timeout we mark them for abortion, then stop the command
ring, and turn the commands to no-ops and finally restart the command
ring.
If the host is working properly the no-op commands will finish and
pending completions are called.
If we notice the host is failing driver clears the command ring and
completes, deletes and frees all pending commands.
There are two separate cases reported where host is believed to work
properly but is not. In the first case we successfully stop the ring
but no abort or stop commnand ring event is ever sent and host locks up.
The second case is if a host is removed, command times out and driver
believes the ring is stopped, and assumes it be restarted, but actually
ends up timing out on the same command forever.
If one of the pending commands has the xhci->mutex held it will block
xhci_stop() in the remove codepath which otherwise would cleanup pending
commands.
Add a check that clears all pending commands in case host is removed,
or we are stuck timeouting on the same command. Also restart the
command timeout timer when stopping the command ring to ensure we
recive an ring stop/abort event.
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
---
drivers/usb/host/xhci-ring.c | 27 ++++++++++++++++++++++-----
1 file changed, 22 insertions(+), 5 deletions(-)
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 52deae4..c82570d 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -290,6 +290,14 @@ static int xhci_abort_cmd_ring(struct xhci_hcd *xhci)
temp_64 = xhci_read_64(xhci, &xhci->op_regs->cmd_ring);
xhci->cmd_ring_state = CMD_RING_STATE_ABORTED;
+
+ /*
+ * Writing the CMD_RING_ABORT bit should cause a cmd completion event,
+ * however on some host hw the CMD_RING_RUNNING bit is correctly cleared
+ * but the completion event in never sent. Use the cmd timeout timer to
+ * handle those cases. Use twice the time to cover the bit polling retry
+ */
+ mod_timer(&xhci->cmd_timer, jiffies + (2 * XHCI_CMD_DEFAULT_TIMEOUT));
xhci_write_64(xhci, temp_64 | CMD_RING_ABORT,
&xhci->op_regs->cmd_ring);
@@ -314,6 +322,7 @@ static int xhci_abort_cmd_ring(struct xhci_hcd *xhci)
xhci_err(xhci, "Stopped the command ring failed, "
"maybe the host is dead\n");
+ del_timer(&xhci->cmd_timer);
xhci->xhc_state |= XHCI_STATE_DYING;
xhci_quiesce(xhci);
xhci_halt(xhci);
@@ -1246,22 +1255,21 @@ void xhci_handle_command_timeout(unsigned long data)
int ret;
unsigned long flags;
u64 hw_ring_state;
- struct xhci_command *cur_cmd = NULL;
+ bool second_timeout = false;
xhci = (struct xhci_hcd *) data;
/* mark this command to be cancelled */
spin_lock_irqsave(&xhci->lock, flags);
if (xhci->current_cmd) {
- cur_cmd = xhci->current_cmd;
- cur_cmd->status = COMP_CMD_ABORT;
+ if (xhci->current_cmd->status == COMP_CMD_ABORT)
+ second_timeout = true;
+ xhci->current_cmd->status = COMP_CMD_ABORT;
}
-
/* Make sure command ring is running before aborting it */
hw_ring_state = xhci_read_64(xhci, &xhci->op_regs->cmd_ring);
if ((xhci->cmd_ring_state & CMD_RING_STATE_RUNNING) &&
(hw_ring_state & CMD_RING_RUNNING)) {
-
spin_unlock_irqrestore(&xhci->lock, flags);
xhci_dbg(xhci, "Command timeout\n");
ret = xhci_abort_cmd_ring(xhci);
@@ -1273,6 +1281,15 @@ void xhci_handle_command_timeout(unsigned long data)
}
return;
}
+
+ /* command ring failed to restart, or host removed. Bail out */
+ if (second_timeout || xhci->xhc_state & XHCI_STATE_REMOVING) {
+ spin_unlock_irqrestore(&xhci->lock, flags);
+ xhci_dbg(xhci, "command timed out twice, ring start fail?\n");
+ xhci_cleanup_command_queue(xhci);
+ return;
+ }
+
/* command timeout on stopped ring, ring can't be aborted */
xhci_dbg(xhci, "Command timeout on stopped ring\n");
xhci_handle_stopped_cmd_ring(xhci, xhci->current_cmd);
--
1.9.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [RFT PATCH] xhci: Fix handling timeouted commands on hosts in weird states.
[not found] <573319F4.2020004@linux.intel.com>
@ 2016-05-12 11:57 ` Mathias Nyman
2016-05-13 14:01 ` Joe Lawrence
2016-05-25 15:33 ` Joe Lawrence
0 siblings, 2 replies; 7+ messages in thread
From: Mathias Nyman @ 2016-05-12 11:57 UTC (permalink / raw)
To: joe.lawrence; +Cc: linux-usb, derek.shute, rajesh.bhagat, Mathias Nyman, stable
If commands timeout we mark them for abortion, then stop the command
ring, and turn the commands to no-ops and finally restart the command
ring.
If the host is working properly the no-op commands will finish and
pending completions are called.
If we notice the host is failing driver clears the command ring and
completes, deletes and frees all pending commands.
There are two separate cases reported where host is believed to work
properly but is not. In the first case we successfully stop the ring
but no abort or stop commnand ring event is ever sent and host locks up.
The second case is if a host is removed, command times out and driver
believes the ring is stopped, and assumes it be restarted, but actually
ends up timing out on the same command forever.
If one of the pending commands has the xhci->mutex held it will block
xhci_stop() in the remove codepath which otherwise would cleanup pending
commands.
Add a check that clears all pending commands in case host is removed,
or we are stuck timeouting on the same command. Also restart the
command timeout timer when stopping the command ring to ensure we
recive an ring stop/abort event.
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
---
drivers/usb/host/xhci-ring.c | 27 ++++++++++++++++++++++-----
1 file changed, 22 insertions(+), 5 deletions(-)
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 52deae4..c82570d 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -290,6 +290,14 @@ static int xhci_abort_cmd_ring(struct xhci_hcd *xhci)
temp_64 = xhci_read_64(xhci, &xhci->op_regs->cmd_ring);
xhci->cmd_ring_state = CMD_RING_STATE_ABORTED;
+
+ /*
+ * Writing the CMD_RING_ABORT bit should cause a cmd completion event,
+ * however on some host hw the CMD_RING_RUNNING bit is correctly cleared
+ * but the completion event in never sent. Use the cmd timeout timer to
+ * handle those cases. Use twice the time to cover the bit polling retry
+ */
+ mod_timer(&xhci->cmd_timer, jiffies + (2 * XHCI_CMD_DEFAULT_TIMEOUT));
xhci_write_64(xhci, temp_64 | CMD_RING_ABORT,
&xhci->op_regs->cmd_ring);
@@ -314,6 +322,7 @@ static int xhci_abort_cmd_ring(struct xhci_hcd *xhci)
xhci_err(xhci, "Stopped the command ring failed, "
"maybe the host is dead\n");
+ del_timer(&xhci->cmd_timer);
xhci->xhc_state |= XHCI_STATE_DYING;
xhci_quiesce(xhci);
xhci_halt(xhci);
@@ -1246,22 +1255,21 @@ void xhci_handle_command_timeout(unsigned long data)
int ret;
unsigned long flags;
u64 hw_ring_state;
- struct xhci_command *cur_cmd = NULL;
+ bool second_timeout = false;
xhci = (struct xhci_hcd *) data;
/* mark this command to be cancelled */
spin_lock_irqsave(&xhci->lock, flags);
if (xhci->current_cmd) {
- cur_cmd = xhci->current_cmd;
- cur_cmd->status = COMP_CMD_ABORT;
+ if (xhci->current_cmd->status == COMP_CMD_ABORT)
+ second_timeout = true;
+ xhci->current_cmd->status = COMP_CMD_ABORT;
}
-
/* Make sure command ring is running before aborting it */
hw_ring_state = xhci_read_64(xhci, &xhci->op_regs->cmd_ring);
if ((xhci->cmd_ring_state & CMD_RING_STATE_RUNNING) &&
(hw_ring_state & CMD_RING_RUNNING)) {
-
spin_unlock_irqrestore(&xhci->lock, flags);
xhci_dbg(xhci, "Command timeout\n");
ret = xhci_abort_cmd_ring(xhci);
@@ -1273,6 +1281,15 @@ void xhci_handle_command_timeout(unsigned long data)
}
return;
}
+
+ /* command ring failed to restart, or host removed. Bail out */
+ if (second_timeout || xhci->xhc_state & XHCI_STATE_REMOVING) {
+ spin_unlock_irqrestore(&xhci->lock, flags);
+ xhci_dbg(xhci, "command timed out twice, ring start fail?\n");
+ xhci_cleanup_command_queue(xhci);
+ return;
+ }
+
/* command timeout on stopped ring, ring can't be aborted */
xhci_dbg(xhci, "Command timeout on stopped ring\n");
xhci_handle_stopped_cmd_ring(xhci, xhci->current_cmd);
--
1.9.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [RFT PATCH] xhci: Fix handling timeouted commands on hosts in weird states.
2016-05-12 11:07 [RFT PATCH] xhci: Fix handling timeouted commands on hosts in weird states Mathias Nyman
@ 2016-05-13 8:29 ` Greg KH
2016-05-13 9:49 ` Mathias Nyman
0 siblings, 1 reply; 7+ messages in thread
From: Greg KH @ 2016-05-13 8:29 UTC (permalink / raw)
To: Mathias Nyman; +Cc: stable
On Thu, May 12, 2016 at 02:07:41PM +0300, Mathias Nyman wrote:
> If commands timeout we mark them for abortion, then stop the command
> ring, and turn the commands to no-ops and finally restart the command
> ring.
>
> If the host is working properly the no-op commands will finish and
> pending completions are called.
> If we notice the host is failing driver clears the command ring and
> completes, deletes and frees all pending commands.
>
> There are two separate cases reported where host is believed to work
> properly but is not. In the first case we successfully stop the ring
> but no abort or stop commnand ring event is ever sent and host locks up.
>
> The second case is if a host is removed, command times out and driver
> believes the ring is stopped, and assumes it be restarted, but actually
> ends up timing out on the same command forever.
> If one of the pending commands has the xhci->mutex held it will block
> xhci_stop() in the remove codepath which otherwise would cleanup pending
> commands.
>
> Add a check that clears all pending commands in case host is removed,
> or we are stuck timeouting on the same command. Also restart the
> command timeout timer when stopping the command ring to ensure we
> recive an ring stop/abort event.
>
> Cc: stable <stable@vger.kernel.org>
> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Any reason why you just sent this to stable@ and not linux-usb@?
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFT PATCH] xhci: Fix handling timeouted commands on hosts in weird states.
2016-05-13 8:29 ` Greg KH
@ 2016-05-13 9:49 ` Mathias Nyman
0 siblings, 0 replies; 7+ messages in thread
From: Mathias Nyman @ 2016-05-13 9:49 UTC (permalink / raw)
To: Greg KH; +Cc: stable
On 13.05.2016 11:29, Greg KH wrote:
> On Thu, May 12, 2016 at 02:07:41PM +0300, Mathias Nyman wrote:
>> If commands timeout we mark them for abortion, then stop the command
>> ring, and turn the commands to no-ops and finally restart the command
>> ring.
>>
>> If the host is working properly the no-op commands will finish and
>> pending completions are called.
>> If we notice the host is failing driver clears the command ring and
>> completes, deletes and frees all pending commands.
>>
>> There are two separate cases reported where host is believed to work
>> properly but is not. In the first case we successfully stop the ring
>> but no abort or stop commnand ring event is ever sent and host locks up.
>>
>> The second case is if a host is removed, command times out and driver
>> believes the ring is stopped, and assumes it be restarted, but actually
>> ends up timing out on the same command forever.
>> If one of the pending commands has the xhci->mutex held it will block
>> xhci_stop() in the remove codepath which otherwise would cleanup pending
>> commands.
>>
>> Add a check that clears all pending commands in case host is removed,
>> or we are stuck timeouting on the same command. Also restart the
>> command timeout timer when stopping the command ring to ensure we
>> recive an ring stop/abort event.
>>
>> Cc: stable <stable@vger.kernel.org>
>> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
>
> Any reason why you just sent this to stable@ and not linux-usb@?
>
Missing --suppress-cc=all, was supposed to go only to me for a testapply-build round.
Same patch is now in linux-usb with correct people and lists in the to and cc fields.
Sorry about the extra fuss
-Mathias
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFT PATCH] xhci: Fix handling timeouted commands on hosts in weird states.
2016-05-12 11:57 ` Mathias Nyman
@ 2016-05-13 14:01 ` Joe Lawrence
2016-05-25 15:33 ` Joe Lawrence
1 sibling, 0 replies; 7+ messages in thread
From: Joe Lawrence @ 2016-05-13 14:01 UTC (permalink / raw)
To: Mathias Nyman; +Cc: linux-usb, derek.shute, rajesh.bhagat, stable
On 05/12/2016 07:57 AM, Mathias Nyman wrote:
> If commands timeout we mark them for abortion, then stop the command
> ring, and turn the commands to no-ops and finally restart the command
> ring.
>
> If the host is working properly the no-op commands will finish and
> pending completions are called.
> If we notice the host is failing driver clears the command ring and
> completes, deletes and frees all pending commands.
>
> There are two separate cases reported where host is believed to work
> properly but is not. In the first case we successfully stop the ring
> but no abort or stop commnand ring event is ever sent and host locks up.
s/commnand/command/
>
> The second case is if a host is removed, command times out and driver
> believes the ring is stopped, and assumes it be restarted, but actually
> ends up timing out on the same command forever.
> If one of the pending commands has the xhci->mutex held it will block
> xhci_stop() in the remove codepath which otherwise would cleanup pending
> commands.
>
> Add a check that clears all pending commands in case host is removed,
> or we are stuck timeouting on the same command. Also restart the
s/timeouting/timing out/
> command timeout timer when stopping the command ring to ensure we
> recive an ring stop/abort event.
s/recive/receive/
>
> Cc: stable <stable@vger.kernel.org>
> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
> ---
> drivers/usb/host/xhci-ring.c | 27 ++++++++++++++++++++++-----
> 1 file changed, 22 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
> index 52deae4..c82570d 100644
> --- a/drivers/usb/host/xhci-ring.c
> +++ b/drivers/usb/host/xhci-ring.c
> @@ -290,6 +290,14 @@ static int xhci_abort_cmd_ring(struct xhci_hcd *xhci)
>
> temp_64 = xhci_read_64(xhci, &xhci->op_regs->cmd_ring);
> xhci->cmd_ring_state = CMD_RING_STATE_ABORTED;
> +
> + /*
> + * Writing the CMD_RING_ABORT bit should cause a cmd completion event,
> + * however on some host hw the CMD_RING_RUNNING bit is correctly cleared
> + * but the completion event in never sent. Use the cmd timeout timer to
> + * handle those cases. Use twice the time to cover the bit polling retry
> + */
> + mod_timer(&xhci->cmd_timer, jiffies + (2 * XHCI_CMD_DEFAULT_TIMEOUT));
> xhci_write_64(xhci, temp_64 | CMD_RING_ABORT,
> &xhci->op_regs->cmd_ring);
>
> @@ -314,6 +322,7 @@ static int xhci_abort_cmd_ring(struct xhci_hcd *xhci)
>
> xhci_err(xhci, "Stopped the command ring failed, "
> "maybe the host is dead\n");
> + del_timer(&xhci->cmd_timer);
> xhci->xhc_state |= XHCI_STATE_DYING;
> xhci_quiesce(xhci);
> xhci_halt(xhci);
> @@ -1246,22 +1255,21 @@ void xhci_handle_command_timeout(unsigned long data)
> int ret;
> unsigned long flags;
> u64 hw_ring_state;
> - struct xhci_command *cur_cmd = NULL;
> + bool second_timeout = false;
> xhci = (struct xhci_hcd *) data;
>
> /* mark this command to be cancelled */
> spin_lock_irqsave(&xhci->lock, flags);
> if (xhci->current_cmd) {
> - cur_cmd = xhci->current_cmd;
> - cur_cmd->status = COMP_CMD_ABORT;
> + if (xhci->current_cmd->status == COMP_CMD_ABORT)
> + second_timeout = true;
> + xhci->current_cmd->status = COMP_CMD_ABORT;
> }
>
> -
> /* Make sure command ring is running before aborting it */
> hw_ring_state = xhci_read_64(xhci, &xhci->op_regs->cmd_ring);
> if ((xhci->cmd_ring_state & CMD_RING_STATE_RUNNING) &&
> (hw_ring_state & CMD_RING_RUNNING)) {
> -
> spin_unlock_irqrestore(&xhci->lock, flags);
> xhci_dbg(xhci, "Command timeout\n");
> ret = xhci_abort_cmd_ring(xhci);
> @@ -1273,6 +1281,15 @@ void xhci_handle_command_timeout(unsigned long data)
> }
> return;
> }
> +
> + /* command ring failed to restart, or host removed. Bail out */
> + if (second_timeout || xhci->xhc_state & XHCI_STATE_REMOVING) {
> + spin_unlock_irqrestore(&xhci->lock, flags);
> + xhci_dbg(xhci, "command timed out twice, ring start fail?\n");
> + xhci_cleanup_command_queue(xhci);
> + return;
> + }
> +
> /* command timeout on stopped ring, ring can't be aborted */
> xhci_dbg(xhci, "Command timeout on stopped ring\n");
> xhci_handle_stopped_cmd_ring(xhci, xhci->current_cmd);
>
Hi Mathias,
At first glance, this patch looks good. As I mentioned earlier, after
applying "xhci_mem_cleanup after second hcd" a lot of issues we'd been
seeing on Stratus HW cleared up. That said, I'll add this patch to my
testing and report any problems.
Thanks!
-- Joe
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFT PATCH] xhci: Fix handling timeouted commands on hosts in weird states.
2016-05-12 11:57 ` Mathias Nyman
2016-05-13 14:01 ` Joe Lawrence
@ 2016-05-25 15:33 ` Joe Lawrence
2016-05-26 3:34 ` Rajesh Bhagat
1 sibling, 1 reply; 7+ messages in thread
From: Joe Lawrence @ 2016-05-25 15:33 UTC (permalink / raw)
To: Mathias Nyman; +Cc: linux-usb, derek.shute, rajesh.bhagat, stable
On 05/12/2016 07:57 AM, Mathias Nyman wrote:
> If commands timeout we mark them for abortion, then stop the command
> ring, and turn the commands to no-ops and finally restart the command
> ring.
>
> If the host is working properly the no-op commands will finish and
> pending completions are called.
> If we notice the host is failing driver clears the command ring and
> completes, deletes and frees all pending commands.
>
> There are two separate cases reported where host is believed to work
> properly but is not. In the first case we successfully stop the ring
> but no abort or stop commnand ring event is ever sent and host locks up.
>
> The second case is if a host is removed, command times out and driver
> believes the ring is stopped, and assumes it be restarted, but actually
> ends up timing out on the same command forever.
> If one of the pending commands has the xhci->mutex held it will block
> xhci_stop() in the remove codepath which otherwise would cleanup pending
> commands.
>
> Add a check that clears all pending commands in case host is removed,
> or we are stuck timeouting on the same command. Also restart the
> command timeout timer when stopping the command ring to ensure we
> recive an ring stop/abort event.
>
> Cc: stable <stable@vger.kernel.org>
> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
> ---
> drivers/usb/host/xhci-ring.c | 27 ++++++++++++++++++++++-----
> 1 file changed, 22 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
> index 52deae4..c82570d 100644
> --- a/drivers/usb/host/xhci-ring.c
> +++ b/drivers/usb/host/xhci-ring.c
> @@ -290,6 +290,14 @@ static int xhci_abort_cmd_ring(struct xhci_hcd *xhci)
>
> temp_64 = xhci_read_64(xhci, &xhci->op_regs->cmd_ring);
> xhci->cmd_ring_state = CMD_RING_STATE_ABORTED;
> +
> + /*
> + * Writing the CMD_RING_ABORT bit should cause a cmd completion event,
> + * however on some host hw the CMD_RING_RUNNING bit is correctly cleared
> + * but the completion event in never sent. Use the cmd timeout timer to
> + * handle those cases. Use twice the time to cover the bit polling retry
> + */
> + mod_timer(&xhci->cmd_timer, jiffies + (2 * XHCI_CMD_DEFAULT_TIMEOUT));
> xhci_write_64(xhci, temp_64 | CMD_RING_ABORT,
> &xhci->op_regs->cmd_ring);
>
> @@ -314,6 +322,7 @@ static int xhci_abort_cmd_ring(struct xhci_hcd *xhci)
>
> xhci_err(xhci, "Stopped the command ring failed, "
> "maybe the host is dead\n");
> + del_timer(&xhci->cmd_timer);
> xhci->xhc_state |= XHCI_STATE_DYING;
> xhci_quiesce(xhci);
> xhci_halt(xhci);
> @@ -1246,22 +1255,21 @@ void xhci_handle_command_timeout(unsigned long data)
> int ret;
> unsigned long flags;
> u64 hw_ring_state;
> - struct xhci_command *cur_cmd = NULL;
> + bool second_timeout = false;
> xhci = (struct xhci_hcd *) data;
>
> /* mark this command to be cancelled */
> spin_lock_irqsave(&xhci->lock, flags);
> if (xhci->current_cmd) {
> - cur_cmd = xhci->current_cmd;
> - cur_cmd->status = COMP_CMD_ABORT;
> + if (xhci->current_cmd->status == COMP_CMD_ABORT)
> + second_timeout = true;
> + xhci->current_cmd->status = COMP_CMD_ABORT;
> }
>
> -
> /* Make sure command ring is running before aborting it */
> hw_ring_state = xhci_read_64(xhci, &xhci->op_regs->cmd_ring);
> if ((xhci->cmd_ring_state & CMD_RING_STATE_RUNNING) &&
> (hw_ring_state & CMD_RING_RUNNING)) {
> -
> spin_unlock_irqrestore(&xhci->lock, flags);
> xhci_dbg(xhci, "Command timeout\n");
> ret = xhci_abort_cmd_ring(xhci);
> @@ -1273,6 +1281,15 @@ void xhci_handle_command_timeout(unsigned long data)
> }
> return;
> }
> +
> + /* command ring failed to restart, or host removed. Bail out */
> + if (second_timeout || xhci->xhc_state & XHCI_STATE_REMOVING) {
> + spin_unlock_irqrestore(&xhci->lock, flags);
> + xhci_dbg(xhci, "command timed out twice, ring start fail?\n");
> + xhci_cleanup_command_queue(xhci);
> + return;
> + }
> +
> /* command timeout on stopped ring, ring can't be aborted */
> xhci_dbg(xhci, "Command timeout on stopped ring\n");
> xhci_handle_stopped_cmd_ring(xhci, xhci->current_cmd);
>
So far so good, feel free to add my Tested-by tag.
-- Joe
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: [RFT PATCH] xhci: Fix handling timeouted commands on hosts in weird states.
2016-05-25 15:33 ` Joe Lawrence
@ 2016-05-26 3:34 ` Rajesh Bhagat
0 siblings, 0 replies; 7+ messages in thread
From: Rajesh Bhagat @ 2016-05-26 3:34 UTC (permalink / raw)
To: Joe Lawrence, Mathias Nyman
Cc: linux-usb@vger.kernel.org, derek.shute@stratus.com, stable
> -----Original Message-----
> From: Joe Lawrence [mailto:joe.lawrence@stratus.com]
> Sent: Wednesday, May 25, 2016 9:04 PM
> To: Mathias Nyman <mathias.nyman@linux.intel.com>
> Cc: linux-usb@vger.kernel.org; derek.shute@stratus.com; Rajesh Bhagat
> <rajesh.bhagat@nxp.com>; stable <stable@vger.kernel.org>
> Subject: Re: [RFT PATCH] xhci: Fix handling timeouted commands on hosts in weird
> states.
>
> On 05/12/2016 07:57 AM, Mathias Nyman wrote:
> > If commands timeout we mark them for abortion, then stop the command
> > ring, and turn the commands to no-ops and finally restart the command
> > ring.
> >
> > If the host is working properly the no-op commands will finish and
> > pending completions are called.
> > If we notice the host is failing driver clears the command ring and
> > completes, deletes and frees all pending commands.
> >
> > There are two separate cases reported where host is believed to work
> > properly but is not. In the first case we successfully stop the ring
> > but no abort or stop commnand ring event is ever sent and host locks up.
> >
> > The second case is if a host is removed, command times out and driver
> > believes the ring is stopped, and assumes it be restarted, but
> > actually ends up timing out on the same command forever.
> > If one of the pending commands has the xhci->mutex held it will block
> > xhci_stop() in the remove codepath which otherwise would cleanup
> > pending commands.
> >
> > Add a check that clears all pending commands in case host is removed,
> > or we are stuck timeouting on the same command. Also restart the
> > command timeout timer when stopping the command ring to ensure we
> > recive an ring stop/abort event.
> >
> > Cc: stable <stable@vger.kernel.org>
> > Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
> > ---
> > drivers/usb/host/xhci-ring.c | 27 ++++++++++++++++++++++-----
> > 1 file changed, 22 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/usb/host/xhci-ring.c
> > b/drivers/usb/host/xhci-ring.c index 52deae4..c82570d 100644
> > --- a/drivers/usb/host/xhci-ring.c
> > +++ b/drivers/usb/host/xhci-ring.c
> > @@ -290,6 +290,14 @@ static int xhci_abort_cmd_ring(struct xhci_hcd
> > *xhci)
> >
> > temp_64 = xhci_read_64(xhci, &xhci->op_regs->cmd_ring);
> > xhci->cmd_ring_state = CMD_RING_STATE_ABORTED;
> > +
> > + /*
> > + * Writing the CMD_RING_ABORT bit should cause a cmd completion event,
> > + * however on some host hw the CMD_RING_RUNNING bit is correctly cleared
> > + * but the completion event in never sent. Use the cmd timeout timer to
> > + * handle those cases. Use twice the time to cover the bit polling retry
> > + */
> > + mod_timer(&xhci->cmd_timer, jiffies + (2 *
> > +XHCI_CMD_DEFAULT_TIMEOUT));
> > xhci_write_64(xhci, temp_64 | CMD_RING_ABORT,
> > &xhci->op_regs->cmd_ring);
> >
> > @@ -314,6 +322,7 @@ static int xhci_abort_cmd_ring(struct xhci_hcd
> > *xhci)
> >
> > xhci_err(xhci, "Stopped the command ring failed, "
> > "maybe the host is dead\n");
> > + del_timer(&xhci->cmd_timer);
> > xhci->xhc_state |= XHCI_STATE_DYING;
> > xhci_quiesce(xhci);
> > xhci_halt(xhci);
> > @@ -1246,22 +1255,21 @@ void xhci_handle_command_timeout(unsigned long
> data)
> > int ret;
> > unsigned long flags;
> > u64 hw_ring_state;
> > - struct xhci_command *cur_cmd = NULL;
> > + bool second_timeout = false;
> > xhci = (struct xhci_hcd *) data;
> >
> > /* mark this command to be cancelled */
> > spin_lock_irqsave(&xhci->lock, flags);
> > if (xhci->current_cmd) {
> > - cur_cmd = xhci->current_cmd;
> > - cur_cmd->status = COMP_CMD_ABORT;
> > + if (xhci->current_cmd->status == COMP_CMD_ABORT)
> > + second_timeout = true;
> > + xhci->current_cmd->status = COMP_CMD_ABORT;
> > }
> >
> > -
> > /* Make sure command ring is running before aborting it */
> > hw_ring_state = xhci_read_64(xhci, &xhci->op_regs->cmd_ring);
> > if ((xhci->cmd_ring_state & CMD_RING_STATE_RUNNING) &&
> > (hw_ring_state & CMD_RING_RUNNING)) {
> > -
> > spin_unlock_irqrestore(&xhci->lock, flags);
> > xhci_dbg(xhci, "Command timeout\n");
> > ret = xhci_abort_cmd_ring(xhci);
> > @@ -1273,6 +1281,15 @@ void xhci_handle_command_timeout(unsigned long
> data)
> > }
> > return;
> > }
> > +
> > + /* command ring failed to restart, or host removed. Bail out */
> > + if (second_timeout || xhci->xhc_state & XHCI_STATE_REMOVING) {
> > + spin_unlock_irqrestore(&xhci->lock, flags);
> > + xhci_dbg(xhci, "command timed out twice, ring start fail?\n");
> > + xhci_cleanup_command_queue(xhci);
> > + return;
> > + }
> > +
> > /* command timeout on stopped ring, ring can't be aborted */
> > xhci_dbg(xhci, "Command timeout on stopped ring\n");
> > xhci_handle_stopped_cmd_ring(xhci, xhci->current_cmd);
> >
>
Hello Mathias,
I'm using kernel version 4.1.8, and this patch applies OK but fails in compilation
due to missing definition of 'XHCI_STATE_REMOVING'.
Can you help me list the dependent patches that can be applied to 4.1.8 to
make this patch work.
Best Regards,
Rajesh Bhagat
> So far so good, feel free to add my Tested-by tag.
>
> -- Joe
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2016-05-26 3:49 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-05-12 11:07 [RFT PATCH] xhci: Fix handling timeouted commands on hosts in weird states Mathias Nyman
2016-05-13 8:29 ` Greg KH
2016-05-13 9:49 ` Mathias Nyman
[not found] <573319F4.2020004@linux.intel.com>
2016-05-12 11:57 ` Mathias Nyman
2016-05-13 14:01 ` Joe Lawrence
2016-05-25 15:33 ` Joe Lawrence
2016-05-26 3:34 ` Rajesh Bhagat
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).