stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] xhci: fix off by one error in TRB DMA address boundary check
       [not found] <1438607269-8977-1-git-send-email-mathias.nyman@linux.intel.com>
@ 2015-08-03 13:07 ` Mathias Nyman
  2015-08-03 13:07 ` [PATCH 2/2] drivers/usb: Delete XHCI command timer if necessary Mathias Nyman
  1 sibling, 0 replies; 6+ messages in thread
From: Mathias Nyman @ 2015-08-03 13:07 UTC (permalink / raw)
  To: gregkh; +Cc: linux-usb, Mathias Nyman, stable

We need to check that a TRB is part of the current segment
before calculating its DMA address.

Previously a ring segment didn't use a full memory page, and every
new ring segment got a new memory page, so the off by one
error in checking the upper bound was never seen.

Now that we use a full memory page, 256 TRBs (4096 bytes), the off by one
didn't catch the case when a TRB was the first element of the next segment.

This is triggered if the virtual memory pages for a ring segment are
next to each in increasing order where the ring buffer wraps around and
causes errors like:

[  106.398223] xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not
 part of current TD ep_index 0 comp_code 1
[  106.398230] xhci_hcd 0000:00:14.0: Looking for event-dma fffd3000
 trb-start fffd4fd0 trb-end fffd5000 seg-start fffd4000 seg-end fffd4ff0

The trb-end address is one outside the end-seg address.

Cc: <stable@vger.kernel.org>
Tested-by: Arkadiusz Miśkiewicz <arekm@maven.pl>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
---
 drivers/usb/host/xhci-ring.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 6a8fc52..32f4d56 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -82,7 +82,7 @@ dma_addr_t xhci_trb_virt_to_dma(struct xhci_segment *seg,
 		return 0;
 	/* offset in TRBs */
 	segment_offset = trb - seg->trbs;
-	if (segment_offset > TRBS_PER_SEGMENT)
+	if (segment_offset >= TRBS_PER_SEGMENT)
 		return 0;
 	return seg->dma + (segment_offset * sizeof(*trb));
 }
-- 
1.8.3.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] drivers/usb: Delete XHCI command timer if necessary
       [not found] <1438607269-8977-1-git-send-email-mathias.nyman@linux.intel.com>
  2015-08-03 13:07 ` [PATCH 1/2] xhci: fix off by one error in TRB DMA address boundary check Mathias Nyman
@ 2015-08-03 13:07 ` Mathias Nyman
  2015-08-11  8:15   ` Oliver Neukum
  1 sibling, 1 reply; 6+ messages in thread
From: Mathias Nyman @ 2015-08-03 13:07 UTC (permalink / raw)
  To: gregkh; +Cc: linux-usb, Gavin Shan, stable, Mathias Nyman

From: Gavin Shan <gwshan@linux.vnet.ibm.com>

When xhci_mem_cleanup() is called, it's possible that the command
timer isn't initialized and scheduled. For those cases, to delete
the command timer causes soft-lockup as below stack dump shows.

The patch avoids deleting the command timer if it's not scheduled
with the help of timer_pending().

NMI watchdog: BUG: soft lockup - CPU#40 stuck for 23s! [kworker/40:1:8140]
      :
NIP [c000000000150b30] lock_timer_base.isra.34+0x90/0xa0
LR [c000000000150c24] try_to_del_timer_sync+0x34/0xa0
Call Trace:
[c000000f67c975e0] [c0000000015b84f8] mon_ops+0x0/0x8 (unreliable)
[c000000f67c97620] [c000000000150c24] try_to_del_timer_sync+0x34/0xa0
[c000000f67c97660] [c000000000150cf0] del_timer_sync+0x60/0x80
[c000000f67c97690] [c00000000070ac0c] xhci_mem_cleanup+0x5c/0x5e0
[c000000f67c97740] [c00000000070c2e8] xhci_mem_init+0x1158/0x13b0
[c000000f67c97860] [c000000000700978] xhci_init+0x88/0x110
[c000000f67c978e0] [c000000000701644] xhci_gen_setup+0x2b4/0x590
[c000000f67c97970] [c0000000006d4410] xhci_pci_setup+0x40/0x190
[c000000f67c979f0] [c0000000006b1af8] usb_add_hcd+0x418/0xba0
[c000000f67c97ab0] [c0000000006cb15c] usb_hcd_pci_probe+0x1dc/0x5c0
[c000000f67c97b50] [c0000000006d3ba4] xhci_pci_probe+0x64/0x1f0
[c000000f67c97ba0] [c0000000004fe9ac] local_pci_probe+0x6c/0x130
[c000000f67c97c30] [c0000000000e5ce8] work_for_cpu_fn+0x38/0x60
[c000000f67c97c60] [c0000000000eacb8] process_one_work+0x198/0x470
[c000000f67c97cf0] [c0000000000eb6ac] worker_thread+0x37c/0x5a0
[c000000f67c97d80] [c0000000000f2730] kthread+0x110/0x130
[c000000f67c97e30] [c000000000009660] ret_from_kernel_thread+0x5c/0x7c

Cc: <stable@vger.kernel.org>
Reported-by: Priya M. A <priyama2@in.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
---
 drivers/usb/host/xhci-mem.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c
index 3e442f7..9a8c936 100644
--- a/drivers/usb/host/xhci-mem.c
+++ b/drivers/usb/host/xhci-mem.c
@@ -1792,7 +1792,8 @@ void xhci_mem_cleanup(struct xhci_hcd *xhci)
 	int size;
 	int i, j, num_ports;
 
-	del_timer_sync(&xhci->cmd_timer);
+	if (timer_pending(&xhci->cmd_timer))
+		del_timer_sync(&xhci->cmd_timer);
 
 	/* Free the Event Ring Segment Table and the actual Event Ring */
 	size = sizeof(struct xhci_erst_entry)*(xhci->erst.num_entries);
-- 
1.8.3.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] drivers/usb: Delete XHCI command timer if necessary
  2015-08-03 13:07 ` [PATCH 2/2] drivers/usb: Delete XHCI command timer if necessary Mathias Nyman
@ 2015-08-11  8:15   ` Oliver Neukum
  2015-08-12 10:55     ` Mathias Nyman
  0 siblings, 1 reply; 6+ messages in thread
From: Oliver Neukum @ 2015-08-11  8:15 UTC (permalink / raw)
  To: Mathias Nyman; +Cc: gregkh, linux-usb, Gavin Shan, stable

On Mon, 2015-08-03 at 16:07 +0300, Mathias Nyman wrote:
> From: Gavin Shan <gwshan@linux.vnet.ibm.com>
> 
> When xhci_mem_cleanup() is called, it's possible that the command
> timer isn't initialized and scheduled. For those cases, to delete
> the command timer causes soft-lockup as below stack dump shows.
> 
> The patch avoids deleting the command timer if it's not scheduled
> with the help of timer_pending().

Are you sure this is safe? timer_pending() will not show you that
the timer function is running. It looks like you introduced a race
between timeout and cleanup.

	Regards
		Oliver



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] drivers/usb: Delete XHCI command timer if necessary
  2015-08-11  8:15   ` Oliver Neukum
@ 2015-08-12 10:55     ` Mathias Nyman
  2015-08-12 13:08       ` Oliver Neukum
  2015-08-12 16:18       ` Greg KH
  0 siblings, 2 replies; 6+ messages in thread
From: Mathias Nyman @ 2015-08-12 10:55 UTC (permalink / raw)
  To: Oliver Neukum, gregkh; +Cc: linux-usb, Gavin Shan, stable

On 11.08.2015 11:15, Oliver Neukum wrote:
> On Mon, 2015-08-03 at 16:07 +0300, Mathias Nyman wrote:
>> From: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>
>> When xhci_mem_cleanup() is called, it's possible that the command
>> timer isn't initialized and scheduled. For those cases, to delete
>> the command timer causes soft-lockup as below stack dump shows.
>>
>> The patch avoids deleting the command timer if it's not scheduled
>> with the help of timer_pending().
> 
> Are you sure this is safe? timer_pending() will not show you that
> the timer function is running. It looks like you introduced a race
> between timeout and cleanup.
> 

Looking at it in more detail you're right.

Fortunately this can only happen in cases where xhci is already hosed
(no command response for 5 seconds), and we are at the same time
anyway about to remove xhci.

Doesn't this mean that all cases with
if (timer_pending(&timer))
	del_timer_sync(&timer)

is just basically the same as a plain del_timer(&timer)?

Anyways, turns out that the error path in xhci initialization code can end up calling
del_timer_sync() before timer is initialized. This should be fixed by re-arranging
some code in xhci initialization instead.

Greg, should this be reverted in rc7?
I think that the possible side effect of this patch is still lesser the original
issue.     

Thanks for spotting this

-Mathias

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] drivers/usb: Delete XHCI command timer if necessary
  2015-08-12 10:55     ` Mathias Nyman
@ 2015-08-12 13:08       ` Oliver Neukum
  2015-08-12 16:18       ` Greg KH
  1 sibling, 0 replies; 6+ messages in thread
From: Oliver Neukum @ 2015-08-12 13:08 UTC (permalink / raw)
  To: Mathias Nyman; +Cc: gregkh, linux-usb, Gavin Shan, stable

On Wed, 2015-08-12 at 13:55 +0300, Mathias Nyman wrote:

> Fortunately this can only happen in cases where xhci is already hosed
> (no command response for 5 seconds), and we are at the same time
> anyway about to remove xhci.
> 
> Doesn't this mean that all cases with
> if (timer_pending(&timer))
> 	del_timer_sync(&timer)
> 
> is just basically the same as a plain del_timer(&timer)?

Yes. I never understood the idiom.

> Anyways, turns out that the error path in xhci initialization code can end up calling
> del_timer_sync() before timer is initialized. This should be fixed by re-arranging
> some code in xhci initialization instead.

Good.

> Greg, should this be reverted in rc7?
> I think that the possible side effect of this patch is still lesser the original
> issue.     

I agree.

	Regards
		Oliver



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] drivers/usb: Delete XHCI command timer if necessary
  2015-08-12 10:55     ` Mathias Nyman
  2015-08-12 13:08       ` Oliver Neukum
@ 2015-08-12 16:18       ` Greg KH
  1 sibling, 0 replies; 6+ messages in thread
From: Greg KH @ 2015-08-12 16:18 UTC (permalink / raw)
  To: Mathias Nyman; +Cc: Oliver Neukum, linux-usb, Gavin Shan, stable

On Wed, Aug 12, 2015 at 01:55:34PM +0300, Mathias Nyman wrote:
> On 11.08.2015 11:15, Oliver Neukum wrote:
> > On Mon, 2015-08-03 at 16:07 +0300, Mathias Nyman wrote:
> >> From: Gavin Shan <gwshan@linux.vnet.ibm.com>
> >>
> >> When xhci_mem_cleanup() is called, it's possible that the command
> >> timer isn't initialized and scheduled. For those cases, to delete
> >> the command timer causes soft-lockup as below stack dump shows.
> >>
> >> The patch avoids deleting the command timer if it's not scheduled
> >> with the help of timer_pending().
> > 
> > Are you sure this is safe? timer_pending() will not show you that
> > the timer function is running. It looks like you introduced a race
> > between timeout and cleanup.
> > 
> 
> Looking at it in more detail you're right.
> 
> Fortunately this can only happen in cases where xhci is already hosed
> (no command response for 5 seconds), and we are at the same time
> anyway about to remove xhci.
> 
> Doesn't this mean that all cases with
> if (timer_pending(&timer))
> 	del_timer_sync(&timer)
> 
> is just basically the same as a plain del_timer(&timer)?
> 
> Anyways, turns out that the error path in xhci initialization code can end up calling
> del_timer_sync() before timer is initialized. This should be fixed by re-arranging
> some code in xhci initialization instead.
> 
> Greg, should this be reverted in rc7?
> I think that the possible side effect of this patch is still lesser the original
> issue.     

Just fix it "right" in a new patch.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-08-12 16:18 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1438607269-8977-1-git-send-email-mathias.nyman@linux.intel.com>
2015-08-03 13:07 ` [PATCH 1/2] xhci: fix off by one error in TRB DMA address boundary check Mathias Nyman
2015-08-03 13:07 ` [PATCH 2/2] drivers/usb: Delete XHCI command timer if necessary Mathias Nyman
2015-08-11  8:15   ` Oliver Neukum
2015-08-12 10:55     ` Mathias Nyman
2015-08-12 13:08       ` Oliver Neukum
2015-08-12 16:18       ` Greg KH

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).