From: Greg KH <gregkh@linuxfoundation.org>
To: Mathias Nyman <mathias.nyman@linux.intel.com>
Cc: linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: USB xhci crash under load on 5.14-rc3
Date: Thu, 5 Aug 2021 20:39:58 +0200 [thread overview]
Message-ID: <YQwwfn2rQGvCrvXS@kroah.com> (raw)
In-Reply-To: <9bb1d58b-5c68-86b7-13df-2faa749880c5@linux.intel.com>
On Thu, Aug 05, 2021 at 05:59:00PM +0300, Mathias Nyman wrote:
> On 4.8.2021 11.00, Greg KH wrote:
> > Hi,
> >
> > I was doing some filesystem backups from one USB device to another one
> > this weekend and kept running into the problem of the xhci controller
> > shutting down after an hour or so of high volume traffic.
> >
> > I finally captured the problem in the kernel log as this would also take
> > out my keyboard, making it hard to recover from :)
> >
> > The log is below for when the problem happens, and then the devices are
> > disconnected from the bus (ignore the filesystem errors, those are
> > expected when i/o is in flight and we disconnect a device.
> >
> > Any hint as to what the IO_PAGE_FAULT error messages are?
> >
>
> No idea, unfortunately.
>
> > I'll go back to 5.13.y now and see if I can reproduce it there or not,
> > as my backups are not yet done...
> >
> > thanks,
> >
> > greg k-h
> >
> >
> > [Aug 4 09:48] xhci_hcd 0000:47:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0032 address=0xfffffff00000 flags=0x0000]
> > [ +0.000012] xhci_hcd 0000:47:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0032 address=0xfffffff00f80 flags=0x0000]
> > [ +0.000006] xhci_hcd 0000:47:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0032 address=0xfffffff01000 flags=0x0000]
> > [ +0.000006] xhci_hcd 0000:47:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0032 address=0xfffffff01f80 flags=0x0000]
> > [ +0.000005] xhci_hcd 0000:47:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0032 address=0xfffffff02000 flags=0x0000]
> > [ +0.000006] xhci_hcd 0000:47:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0032 address=0xfffffff02f80 flags=0x0000]
> > [ +0.000006] xhci_hcd 0000:47:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0032 address=0xfffffff03000 flags=0x0000]
> > [ +0.000005] xhci_hcd 0000:47:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0032 address=0xfffffff03f80 flags=0x0000]
> > [ +0.000006] xhci_hcd 0000:47:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0032 address=0xfffffff04000 flags=0x0000]
> > [Aug 4 09:49] sd 3:0:0:0: [sdc] tag#21 uas_eh_abort_handler 0 uas-tag 1 inflight: CMD IN
> > [ +0.000011] sd 3:0:0:0: [sdc] tag#21 CDB: Read(16) 88 00 00 00 00 01 8a 44 08 b0 00 00 00 08 00 00
> > [ +5.106493] xhci_hcd 0000:47:00.1: xHCI host not responding to stop endpoint command.
> > [ +0.000010] xhci_hcd 0000:47:00.1: USBSTS: HCHalted HSE
>
>
> HSE "Host System Error" bit is set, meaning xHC hardware detected a serious error and stopped the host.
> HSE was probably set 5-10 seconds earlier, but only discovered here.
>
> Specs state:
>
> xHC sets this bit to ‘1’ when a serious error
> is detected, either internal to the xHC or during a host system access involving the xHC module.
> (In a PCI system, conditions that set this bit to ‘1’ include PCI Parity error, PCI Master Abort, and
> PCI Target Abort.)
Ok, I would believe in a PCI error here, hammering a xhci controller
with read/write streams to two different storage devices on the same bus
for a few hours as fast as the bus allows is a good stress test.
I tried splitting this across PCI devices, and can not seem to duplicate
the failure in the xhci controllers, now the devices fail with disk
errors after about a terrabyte of traffic, but are recoverable after
unplug/plugging them back in and running fsck.
Cheap USB storage, gotta love it...
If I come up with a reproducable failure, I'll let you know, thanks for
the help,
greg k-h
next prev parent reply other threads:[~2021-08-05 18:40 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-08-04 8:00 USB xhci crash under load on 5.14-rc3 Greg KH
2021-08-05 9:53 ` Greg KH
2021-08-05 14:59 ` Mathias Nyman
2021-08-05 18:39 ` Greg KH [this message]
-- strict thread matches above, loose matches on Subject: below --
2022-02-06 16:58 Roman Mäder
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YQwwfn2rQGvCrvXS@kroah.com \
--to=gregkh@linuxfoundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-usb@vger.kernel.org \
--cc=mathias.nyman@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.