All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Jérôme Carretero" <cJ-ko@zougloub.eu>
To: James Bottomley <James.Bottomley@HansenPartnership.com>,
	Alan Stern <stern@rowland.harvard.edu>, Tejun Heo <tj@kernel.org>
Cc: Hans de Goede <hdegoede@redhat.com>,
	Jens <jens-bugzilla.kernel.org@spamfreemail.de>,
	Andrey Astafyev <1@246060.ru>, Oliver Neukum <oneukum@suse.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	linux-usb@vger.kernel.org, linux-scsi@vger.kernel.org
Subject: Re: Seagate External SMR drive USB resets (XHCI transfer error, not timeout)
Date: Wed, 15 Nov 2017 23:21:29 -0500	[thread overview]
Message-ID: <20171115232129.102a1122@Vantage.cJ> (raw)
In-Reply-To: <20171115181708.0a5d9288@Vantage.cJ>

[-- Attachment #1: Type: text/plain, Size: 3146 bytes --]

Hi,

On Wed, 15 Nov 2017 18:17:08 -0500
Jérôme Carretero <cJ-ko@zougloub.eu> wrote:

> Hi,
> 
> 
> On Thu, 16 Nov 2017 07:40:08 +0900
> James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> 
> > On Wed, 2017-11-15 at 17:02 -0500, Alan Stern wrote:  
> > > On Wed, 15 Nov 2017, Jérôme Carretero wrote:    
> > > > >   Because with several of these drives / lots of
> > > > >activity /    
> > > occasional    
> > > > >   issues, it looks like it will be hard to catch (yes I can
> > > > >use
> > > > > usbmon).
> > > > > 
> > > > > - It looks like there is no configurable timeout for USB
> > > > > MSC    
> > > requests.    
> > > > >   Perhaps the device is not responding in time and this is
> > > > >why    
> > > it's    
> > > > >   reset?    
> > > 
> > > Timeouts are set by the SCSI layer.  I believe they are rather
> > > long (30 seconds, by default).  Presumably they are configurable,
> > > although I would have to do some digging to figure out how.    
> > 
> > They're in /sys/class/scsi_device/<id>/device/timeout  
> 
> 
> I'll use wireshark to check the cause: for sure, the drives are not
> "timing out" after 30 seconds (indeed the reported value
> in /sys/class/scsi_device/.../timeout or /sys/block/*/device/timeout),
> because I see (in dstat) that a disk is busy until the right about the
> moment where its reset message appears.
> 
> Is it possible that the SCSI timeout doesn't get set into an USB URB
> timeout (I'll check by myself, but asking doesn't hurt) ?

I performed an usbmon capture extract, centered around the event
(there was a few hundred MBs written for this to happen):

 Nov 15 22:16:33 Bidule kernel: usb 6-4.3.2.1: reset SuperSpeed USB
  device number 8 using xhci_hcd

I can see that the computer is sending a write request, and sees a
-EPROTO in answer (capture in attachment), so scratch the timeout issue
(and actually when thinking about it, this matches what UAS was saying,
except that UAS was taking ages to recover).

Looked for EPROTO in the usb code, and found a dynamic debug printf in
XHCI; after enabling it:

 Nov 15 22:45:03 Bidule kernel: xhci_hcd 0000:07:00.0: Transfer error for slot 13 ep 3 on endpoint
 Nov 15 22:45:03 Bidule kernel: xhci_hcd 0000:07:00.0: Transfer error for slot 12 ep 3 on endpoint
 Nov 15 22:45:03 Bidule kernel: usb 6-4.3.3.1: reset SuperSpeed USB device number 9 using xhci_hcd
 Nov 15 22:45:03 Bidule kernel: usb 6-4.3.2.1: reset SuperSpeed USB device number 8 using xhci_hcd

First, I understand that a bad USB device could poison the kernel log,
but shouldn't that xhci_dbg() (and others eg. babble) be at least an
xhci_info() (I saw 2a9227a5)?

Then... I don't know enough to attribute the issue the upstream USB hub(s)
or the drive endpoint not behaving properly, or the kernel... what
should I do with these messages?

I'm still filling the drives, will perform a scrub after, to see if
the issue causes data loss...


-- 
Jérôme

PS: BTW, thanks a lot for the help so far.
PPS: It would be so nice if someone from Seagate was reading this.

[-- Attachment #2: smr-reset-excerpt.pcapng.gz --]
[-- Type: application/gzip, Size: 10144 bytes --]

  reply	other threads:[~2017-11-16  4:20 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-10 15:13 [PATCH] uas: Add US_FL_NO_ATA_1X quirk for one more Seagate device Hans de Goede
2017-11-12 21:42 ` Jérôme Carretero
     [not found]   ` <20171112164234.48b5185c-WI5o+PA4G9BYumZHjSPV5A@public.gmane.org>
2017-11-13  4:01     ` Andrey Astafyev
2017-11-13  6:14       ` Jérôme Carretero
2017-11-13  6:16         ` Andrey Astafyev
2017-11-13  7:14           ` Jérôme Carretero
     [not found]         ` <20171113011438.458369bf-WI5o+PA4G9BYumZHjSPV5A@public.gmane.org>
2017-11-13  9:04           ` Hans de Goede
     [not found]             ` <3d276729-63f7-9727-4a22-55849712439c-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-11-13 17:38               ` Jérôme Carretero
     [not found]                 ` <20171113123814.4e70a498-WI5o+PA4G9BYumZHjSPV5A@public.gmane.org>
2017-11-15 21:43                   ` Seagate External SMR drive USB resets (was: Re: [PATCH] uas: Add US_FL_NO_ATA_1X quirk for one more Seagate device) Jérôme Carretero
     [not found]                     ` <20171115164314.74ce972f-WI5o+PA4G9BYumZHjSPV5A@public.gmane.org>
2017-11-15 21:49                       ` Jérôme Carretero
     [not found]                         ` <20171115164902.00d1330d-WI5o+PA4G9BYumZHjSPV5A@public.gmane.org>
2017-11-15 22:02                           ` Alan Stern
2017-11-15 22:40                             ` James Bottomley
2017-11-15 23:17                               ` Jérôme Carretero
2017-11-16  4:21                                 ` Jérôme Carretero [this message]
     [not found]                                   ` <20171115232129.102a1122-WI5o+PA4G9BYumZHjSPV5A@public.gmane.org>
2017-11-16 19:42                                     ` Seagate External SMR drive USB resets (XHCI transfer error, not timeout) Alan Stern
2017-11-17 22:19                                       ` Jérôme Carretero
2017-11-18 16:57                                         ` Alan Stern
2017-11-15 23:27                             ` Seagate External SMR drive USB resets... why? / USB storage debugging Jérôme Carretero
     [not found]                               ` <20171115182708.25b97ebe-WI5o+PA4G9BYumZHjSPV5A@public.gmane.org>
2017-11-15 23:40                                 ` Bart Van Assche

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171115232129.102a1122@Vantage.cJ \
    --to=cj-ko@zougloub.eu \
    --cc=1@246060.ru \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hdegoede@redhat.com \
    --cc=jens-bugzilla.kernel.org@spamfreemail.de \
    --cc=linux-scsi@vger.kernel.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=oneukum@suse.com \
    --cc=stern@rowland.harvard.edu \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.