From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andreas Reis Subject: Re: [PATCH 0/3] Fix USB deadlock caused by SCSI error handling Date: Thu, 10 Apr 2014 14:26:34 +0200 Message-ID: <53468DFA.2080903@gmail.com> References: <53467950.3010403@gmail.com> <53468297.1040909@suse.de> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: Received: from mail-bk0-f45.google.com ([209.85.214.45]:57791 "EHLO mail-bk0-f45.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1030405AbaDJM0h (ORCPT ); Thu, 10 Apr 2014 08:26:37 -0400 In-Reply-To: <53468297.1040909@suse.de> Sender: linux-scsi-owner@vger.kernel.org List-Id: linux-scsi@vger.kernel.org To: Hannes Reinecke , Alan Stern Cc: James Bottomley , SCSI development list , USB list Only your 0/3 patch to which Alan linked, along with two other patches=20 by Mathias Nyman ("disable usb3 on intel hosts" and "disable all lpm=20 related control transfers", one of which is the source of the "do=20 nothing"s). I'll revert the latter two and apply the rest of the set. Which I'm=20 guessing currently consists of said 0/3 patch =97 http://www.spinics.net/lists/linux-scsi/msg73502.html =97 plus 2/3 and 3/3? Or should I just omit 0/3 and try whichever of the two in 1/3 "works=20 best"? Rather confusing ATM. Anyway, for whatever reason the bug is happening rather frequently now.= =20 I've spotted the following occurring after the "Device offlined" line=20 two times now: [ 206.901385] sd 11:0:0:0: [sdg] Unhandled error code [ 206.901394] sd 11:0:0:0: [sdg] [ 206.901397] Result: hostbyte=3D0x01 driverbyte=3D0x00 [ 206.901400] sd 11:0:0:0: [sdg] CDB: [ 206.901403] cdb[0]=3D0x2a: 2a 00 02 25 1b 50 00 00 08 00 [ 206.901419] end_request: I/O error, dev sdg, sector 35986256 The second time had "sd 12:0:0:0", "cdb[0]=3D0x28: 28 00 03 94 77 20 00= 00=20 08 00" and a different sector. Andreas Reis On 10.04.2014 13:37, Hannes Reinecke wrote: > On 04/10/2014 12:58 PM, Andreas Reis wrote: >> That patch appears to work in preventing the crashes, judged on one >> repeated appearance of the bug. >> >> dmesg had the usual >> [ 215.229903] usb 4-2: usb_disable_lpm called, do nothing >> [ 215.336941] usb 4-2: reset SuperSpeed USB device number 3 using >> xhci_hcd >> [ 215.350296] xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called >> with disabled ep ffff880427b829c0 >> [ 215.350305] xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called >> with disabled ep ffff880427b82a08 >> [ 215.350621] usb 4-2: usb_enable_lpm called, do nothing >> >> repeated five times, followed by one >> [ 282.795801] sd 8:0:0:0: Device offlined - not ready after error >> recovery >> >> and then as often as something tried to read from it: >> [ 295.585472] sd 8:0:0:0: rejecting I/O to offline device >> >> The stick could then be properly un- and remounted (the latter if it >> had been physically replugged) without issue =97 for the bug to >> reoccur after one to three minutes. I tried this three times, no >> dmesg difference except the ep addresses varied on two of that. >> > Was this just that patch you've tested with or the entire patch serie= s? > > If the latter, Alan, is this the expected outcome? > I would've thought the error recover should _not_ run into > offlining devices here, but rather the device should be recovered > eventually. > > Andreas, can you test with the entire patch series and enable > 'scsi_logging_level -s -E 5' prior to running the tests? > > THX. > > Cheers, > > Hannes > -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" i= n the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html