* USB HD: No Sense / Info fld=0x0 and read corruption @ 2008-12-24 13:39 Ludovico Cavedon 2008-12-24 18:02 ` Alan Stern 0 siblings, 1 reply; 6+ messages in thread From: Ludovico Cavedon @ 2008-12-24 13:39 UTC (permalink / raw) To: linux-usb-u79uwXL29TY76Z2rM5mHXA, linux-scsi-u79uwXL29TY76Z2rM5mHXA Hi, I have a problem with a USB hard drive. The problem starts to happen after some random time of operation; the time may range from 1 minute up to one hour... The first symptom are these messages in the kernel log. --------------- sd 5:0:0:0: [sdb] Sense Key : No Sense [current] Info fld=0x0 5:0:0:0: [sdb] Add. Sense: No additional sense information --------------- The may appear only once or multiple times. Sometime they are not associated with other errors, sometimes I start to see filesystem corrpution: --------------- attempt to access beyond end of device sdb8: rw=0, want=15514452856, limit=207045657 --------------- If I unmount the device and replug it, I am able to read the same data without any problem. I suspect that some read corruption is happening. However I have *never* seen I/O error reported! Here is the usbmon log: http://pastebin.com/f4e1afeb1 There are some successful read operation, than the read operation 31 = 55534243 8f150000 00100100 80000a28 0022ebd0 a8000088 00000000 000000 whose command completion status ends with "01". The subsequent REQUEST SENSE, however is empty. What may be happening here? My situation looks different from http://thread.gmane.org/gmane.linux.kernel/747753 -there is an additional "Info fld=0x0" line in the log -there log messages are not always looping -it is happening also with 2.6.28-rc8 (which should have the patch, right?) I am not able to understand if this is HD problem or a kernel problem. I am able to replicate it: -2.6.28-rc8 vanilla kernel -2.6.27 ubuntu (intrepid) kernel -2.6.25 ubuntu (hardy) kernel -on two different computers -with different USB cables (so it not a cable problem) However, I have never got errors reported by Windows (dual boot on the same machine). The HD driver is a Wester Didital 320GB (WD3200): T: Bus=07 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 5 Spd=480 MxCh= 0 D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=1058 ProdID=0704 Rev= 1.05 S: Manufacturer=Western Digital S: Product=External HDD S: SerialNumber=5758453730384E5036333734 C:* #Ifs= 1 Cfg#= 1 Atr=c0 MxPwr= 2mA I:* If#= 0 Alt= 0 #EPs= 2 Cls=08(stor.) Sub=06 Prot=50 Driver=usb-storage E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms The USB controller is 00:1a.7 USB Controller [0c03]: Intel Corporation 82801H (ICH8 Family) USB2 EHCI Controller #2 [8086:283a] (rev 03) Please tell me if I can provide other useful information. Thank you, Ludovico Cavedon -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: USB HD: No Sense / Info fld=0x0 and read corruption 2008-12-24 13:39 USB HD: No Sense / Info fld=0x0 and read corruption Ludovico Cavedon @ 2008-12-24 18:02 ` Alan Stern [not found] ` <Pine.LNX.4.44L0.0812241253300.27059-100000-pYrvlCTfrz9XsRXLowluHWD2FQJk+8+b@public.gmane.org> 0 siblings, 1 reply; 6+ messages in thread From: Alan Stern @ 2008-12-24 18:02 UTC (permalink / raw) To: Ludovico Cavedon; +Cc: linux-usb, linux-scsi On Wed, 24 Dec 2008, Ludovico Cavedon wrote: > Hi, > I have a problem with a USB hard drive. > > The problem starts to happen after some random time of operation; the > time may range from 1 minute up to one hour... > > The first symptom are these messages in the kernel log. > --------------- > sd 5:0:0:0: [sdb] Sense Key : No Sense [current] > Info fld=0x0 > 5:0:0:0: [sdb] Add. Sense: No additional sense information > --------------- > The may appear only once or multiple times. > > Sometime they are not associated with other errors, sometimes I start to > see filesystem corrpution: > --------------- > attempt to access beyond end of device > sdb8: rw=0, want=15514452856, limit=207045657 > --------------- > > If I unmount the device and replug it, I am able to read the same data > without any problem. > I suspect that some read corruption is happening. However I have *never* > seen I/O error reported! > > Here is the usbmon log: > http://pastebin.com/f4e1afeb1 > > There are some successful read operation, than the read operation > 31 = 55534243 8f150000 00100100 80000a28 0022ebd0 a8000088 00000000 000000 > > whose command completion status ends with "01". The subsequent REQUEST > SENSE, however is empty. > > What may be happening here? > My situation looks different from > http://thread.gmane.org/gmane.linux.kernel/747753 > -there is an additional "Info fld=0x0" line in the log That's because your "empty" sense information has the Valid flag set. > -there log messages are not always looping > -it is happening also with 2.6.28-rc8 (which should have the patch, right?) What patch? Do you mean the patch at the end of that email thread? It affects only Argosy USB drives, not your Western Digital. > I am not able to understand if this is HD problem or a kernel problem. I Partly both. The HD (or more likely, its USB interface) is responsible for sending those unnecessary empty sense records. The kernel is responsible for not reporting an I/O error (assuming an error actually did take place). > am able to replicate it: > -2.6.28-rc8 vanilla kernel > -2.6.27 ubuntu (intrepid) kernel 2.6.27 doesn't go into an endless loop? This may indicate that eventually the drive stops sending the Check Condition status. > -2.6.25 ubuntu (hardy) kernel > -on two different computers > -with different USB cables (so it not a cable problem) > > However, I have never got errors reported by Windows (dual boot on the > same machine). > > The HD driver is a Wester Didital 320GB (WD3200): > T: Bus=07 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 5 Spd=480 MxCh= 0 > D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 > P: Vendor=1058 ProdID=0704 Rev= 1.05 > S: Manufacturer=Western Digital > S: Product=External HDD > S: SerialNumber=5758453730384E5036333734 > C:* #Ifs= 1 Cfg#= 1 Atr=c0 MxPwr= 2mA > I:* If#= 0 Alt= 0 #EPs= 2 Cls=08(stor.) Sub=06 Prot=50 Driver=usb-storage > E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms > E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms > > The USB controller is > 00:1a.7 USB Controller [0c03]: Intel Corporation 82801H (ICH8 Family) > USB2 EHCI Controller #2 [8086:283a] (rev 03) > > > Please tell me if I can provide other useful information. It would help to see the dmesg log for when one of these errors occurs. It would also help to know what happens under Windows. Do the same "empty sense" errors occur? If they do, how does Windows handle them? Alan Stern ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <Pine.LNX.4.44L0.0812241253300.27059-100000-pYrvlCTfrz9XsRXLowluHWD2FQJk+8+b@public.gmane.org>]
* Re: USB HD: No Sense / Info fld=0x0 and read corruption [not found] ` <Pine.LNX.4.44L0.0812241253300.27059-100000-pYrvlCTfrz9XsRXLowluHWD2FQJk+8+b@public.gmane.org> @ 2008-12-24 20:09 ` Ludovico Cavedon [not found] ` <495296DF.4090900-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 0 siblings, 1 reply; 6+ messages in thread From: Ludovico Cavedon @ 2008-12-24 20:09 UTC (permalink / raw) To: Alan Stern Cc: linux-usb-u79uwXL29TY76Z2rM5mHXA, linux-scsi-u79uwXL29TY76Z2rM5mHXA Alan Stern wrote: > On Wed, 24 Dec 2008, Ludovico Cavedon wrote: >> -it is happening also with 2.6.28-rc8 (which should have the patch, right?) > > What patch? Do you mean the patch at the end of that email thread? It > affects only Argosy USB drives, not your Western Digital. I thought also patches http://marc.info/?l=linux-scsi&m=122443015406309&w=2 were included, but later I realized I was wrong. I have patched my kernel with them, and now I/O errors get reported! >> I am not able to understand if this is HD problem or a kernel problem. I > > Partly both. The HD (or more likely, its USB interface) is responsible > for sending those unnecessary empty sense records. The kernel is > responsible for not reporting an I/O error (assuming an error actually > did take place). I think I found out what is happening on the HD side. The SMART self test fails with a read error. The SMART log reports uncorrectable read errors. However Reallocated_Event_Count is 0. Searching on the web, looks like that these sectors have bad ECC, so the cause a read error, however they are not bad sectors. Is this correct? My question is: how can this happen? and not just one sector, but at least a dozen! Bad HD? (it's new! 93 hrs of activity so far!) >> am able to replicate it: >> -2.6.28-rc8 vanilla kernel >> -2.6.27 ubuntu (intrepid) kernel > > 2.6.27 doesn't go into an endless loop? This may indicate that > eventually the drive stops sending the Check Condition status. You are right. Most of the read attempts succeed after a a while. However I found one sector that is causing an endless loop on 2.6.27 and unpatched 2.6.28-rc8 > It would help to see the dmesg log for when one of these errors occurs. There are no additional messages. However here it is: http://pastebin.com/mcfd54a3 > It would also help to know what happens under Windows. Do the same > "empty sense" errors occur? If they do, how does Windows handle them? I can try to use usb snoopy to log usb traffic under windows. Do you know how I can ask Windows "read sector X"? Thank you for your help, Merry Christmas! Ludovico -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <495296DF.4090900-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>]
* Re: USB HD: No Sense / Info fld=0x0 and read corruption [not found] ` <495296DF.4090900-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> @ 2008-12-24 23:02 ` Alan Stern [not found] ` <Pine.LNX.4.44L0.0812241756020.917-100000-pYrvlCTfrz9XsRXLowluHWD2FQJk+8+b@public.gmane.org> 0 siblings, 1 reply; 6+ messages in thread From: Alan Stern @ 2008-12-24 23:02 UTC (permalink / raw) To: Ludovico Cavedon Cc: linux-usb-u79uwXL29TY76Z2rM5mHXA, linux-scsi-u79uwXL29TY76Z2rM5mHXA On Wed, 24 Dec 2008, Ludovico Cavedon wrote: > Alan Stern wrote: > > On Wed, 24 Dec 2008, Ludovico Cavedon wrote: > >> -it is happening also with 2.6.28-rc8 (which should have the patch, right?) > > > > What patch? Do you mean the patch at the end of that email thread? It > > affects only Argosy USB drives, not your Western Digital. > > I thought also patches > http://marc.info/?l=linux-scsi&m=122443015406309&w=2 > were included, but later I realized I was wrong. A revised version of the first patch in that message is queued for 2.6.29. The second patch has not yet been merged in any form. > I have patched my kernel with them, and now I/O errors get reported! > > >> I am not able to understand if this is HD problem or a kernel problem. I > > > > Partly both. The HD (or more likely, its USB interface) is responsible > > for sending those unnecessary empty sense records. The kernel is > > responsible for not reporting an I/O error (assuming an error actually > > did take place). > > I think I found out what is happening on the HD side. The SMART self > test fails with a read error. The SMART log reports uncorrectable read > errors. However Reallocated_Event_Count is 0. Searching on the web, > looks like that these sectors have bad ECC, so the cause a read error, > however they are not bad sectors. Is this correct? I don't know. It sounds reasonable. The real issue is: Why doesn't the drive send back appropriate sense information to let the host know about the bad ECC? > My question is: how can this happen? and not just one sector, but at > least a dozen! > Bad HD? (it's new! 93 hrs of activity so far!) Maybe you can exchange it... > > 2.6.27 doesn't go into an endless loop? This may indicate that > > eventually the drive stops sending the Check Condition status. > > You are right. Most of the read attempts succeed after a a while. > However I found one sector that is causing an endless loop on 2.6.27 and > unpatched 2.6.28-rc8 > > > It would help to see the dmesg log for when one of these errors occurs. > > There are no additional messages. However here it is: > http://pastebin.com/mcfd54a3 Yeah, that's not very useful. > > It would also help to know what happens under Windows. Do the same > > "empty sense" errors occur? If they do, how does Windows handle them? > > I can try to use usb snoopy to log usb traffic under windows. > Do you know how I can ask Windows "read sector X"? I wish I knew! Perhaps Microsoft's KnowledgeBase site can tell you how. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <Pine.LNX.4.44L0.0812241756020.917-100000-pYrvlCTfrz9XsRXLowluHWD2FQJk+8+b@public.gmane.org>]
* Re: USB HD: No Sense / Info fld=0x0 and read corruption [not found] ` <Pine.LNX.4.44L0.0812241756020.917-100000-pYrvlCTfrz9XsRXLowluHWD2FQJk+8+b@public.gmane.org> @ 2008-12-26 10:38 ` Ludovico Cavedon [not found] ` <4954B40E.80403-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 0 siblings, 1 reply; 6+ messages in thread From: Ludovico Cavedon @ 2008-12-26 10:38 UTC (permalink / raw) To: Alan Stern Cc: linux-usb-u79uwXL29TY76Z2rM5mHXA, linux-scsi-u79uwXL29TY76Z2rM5mHXA Alan Stern wrote: > On Wed, 24 Dec 2008, Ludovico Cavedon wrote: >> My question is: how can this happen? and not just one sector, but at >> least a dozen! >> Bad HD? (it's new! 93 hrs of activity so far!) > > Maybe you can exchange it... Definitely! >>> It would also help to know what happens under Windows. Do the same >>> "empty sense" errors occur? If they do, how does Windows handle them? >> I can try to use usb snoopy to log usb traffic under windows. >> Do you know how I can ask Windows "read sector X"? > > I wish I knew! Perhaps Microsoft's KnowledgeBase site can tell you > how. I found a free tool to read a raw sector of a partition (NT Disk Viewer). Unfortunately USB Snoopy kept stopping capturing packets after a few seconds, so I was not able to see what was happening. Anyway, I goi a read error ofter a while the program seemed frozen. So I guess Windows is handling them retrying to read the sector and finally giving up. A question: If I rewrite these sectors I can fix these errors (at least temporarily). I also noticed that when the "check condition" bit is set some data is also tranferred. Is there get these partial data (e.g. with dd)? Thanks, Ludovico -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <4954B40E.80403-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>]
* Re: USB HD: No Sense / Info fld=0x0 and read corruption [not found] ` <4954B40E.80403-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> @ 2008-12-26 16:37 ` Alan Stern 0 siblings, 0 replies; 6+ messages in thread From: Alan Stern @ 2008-12-26 16:37 UTC (permalink / raw) To: Ludovico Cavedon Cc: linux-usb-u79uwXL29TY76Z2rM5mHXA, linux-scsi-u79uwXL29TY76Z2rM5mHXA On Fri, 26 Dec 2008, Ludovico Cavedon wrote: > A question: > If I rewrite these sectors I can fix these errors (at least > temporarily). I also noticed that when the "check condition" bit is set > some data is also tranferred. Is there get these partial data (e.g. with > dd)? Not the way you're thinking. I believe wireshark is able to monitor USB packets, so you could see the raw data that way. Also there are tools like the sg-utils package or plscsi, which provide a way for you to send specific SCSI commands to a device and see exactly what the results are. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2008-12-26 16:37 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-12-24 13:39 USB HD: No Sense / Info fld=0x0 and read corruption Ludovico Cavedon
2008-12-24 18:02 ` Alan Stern
[not found] ` <Pine.LNX.4.44L0.0812241253300.27059-100000-pYrvlCTfrz9XsRXLowluHWD2FQJk+8+b@public.gmane.org>
2008-12-24 20:09 ` Ludovico Cavedon
[not found] ` <495296DF.4090900-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2008-12-24 23:02 ` Alan Stern
[not found] ` <Pine.LNX.4.44L0.0812241756020.917-100000-pYrvlCTfrz9XsRXLowluHWD2FQJk+8+b@public.gmane.org>
2008-12-26 10:38 ` Ludovico Cavedon
[not found] ` <4954B40E.80403-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2008-12-26 16:37 ` Alan Stern
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox