USB HD: No Sense / Info fld=0x0 and read corruption

public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed

* USB HD: No Sense / Info fld=0x0 and read corruption
@ 2008-12-24 13:39 Ludovico Cavedon
  2008-12-24 18:02 ` Alan Stern
  0 siblings, 1 reply; 6+ messages in thread
From: Ludovico Cavedon @ 2008-12-24 13:39 UTC (permalink / raw)
  To: linux-usb-u79uwXL29TY76Z2rM5mHXA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA

Hi,
I have a problem with a USB hard drive.

The problem starts to happen after some random time of operation; the
time may range from 1 minute up to one hour...

The first symptom are these messages in the kernel log.
---------------
sd 5:0:0:0: [sdb] Sense Key : No Sense [current]
Info fld=0x0
5:0:0:0: [sdb] Add. Sense: No additional sense information
---------------
The may appear only once or multiple times.

Sometime they are not associated with other errors, sometimes I start to
see filesystem corrpution:
---------------
attempt to access beyond end of device
sdb8: rw=0, want=15514452856, limit=207045657
---------------

If I unmount the device and replug it, I am able to read the same data
without any problem.
I suspect that some read corruption is happening. However I have *never*
seen I/O error reported!

Here is the usbmon log:
http://pastebin.com/f4e1afeb1

There are some successful read operation, than the read operation
31 = 55534243 8f150000 00100100 80000a28 0022ebd0 a8000088 00000000 000000

whose command completion status ends with "01". The subsequent REQUEST
SENSE, however is empty.

What may be happening here?
My situation looks different from
http://thread.gmane.org/gmane.linux.kernel/747753
-there is an additional "Info fld=0x0" line in the log
-there log messages are not always looping
-it is happening also with 2.6.28-rc8 (which should have the patch, right?)

I am not able to understand if this is HD problem or a kernel problem. I
am able to replicate it:
-2.6.28-rc8 vanilla kernel
-2.6.27 ubuntu (intrepid) kernel
-2.6.25 ubuntu (hardy) kernel
-on two different computers
-with different USB cables (so it not a cable problem)

However, I have never got errors reported by Windows (dual boot on the
same machine).

The HD driver is a Wester Didital 320GB (WD3200):
T:  Bus=07 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#=  5 Spd=480 MxCh= 0
D:  Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
P:  Vendor=1058 ProdID=0704 Rev= 1.05
S:  Manufacturer=Western Digital
S:  Product=External HDD
S:  SerialNumber=5758453730384E5036333734
C:* #Ifs= 1 Cfg#= 1 Atr=c0 MxPwr=  2mA
I:* If#= 0 Alt= 0 #EPs= 2 Cls=08(stor.) Sub=06 Prot=50 Driver=usb-storage
E:  Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms

The USB controller is
00:1a.7 USB Controller [0c03]: Intel Corporation 82801H (ICH8 Family)
USB2 EHCI Controller #2 [8086:283a] (rev 03)


Please tell me if I can provide other useful information.

Thank you,
Ludovico Cavedon
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: USB HD: No Sense / Info fld=0x0 and read corruption
  2008-12-24 13:39 USB HD: No Sense / Info fld=0x0 and read corruption Ludovico Cavedon
@ 2008-12-24 18:02 ` Alan Stern
       [not found]   ` <Pine.LNX.4.44L0.0812241253300.27059-100000-pYrvlCTfrz9XsRXLowluHWD2FQJk+8+b@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Alan Stern @ 2008-12-24 18:02 UTC (permalink / raw)
  To: Ludovico Cavedon; +Cc: linux-usb, linux-scsi

On Wed, 24 Dec 2008, Ludovico Cavedon wrote:

> Hi,
> I have a problem with a USB hard drive.
> 
> The problem starts to happen after some random time of operation; the
> time may range from 1 minute up to one hour...
> 
> The first symptom are these messages in the kernel log.
> ---------------
> sd 5:0:0:0: [sdb] Sense Key : No Sense [current]
> Info fld=0x0
> 5:0:0:0: [sdb] Add. Sense: No additional sense information
> ---------------
> The may appear only once or multiple times.
> 
> Sometime they are not associated with other errors, sometimes I start to
> see filesystem corrpution:
> ---------------
> attempt to access beyond end of device
> sdb8: rw=0, want=15514452856, limit=207045657
> ---------------
> 
> If I unmount the device and replug it, I am able to read the same data
> without any problem.
> I suspect that some read corruption is happening. However I have *never*
> seen I/O error reported!
> 
> Here is the usbmon log:
> http://pastebin.com/f4e1afeb1
> 
> There are some successful read operation, than the read operation
> 31 = 55534243 8f150000 00100100 80000a28 0022ebd0 a8000088 00000000 000000
> 
> whose command completion status ends with "01". The subsequent REQUEST
> SENSE, however is empty.
> 
> What may be happening here?
> My situation looks different from
> http://thread.gmane.org/gmane.linux.kernel/747753
> -there is an additional "Info fld=0x0" line in the log

That's because your "empty" sense information has the Valid flag set.

> -there log messages are not always looping
> -it is happening also with 2.6.28-rc8 (which should have the patch, right?)

What patch?  Do you mean the patch at the end of that email thread?  It 
affects only Argosy USB drives, not your Western Digital.

> I am not able to understand if this is HD problem or a kernel problem. I

Partly both.  The HD (or more likely, its USB interface) is responsible 
for sending those unnecessary empty sense records.  The kernel is 
responsible for not reporting an I/O error (assuming an error actually 
did take place).

> am able to replicate it:
> -2.6.28-rc8 vanilla kernel
> -2.6.27 ubuntu (intrepid) kernel

2.6.27 doesn't go into an endless loop?  This may indicate that 
eventually the drive stops sending the Check Condition status.

> -2.6.25 ubuntu (hardy) kernel
> -on two different computers
> -with different USB cables (so it not a cable problem)
> 
> However, I have never got errors reported by Windows (dual boot on the
> same machine).
> 
> The HD driver is a Wester Didital 320GB (WD3200):
> T:  Bus=07 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#=  5 Spd=480 MxCh= 0
> D:  Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
> P:  Vendor=1058 ProdID=0704 Rev= 1.05
> S:  Manufacturer=Western Digital
> S:  Product=External HDD
> S:  SerialNumber=5758453730384E5036333734
> C:* #Ifs= 1 Cfg#= 1 Atr=c0 MxPwr=  2mA
> I:* If#= 0 Alt= 0 #EPs= 2 Cls=08(stor.) Sub=06 Prot=50 Driver=usb-storage
> E:  Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
> E:  Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
> 
> The USB controller is
> 00:1a.7 USB Controller [0c03]: Intel Corporation 82801H (ICH8 Family)
> USB2 EHCI Controller #2 [8086:283a] (rev 03)
> 
> 
> Please tell me if I can provide other useful information.

It would help to see the dmesg log for when one of these errors occurs.

It would also help to know what happens under Windows.  Do the same 
"empty sense" errors occur?  If they do, how does Windows handle them?

Alan Stern


^ permalink raw reply	[flat|nested] 6+ messages in thread

[parent not found: <Pine.LNX.4.44L0.0812241253300.27059-100000-pYrvlCTfrz9XsRXLowluHWD2FQJk+8+b@public.gmane.org>]

* Re: USB HD: No Sense / Info fld=0x0 and read corruption
       [not found]   ` <Pine.LNX.4.44L0.0812241253300.27059-100000-pYrvlCTfrz9XsRXLowluHWD2FQJk+8+b@public.gmane.org>
@ 2008-12-24 20:09     ` Ludovico Cavedon
       [not found]       ` <495296DF.4090900-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Ludovico Cavedon @ 2008-12-24 20:09 UTC (permalink / raw)
  To: Alan Stern
  Cc: linux-usb-u79uwXL29TY76Z2rM5mHXA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA

Alan Stern wrote:
> On Wed, 24 Dec 2008, Ludovico Cavedon wrote:
>> -it is happening also with 2.6.28-rc8 (which should have the patch, right?)
> 
> What patch?  Do you mean the patch at the end of that email thread?  It 
> affects only Argosy USB drives, not your Western Digital.

I thought also patches
http://marc.info/?l=linux-scsi&m=122443015406309&w=2
were included, but later I realized I was wrong.
I have patched my kernel with them, and now I/O errors get reported!

>> I am not able to understand if this is HD problem or a kernel problem. I
> 
> Partly both.  The HD (or more likely, its USB interface) is responsible 
> for sending those unnecessary empty sense records.  The kernel is 
> responsible for not reporting an I/O error (assuming an error actually 
> did take place).

I think I found out what is happening on the HD side. The SMART self
test fails with a read error. The SMART log reports uncorrectable read
errors. However Reallocated_Event_Count is 0. Searching on the web,
looks like that these sectors have bad ECC, so the cause a read error,
however they are not bad sectors. Is this correct?

My question is: how can this happen? and not just one sector, but at
least a dozen!
Bad HD? (it's new! 93 hrs of activity so far!)

>> am able to replicate it:
>> -2.6.28-rc8 vanilla kernel
>> -2.6.27 ubuntu (intrepid) kernel
> 
> 2.6.27 doesn't go into an endless loop?  This may indicate that 
> eventually the drive stops sending the Check Condition status.

You are right. Most of the read attempts succeed after a a while.
However I found one sector that is causing an endless loop on 2.6.27 and
unpatched 2.6.28-rc8

> It would help to see the dmesg log for when one of these errors occurs.

There are no additional messages. However here it is:
http://pastebin.com/mcfd54a3

> It would also help to know what happens under Windows.  Do the same 
> "empty sense" errors occur?  If they do, how does Windows handle them?

I can try to use usb snoopy to log usb traffic under windows.
Do you know how I can ask Windows "read sector X"?

Thank you for your help,
Merry Christmas!
Ludovico
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

[parent not found: <495296DF.4090900-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>]

* Re: USB HD: No Sense / Info fld=0x0 and read corruption
       [not found]       ` <495296DF.4090900-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2008-12-24 23:02         ` Alan Stern
       [not found]           ` <Pine.LNX.4.44L0.0812241756020.917-100000-pYrvlCTfrz9XsRXLowluHWD2FQJk+8+b@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Alan Stern @ 2008-12-24 23:02 UTC (permalink / raw)
  To: Ludovico Cavedon
  Cc: linux-usb-u79uwXL29TY76Z2rM5mHXA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA

On Wed, 24 Dec 2008, Ludovico Cavedon wrote:

> Alan Stern wrote:
> > On Wed, 24 Dec 2008, Ludovico Cavedon wrote:
> >> -it is happening also with 2.6.28-rc8 (which should have the patch, right?)
> > 
> > What patch?  Do you mean the patch at the end of that email thread?  It 
> > affects only Argosy USB drives, not your Western Digital.
> 
> I thought also patches
> http://marc.info/?l=linux-scsi&m=122443015406309&w=2
> were included, but later I realized I was wrong.

A revised version of the first patch in that message is queued for 
2.6.29.  The second patch has not yet been merged in any form.

> I have patched my kernel with them, and now I/O errors get reported!
> 
> >> I am not able to understand if this is HD problem or a kernel problem. I
> > 
> > Partly both.  The HD (or more likely, its USB interface) is responsible 
> > for sending those unnecessary empty sense records.  The kernel is 
> > responsible for not reporting an I/O error (assuming an error actually 
> > did take place).
> 
> I think I found out what is happening on the HD side. The SMART self
> test fails with a read error. The SMART log reports uncorrectable read
> errors. However Reallocated_Event_Count is 0. Searching on the web,
> looks like that these sectors have bad ECC, so the cause a read error,
> however they are not bad sectors. Is this correct?

I don't know.  It sounds reasonable.  The real issue is: Why doesn't
the drive send back appropriate sense information to let the host know
about the bad ECC?

> My question is: how can this happen? and not just one sector, but at
> least a dozen!
> Bad HD? (it's new! 93 hrs of activity so far!)

Maybe you can exchange it...

> > 2.6.27 doesn't go into an endless loop?  This may indicate that 
> > eventually the drive stops sending the Check Condition status.
> 
> You are right. Most of the read attempts succeed after a a while.
> However I found one sector that is causing an endless loop on 2.6.27 and
> unpatched 2.6.28-rc8
> 
> > It would help to see the dmesg log for when one of these errors occurs.
> 
> There are no additional messages. However here it is:
> http://pastebin.com/mcfd54a3

Yeah, that's not very useful.

> > It would also help to know what happens under Windows.  Do the same 
> > "empty sense" errors occur?  If they do, how does Windows handle them?
> 
> I can try to use usb snoopy to log usb traffic under windows.
> Do you know how I can ask Windows "read sector X"?

I wish I knew!  Perhaps Microsoft's KnowledgeBase site can tell you 
how.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

[parent not found: <Pine.LNX.4.44L0.0812241756020.917-100000-pYrvlCTfrz9XsRXLowluHWD2FQJk+8+b@public.gmane.org>]

* Re: USB HD: No Sense / Info fld=0x0 and read corruption
       [not found]           ` <Pine.LNX.4.44L0.0812241756020.917-100000-pYrvlCTfrz9XsRXLowluHWD2FQJk+8+b@public.gmane.org>
@ 2008-12-26 10:38             ` Ludovico Cavedon
       [not found]               ` <4954B40E.80403-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Ludovico Cavedon @ 2008-12-26 10:38 UTC (permalink / raw)
  To: Alan Stern
  Cc: linux-usb-u79uwXL29TY76Z2rM5mHXA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA

Alan Stern wrote:
> On Wed, 24 Dec 2008, Ludovico Cavedon wrote:
>> My question is: how can this happen? and not just one sector, but at
>> least a dozen!
>> Bad HD? (it's new! 93 hrs of activity so far!)
> 
> Maybe you can exchange it...

Definitely!

>>> It would also help to know what happens under Windows.  Do the same 
>>> "empty sense" errors occur?  If they do, how does Windows handle them?
>> I can try to use usb snoopy to log usb traffic under windows.
>> Do you know how I can ask Windows "read sector X"?
> 
> I wish I knew!  Perhaps Microsoft's KnowledgeBase site can tell you 
> how.

I found a free tool to read a raw sector of a partition (NT Disk
Viewer). Unfortunately USB Snoopy kept stopping capturing packets after
a few seconds, so I was not able to see what was happening.
Anyway, I goi a read error ofter a while the program seemed frozen. So I
guess Windows is handling them retrying to read the sector and finally
giving up.

A question:
If I rewrite these sectors I can fix these errors (at least
temporarily). I also noticed that when the "check condition" bit is set
some data is also tranferred. Is there get these partial data (e.g. with
dd)?

Thanks,
Ludovico

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

[parent not found: <4954B40E.80403-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>]

* Re: USB HD: No Sense / Info fld=0x0 and read corruption
       [not found]               ` <4954B40E.80403-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2008-12-26 16:37                 ` Alan Stern
  0 siblings, 0 replies; 6+ messages in thread
From: Alan Stern @ 2008-12-26 16:37 UTC (permalink / raw)
  To: Ludovico Cavedon
  Cc: linux-usb-u79uwXL29TY76Z2rM5mHXA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA

On Fri, 26 Dec 2008, Ludovico Cavedon wrote:

> A question:
> If I rewrite these sectors I can fix these errors (at least
> temporarily). I also noticed that when the "check condition" bit is set
> some data is also tranferred. Is there get these partial data (e.g. with
> dd)?

Not the way you're thinking.  I believe wireshark is able to monitor 
USB packets, so you could see the raw data that way.

Also there are tools like the sg-utils package or plscsi, which
provide a way for you to send specific SCSI commands to a device and
see exactly what the results are.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2008-12-26 16:37 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-12-24 13:39 USB HD: No Sense / Info fld=0x0 and read corruption Ludovico Cavedon
2008-12-24 18:02 ` Alan Stern
     [not found]   ` <Pine.LNX.4.44L0.0812241253300.27059-100000-pYrvlCTfrz9XsRXLowluHWD2FQJk+8+b@public.gmane.org>
2008-12-24 20:09     ` Ludovico Cavedon
     [not found]       ` <495296DF.4090900-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2008-12-24 23:02         ` Alan Stern
     [not found]           ` <Pine.LNX.4.44L0.0812241756020.917-100000-pYrvlCTfrz9XsRXLowluHWD2FQJk+8+b@public.gmane.org>
2008-12-26 10:38             ` Ludovico Cavedon
     [not found]               ` <4954B40E.80403-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2008-12-26 16:37                 ` Alan Stern

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox