public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: single bit errors on files stored on USB-HDDs via USB2/usb_storage
@ 2006-12-10  8:44 linux
  2006-12-10 15:37 ` Clemens Koller
  0 siblings, 1 reply; 22+ messages in thread
From: linux @ 2006-12-10  8:44 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux

How the wires can cause single-bit errors is a bit beyond me;
USB protects every bit on the wire well enough that communication
errors should be detected.

Every packet starts with an identifier byte; this contains a 4-bit packet
identifier repeated twice.

Some small "token" packets have an 11-bit payload (7 address and 4
endpoint bits) and a 5-bit CRC.

Any corruption of those would result in USB state machine confusion and
at least large data gaps.

Packets with an actual data payload are protected with a CRC-16.
Not quite as strong as Ethernet, but sufficient to detect all errors of
three bits or less, and all burst errors of 16 consecutive bits or less.

A single-bit flip can't get past a CRC-16 unless you flip at least
three bits in the CRC as well.  The actual pattern depends on the bit
position and averages 8 bits; given the documented bit error positions
and a better knowledge of the ATA-over-USB encapsulation protocol,
the actual CRC changes could be computed.


Now, I can imagine a USB slave controller so cheap and/or buggy that it
doesn't check the CRC, but I'd think that most would.  Checking a CRC
is hardly a novel challenge.

^ permalink raw reply	[flat|nested] 22+ messages in thread
* Re: [usb-storage]  single bit errors on files stored on USB-HDDs via USB2/usb_storage
  2006-12-06 22:01 Matthias Schniedermeyer
@ 2006-12-07 18:08 Alan Stern
  2006-12-07 19:41 ` Matthias Schniedermeyer
  0 siblings, 1 reply; 22+ messages in thread
From: Alan Stern @ 2006-12-07 18:08 UTC (permalink / raw)
  To: Matthias Schniedermeyer; +Cc: linux-kernel, usb-storage

On Wed, 6 Dec 2006, Matthias Schniedermeyer wrote:

> Hi
> 
> 
> I'm using a Bunch auf HDDs in USB-Enclosures for storing files.
> (currently 38 HDD, with a total capacity of 9,5 TB of which 8,5 TB is used)
> 
> After i realised about a year(!) ago that the files copied to the HDDs
> sometimes aren't identical to the "original"-files i changed my
> procedured so that each file is MD5 before and after and deleted/copied
> again if an error is detected.
> 
> My averate file size is about 1GB with files from about 400MB to 5000MB
> I estimate the average error-rate at about one damaged file in about
> 10GB of data.
> 
> I'm not sure and haven't checked if the files are wrongly written or
> "only" wrongly read back as i delete the defective files and copy them
> again.
> 
> Today i copied a few files back and checked them against the stored MD5
> sums and 5 files of 86 (each about 700 MB) had errors. So i copied the 5
> files again. 4 of the files were OK after that and coping the last file
> the third time also resulted in the correct MD5.
> 
> This time i kept the defective files and used "vbindiff" to show me the
> difference. Strangly in EVERY case the difference is a single bit in a
> sequence of "0xff"-Bytes inside a block of varing bit-values that
> changed a "0xff" into a "0xf7".
> Also interesting is that each error is at a 0xXXXXXXX5-Position
> 
> Attached is a file with 5 of the 6 differences named 1-5. Of each of the
> 5 2x3 lines-blocks the first 3 lines are the original the following 3
> lines contain the error in the middle row 6th value.
> 
> NEVER did i see any messages in syslog regarding erros or an aborting
> program due to errors passed down from the kernel or something like that.

This was almost certainly caused by hardware flaws in the USB interface 
chips of the enclosures.  There's nothing the kernel can do about it 
because the errors aren't reported; all that happens is that incorrect 
data is sent to or from the drive.

Alan Stern


^ permalink raw reply	[flat|nested] 22+ messages in thread
[parent not found: <fa./xvi+/Ji/HqNkvnGjUt4pIS9goM@ifi.uio.no>]
* single bit errors on files stored on USB-HDDs via USB2/usb_storage
@ 2006-12-06 22:01 Matthias Schniedermeyer
  2006-12-07 22:10 ` DervishD
  0 siblings, 1 reply; 22+ messages in thread
From: Matthias Schniedermeyer @ 2006-12-06 22:01 UTC (permalink / raw)
  To: linux-kernel, usb-storage

[-- Attachment #1: Type: text/plain, Size: 2608 bytes --]

Hi


I'm using a Bunch auf HDDs in USB-Enclosures for storing files.
(currently 38 HDD, with a total capacity of 9,5 TB of which 8,5 TB is used)

After i realised about a year(!) ago that the files copied to the HDDs
sometimes aren't identical to the "original"-files i changed my
procedured so that each file is MD5 before and after and deleted/copied
again if an error is detected.

My averate file size is about 1GB with files from about 400MB to 5000MB
I estimate the average error-rate at about one damaged file in about
10GB of data.

I'm not sure and haven't checked if the files are wrongly written or
"only" wrongly read back as i delete the defective files and copy them
again.

Today i copied a few files back and checked them against the stored MD5
sums and 5 files of 86 (each about 700 MB) had errors. So i copied the 5
files again. 4 of the files were OK after that and coping the last file
the third time also resulted in the correct MD5.

This time i kept the defective files and used "vbindiff" to show me the
difference. Strangly in EVERY case the difference is a single bit in a
sequence of "0xff"-Bytes inside a block of varing bit-values that
changed a "0xff" into a "0xf7".
Also interesting is that each error is at a 0xXXXXXXX5-Position

Attached is a file with 5 of the 6 differences named 1-5. Of each of the
5 2x3 lines-blocks the first 3 lines are the original the following 3
lines contain the error in the middle row 6th value.

NEVER did i see any messages in syslog regarding erros or an aborting
program due to errors passed down from the kernel or something like that.

Data for the computer/software:
Hardware:
Computer is a Dual P3-933Mhz with 3GB (ECC) SD-RAM, Severworks HE-SL-Chipset
Source-HDD is a 200GB S-ATA device connected to a Promise TX-4 using libata.
Destination-HDDs: Several different models in several different
enclosures and different chipsets, mostly Genesys Logic)
USB-controller: Currently i use a EHCI/OHCI-NEC-Chipset add-on card but
since about 4-5 month ago i used a EHCI/UHCI-VIA-Chipset add-on card
with same results.
Software:
Kernel: <What was current 1 year ago> up to 2.6.18, self compiled
vanilla kernels.
I haven't tried 2.6.19 and i don't expect any changes from it.
Distribution: Debian SID


I you need any other information i will provide them as good as i can.




Bis denn

-- 
Real Programmers consider "what you see is what you get" to be just as
bad a concept in Text Editors as it is in women. No, the Real Programmer
wants a "you asked for it, you got it" text editor -- complicated,
cryptic, powerful, unforgiving, dangerous.


[-- Attachment #2: errors.txt --]
[-- Type: text/plain, Size: 2395 bytes --]

1:
245A E2F0: 0E D9 35 01 00 F4 7B F8  00 00 01 E0 09 00 80 00  ..5...{. ........
245A E300: 0D FF FF FF FF FF FF FF  FF FF FF FF FF FF DF FC  ........ ........
245A E310: 20 92 50 90 DC F4 0C 1A  1A 18 DB 80 4E 61 25 80   .P..... ....Na%.

245A E2F0: 0E D9 35 01 00 F4 7B F8  00 00 01 E0 09 00 80 00  ..5...{. ........
245A E300: 0D FF FF FF FF F7 FF FF  FF FF FF FF FF FF DF FC  ........ ........
245A E310: 20 92 50 90 DC F4 0C 1A  1A 18 DB 80 4E 61 25 80   .P..... ....Na%.

2:
24F9 F770: 00 00 01 E0 09 00 80 00  0D FF FF FF FF FF FF FF  ........ ........
24F9 F780: FF FF FF FF FF FF FC 13  64 0B 38 68 EA A2 11 86  ........ d.8h....
24F9 F790: 61 7A EE EC ED 1D 6F 31  32 6E 4D D9 B5 31 37 66  az....o1 2nM..17f

24F9 F770: 00 00 01 E0 09 00 80 00  0D FF FF FF FF FF FF FF  ........ ........
24F9 F780: FF FF FF FF FF F7 FC 13  64 0B 38 68 EA A2 11 86  ........ d.8h....
24F9 F790: 61 7A EE EC ED 1D 6F 31  32 6E 4D D9 B5 31 37 66  az....o1 2nM..17f

3:
20CB C6B0: 00 FB 3F F8 00 00 01 E0  09 00 80 80 0D 21 2A 1B  ..?..... .....!*.
20CB C6C0: 65 F1 FF FF FF FF FF FF  FF FF 3E C4 BC 2B 39 A4  e....... ..>..+9.
20CB C6D0: 8E 85 50 EB 7B 02 7B 93  79 77 50 EF 60 32 8C 03  ..P.{.{. ywP.`2..

20CB C6B0: 00 FB 3F F8 00 00 01 E0  09 00 80 80 0D 21 2A 1B  ..?..... .....!*.
20CB C6C0: 65 F1 FF FF FF F7 FF FF  FF FF 3E C4 BC 2B 39 A4  e....... ..>..+9.
20CB C6D0: 8E 85 50 EB 7B 02 7B 93  79 77 50 EF 60 32 8C 03  ..P.{.{. ywP.`2..

4:
1F13 06B0: 00 00 01 E0 09 00 80 00  0D FF FF FF FF FF FF FF  ........ ........
1F13 06C0: FF FF FF FF FF FF 7F 5C  14 05 F2 9E 90 0F 6F A4  .......\ ......o.
1F13 06D0: B8 10 BF E9 6A 78 A3 00  13 00 FD 9C 00 A5 5B EB  ....jx.. ......[.

1F13 06B0: 00 00 01 E0 09 00 80 00  0D FF FF FF FF FF FF FF  ........ ........
1F13 06C0: FF FF FF FF FF F7 7F 5C  14 05 F2 9E 90 0F 6F A4  .......\ ......o.
1F13 06D0: B8 10 BF E9 6A 78 A3 00  13 00 FD 9C 00 A5 5B EB  ....jx.. ......[.

5:
1F13 06B0: 00 00 01 E0 09 00 80 00  0D FF FF FF FF FF FF FF  ........ ........
1F13 06C0: FF FF FF FF FF FF 7F 5C  14 05 F2 9E 90 0F 6F A4  .......\ ......o.
1F13 06D0: B8 10 BF E9 6A 78 A3 00  13 00 FD 9C 00 A5 5B EB  ....jx.. ......[.

1F13 06B0: 00 00 01 E0 09 00 80 00  0D FF FF FF FF FF FF FF  ........ ........
1F13 06C0: FF FF FF FF FF F7 7F 5C  14 05 F2 9E 90 0F 6F A4  .......\ ......o.
1F13 06D0: B8 10 BF E9 6A 78 A3 00  13 00 FD 9C 00 A5 5B EB  ....jx.. ......[.


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2006-12-10 15:37 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-12-10  8:44 single bit errors on files stored on USB-HDDs via USB2/usb_storage linux
2006-12-10 15:37 ` Clemens Koller
  -- strict thread matches above, loose matches on Subject: below --
2006-12-07 18:08 [usb-storage] " Alan Stern
2006-12-07 19:41 ` Matthias Schniedermeyer
2006-12-07 23:45   ` Pete Zaitcev
2006-12-08  9:16     ` Matthias Schniedermeyer
2006-12-08 12:21       ` Stefan Richter
2006-12-08 16:18         ` John Stoffel
     [not found] <fa./xvi+/Ji/HqNkvnGjUt4pIS9goM@ifi.uio.no>
2006-12-07  0:02 ` Robert Hancock
2006-12-07  9:03   ` Matthias Schniedermeyer
     [not found] ` <fa.nPT9ZJ5poT8fZx3aWy0MqRK/gto@ifi.uio.no>
     [not found]   ` <fa.aML3aAeWqfac08XNpQa7Zu0AC8w@ifi.uio.no>
2006-12-08  3:18     ` Robert Hancock
2006-12-08  9:07       ` Matthias Schniedermeyer
2006-12-08 10:25         ` Stefan Richter
2006-12-08 10:39           ` Matthias Schniedermeyer
2006-12-08 11:01             ` Oliver Neukum
2006-12-08 12:27               ` Stefan Richter
2006-12-09  6:11               ` Ben Nizette
2006-12-09  8:18                 ` Oliver Neukum
2006-12-09 10:16                   ` Ben Nizette
2006-12-06 22:01 Matthias Schniedermeyer
2006-12-07 22:10 ` DervishD
2006-12-07 22:57   ` Matthias Schniedermeyer
2006-12-07 23:05     ` Jan Engelhardt
2006-12-08  9:32     ` DervishD

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox