linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* st corruption
@ 2001-03-22 19:41 Geert Uytterhoeven
  2001-03-22 23:14 ` Tony Mantler
  0 siblings, 1 reply; 8+ messages in thread
From: Geert Uytterhoeven @ 2001-03-22 19:41 UTC (permalink / raw)
  To: Linux/PPC Development


(cfr. my posting on linux-kernel)

I'm seeing data corruption when writing to tape. Not when reading, not when
copying between disks.

The corruption affects 32 bytes on a 32-byte boundary. The corrupted data are
always a copy of the data exactly 10240 bytes before. Note that 32 bytes is the
cache line size of a 604e, while 10240 is the default block size for tar.
Perhaps a missing sync before PCI busmastering?

My hardware: CHRP LongTrail, HP C1536 DDS1 tape drive connected to Sym53c875.
The problem happens with 2.4.3-pre4, but also with the good old
2.4.0-test1-ac10. This means all backups I have may be corrupted :-(

Anybody out there with a SCSI tape drive who's willing to do some tests?
Someone already tried with a Pentium, but no corruption, so it may be a PPC
specific problem. Just create some large files, make md5sums, tar them to tape,
untar them from tape, and verify the md5sums. I see approx. 7 blocks of
corrupted data for 256 MB of data.

Many thanks in advance!

Gr{oetje,eeting}s,

						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: st corruption
  2001-03-22 19:41 st corruption Geert Uytterhoeven
@ 2001-03-22 23:14 ` Tony Mantler
  2001-03-23  7:20   ` Geert Uytterhoeven
  2001-03-25 15:08   ` Guillaume Laures
  0 siblings, 2 replies; 8+ messages in thread
From: Tony Mantler @ 2001-03-22 23:14 UTC (permalink / raw)
  To: Geert Uytterhoeven, Linux/PPC Development


At 1:41 PM -0600 3/22/2001, Geert Uytterhoeven wrote:
[...]
>Just create some large files, make md5sums, tar them to tape,
>untar them from tape, and verify the md5sums. I see approx. 7 blocks of
>corrupted data for 256 MB of data.

merida:/home/nicoya# modprobe mesh
merida:/home/nicoya# cat /proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
  Vendor: QUANTUM  Model: FIREBALL ST4300S Rev: 0F0D
  Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 03 Lun: 00
  Vendor: MATSHITA Model: CD-ROM CR-8012   Rev: 1.0f
  Type:   CD-ROM                           ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 06 Lun: 00
  Vendor: ARCHIVE  Model: Python 25501-XXX Rev: 2.96
  Type:   Sequential-Access                ANSI SCSI revision: 02
merida:/home/nicoya# mt status
drive type = Generic SCSI-2 tape
drive status = 318767616
sense key error = 0
residue count = 0
file number = 0
block number = 0
Tape block size 512 bytes. Density code 0x13 (DDS (61000 bpi)).
Soft error count since last status=0
General status bits on (41010000):
 BOT ONLINE IM_REP_EN
merida:/home/nicoya# dd if=/dev/cdrom of=testfile bs=1024k count=256
256+0 records in
256+0 records out
merida:/home/nicoya# md5sum testfile
118c94df7aae2df0fb26dce3b13312f9  testfile
merida:/home/nicoya# tar -c testfile >/dev/st0
merida:/home/nicoya# mv testfile testfile.1
merida:/home/nicoya# tar -x </dev/st0
merida:/home/nicoya# md5sum testfile
118c94df7aae2df0fb26dce3b13312f9  testfile
merida:/home/nicoya# uname -a
Linux merida 2.4.1 #1 SMP Mon Feb 5 17:32:52 CST 2001 ppc unknown


This is with my 9600/200mp. (For the curious, the CD in the drive was my
copy of the Marathon trilogy)


Cheers - Tony 'Nicoya' Mantler :)


--
Tony "Nicoya" Mantler - Renaissance Nerd Extraordinaire - nicoya@apia.dhs.org
Winnipeg, Manitoba, Canada           --           http://nicoya.feline.pp.se/

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: st corruption
  2001-03-22 23:14 ` Tony Mantler
@ 2001-03-23  7:20   ` Geert Uytterhoeven
  2001-03-23 13:22     ` Tony Mantler
  2001-03-25 15:08   ` Guillaume Laures
  1 sibling, 1 reply; 8+ messages in thread
From: Geert Uytterhoeven @ 2001-03-23  7:20 UTC (permalink / raw)
  To: Tony Mantler; +Cc: Linux/PPC Development


On Thu, 22 Mar 2001, Tony Mantler wrote:
> At 1:41 PM -0600 3/22/2001, Geert Uytterhoeven wrote:
> [...]
> >Just create some large files, make md5sums, tar them to tape,
> >untar them from tape, and verify the md5sums. I see approx. 7 blocks of
> >corrupted data for 256 MB of data.
>
> merida:/home/nicoya# modprobe mesh
> merida:/home/nicoya# cat /proc/scsi/scsi
> Attached devices:
> Host: scsi0 Channel: 00 Id: 00 Lun: 00
>   Vendor: QUANTUM  Model: FIREBALL ST4300S Rev: 0F0D
>   Type:   Direct-Access                    ANSI SCSI revision: 02
> Host: scsi0 Channel: 00 Id: 03 Lun: 00
>   Vendor: MATSHITA Model: CD-ROM CR-8012   Rev: 1.0f
>   Type:   CD-ROM                           ANSI SCSI revision: 02
> Host: scsi0 Channel: 00 Id: 06 Lun: 00
>   Vendor: ARCHIVE  Model: Python 25501-XXX Rev: 2.96
>   Type:   Sequential-Access                ANSI SCSI revision: 02

Ugh... I don't dare to connect my DDS to the MESH. Before I had the '875, I did
it, but from time to time I got lost arbitrations corrupting data.

> 118c94df7aae2df0fb26dce3b13312f9  testfile
> merida:/home/nicoya# uname -a
> Linux merida 2.4.1 #1 SMP Mon Feb 5 17:32:52 CST 2001 ppc unknown

Hmmm... Perhaps I should retry on the MESH, just to see whether it's a MESH or
Sym53c875 problem.

Thanks for testing!

Gr{oetje,eeting}s,

						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: st corruption
  2001-03-23  7:20   ` Geert Uytterhoeven
@ 2001-03-23 13:22     ` Tony Mantler
  0 siblings, 0 replies; 8+ messages in thread
From: Tony Mantler @ 2001-03-23 13:22 UTC (permalink / raw)
  To: Geert Uytterhoeven; +Cc: Linux/PPC Development


At 1:20 AM -0600 3/23/2001, Geert Uytterhoeven wrote:
>On Thu, 22 Mar 2001, Tony Mantler wrote:
>> At 1:41 PM -0600 3/22/2001, Geert Uytterhoeven wrote:
>> [...]
>> >Just create some large files, make md5sums, tar them to tape,
>> >untar them from tape, and verify the md5sums. I see approx. 7 blocks of
>> >corrupted data for 256 MB of data.
>>
>> merida:/home/nicoya# modprobe mesh
>> merida:/home/nicoya# cat /proc/scsi/scsi
>> Attached devices:
>> Host: scsi0 Channel: 00 Id: 00 Lun: 00
>>   Vendor: QUANTUM  Model: FIREBALL ST4300S Rev: 0F0D
>>   Type:   Direct-Access                    ANSI SCSI revision: 02
>> Host: scsi0 Channel: 00 Id: 03 Lun: 00
>>   Vendor: MATSHITA Model: CD-ROM CR-8012   Rev: 1.0f
>>   Type:   CD-ROM                           ANSI SCSI revision: 02
>> Host: scsi0 Channel: 00 Id: 06 Lun: 00
>>   Vendor: ARCHIVE  Model: Python 25501-XXX Rev: 2.96
>>   Type:   Sequential-Access                ANSI SCSI revision: 02
>
>Ugh... I don't dare to connect my DDS to the MESH. Before I had the '875,
>I did
>it, but from time to time I got lost arbitrations corrupting data.

Well, the tape drive itself is actually the assembled parts of 2 broken
tape drives, so I wouldn't exactly trust it with my life anyways. ;)

It's really just sitting in my 9600 because I didn't have anywhere else
interesting to stick it.


>> 118c94df7aae2df0fb26dce3b13312f9  testfile
>> merida:/home/nicoya# uname -a
>> Linux merida 2.4.1 #1 SMP Mon Feb 5 17:32:52 CST 2001 ppc unknown
>
>Hmmm... Perhaps I should retry on the MESH, just to see whether it's a MESH or
>Sym53c875 problem.

Finding someone with Sym53c875 SCSI in a non-pmac non-x86 might help too.

It could also be that my SMP machine has a different cache flushing
profile, since both tar and the st driver would've likely been bouncing
from CPU to CPU a bit. (Am I the only one who would like to see stronger
CPU binding in SMP linux? Especially on platforms with larger caches)


Cheers - Tony 'Nicoya' Mantler :)


--
Tony "Nicoya" Mantler - Renaissance Nerd Extraordinaire - nicoya@apia.dhs.org
Winnipeg, Manitoba, Canada           --           http://nicoya.feline.pp.se/


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: st corruption
  2001-03-22 23:14 ` Tony Mantler
  2001-03-23  7:20   ` Geert Uytterhoeven
@ 2001-03-25 15:08   ` Guillaume Laures
  2001-03-25 19:21     ` Geert Uytterhoeven
  1 sibling, 1 reply; 8+ messages in thread
From: Guillaume Laures @ 2001-03-25 15:08 UTC (permalink / raw)
  To: Geert Uytterhoeven; +Cc: Tony Mantler, Linux/PPC Development


Time to try out the DDS-2 that came with my ANS 700 :-)


Le 22 Mar 2001 17:14:01 -0600, Tony Mantler a écrit :
>
> At 1:41 PM -0600 3/22/2001, Geert Uytterhoeven wrote:
> [...]
> >Just create some large files, make md5sums, tar them to tape,
> >untar them from tape, and verify the md5sums. I see approx. 7 blocks of
> >corrupted data for 256 MB of data.
>

> merida:/home/nicoya# cat /proc/scsi/scsi

Host: scsi0 Channel: 00 Id: 00 Lun: 00
  Vendor: MATSHITA Model: CD-ROM CR-8005A  Rev: 4.0i
  Type:   CD-ROM                           ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 01 Lun: 00
  Vendor: HP       Model: C1533A           Rev: 9503
  Type:   Sequential-Access                ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 02 Lun: 00
  Vendor: SEAGATE  Model: ST15150W_APL     Rev: 9503
  Type:   Direct-Access                    ANSI SCSI revision: 02

scsi0 is :
sym53c8xx: at PCI bus 0, device 17, function 0
sym53c8xx: setting PCI_COMMAND_IO...
sym53c8xx: setting PCI_COMMAND_PARITY...(fix-up)
sym53c8xx: 53c825a detected
sym53c8xx: at PCI bus 0, device 18, function 0
sym53c8xx: setting PCI_COMMAND_IO...
sym53c8xx: setting PCI_COMMAND_PARITY...(fix-up)
sym53c8xx: 53c825a detected
sym53c825a-0: rev 0x11 on pci bus 0 device 17 function 0 irq 22
sym53c825a-0: ID 7, Fast-10, Parity Checking
sym53c825a-1: rev 0x11 on pci bus 0 device 18 function 0 irq 26
sym53c825a-1: ID 7, Fast-10, Parity Checking
scsi0 : sym53c8xx-1.7.1-20000726
scsi1 : sym53c8xx-1.7.1-20000726
scsi2 : 53C94
scsi : 3 hosts.

> merida:/home/nicoya# mt status

SCSI 2 tape drive:
File number=0, block number=0, partition=0.
Tape block size 0 bytes. Density code 0x13 (DDS (61000 bpi)).
Soft error count since last status=0
General status bits on (41010000):
 BOT ONLINE IM_REP_EN

> merida:/home/nicoya# dd if=/dev/cdrom of=testfile bs=1024k count=256

256+0 records in
256+0 records out

> merida:/home/nicoya# md5sum testfile

13b37214355ea84d906a54bb14c1c0be  testfile

> merida:/home/nicoya# tar -c testfile >/dev/st0
> merida:/home/nicoya# mv testfile testfile.1
> merida:/home/nicoya# tar -x </dev/st0
> merida:/home/nicoya# md5sum testfile

13b37214355ea84d906a54bb14c1c0be  testfile

uname -a :
Linux shiner.gom.net 2.2.18-snd-bkport #26 Wed Feb 28 23:19:57 CET 2001
ppc unknown

So no prob here either with close hardware.

Cheers



--
GoM


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: st corruption
  2001-03-25 15:08   ` Guillaume Laures
@ 2001-03-25 19:21     ` Geert Uytterhoeven
  2001-03-25 20:08       ` Guillaume Laures
  0 siblings, 1 reply; 8+ messages in thread
From: Geert Uytterhoeven @ 2001-03-25 19:21 UTC (permalink / raw)
  To: Guillaume Laures; +Cc: Tony Mantler, Linux/PPC Development


On 25 Mar 2001, Guillaume Laures wrote:
> > merida:/home/nicoya# md5sum testfile
>
> 13b37214355ea84d906a54bb14c1c0be  testfile
>
> > merida:/home/nicoya# tar -c testfile >/dev/st0
> > merida:/home/nicoya# mv testfile testfile.1
> > merida:/home/nicoya# tar -x </dev/st0
> > merida:/home/nicoya# md5sum testfile
>
> 13b37214355ea84d906a54bb14c1c0be  testfile
>
> uname -a :
> Linux shiner.gom.net 2.2.18-snd-bkport #26 Wed Feb 28 23:19:57 CET 2001
> ppc unknown
>
> So no prob here either with close hardware.

Thanks! Do you also have a 2.4.x kernel around?

Or perhaps I should try 2.2.18... No idea whether 2.2.18 works on LongTrail,
though. I switched to 2.3.x a long time ago...

Gr{oetje,eeting}s,

						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: st corruption
  2001-03-25 19:21     ` Geert Uytterhoeven
@ 2001-03-25 20:08       ` Guillaume Laures
  2001-03-27 18:26         ` Geert Uytterhoeven
  0 siblings, 1 reply; 8+ messages in thread
From: Guillaume Laures @ 2001-03-25 20:08 UTC (permalink / raw)
  To: Geert Uytterhoeven; +Cc: Tony Mantler, Linux/PPC Development


Le 25 Mar 2001 21:21:20 +0200, Geert Uytterhoeven a écrit :
> On 25 Mar 2001, Guillaume Laures wrote:
> > > merida:/home/nicoya# md5sum testfile
> >
> > 13b37214355ea84d906a54bb14c1c0be  testfile
> >
> > > merida:/home/nicoya# tar -c testfile >/dev/st0
> > > merida:/home/nicoya# mv testfile testfile.1
> > > merida:/home/nicoya# tar -x </dev/st0
> > > merida:/home/nicoya# md5sum testfile
> >
> > 13b37214355ea84d906a54bb14c1c0be  testfile
> >
> > uname -a :
> > Linux shiner.gom.net 2.2.18-snd-bkport #26 Wed Feb 28 23:19:57 CET 2001
> > ppc unknown
> >
> > So no prob here either with close hardware.
>
> Thanks! Do you also have a 2.4.x kernel around?

Not yet for this machine, but soon. I'll keep you tuned.

I may do test on my G4 with an external enclosure (2.4.3-pre3, scsi =
sym53c1010-33-0 through sym53c8xx-1.7.3a-20010304)

Later,


--
GoM


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: st corruption
  2001-03-25 20:08       ` Guillaume Laures
@ 2001-03-27 18:26         ` Geert Uytterhoeven
  0 siblings, 0 replies; 8+ messages in thread
From: Geert Uytterhoeven @ 2001-03-27 18:26 UTC (permalink / raw)
  To: Linux/PPC Development


Status update:
  - When I connect my DDS1 to the MESH, I see no corruption (as long as I get
    no `lost arbitration' messages from the MESH driver. I never get those with
    the disk BTW). So the tape drive seems to be fine.
  - I wanted to try different tape drives, but all retired DDS drives I found
    at work seem to be in a non-functional state. I tried 3 of them, without
    any luck.
  - I wanted to try a 2.2.x kernel, but linuxppc_2_2 (2.2.19-pre3) just says
    `illegal instruction' and returns me to the OF prompt.

My next steps:
  - Try to understand the sym53c8xx and st drivers.
  - Look for missing sync()s and cache flushes in PCI and SCSI busmastering
    code.

Anyone else with a clue?

Gr{oetje,eeting}s,

						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2001-03-27 18:26 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-03-22 19:41 st corruption Geert Uytterhoeven
2001-03-22 23:14 ` Tony Mantler
2001-03-23  7:20   ` Geert Uytterhoeven
2001-03-23 13:22     ` Tony Mantler
2001-03-25 15:08   ` Guillaume Laures
2001-03-25 19:21     ` Geert Uytterhoeven
2001-03-25 20:08       ` Guillaume Laures
2001-03-27 18:26         ` Geert Uytterhoeven

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).