public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* ext3 corruption on 2.4.18 (LVM, vt82c586b, no DMA)
@ 2002-09-04 10:26 Marius Gedminas
  2002-09-06 17:34 ` Stephen C. Tweedie
  0 siblings, 1 reply; 4+ messages in thread
From: Marius Gedminas @ 2002-09-04 10:26 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 10322 bytes --]

There's an old Compaq Deskpro 2000 (Pentium MMX 166 MHz, 384M RAM)
that's being used as an Internet gateway (NAT) and FTP server for about
200 users.  It was previously running that other operating system, and I
helped convert it to Linux (Debian 3.0).

There are three disk drives.  Two and a half are used for a single ext3
filesystem using LVM.  That 140GB partition is only used for FTP.

About 20 hours after mke2fs the first erros started cropping up:

  kernel: EXT3-fs error (device lvm(58,0)): ext3_add_entry: bad entry in directory #8568833: rec_len %% 4 != 0 - offset=0, inode=1104134607, rec_len=16847, name_len=207
  kernel: EXT3-fs error (device lvm(58,0)): ext3_add_entry: bad entry in directory #8568833: rec_len %% 4 != 0 - offset=0, inode=1104134607, rec_len=16847, name_len=207
  kernel: EXT3-fs error (device lvm(58,0)): ext3_add_entry: bad entry in directory #8650753: rec_len %% 4 != 0 - offset=0, inode=1054162645, rec_len=16093, name_len=213
  kernel: EXT3-fs error (device lvm(58,0)): ext3_add_entry: bad entry in directory #8650753: rec_len %% 4 != 0 - offset=0, inode=1054162645, rec_len=16093, name_len=213
  kernel: EXT3-fs error (device lvm(58,0)): ext3_add_entry: bad entry in directory #8650753: rec_len %% 4 != 0 - offset=0, inode=1503025555, rec_len=22942, name_len=147
  [...]
  kernel: EXT3-fs error (device lvm(58,0)): ext3_readdir: bad entry in directory #8847362: rec_len %% 4 != 0 - offset=0, inode=1861709558, rec_len=28414, name_len=247
  [...]
  kernel: EXT3-fs warning (device lvm(58,0)): empty_dir: bad directory (dir #8830978) - no `.' or `..'
  kernel: EXT3-fs warning (device lvm(58,0)): ext3_rmdir: empty directory has nlink!=2 (3)
  [...]
  kernel: EXT3-fs error (device lvm(58,0)): ext3_readdir: bad entry in directory #8683521: rec_len is too small for name_len - offset=0, inode=11305005, rec_len=44, name_len=45
  kernel: attempt to access beyond end of device
  kernel: 3a:00: rw=0, want=1123713500, limit=154509312
  kernel: attempt to access beyond end of device
  kernel: 3a:00: rw=0, want=586860000, limit=154509312
  [...]

Unfortunately I noticed this only two days later.  e2fsck found *lots*
of errors, and it keeps restarting from the beginning for some reason.
I'm starting to have doubts if it will ever finish.

Is this an unfortunate interaction between ext3 and LVM, or should I
suspect flaky hardware?  RAM, disks, IDE cable?  There were problems
with /dev/hdd earlier that hinted a broken cable (borken model name in
hdparm -i), and the cable was replaced with a new one.

I gather from Configure.help that DMA is broken on Via VP2, but it is
turned off here.  I see only one series of hardware-related messages in
syslog:

  kernel: hda: status timeout: status=0xd0 { Busy }
  kernel: hda: no DRQ after issuing WRITE
  kernel: ide0: reset timed-out, status=0x80

But these appeared about 4 hours after the first EXT3 error messages.

The kernel comes from a Debian package (2.4.18-586tsc).  LVM also
(lvm10-1:1.0.4-4).

Exceprt from dmesg:

Uniform Multi-Platform E-IDE driver Revision: 6.31
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
VP_IDE: IDE controller on PCI bus 00 dev 39
VP_IDE: chipset revision 6
VP_IDE: not 100% native mode: will probe irqs later
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
VP_IDE: VIA vt82c586b (rev 31) IDE UDMA33 controller on pci00:07.1
    ide0: BM-DMA at 0x1400-0x1407, BIOS settings: hda:DMA, hdb:pio
    ide1: BM-DMA at 0x1408-0x140f, BIOS settings: hdc:DMA, hdd:DMA
hda: IC35L040AVVN07-0, ATA DISK drive
hdc: MAXTOR 6L060J3, ATA DISK drive
hdd: MAXTOR 6L080J4, ATA DISK drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
ide1 at 0x170-0x177,0x376 on irq 15
hda: 80418240 sectors (41174 MB) w/1863KiB Cache, CHS=79780/16/63
hdc: 117266688 sectors (60041 MB) w/1819KiB Cache, CHS=116336/16/63
hdd: 156355584 sectors (80054 MB) w/1819KiB Cache, CHS=155114/16/63
Partition check:
 /dev/ide/host0/bus0/target0/lun0: [PTBL] [5005/255/63] p1 p2 p3 p4
 /dev/ide/host0/bus1/target0/lun0: [PTBL] [7299/255/63] p1 < p5 >
 /dev/ide/host0/bus1/target1/lun0: [PTBL] [9732/255/63] p1

/proc/ide/via:

----------VIA BusMastering IDE Configuration----------------
Driver Version:                     3.29
South Bridge:                       VIA vt82c586b
Revision:                           ISA 0x31 IDE 0x6
Highest DMA rate:                   UDMA33
BM-DMA base:                        0x1400
PCI clock:                          33MHz
Master Read  Cycle IRDY:            1ws
Master Write Cycle IRDY:            1ws
BM IDE Status Register Read Retry:  yes
Max DRDY Pulse Width:               No limit
-----------------------Primary IDE-------Secondary IDE------
Read DMA FIFO flush:          yes                 yes
End Sector FIFO flush:         no                  no
Prefetch Buffer:              yes                 yes
Post Write Buffer:            yes                 yes
Enabled:                      yes                 yes
Simplex only:                  no                  no
Cable Type:                   40w                 40w
-------------------drive0----drive1----drive2----drive3-----
Transfer Mode:        PIO       PIO       PIO       PIO
Address Setup:       30ns     120ns      30ns      30ns
Cmd Active:          90ns      90ns      90ns      90ns
Cmd Recovery:        30ns      30ns      30ns      30ns
Data Active:         90ns     330ns      90ns      90ns
Data Recovery:       30ns     270ns      30ns      30ns
Cycle Time:         120ns     600ns     120ns     120ns
Transfer Rate:   16.5MB/s   3.3MB/s  16.5MB/s  16.5MB/s

lspci -v

00:00.0 Host bridge: VIA Technologies, Inc. VT82C595/97 [Apollo VP2/97] (rev 03)
        Flags: bus master, 66Mhz, medium devsel, latency 8

00:02.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139 (rev 10)
        Subsystem: Compex: Unknown device 8139
        Flags: bus master, medium devsel, latency 66, IRQ 11
        I/O ports at 1800 [size=256]
        Memory at 44000000 (32-bit, non-prefetchable) [size=256]
        Expansion ROM at <unassigned> [disabled] [size=64K]
        Capabilities: [50] Power Management version 2

00:03.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139 (rev 10)
        Subsystem: Compex: Unknown device 8139
        Flags: bus master, medium devsel, latency 66, IRQ 11
        I/O ports at 2000 [size=256]
        Memory at 44200000 (32-bit, non-prefetchable) [size=256]
        Expansion ROM at <unassigned> [disabled] [size=64K]
        Capabilities: [50] Power Management version 2

00:04.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139 (rev 10)
        Subsystem: Compex: Unknown device 8139
        Flags: bus master, medium devsel, latency 66, IRQ 11
        I/O ports at 1c00 [size=256]
        Memory at 44100000 (32-bit, non-prefetchable) [size=256]
        Expansion ROM at <unassigned> [disabled] [size=64K]
        Capabilities: [50] Power Management version 2

00:07.0 ISA bridge: VIA Technologies, Inc. VT82C586/A/B PCI-to-ISA [Apollo VP] (rev 31)
        Flags: bus master, medium devsel, latency 0

00:07.1 IDE interface: VIA Technologies, Inc. Bus Master IDE (rev 06) (prog-if 8a [Master SecP PriP])
        Flags: bus master, medium devsel, latency 66
        I/O ports at 1400 [size=16]

00:07.2 USB Controller: VIA Technologies, Inc. UHCI USB (rev 02) (prog-if 00 [UHCI])
        Subsystem: Unknown device 0925:1234
        Flags: bus master, medium devsel, latency 66, IRQ 11
        I/O ports at 1420 [size=32]

00:07.3 Non-VGA unclassified device: VIA Technologies, Inc. VT82C586B ACPI
        Flags: medium devsel
        I/O ports at 1000 [size=256]

00:0f.0 VGA compatible controller: S3 Inc. Trio 64V2/DX or /GX (rev 04) (prog-if 00 [VGA])
        Subsystem: Compaq Computer Corporation: Unknown device b031
        Flags: medium devsel, IRQ 11
        Memory at 40000000 (32-bit, non-prefetchable) [size=64M]
        Expansion ROM at <unassigned> [disabled] [size=64K]


hdparm -i /dev/hda

/dev/hda:

 Model=IC35L040AVVN07-0, FwRev=VA2OAG0A, SerialNo=VNP210B2R7MVGB
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=52
 BuffType=DualPortCache, BuffSize=1863kB, MaxMultSect=16, MultSect=off
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=80418240
 IORDY=on/off, tPIO={min:240,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes: pio0 pio1 pio2 pio3 pio4 
 DMA modes: mdma0 mdma1 mdma2 udma0 udma1 *udma2 udma3 udma4 udma5 
 AdvancedPM=yes: disabled (255) WriteCache=enabled
 Drive Supports : ATA/ATAPI-5 T13 1321D revision 1 : ATA-2 ATA-3 ATA-4 ATA-5 

hdparm -i /dev/hdc

/dev/hdc:

 Model=MAXTOR 6L060J3, FwRev=A93.0500, SerialNo=663202014354
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs }
 RawCHS=16383/16/63, TrkSize=32256, SectSize=21298, ECCbytes=4
 BuffType=DualPortCache, BuffSize=1819kB, MaxMultSect=16, MultSect=off
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=117266688
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes: pio0 pio1 pio2 pio3 pio4 
 DMA modes: mdma0 mdma1 mdma2 udma0 udma1 *udma2 udma3 udma4 udma5 udma6 
 AdvancedPM=no WriteCache=enabled
 Drive Supports : ATA/ATAPI-5 T13 1321D revision 1 : ATA-1 ATA-2 ATA-3 ATA-4 ATA-5 

hdparm -i /dev/hdd

/dev/hdd:

 Model=MAXTOR 6L080J4, FwRev=A93.0500, SerialNo=664203751576
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs }
 RawCHS=16383/16/63, TrkSize=32256, SectSize=21298, ECCbytes=4
 BuffType=DualPortCache, BuffSize=1819kB, MaxMultSect=16, MultSect=off
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=156355584
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes: pio0 pio1 pio2 pio3 pio4 
 DMA modes: mdma0 mdma1 mdma2 udma0 udma1 *udma2 udma3 udma4 udma5 udma6 
 AdvancedPM=no WriteCache=enabled
 Drive Supports : ATA/ATAPI-5 T13 1321D revision 1 : ATA-1 ATA-2 ATA-3 ATA-4 ATA-5 


Marius Gedminas
-- 
This host is a black hole at HTTP wavelengths. GETs go in, and nothing
comes out, not even Hawking radiation.
		-- Graaagh the Mighty on rec.games.roguelike.angband

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: ext3 corruption on 2.4.18 (LVM, vt82c586b, no DMA)
  2002-09-04 10:26 ext3 corruption on 2.4.18 (LVM, vt82c586b, no DMA) Marius Gedminas
@ 2002-09-06 17:34 ` Stephen C. Tweedie
  2002-09-06 19:02   ` Marius Gedminas
  2002-09-06 21:04   ` Alan Cox
  0 siblings, 2 replies; 4+ messages in thread
From: Stephen C. Tweedie @ 2002-09-06 17:34 UTC (permalink / raw)
  To: linux-kernel, Marius Gedminas; +Cc: ext2-devel, Stephen Tweedie

Hi,

On Wed, Sep 04, 2002 at 12:26:06PM +0200, Marius Gedminas wrote:
> There's an old Compaq Deskpro 2000 (Pentium MMX 166 MHz, 384M RAM)
> that's being used as an Internet gateway (NAT) and FTP server for about
> 200 users.  It was previously running that other operating system, and I
> helped convert it to Linux (Debian 3.0).

> About 20 hours after mke2fs the first erros started cropping up:
> 
>   kernel: EXT3-fs error (device lvm(58,0)): ext3_add_entry: bad entry in directory #8568833: rec_len %% 4 != 0 - offset=0, inode=1104134607, rec_len=16847, name_len=207

Well, there are a couple of ext3 fixes that have just been merged into
Marcelo's bk tree, so you could try that and see if it helps.
However, I suspect it won't, because:

> Unfortunately I noticed this only two days later.  e2fsck found *lots*
> of errors, and it keeps restarting from the beginning for some reason.
> I'm starting to have doubts if it will ever finish.

This suggests that e2fsck is finding new corruption each time it is
scanning the disk.  That sounds as if a hardware or driver-level
problem is more likely.  What sorts of errors are you getting from the
fsck passes?

> Is this an unfortunate interaction between ext3 and LVM, or should I
> suspect flaky hardware?  RAM, disks, IDE cable?  There were problems
> with /dev/hdd earlier that hinted a broken cable (borken model name in
> hdparm -i), and the cable was replaced with a new one.

One thing that would help would be to try a surface scan which writes
stuff to the disk and verifies it.  The "badblocks" code from e2fsck
can do that, but the most effective form of "badblocks" for such a
case is highly destructive to your data, so it's only useful if you
don't need to preserve the data already on the filesystem.

> I gather from Configure.help that DMA is broken on Via VP2, but it is
> turned off here.

Unfortunately, if you disable UDMA mode, you also lose the checksums
between drive and controller which can detect cable data corruption.

Cheers,
 Stephen

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: ext3 corruption on 2.4.18 (LVM, vt82c586b, no DMA)
  2002-09-06 17:34 ` Stephen C. Tweedie
@ 2002-09-06 19:02   ` Marius Gedminas
  2002-09-06 21:04   ` Alan Cox
  1 sibling, 0 replies; 4+ messages in thread
From: Marius Gedminas @ 2002-09-06 19:02 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: linux-kernel, ext2-devel

[-- Attachment #1: Type: text/plain, Size: 4352 bytes --]

On Fri, Sep 06, 2002 at 06:34:15PM +0100, Stephen C. Tweedie wrote:
> On Wed, Sep 04, 2002 at 12:26:06PM +0200, Marius Gedminas wrote:
> > There's an old Compaq Deskpro 2000 (Pentium MMX 166 MHz, 384M RAM)
> > that's being used as an Internet gateway (NAT) and FTP server for about
> > 200 users.  It was previously running that other operating system, and I
> > helped convert it to Linux (Debian 3.0).
> 
> > About 20 hours after mke2fs the first erros started cropping up:
> > 
> >   kernel: EXT3-fs error (device lvm(58,0)): ext3_add_entry: bad entry in directory #8568833: rec_len %% 4 != 0 - offset=0, inode=1104134607, rec_len=16847, name_len=207
> 
> Well, there are a couple of ext3 fixes that have just been merged into
> Marcelo's bk tree, so you could try that and see if it helps.
> However, I suspect it won't, because:
> 
> > Unfortunately I noticed this only two days later.

(Now all partitions are mounted with errors=remount-ro.  I didn't know
that wasn't the default.)

> > e2fsck found *lots*
> > of errors, and it keeps restarting from the beginning for some reason.
> > I'm starting to have doubts if it will ever finish.
> 
> This suggests that e2fsck is finding new corruption each time it is
> scanning the disk.  That sounds as if a hardware or driver-level
> problem is more likely.  What sorts of errors are you getting from the
> fsck passes?

Mostly "Inode XXX is in use, but has dtime set", some "Inode XXX has
compression flag set on filesystem without compression support", a
couple of "Inode XXX has illegal block(s)", which are followed by "Too
many illegal blocks in inode XXX" (AFAIU that forces the restart of pass
1).

I have a full typescript (1.4M) if you're interested.  The different
iterations of pass 1 are very similair but there are also differenctes.
I got tired after five iterations and just mkfs'ed it again.

This is starting to look more and more like a hardware problem.  The
next day I suddenly noticed that / and /var (both on /dev/hda) were
remounted read-only, probably because of errors.  That 140G LVM
partition was still rw.  An ls or grep would fail in any directory, but
some of the files could be accessed by their full path.  Trying to
access other files (even files in /proc) would result in I/O or Bus
error.  Actually, I'm only 90% sure about /proc -- could be it was
/usr/bin/less that was unreadable.  I was a bit excited at the time.
But I usually cat things in /proc, and I clearly remember that I could
successfully cat some files in /etc.

A reboot helped, but the syslog contained nothing interesting.  I ran
memtest for at least 20 hours -- it found no errors.  I think I already
mentioned that read-only badblock scan also found nothing.

BTW both /, /var and about 20GB part of LVM are on /dev/hda, which is an
40G IBM drive.  It might be pure superstition, but the words "IBM hard
drive" do not inspire much confidence to me.

> > Is this an unfortunate interaction between ext3 and LVM, or should I
> > suspect flaky hardware?  RAM, disks, IDE cable?  There were problems
> > with /dev/hdd earlier that hinted a broken cable (borken model name in
> > hdparm -i), and the cable was replaced with a new one.
> 
> One thing that would help would be to try a surface scan which writes
> stuff to the disk and verifies it.  The "badblocks" code from e2fsck
> can do that, but the most effective form of "badblocks" for such a
> case is highly destructive to your data, so it's only useful if you
> don't need to preserve the data already on the filesystem.

It might be difficult to arrange at the moment, but I'll keep it in mind.

> > I gather from Configure.help that DMA is broken on Via VP2, but it is
> > turned off here.
> 
> Unfortunately, if you disable UDMA mode, you also lose the checksums
> between drive and controller which can detect cable data corruption.

Ah, that's good to know.  Is there a kernel tree that supports DMA on
VP2?  I think I heard somewhere that the new IDE code did that, but I
wasn't paying much attention to its fate lately.

Thanks,
Marius Gedminas
-- 
It's not illegal to disagree with my opinions (*).
[...]
(*) Although it obviously _should_ be. Mwhaahahahahaaa... You unbelievers
will all be shot when the revolution comes!
		-- Linus Torvalds

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: ext3 corruption on 2.4.18 (LVM, vt82c586b, no DMA)
  2002-09-06 17:34 ` Stephen C. Tweedie
  2002-09-06 19:02   ` Marius Gedminas
@ 2002-09-06 21:04   ` Alan Cox
  1 sibling, 0 replies; 4+ messages in thread
From: Alan Cox @ 2002-09-06 21:04 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: linux-kernel, Marius Gedminas, ext2-devel

On Fri, 2002-09-06 at 18:34, Stephen C. Tweedie wrote:
> > I gather from Configure.help that DMA is broken on Via VP2, but it is
> > turned off here.
> 
> Unfortunately, if you disable UDMA mode, you also lose the checksums
> between drive and controller which can detect cable data corruption.

I's have to look it up to be sure but I believe the VIA VP2 goes to DMA
not UDMA


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2002-09-06 21:00 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-09-04 10:26 ext3 corruption on 2.4.18 (LVM, vt82c586b, no DMA) Marius Gedminas
2002-09-06 17:34 ` Stephen C. Tweedie
2002-09-06 19:02   ` Marius Gedminas
2002-09-06 21:04   ` Alan Cox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox