public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* fs corruption with 2.4.20 IDE+md+LVM
@ 2003-01-05  4:45 Carl Wilhelm Soderstrom
  2003-01-05  5:48 ` Carl Wilhelm Soderstrom
  0 siblings, 1 reply; 6+ messages in thread
From: Carl Wilhelm Soderstrom @ 2003-01-05  4:45 UTC (permalink / raw)
  To: linux-kernel

I observed filesystem corruption on my home workstation recently. I was
running kernel 2.4.20 (built myself with gcc 2.95.4), and ext3 with the
default journaling mode (ordered?).

I was downloading files, and noticed that they weren't being saved. I
immediately did a 'df -h', and it reported my home partition as having 7.3T
used, -64Z free.

I (foolishly) immediately did a 'du -sch ~/*' to see what might be taking up
all the space. after realizing what was going on (du reported filesystem
permission errors on files it shouldn't have), I shut down all programs, and
dropped to runlevel 1. 

I unmounted my LVM'ed partitions (/var /usr /home), and tried to fsck
/dev/sys/home (the /home partition). it couldn't find a good superblock; and
fell back to using another backup superblock. fsck reported that the journal
was corrupt, and discarded it. many of the low-numbered inodes had wrong
refcounts, or wrong modes.

eventually it fixed the filesystem; but everything ended up in many files &
directories under lost+found. (had to pull the home dirs from one or more
dirs each, under lost+found).

after fixing the filesystem, I gratuitously fsck -f'ed all my other
partitions; they came up clean.

fortunately, looks like the only stuff I really lost were some chunks of my
XFree86 source tree, and some linux kernel sources. easily replaceable
stuff.

here's my system architecture:
2x Western Digital 80GB Special Edition IDE drives (hde, hdf)
- / is an ext3 RAID1 /dev/md0 made of hde1 and hdf1
- /dev/md1 is LVM-formatted RAID1, made of hde2 and hdf2. this partition
contains /var, /usr, and /home. 

/home is the only place that I saw this corruption.

I have since reverted back to kernel 2.4.18.

I'm thinking that my reaction *should* have been to power-cycle the box
immediately upon notice of the problem, to prevent further fs corruption,
and bring it back up in single-user read-only mode. shutting down programs
nicely would have written more stuff to disk, worsening the corruption.

I will also point out that kernel 2.4.20-ac1 and 2.4.21-pre6 will not boot
on my machine; they kernel panic when detecting my IDE devices. I have not
tried 2.4.20-ac2 nor 2.4.21-pre2 yet. 2.4.20 and 2.4.18 boot quite happily
tho. I suppose I ought to try the latest versions and set up a serial
console to capture the oops, before reporting a bug on this.

Carl Soderstrom.
-- 
Systems Administrator
Real-Time Enterprises
www.real-time.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: fs corruption with 2.4.20 IDE+md+LVM
  2003-01-05  4:45 Carl Wilhelm Soderstrom
@ 2003-01-05  5:48 ` Carl Wilhelm Soderstrom
  0 siblings, 0 replies; 6+ messages in thread
From: Carl Wilhelm Soderstrom @ 2003-01-05  5:48 UTC (permalink / raw)
  To: linux-kernel

On Sat, Jan 04, 2003 at 10:45:00PM -0600, Carl Wilhelm Soderstrom wrote:
> I observed filesystem corruption on my home workstation recently. I was
> running kernel 2.4.20 (built myself with gcc 2.95.4), and ext3 with the
> default journaling mode (ordered?).

I should probably include some details about my IDE devices.

here's the controller for the devices in question. it's the second
controller on the mobo. (first controller is only ATA-66)

00:11.0 Unknown mass storage controller: Promise Technology, Inc. 20265 (rev
02)
        Subsystem: Promise Technology, Inc. Ultra100
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
        Latency: 32
        Interrupt: pin A routed to IRQ 9
        Region 0: I/O ports at 9400 [size=8]
        Region 1: I/O ports at 9000 [size=4]
        Region 2: I/O ports at 8800 [size=8]
        Region 3: I/O ports at 8400 [size=4]
        Region 4: I/O ports at 8000 [size=64]
        Region 5: Memory at de800000 (32-bit, non-prefetchable) [size=128K]
        Expansion ROM at <unassigned> [disabled] [size=64K]
        Capabilities: <available only to root>

and here's the output of hdparm. (yes, I know it could probably get tweaked
a bit for performance. this is a brand-new drive arrangement, and I was
trying to run it in a 'safe' setting for a while to see if anything would go
wrong. well, it did).

~# hdparm /dev/hde

/dev/hde:
 multcount    =  0 (off)
 IO_support   =  0 (default 16-bit)
 unmaskirq    =  0 (off)
 using_dma    =  1 (on)
 keepsettings =  0 (off)
 readonly     =  0 (off)
 readahead    =  8 (on)
 geometry     = 155061/16/63, sectors = 156301488, start = 0
~# hdparm /dev/hdf

/dev/hdf:
 multcount    =  0 (off)
 IO_support   =  0 (default 16-bit)
 unmaskirq    =  0 (off)
 using_dma    =  1 (on)
 keepsettings =  0 (off)
 readonly     =  0 (off)
 readahead    =  8 (on)
 geometry     = 155061/16/63, sectors = 156301488, start = 0

Carl Soderstrom.
-- 
Systems Administrator
Real-Time Enterprises
www.real-time.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: fs corruption with 2.4.20 IDE+md+LVM
@ 2003-01-06  2:14 Dmitry Volkoff
  2003-01-06  4:49 ` Carl Wilhelm Soderstrom
  0 siblings, 1 reply; 6+ messages in thread
From: Dmitry Volkoff @ 2003-01-06  2:14 UTC (permalink / raw)
  To: linux-kernel

> I observed filesystem corruption on my home workstation recently. I was
> running kernel 2.4.20 (built myself with gcc 2.95.4), and ext3 with the
> default journaling mode (ordered?).

Hello, 

Same problem here. I have software raid-1 on 2 IDE Seagate 80G, kernel 
2.4.20aa1 built with gcc-3.2, all filesystems are ext2, no LVM. 
FS corruption after running Cerberus test for about 8 hours. 

> I will also point out that kernel 2.4.20-ac1 and 2.4.21-pre6 will not 
> boot on my machine; they kernel panic when detecting my IDE devices. 

I can confirm. Kernel 2.4.21-pre2 does not boot from a RAID device 
(/dev/md0). 

-- 

    D.V.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: fs corruption with 2.4.20 IDE+md+LVM
  2003-01-06  2:14 fs corruption with 2.4.20 IDE+md+LVM Dmitry Volkoff
@ 2003-01-06  4:49 ` Carl Wilhelm Soderstrom
  2003-01-06 15:02   ` Alan Cox
  0 siblings, 1 reply; 6+ messages in thread
From: Carl Wilhelm Soderstrom @ 2003-01-06  4:49 UTC (permalink / raw)
  To: Dmitry Volkoff; +Cc: linux-kernel

On Mon, Jan 06, 2003 at 05:14:12AM +0300, Dmitry Volkoff wrote:
> > I observed filesystem corruption on my home workstation recently. I was
> > running kernel 2.4.20 (built myself with gcc 2.95.4), and ext3 with the
> > default journaling mode (ordered?).
> 
> Hello, 
> 
> Same problem here. I have software raid-1 on 2 IDE Seagate 80G, kernel 
> 2.4.20aa1 built with gcc-3.2, all filesystems are ext2, no LVM. 
> FS corruption after running Cerberus test for about 8 hours. 

glad to know I'm not the only one.

someone pointed out to me in a private e-mail, that the corruption may be
related to my VIA KT133 chipset. (they had a similar problem).

> > I will also point out that kernel 2.4.20-ac1 and 2.4.21-pre6 will not 
> > boot on my machine; they kernel panic when detecting my IDE devices. 
> 
> I can confirm. Kernel 2.4.21-pre2 does not boot from a RAID device 
> (/dev/md0). 

sorry about the thinko in my mail. I meant 2.4.21-pre1. Glad to know I'm not
crazy, but hopefully confirmation means it'll get fixed before 2.4.21-final.

<flamebait>
maybe I just missed the arguments since I wasn't reading LKML at the time;
but *why* is IDE being revamped in the middle of a "stable" kernel series?
however better it may be, I don't regard the existing situation as being bad
enough to justify the risk.
</flamebait>

Carl Soderstrom.
-- 
Systems Administrator
Real-Time Enterprises
www.real-time.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: fs corruption with 2.4.20 IDE+md+LVM
  2003-01-06  4:49 ` Carl Wilhelm Soderstrom
@ 2003-01-06 15:02   ` Alan Cox
  2003-01-06 16:21     ` Carl Wilhelm Soderstrom
  0 siblings, 1 reply; 6+ messages in thread
From: Alan Cox @ 2003-01-06 15:02 UTC (permalink / raw)
  To: Carl Wilhelm Soderstrom; +Cc: Dmitry Volkoff, Linux Kernel Mailing List

On Mon, 2003-01-06 at 04:49, Carl Wilhelm Soderstrom wrote:
> <flamebait>
> maybe I just missed the arguments since I wasn't reading LKML at the time;
> but *why* is IDE being revamped in the middle of a "stable" kernel series?
> however better it may be, I don't regard the existing situation as being bad
> enough to justify the risk.
> </flamebait>

You are reporting problems in 2.4.20. 2.4.20 doesn't have the revamped IDE...

The IDE is getting updated because

- Lots of new controllers dont work with the old code
- Lots of LBA48 problems exist with the older code
- SATA is right out with the older code
- Several existing controllers have weird bugs with the older code

I'd much prefer we didn't have to update the IDE too 8)


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: fs corruption with 2.4.20 IDE+md+LVM
  2003-01-06 15:02   ` Alan Cox
@ 2003-01-06 16:21     ` Carl Wilhelm Soderstrom
  0 siblings, 0 replies; 6+ messages in thread
From: Carl Wilhelm Soderstrom @ 2003-01-06 16:21 UTC (permalink / raw)
  To: Alan Cox; +Cc: Dmitry Volkoff, Linux Kernel Mailing List

On Mon, Jan 06, 2003 at 03:02:02PM +0000, Alan Cox wrote:
> You are reporting problems in 2.4.20. 2.4.20 doesn't have the revamped IDE...

I know. which is why I put that comment after the section of my mail
regarding md bugs in 2.4.21

> The IDE is getting updated because
> 
> - Lots of new controllers dont work with the old code
> - Lots of LBA48 problems exist with the older code
> - SATA is right out with the older code
> - Several existing controllers have weird bugs with the older code
> 
> I'd much prefer we didn't have to update the IDE too 8)

ok. I didn't see an extensive discussion of this on any of the kernel-digest
forums (kerneltrap, kernel traffic). 
I'll trust that you're doing the right thing, and try to avoid stepping in
any other flamebait. ;)

Carl Soderstrom.
-- 
Systems Administrator
Real-Time Enterprises
www.real-time.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2003-01-06 16:12 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-01-06  2:14 fs corruption with 2.4.20 IDE+md+LVM Dmitry Volkoff
2003-01-06  4:49 ` Carl Wilhelm Soderstrom
2003-01-06 15:02   ` Alan Cox
2003-01-06 16:21     ` Carl Wilhelm Soderstrom
  -- strict thread matches above, loose matches on Subject: below --
2003-01-05  4:45 Carl Wilhelm Soderstrom
2003-01-05  5:48 ` Carl Wilhelm Soderstrom

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox