public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* strange ext3 corruption problem on 2.6.x
@ 2004-03-13  0:47 Marc Lehmann
  2004-03-13  2:34 ` Andrew Morton
  0 siblings, 1 reply; 8+ messages in thread
From: Marc Lehmann @ 2004-03-13  0:47 UTC (permalink / raw)
  To: linux-kernel

I use lvm-over-raid5 and get these messages once a day (requiring a reboot
afterwards):

   EXT3-fs error (device dm-0): ext3_readdir: bad entry in directory #4804801: directory entry across blocks - offset=0, inode=0, rec_len=50000,
   name_len=152
   Aborting journal on device dm-0.
   add_dirent_to_buf: aborting transaction: Journal has aborted in __ext3_journal_get_write_access<2>EXT3-fs error (device dm-0) in add_dirent_to
   _buf: Journal has aborted
   EXT3-fs error (device dm-0) in ext3_writeback_writepage: IO failure
   EXT3-fs error (device dm-0) in ext3_writeback_writepage: IO failure
   ext3_abort called.
   EXT3-fs abort (device dm-0): ext3_journal_start: Detected aborted journal
   Remounting filesystem read-only
   EXT3-fs error (device dm-0) in start_transaction: Journal has aborted
   EXT3-fs error (device dm-0) in ext3_delete_inode: Journal has aborted
   EXT3-fs error (device dm-0) in ext3_create: Journal has aborted
   EXT3-fs error (device dm-0): ext3_readdir: bad entry in directory #4804801: directory entry across blocks - offset=0, inode=0, rec_len=50000,
   name_len=152
   EXT3-fs error (device dm-0): ext3_readdir: bad entry in directory #4804801: directory entry across blocks - offset=0, inode=0, rec_len=50000,
   name_len=152
   EXT3-fs error (device dm-0): ext3_readdir: bad entry in directory #4804801: directory entry across blocks - offset=0, inode=0, rec_len=50000, 
   name_len=152

e2fsck after rebooting shows no errors and nothing to fix, and in fact, in
this very incident my home directory was missing, after rebooting it was
there again, so so far this doesn't look like on-disk data corruption.

About my configuration:

5 IDE disks were combined into one raid5, with lvm on top. Theer are
two lvs on the raid, one formatted with ext3 and one with reiserfs. the
array was not degraded and not rebuilding. Data throughput under 2.6 is
much lower than under 2.4, though (and 2.6 takes enourmous amounts of cpu
for reading from the raid5 array), but this issue is probably a seperate
problem.

Both partitions currently undergo heavy filesystem activity, mainly
untar'ing big tars with lots of medium-sized files (e.g. 10gb of jpeg
files, or cvs directories).

Reiserfs so far never gave a problem, neither did ext3 filesystems on
normal harddisk partitions (although the latter ones were never under
write stress like the partitions on the lv partitions).

There are no other kernel messages between mounting the volume and the
problem.

I can use this machine for many hours under no stress without any
problems.

I had these problems on 2.6.3 and 2.6.4, other 2.6. kernels have not been
tested.

Using 2.4 on the same machine (lvm1) doesn't show any problems (the
machine is a dual P-III 1ghz).

Summary: the ext3 partition regularly gives me these problems (about once
per day), while reiserfs on the same device does not.  Neither of them
make problems under 2.4.

Hope that helps,

-- 
      -----==-                                             |
      ----==-- _                                           |
      ---==---(_)__  __ ____  __       Marc Lehmann      +--
      --==---/ / _ \/ // /\ \/ /       pcg@goof.com      |e|
      -=====/_/_//_/\_,_/ /_/\_\       XX11-RIPE         --+
    The choice of a GNU generation                       |
                                                         |

^ permalink raw reply	[flat|nested] 8+ messages in thread
[parent not found: <20040314222929.GA23106@mark>]
[parent not found: <1zRh6-2V6-9@gated-at.bofh.it>]
* Re: strange ext3 corruption problem on 2.6.x
@ 2004-03-23  7:33 John Pearson
  2004-03-23  7:54 ` Andrew D Kirch
  0 siblings, 1 reply; 8+ messages in thread
From: John Pearson @ 2004-03-23  7:33 UTC (permalink / raw)
  To: linux-kernel

OK,

I've seen this one now, too; here's my datapoint:

First, under vanilla 2.6.3:

EXT3-fs error (device dm-0): ext3_readdir: bad entry in directory 
#917711: rec_len % 4 != 0 - offset=0, inode=1182746341, rec_len=16861, 
name_len=185
Aborting journal on device dm-0.
ext3_abort called.
EXT3-fs abort (device dm-0): ext3_journal_start: Detected aborted journal
Remounting filesystem read-only
 


Then, under 2.6.4+skas3:
 

EXT3-fs error (device dm-3): ext3_readdir: bad entry in directory 
#510327: directory entry across blocks - offset=0, inode=0, 
rec_len=5044, name_len=113
Aborting journal on device dm-3.
ext3_abort called.
EXT3-fs abort (device dm-3): ext3_journal_start: Detected aborted journal
Remounting filesystem read-only



I'm running ext3 over raid5;  In both cases, fsck spotted the aborted 
journal and checked the FS, which came up clean.

No other issues in many days of uptime, including kernel compiles, etc., 
so I'm reasonably confident of the RAM and hardware generally.

I wouldn't describe either volume as seeing heavy use - there's rarely 
more than one reader, and almost never more than one writer.

dm-3 has had no writes since last boot (it serves images to diskless 
clients, including NFS roots mounted ro); dm-0 had seen a few writes 
(it's a read-mostly FTP server containing mirrors of debian-security and 
a few other things, synced about once a month).

'directory #510327' on dm-3 is a manpage directory, which shows a size 
of 20480 and contains 751 files; 'directory #917711' on dm-0 has a size 
of 8192 and contains 101 files.

The box is a UMP Athlon XP with 512Mb DDR RAM on a VIA VT8237-based
board, using on-board IDE + a Promise 20268 controller (but as the RAID 
layer works, I doubt it's the hardware).

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2004-03-25 14:24 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-03-13  0:47 strange ext3 corruption problem on 2.6.x Marc Lehmann
2004-03-13  2:34 ` Andrew Morton
2004-03-13  2:40   ` Marc Singer
     [not found] <20040314222929.GA23106@mark>
2004-03-15  3:41 ` Marc Lehmann
2004-03-18  7:25   ` Pavel Machek
     [not found] <1zRh6-2V6-9@gated-at.bofh.it>
     [not found] ` <1zRh6-2V6-7@gated-at.bofh.it>
2004-03-15 23:25   ` Thorild Selen
  -- strict thread matches above, loose matches on Subject: below --
2004-03-23  7:33 John Pearson
2004-03-23  7:54 ` Andrew D Kirch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox