linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Htree ate my hard drive, was: post-halloween 0.2
@ 2002-10-31 11:19 Petr Vandrovec
  2002-10-31 21:42 ` [Ext2-devel] " chrisl
  2002-10-31 22:03 ` Theodore Ts'o
  0 siblings, 2 replies; 6+ messages in thread
From: Petr Vandrovec @ 2002-10-31 11:19 UTC (permalink / raw)
  To: Duncan Sands; +Cc: Linux Kernel, ext2-devel, adilger

On 31 Oct 02 at 9:20, Duncan Sands wrote:
> > I wonder if there is still a bug in the e2fsck code for re-hashing
> > directories?  It shouldn't be possible to have e2fsck complete and
> > there still be an error in the filesystem (ok, sometimes it happens,
> > but in those cases it spews a lot of warnings about the filesystem
> > not being fixed yet and to run manually).
> 
> It is possible that the filesystem was fine when fsck completed, but
> was damaged afterwards, i.e. in the time between fsck completing
> and the reboot.

Just stupid idea. Two or three months ago I complained that if
my box crashes shortly after boot, following things happen:

(1) system for some reason reads /var/run directory to page cache
(2) fsck finds that /var/run/* entries points to invalid nodes, and
    removes them (through block device access)
(4) / is remounted read-write    
(5) because of page cache for block device and directory is not
    coherent (or what...), system still sees /var/run/* populated
(6) rm /var/run/* is run. FS is remounted read-only due to
    freeing inode already freeed...
(7) Reboot, run fsck again, reboot, fine...
    
Nobody answered it at that time, and it happened at least 5 times
again to me - until I modified initscripts to do unconditional
reboot if "fsck /" did ANY modifications to filesystem.

Maybe kernel still uses old directory indexes structure after
fsck created new one?
                                            Best regards,
                                                Petr Vandrovec
                                                vandrove@vc.cvut.cz
                                                

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Ext2-devel] Re: Htree ate my hard drive, was: post-halloween 0.2
  2002-10-31 11:19 Htree ate my hard drive, was: post-halloween 0.2 Petr Vandrovec
@ 2002-10-31 21:42 ` chrisl
  2002-10-31 22:03 ` Theodore Ts'o
  1 sibling, 0 replies; 6+ messages in thread
From: chrisl @ 2002-10-31 21:42 UTC (permalink / raw)
  To: Petr Vandrovec; +Cc: Duncan Sands, Linux Kernel, ext2-devel, adilger

On Thu, Oct 31, 2002 at 01:19:23PM +0200, Petr Vandrovec wrote:
> On 31 Oct 02 at 9:20, Duncan Sands wrote:
> > > I wonder if there is still a bug in the e2fsck code for re-hashing
> > > directories?  It shouldn't be possible to have e2fsck complete and
> > > there still be an error in the filesystem (ok, sometimes it happens,
> > > but in those cases it spews a lot of warnings about the filesystem
> > > not being fixed yet and to run manually).
> > 
> > It is possible that the filesystem was fine when fsck completed, but
> > was damaged afterwards, i.e. in the time between fsck completing
> > and the reboot.
> 
> Just stupid idea. Two or three months ago I complained that if
> my box crashes shortly after boot, following things happen:
> 
> (1) system for some reason reads /var/run directory to page cache
> (2) fsck finds that /var/run/* entries points to invalid nodes, and
>     removes them (through block device access)
> (4) / is remounted read-write    
> (5) because of page cache for block device and directory is not
>     coherent (or what...), system still sees /var/run/* populated
> (6) rm /var/run/* is run. FS is remounted read-only due to
>     freeing inode already freeed...
> (7) Reboot, run fsck again, reboot, fine...
>     
> Nobody answered it at that time, and it happened at least 5 times
> again to me - until I modified initscripts to do unconditional
> reboot if "fsck /" did ANY modifications to filesystem.
> 
> Maybe kernel still uses old directory indexes structure after
> fsck created new one?

File system needs to unmount and remount after e2fsck packed
directory index. Kernel need to trash all the dentry cache of
that file system. If you pack the "/". It'd better reboot.

Chris





^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Ext2-devel] Re: Htree ate my hard drive, was: post-halloween 0.2
  2002-10-31 11:19 Htree ate my hard drive, was: post-halloween 0.2 Petr Vandrovec
  2002-10-31 21:42 ` [Ext2-devel] " chrisl
@ 2002-10-31 22:03 ` Theodore Ts'o
  1 sibling, 0 replies; 6+ messages in thread
From: Theodore Ts'o @ 2002-10-31 22:03 UTC (permalink / raw)
  To: Petr Vandrovec; +Cc: Duncan Sands, Linux Kernel, ext2-devel, adilger

On Thu, Oct 31, 2002 at 01:19:23PM +0200, Petr Vandrovec wrote:
>     
> Nobody answered it at that time, and it happened at least 5 times
> again to me - until I modified initscripts to do unconditional
> reboot if "fsck /" did ANY modifications to filesystem.
> 

In fact, e2fsck should return an exit code which indicates that the
systme should be rebooted if an fsck the root filesystem makes any
changes to the filesystem.  See the man page to fsck(8) for a
definition of fsck's exit codes, but if (exit_status & 2) is non-zero,
the init scripts **should** reboot.  

Unfortunately, not all distributions get this right.  However, your
analysis is right.  If fsck needs to make any modifications to the
root filesystem, which is mounted read-only, it is possible for the
corrupted filesystem elements to still be cached in memory, and then
written back out to disk when the filesystem is remounted read/write.  

This is one reason why I normally recommend that / be a small
filesystem of approximately 128 megs, with separate partitions for
/usr, and either using a separate partition for /var, or using a
symlink from /usr/var to /var.  (And doing something similar for /home
and and /opt, as necessary.)  It minimizes the chances that the root
filesystem will get corrupted, and makes running fsck on the root
filesystem take much less time (obviously, since the root filesystem
becomes quite small.)

						- Ted

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Ext2-devel] Re: Htree ate my hard drive, was: post-halloween 0.2
  2002-10-31  8:07   ` Andreas Dilger
@ 2002-11-04 22:42     ` Stephen C. Tweedie
  2002-11-04 22:59       ` Duncan Sands
  2002-11-04 23:22       ` Udo A. Steinberg
  0 siblings, 2 replies; 6+ messages in thread
From: Stephen C. Tweedie @ 2002-11-04 22:42 UTC (permalink / raw)
  To: Duncan Sands, Dave Jones, Linux Kernel, ext2-devel

Hi,

On Thu, Oct 31, 2002 at 01:07:17AM -0700, Andreas Dilger wrote:
> On Oct 31, 2002  07:27 +0100, Duncan Sands wrote:

> > After a bit of switching back and forth between 2.4.19 and 2.5.44,
> > fsck was run while booting 2.4.19 (the usual check because of >30
> > mounts).  There was a message about optimizing directories.  Booting
> > continued but (big surprise) X refused to run.  It turned out that some
> > device files had vanished.

> > tune2fs -O ^dir_index /dev/hdXXX
> > to remove htree support.  No problems since then.

> I wonder if there is still a bug in the e2fsck code for re-hashing
> directories?

Possibly, but I'm more worried about why the fsck did a directory
optimise on reboot, especially on the root filesystem (where /dev is
usually stored).  Doing major fs surgery on a mounted, readonly
filesystem is sort-of safe, but only if you reboot afterwards.
Continuing and remounting read-write can cause all sorts of damage as
the cached fs data no longer matches what's on disk.

Duncan, did you have fsck set up to do a forced htree rebuild on
reboot?

Cheers,
 Stephen

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Ext2-devel] Re: Htree ate my hard drive, was: post-halloween 0.2
  2002-11-04 22:42     ` [Ext2-devel] " Stephen C. Tweedie
@ 2002-11-04 22:59       ` Duncan Sands
  2002-11-04 23:22       ` Udo A. Steinberg
  1 sibling, 0 replies; 6+ messages in thread
From: Duncan Sands @ 2002-11-04 22:59 UTC (permalink / raw)
  To: Stephen C. Tweedie, Dave Jones, Linux Kernel, ext2-devel

On Monday 04 November 2002 23:42, Stephen C. Tweedie wrote:
> Hi,
>
> On Thu, Oct 31, 2002 at 01:07:17AM -0700, Andreas Dilger wrote:
> > On Oct 31, 2002  07:27 +0100, Duncan Sands wrote:
> > > After a bit of switching back and forth between 2.4.19 and 2.5.44,
> > > fsck was run while booting 2.4.19 (the usual check because of >30
> > > mounts).  There was a message about optimizing directories.  Booting
> > > continued but (big surprise) X refused to run.  It turned out that some
> > > device files had vanished.
> > >
> > > tune2fs -O ^dir_index /dev/hdXXX
> > > to remove htree support.  No problems since then.
> >
> > I wonder if there is still a bug in the e2fsck code for re-hashing
> > directories?
>
> Possibly, but I'm more worried about why the fsck did a directory
> optimise on reboot, especially on the root filesystem (where /dev is
> usually stored).  Doing major fs surgery on a mounted, readonly
> filesystem is sort-of safe, but only if you reboot afterwards.
> Continuing and remounting read-write can cause all sorts of damage as
> the cached fs data no longer matches what's on disk.
>
> Duncan, did you have fsck set up to do a forced htree rebuild on
> reboot?

Hmmm, fsck is called from the debian checkroot init script which does
fsck -a -C
So I guess the answer is "no".

Duncan.

PS: I am using version 1.30-WIP of e2fsck.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Ext2-devel] Re: Htree ate my hard drive, was: post-halloween 0.2
  2002-11-04 22:42     ` [Ext2-devel] " Stephen C. Tweedie
  2002-11-04 22:59       ` Duncan Sands
@ 2002-11-04 23:22       ` Udo A. Steinberg
  1 sibling, 0 replies; 6+ messages in thread
From: Udo A. Steinberg @ 2002-11-04 23:22 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: linux-kernel, ext2-devel

[-- Attachment #1: Type: text/plain, Size: 1136 bytes --]

On Mon, 4 Nov 2002 22:42:13 +0000 Stephen C. Tweedie (SCT) wrote:

SCT> On Thu, Oct 31, 2002 at 01:07:17AM -0700, Andreas Dilger wrote:
SCT> > I wonder if there is still a bug in the e2fsck code for re-hashing
SCT> > directories?
SCT> 
SCT> Possibly, but I'm more worried about why the fsck did a directory
SCT> optimise on reboot, especially on the root filesystem (where /dev is
SCT> usually stored).  Doing major fs surgery on a mounted, readonly
SCT> filesystem is sort-of safe, but only if you reboot afterwards.
SCT> Continuing and remounting read-write can cause all sorts of damage as
SCT> the cached fs data no longer matches what's on disk.

Just a "me too". I've used htree with 2.5.44 and 2.4.20rc1. The next
fs check on the root filesystem founds corruption in /dev. After repairing
the damage and recreating the lost devices the machine ran ok for 2 days.
Then I had some ext3-fs errors and the partition got remounted read-only.
The following fsck revealed two inodes sharing the same block. I don't
have any logs of that incident anymore though :/

I'm running Slackware 9.0-beta and e2fsprogs-1.30-WIP.

Regards,
-Udo.

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2002-11-04 23:16 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-10-31 11:19 Htree ate my hard drive, was: post-halloween 0.2 Petr Vandrovec
2002-10-31 21:42 ` [Ext2-devel] " chrisl
2002-10-31 22:03 ` Theodore Ts'o
  -- strict thread matches above, loose matches on Subject: below --
2002-10-30 17:11 Dave Jones
2002-10-31  6:27 ` Htree ate my hard drive, was: " Duncan Sands
2002-10-31  8:07   ` Andreas Dilger
2002-11-04 22:42     ` [Ext2-devel] " Stephen C. Tweedie
2002-11-04 22:59       ` Duncan Sands
2002-11-04 23:22       ` Udo A. Steinberg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).