* Kernel 2.6.6: Removing the last large file does not reset filesystem properties
@ 2004-05-11 0:20 John McGowan
2004-05-11 7:49 ` Andrew Morton
0 siblings, 1 reply; 8+ messages in thread
From: John McGowan @ 2004-05-11 0:20 UTC (permalink / raw)
To: linux-kernel
Bug: Removing the last large file does not reset filesystem properties
SYSTEM: Pentium III, Fedora Core1 (gcc 3.3.2), Kernel 2.6.6
Symptom: [you can skip this]
----------------------------
1: For the first week, at least, after installing a new kernel, I set the
system to force fsck on boot (Fedora Core1, rc.local script to "touch"
"/forcefsck").
2: Home system, single user. Turn off at night. Turn off when I go to eat
lunch. Etc. (reboot at least once each day - silly, but it is a single
user system and I don't want to waste electricity and it gives better
protection against storms and line spikes).
3: Was using Gimp 2.0 and used a tool. Got a 6 Gig swap file in /tmp/gimp2
(there must be a problem with that tool). Closed gimp, got rid of the
swap file. Upon the next boot I got:
FAILED!!
Dropping to root command line for system maintenance
(such fun ... entering the root password got more error messages about
missing programmes such as "id" and "test" - well, I have "/usr" on
another partition and it was not mounted).
4: Typed reboot, everything was fine.
---------------------
CREATING/DUPLICATING THE PROBLEM:
---------------------------------
So ... I created a 3Gig file, deleted it and booted to my "recovery"
partition (I haven't removed RH72 yet, so I can boot to that to work
on the Fedora system partitions without running e2fsck on a mounted,
active, partition).
I ran:
PREFORCE: dumpe2fs on the Fedora Core1 partition
e2fs -n -f -v on the Fedora Core1 partition
NOFORCE: e2fs -v on the Fedora Core1 partition
FORCE: e2fs -f -v on the Fedora Core1 partition
POSTFORCE: dumpe2fs on the Fedora Core1 partition
e2fs -n -f -v on the Fedora Core1 partition
I was happy to see that there is no difference between the "e2fs -n -f -v"
run before and after the force of FSCK.
Running "e2fs -v" (no force, but not a read-only mount) did nothing
("/1: clean, 35963/1154176 files, 197142/2303910 blocks"
- by the way, since I still have RH72 on the system, the label of its
root partition being "/", the root partition for Fedora is "/1")
since it was unmounted cleanly.
I was happy to see only one difference between the "dumpe2fs"
run before and after the force of FSCK (besides the time and mount count)
AFTER DELETING THE LARGE FILE AND BEFORE FORCING FSCK:
(dumpe2fs)
Filesystem features: has_journal filetype sparse_super large_file
(e2fs -n -f -v)
0 large files
AFTER DELETING THE LARGE FILE AND FORCING FSCK:
(dumpe2fs)
Filesystem features: has_journal filetype sparse_super
(e2fs -n -f -v)
0 large files
It looks like the only problem is that removing the last large file left the
filesystem thinking it had a large file. The error message given by Fedora
Core1 (if someone forces fsck and seldom has large files, for example, only
created by working on multimedia and they are later deleted) can be somewhat
unnerving.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Kernel 2.6.6: Removing the last large file does not reset filesystem properties
2004-05-11 0:20 Kernel 2.6.6: Removing the last large file does not reset filesystem properties John McGowan
@ 2004-05-11 7:49 ` Andrew Morton
2004-05-11 9:53 ` Oliver Feiler
2004-05-11 13:00 ` John McGowan
0 siblings, 2 replies; 8+ messages in thread
From: Andrew Morton @ 2004-05-11 7:49 UTC (permalink / raw)
To: John McGowan; +Cc: linux-kernel, ext2-devel
John McGowan <jmcgowan@inch.com> wrote:
>
> 1: For the first week, at least, after installing a new kernel, I set the
> system to force fsck on boot (Fedora Core1, rc.local script to "touch"
> "/forcefsck").
>
> 2: Home system, single user. Turn off at night. Turn off when I go to eat
> lunch. Etc. (reboot at least once each day - silly, but it is a single
> user system and I don't want to waste electricity and it gives better
> protection against storms and line spikes).
>
> 3: Was using Gimp 2.0 and used a tool. Got a 6 Gig swap file in /tmp/gimp2
> (there must be a problem with that tool). Closed gimp, got rid of the
> swap file. Upon the next boot I got:
> FAILED!!
> Dropping to root command line for system maintenance
> (such fun ... entering the root password got more error messages about
> missing programmes such as "id" and "test" - well, I have "/usr" on
> another partition and it was not mounted).
I think this is really an e2fsck/initscript problem.
fsck saw that there were no large files on the fs, then fixed up the
superblock to say that then returned an exit code which says "I modified
the fs".
The initscripts see that exit code and have a heart attack.
What should happen is that fsck returns an exit code which says "I modified
the fs, but everythig is OK". And the initscripts should say "oh, cool"
and keep booting.
I don't know whether the problem lies with fsck or initscripts.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Kernel 2.6.6: Removing the last large file does not reset filesystem properties
2004-05-11 7:49 ` Andrew Morton
@ 2004-05-11 9:53 ` Oliver Feiler
2004-05-11 13:00 ` John McGowan
1 sibling, 0 replies; 8+ messages in thread
From: Oliver Feiler @ 2004-05-11 9:53 UTC (permalink / raw)
To: Andrew Morton, John McGowan; +Cc: linux-kernel, ext2-devel
[-- Attachment #1: signed data --]
[-- Type: text/plain, Size: 2008 bytes --]
On Tuesday 11 May 2004 09:49, Andrew Morton wrote:
> > 3: Was using Gimp 2.0 and used a tool. Got a 6 Gig swap file in
> > /tmp/gimp2 (there must be a problem with that tool). Closed gimp, got rid
> > of the swap file. Upon the next boot I got:
> > FAILED!!
> > Dropping to root command line for system maintenance
> > (such fun ... entering the root password got more error messages
> > about missing programmes such as "id" and "test" - well, I have "/usr" on
> > another partition and it was not mounted).
>
> I think this is really an e2fsck/initscript problem.
>
> fsck saw that there were no large files on the fs, then fixed up the
> superblock to say that then returned an exit code which says "I modified
> the fs".
>
> The initscripts see that exit code and have a heart attack.
>
> What should happen is that fsck returns an exit code which says "I modified
> the fs, but everythig is OK". And the initscripts should say "oh, cool"
> and keep booting.
>
> I don't know whether the problem lies with fsck or initscripts.
Yes, it's an issue with the initscripts (I'd say). I stumbled over this
problem as well when upgrading e2fsprogs on a fairly old Slackware install.
From the manpage of fsck:
The exit code returned by fsck is the sum of the following
conditions:
0 - No errors
1 - File system errors corrected
2 - System should be rebooted
[...]
The old Slackware init scripts (from 7.0 days I think) checked
if [ $EXITCODE -gt 1 ] ; then
panic!
Newer fscks however also seem to return exit code 2 for "some errors
corrected, please reboot". In Slack 9's initscripts this was changed to auto
reboot in this case. I think this behaviour was changed in some version of
fsck, but I'm note sure.
But admittedly I also got a slight heart attack when our server stopped
booting with an error from fsck. ;)
Oliver
--
Oliver Feiler - http://kiza.kcore.de/
[-- Attachment #2: signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Kernel 2.6.6: Removing the last large file does not reset filesystem properties
2004-05-11 7:49 ` Andrew Morton
2004-05-11 9:53 ` Oliver Feiler
@ 2004-05-11 13:00 ` John McGowan
2004-05-11 15:32 ` Valdis.Kletnieks
2004-05-11 23:27 ` Bill Davidsen
1 sibling, 2 replies; 8+ messages in thread
From: John McGowan @ 2004-05-11 13:00 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, ext2-devel
On Tue, 11 May 2004, Andrew Morton wrote:
> John McGowan <jmcgowan@inch.com> wrote:
>
> I think this is really an e2fsck/initscript problem.
>
> fsck saw that there were no large files on the fs, then fixed up the
> superblock to say that then returned an exit code which says "I modified
> the fs".
>
> The initscripts see that exit code and have a heart attack.
Yes. But why did it have to modify the file system/superblock/properties?
Should the file system have had to be modified (relying upon
fsck to fix the "largefile" property when next it is run)?
> What should happen is that fsck returns an exit code which says "I modified
> the fs, but everythig is OK". And the initscripts should say "oh, cool"
> and keep booting.
Actually, they do, if it isn't the root partition (if I create/delay the
large file from another partition it gives a message and continues - but
for the root partition, the initscript, with an exit code greater than 1
drops one to a root prompt for "maintenance" - and with my /usr on a
different partition and seeing a bunch of "id not found"
"test not found" messages ... for a few minutes I was a bit flustered.
It is easy enough to modify the init script to do a reboot on exit
code 2).
(Fedora Core1 initscript on mounting the root partition:
# A return of 2 or higher means there were serious problems.
echo $"*** An error occurred during the file system check."
echo $"*** Dropping you to a shell; the system will reboot"
echo $"*** when you leave the shell."
str=$"(Repair filesystem)"
PS1="$str \# # "; export PS1
sulogin
(the sulogin login message is:
"Give root password for maintenance")
> I don't know whether the problem lies with fsck or initscripts.
fsck does fix it. Or should the removal of the last large file have
resulted in the change without the mismatch between the "largefile"
property being set with no large files?
It's a small annoyance (no damage to the file system itself), no more.
I know what's happening and how to patch the initscript to get an
automatic reboot on exit code 2. Is that the proper way to handle it?
Regards from:
John McGowan | jmcgowan@inch.com [Internet Channel]
| jmcgowan@coin.org [COIN]
--------------+-----------------------------------------------------
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Kernel 2.6.6: Removing the last large file does not reset filesystem properties
2004-05-11 13:00 ` John McGowan
@ 2004-05-11 15:32 ` Valdis.Kletnieks
2004-05-12 3:09 ` [Ext2-devel] " Theodore Ts'o
2004-05-11 23:27 ` Bill Davidsen
1 sibling, 1 reply; 8+ messages in thread
From: Valdis.Kletnieks @ 2004-05-11 15:32 UTC (permalink / raw)
To: John McGowan; +Cc: Andrew Morton, linux-kernel, ext2-devel
[-- Attachment #1: Type: text/plain, Size: 942 bytes --]
On Tue, 11 May 2004 09:00:33 EDT, John McGowan said:
> fsck does fix it. Or should the removal of the last large file have
> resulted in the change without the mismatch between the "largefile"
> property being set with no large files?
Then fsck should exit RC=1. At least the Fedora Core 2 initscripts think
that's OK and specifically check for that case (a few lines later it remounts /
r/w, which *should* refresh all the important in-core blocks that might have
gotten changed out from under it - I think. If that's not true, somebody squawk
so we can fix that assumption in the initscrips. ;)
> I know what's happening and how to patch the initscript to get an
> automatic reboot on exit code 2. Is that the proper way to handle it?
NO.
Consider - if you *do* scrog your filesystem, you'll get hung in a loop
of fsck/reboot/fsck/reboot/fsck/reboot. You really *do* want the system
to yell for help from a human at that point....
[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Kernel 2.6.6: Removing the last large file does not reset filesystem properties
2004-05-11 13:00 ` John McGowan
2004-05-11 15:32 ` Valdis.Kletnieks
@ 2004-05-11 23:27 ` Bill Davidsen
1 sibling, 0 replies; 8+ messages in thread
From: Bill Davidsen @ 2004-05-11 23:27 UTC (permalink / raw)
To: linux-kernel
John McGowan wrote:
> On Tue, 11 May 2004, Andrew Morton wrote:
>
>
>>John McGowan <jmcgowan@inch.com> wrote:
>>
>>I think this is really an e2fsck/initscript problem.
>>
>>fsck saw that there were no large files on the fs, then fixed up the
>>superblock to say that then returned an exit code which says "I modified
>>the fs".
>>
>>The initscripts see that exit code and have a heart attack.
>
>
> Yes. But why did it have to modify the file system/superblock/properties?
> Should the file system have had to be modified (relying upon
> fsck to fix the "largefile" property when next it is run)?
>
>
>>What should happen is that fsck returns an exit code which says "I modified
>>the fs, but everythig is OK". And the initscripts should say "oh, cool"
>>and keep booting.
>
>
> Actually, they do, if it isn't the root partition (if I create/delay the
> large file from another partition it gives a message and continues - but
> for the root partition, the initscript, with an exit code greater than 1
> drops one to a root prompt for "maintenance" - and with my /usr on a
> different partition and seeing a bunch of "id not found"
> "test not found" messages ... for a few minutes I was a bit flustered.
> It is easy enough to modify the init script to do a reboot on exit
> code 2).
>
> (Fedora Core1 initscript on mounting the root partition:
>
> # A return of 2 or higher means there were serious problems.
> echo $"*** An error occurred during the file system check."
> echo $"*** Dropping you to a shell; the system will reboot"
> echo $"*** when you leave the shell."
> str=$"(Repair filesystem)"
> PS1="$str \# # "; export PS1
> sulogin
>
> (the sulogin login message is:
> "Give root password for maintenance")
>
>
>>I don't know whether the problem lies with fsck or initscripts.
I would say the problem is in the interface. There should be one more
state in the exit codes, and initscripts should handle that:
0 - okay
1 - fixes but okay to continue
2 - fixes and reboot to update in-core info
3 - help! Uncorrected errors.
>
>
> fsck does fix it. Or should the removal of the last large file have
> resulted in the change without the mismatch between the "largefile"
> property being set with no large files?
If the system was shutdown cleanly, then there should not have been this
problem in the first place. Of course if you are testing recovery by
doing sync and hitting the switch, well actually that should still work,
the metadata should be correct, right?
>
> It's a small annoyance (no damage to the file system itself), no more.
>
> I know what's happening and how to patch the initscript to get an
> automatic reboot on exit code 2. Is that the proper way to handle it?
Absolutely not! It will leave you in an endless loop of rebooting!
--
-bill davidsen (davidsen@tmr.com)
"The secret to procrastination is to put things off until the
last possible moment - but no longer" -me
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Ext2-devel] Re: Kernel 2.6.6: Removing the last large file does not reset filesystem properties
2004-05-11 15:32 ` Valdis.Kletnieks
@ 2004-05-12 3:09 ` Theodore Ts'o
2004-05-12 4:25 ` Valdis.Kletnieks
0 siblings, 1 reply; 8+ messages in thread
From: Theodore Ts'o @ 2004-05-12 3:09 UTC (permalink / raw)
To: Valdis.Kletnieks; +Cc: John McGowan, Andrew Morton, linux-kernel, ext2-devel
On Tue, May 11, 2004 at 11:32:26AM -0400, Valdis.Kletnieks@vt.edu wrote:
> > I know what's happening and how to patch the initscript to get an
> > automatic reboot on exit code 2. Is that the proper way to handle it?
>
> NO.
>
YES.
Well, actially, the initscripts should reboot if the ((status & 2) != 0).
Or more simply, if the exit status is 2 or 3.
> Consider - if you *do* scrog your filesystem, you'll get hung in a loop
> of fsck/reboot/fsck/reboot/fsck/reboot. You really *do* want the system
> to yell for help from a human at that point....
This is not a problem. E2fsck only returns an exit status of 2 or 3
when it was able to cleanup the filesystem on its own. The "reboot
requested" bit just means that the root filesystem was modified, and
there was no guarantees that the kernel might have cached information
which might get flushed to disk when the filesystem is remounted
read-write. Hence, the need for a reboot.
Yes, this might seem a little Windows-esque, but for people who are
careful to keep their root partitions small (say, around 128 megs or
so), and use separate partitions for /usr and /var, this isn't a
problem. For people who do use a single massive root filesystem, then
the need to reboot after e2fsck does a "preen" operation becomes more
common (although with ext3, this will rarely come up).
The need to reboot could be removed if the remount of the read-only
filesystem to be read-write also caused the kernel to flush all state
and force everything to be reread from disk. But this means flushing
all dentries, including possiblies dentries in use, and this was
ultimately decided to be Too Hard, and the idea was shot down by the
Grand Penguin himself.
As far as your concern about infinite loops of
fsck/reboot/fsck/reboot, it isn't an issue. If the filesystem is
actually massively scrod, to the point where human assistance is
necessary, then e2fsck will return an exit code of 4:
4 - File system errors left uncorrected
Which should cause the initscripts to call for help. An exit code of
2 or 3 merely means that the fsck is doing just fine, thank you very
much, and just needs a reboot in order to flush the disk caches.
- Ted
P.S. The real right answer is that the fsck of the root partition
should take place *before* the root partition is mounted, in the
initial ramdisk. This gets rid of the whole need to flush the system
caches, since the (real) root filesystem isn't mounted at all during
the time of the initial check.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Ext2-devel] Re: Kernel 2.6.6: Removing the last large file does not reset filesystem properties
2004-05-12 3:09 ` [Ext2-devel] " Theodore Ts'o
@ 2004-05-12 4:25 ` Valdis.Kletnieks
0 siblings, 0 replies; 8+ messages in thread
From: Valdis.Kletnieks @ 2004-05-12 4:25 UTC (permalink / raw)
To: Theodore Ts'o; +Cc: John McGowan, Andrew Morton, linux-kernel, ext2-devel
[-- Attachment #1: Type: text/plain, Size: 2556 bytes --]
On Tue, 11 May 2004 23:09:15 EDT, "Theodore Ts'o" said:
> Well, actially, the initscripts should reboot if the ((status & 2) != 0).
> Or more simply, if the exit status is 2 or 3.
...
> 4 - File system errors left uncorrected
>
> Which should cause the initscripts to call for help. An exit code of
> 2 or 3 merely means that the fsck is doing just fine, thank you very
> much, and just needs a reboot in order to flush the disk caches.
Man, a quarter of a century in this business, and I *still* haven't learned
that you can't trust your vendor to have a clue. That will teach me to go back
and check the upstream copy of the manpages, rather than trusting to memory and
a look at the rc.sysinit script. ;) It's been too long a week already...
It certainly looks like somebody needs to file a bug report against RedHat/
Fedora's initscripts, as it seems even more confused about the proper behavior
than I am ;) rc.sysinit interprets rc=0 as OK, rc=1 as "passed, keep on going"
(ignoring the remount issue, it just remounts r/w and goes on), and anything
greater than 1 as "yell for help". Oh, and to add to the pain - I just noticed
that it fsck's *all* the file systems, drops you into a shell if *any* of them
return a $? higher than 1, and when you exit that shell, it does:
umount -a
mount -n -o remount,ro /
reboot -f
(Yes, if a /userdata filesystem gets an rc2 and hasn't been mounted yet, we reboot anyhow.)
I've convinced myself that a crack pipe was involved in this code....
> P.S. The real right answer is that the fsck of the root partition
> should take place *before* the root partition is mounted, in the
> initial ramdisk. This gets rid of the whole need to flush the system
> caches, since the (real) root filesystem isn't mounted at all during
> the time of the initial check.
Right. Fortunately, RedHat appears to have put down the crack pipe long enough
to ship statically linked fsck.ext[23] and /sbin/lvm, so there's hope of having
enough of the pieces to do that....
Unfortunately, they then pick the pipe up again with mkinitrd - that uses the
'nash' shell wannabe, which does know how to invoke an external command, but
lacks an 'if' statement to test the return code (and yes, this time I checked
the actual nash.c from the src.rpm) - the various builtins and external commands
will set a return code, but you can't reference it except for the last
command's value could conceivably be checked back in init/do_mounts_initrd.c -
but that seems to discard the exit value of linuxrc....
Argh... :)
[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2004-05-12 4:25 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-05-11 0:20 Kernel 2.6.6: Removing the last large file does not reset filesystem properties John McGowan
2004-05-11 7:49 ` Andrew Morton
2004-05-11 9:53 ` Oliver Feiler
2004-05-11 13:00 ` John McGowan
2004-05-11 15:32 ` Valdis.Kletnieks
2004-05-12 3:09 ` [Ext2-devel] " Theodore Ts'o
2004-05-12 4:25 ` Valdis.Kletnieks
2004-05-11 23:27 ` Bill Davidsen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox