From: Eric Sandeen <sandeen@redhat.com>
To: Andreas Dilger <andreas.dilger@oracle.com>
Cc: Ric Wheeler <rwheeler@redhat.com>,
Bernd Schubert <bschubert@ddn.com>, "Ted Ts'o" <tytso@mit.edu>,
Amir Goldstein <amir73il@gmail.com>,
Bernd Schubert <bs_lists@aakef.fastmail.fm>,
Ext4 Developers List <linux-ext4@vger.kernel.org>
Subject: Re: ext4_clear_journal_err: Filesystem error recorded from previous mount: IO failure
Date: Mon, 25 Oct 2010 14:43:01 -0500 [thread overview]
Message-ID: <4CC5DDC5.7080003@redhat.com> (raw)
In-Reply-To: <2D4557FB-DE12-43C3-A277-EE4DD82F0BFF@oracle.com>
Andreas Dilger wrote:
> On 2010-10-25, at 00:43, Ric Wheeler wrote:
>> On 10/24/2010 12:16 PM, Bernd Schubert wrote:
>>> ... sometimes the error state is only set *after* mounting the
>>> filesystem, so difficult to script it. And as I also wrote,
>>> running e2fsck from that script and to do a complete fs check is
>>> not appropriate, as that might simply time out. Again not Lustre
>>> specific. So after some discussion, the proposed solution is to
>>> add a "journal recovery only" option to e2fsck and to do that
>>> before the mount. I will add that to the 'lustre_server' agent
>>> (which is part of Lustre now), but leave it to someone else to
>>> that for the 'Filesystem' agent script (I'm not using that script
>>> myself and IMHO it is already too complex, as it tries to support
>>> all filesystems - shell code is ideal anymore then).
>> Why not simply have your script attempt to mount the file system?
>> If it succeeds, it will replay the journal. If it fails, you will
>> need to fall back to the long fsck which is unavoidable.
>
> I don't really agree with this. The whole reason for having the
> error flag in the superblock and ALWAYS running e2fsck at mount time
> to replay the journal is that e2fsck should be done before mounting
> the filesystem.
Wait, why? Why did we run with a journal if an IO error causes us
to require a fack prior to next mount?
> I really dislike the reiserfs/XFS model where a filesystem is mounted
> and fsck is not run in advance, and then if there is a serious error
> in the filesystem this needs to be detected by the kernel, the
> filesystem unmounted, e2fsck started, and the filesystem remounted...
> That's just backward.
I must be missing something. We run with a proper, carefully designed
journal on properly configured storage so that the journal + filesystem
is always consistent.
fsck is needed when that carefully configured storage munges something
on disk, or when there's a bug in the code that corrupted the filesystem,
but certainly not just because you happened to unmount a while back and
now wish to remount...
Now, extN has this feature of recording fs errors in the superblock,
but I'm not sure we distinguish between "errors which require a fsck"
and others?
Anyway your characterization of xfs is wrong, IMHO, it's:
Mount (possibly replaying the journal) because all should be well,
we have faith in our hardware and our software.
If during runtime the fs encounters a severe metadata error, it will
shut down, and this is your cue to unmount and run xfs_repair, then
remount. Doesn't seem backwards to me. ;) Requiring that fsck
prior to the first mount makes no sense for a journaling fs.
However, Bernd's issue is probably an issue in general with XFS
as well (which doesn't record error state on-disk) - how to quickly
know whether the filesystem you're about to mount in a cluster has
a -known- integrity issue from a previous mount and really does
require a fsck.
For XFS, you have to have monitored the previous mount, I guess,
and watched for any errors the kernel threw when it encountered them.
For extN we record it in the SB, but that record may only be
in the as-yet-unplayed journal, where the tools can't see it until
it's replayed by a mount or by a full fsck.
-Eric
>> We spend a lot of time and testing to make sure that ext* can be
>> shot at any point and come back after a storage outage and still
>> mount.
>
> Sure, it can still mount, but the only thing it might be able to do
> is detect the error and remount the filesystem read-only or panic...
> That's why e2fsck should ALWAYS be run BEFORE the filesystem is
> mounted.
>
> Bernd's issue (the part that I agree with) is that the error may only
> be recorded in the journal, not in the ext3 superblock, and there is
> no easy way to detect this from userspace. Allowing e2fsck to only
> replay the journal is useful this problem. Another similar issue is
> that if tune2fs is run on an unmounted filesystem that hasn't had a
> journal replay, then it may modify the superblock, but journal replay
> will clobber this. There are other similar issues.
>
> Cheers, Andreas -- Andreas Dilger Lustre Technical Lead Oracle
> Corporation Canada Inc.
>
> -- To unsubscribe from this list: send the line "unsubscribe
> linux-ext4" in the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2010-10-25 19:43 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-10-22 13:33 ext4_clear_journal_err: Filesystem error recorded from previous mount: IO failure Bernd Schubert
2010-10-22 17:25 ` Ted Ts'o
2010-10-22 17:42 ` Bernd Schubert
2010-10-22 18:32 ` Ted Ts'o
2010-10-22 18:54 ` Bernd Schubert
2010-10-23 16:00 ` Amir Goldstein
2010-10-23 17:46 ` Bernd Schubert
2010-10-23 22:26 ` Ted Ts'o
2010-10-23 23:56 ` Bernd Schubert
2010-10-24 0:20 ` Bernd Schubert
2010-10-24 1:08 ` Ted Ts'o
2010-10-24 14:42 ` Bernd Schubert
2010-10-23 22:17 ` Ted Ts'o
2010-10-24 8:50 ` Amir Goldstein
2010-10-24 13:55 ` Ric Wheeler
2010-10-24 14:30 ` Bernd Schubert
2010-10-24 15:20 ` Ric Wheeler
2010-10-24 15:39 ` Bernd Schubert
2010-10-24 15:49 ` Ric Wheeler
2010-10-24 16:16 ` Bernd Schubert
2010-10-24 16:43 ` Ric Wheeler
2010-10-25 10:14 ` Andreas Dilger
2010-10-25 11:45 ` Ric Wheeler
2010-10-25 12:54 ` Ric Wheeler
2010-10-25 14:57 ` Andreas Dilger
2010-10-25 19:49 ` Ric Wheeler
2010-10-25 20:08 ` Bernd Schubert
2010-10-25 20:10 ` Ric Wheeler
2010-10-25 19:43 ` Eric Sandeen [this message]
2010-10-25 20:37 ` Bernd Schubert
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4CC5DDC5.7080003@redhat.com \
--to=sandeen@redhat.com \
--cc=amir73il@gmail.com \
--cc=andreas.dilger@oracle.com \
--cc=bs_lists@aakef.fastmail.fm \
--cc=bschubert@ddn.com \
--cc=linux-ext4@vger.kernel.org \
--cc=rwheeler@redhat.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.