linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ric Wheeler <rwheeler@redhat.com>
To: Andreas Dilger <andreas.dilger@oracle.com>
Cc: Bernd Schubert <bschubert@ddn.com>, "Ted Ts'o" <tytso@mit.edu>,
	Amir Goldstein <amir73il@gmail.com>,
	Bernd Schubert <bs_lists@aakef.fastmail.fm>,
	Ext4 Developers List <linux-ext4@vger.kernel.org>
Subject: Re: ext4_clear_journal_err: Filesystem error recorded from previous mount: IO failure
Date: Mon, 25 Oct 2010 15:49:21 -0400	[thread overview]
Message-ID: <4CC5DF41.1080402@redhat.com> (raw)
In-Reply-To: <5ED9AA37-357B-49E9-95E1-3E5A42B6245E@oracle.com>

  On 10/25/2010 10:57 AM, Andreas Dilger wrote:
> On 2010-10-25, at 20:54, Ric Wheeler wrote:
>> On 10/25/2010 07:45 AM, Ric Wheeler wrote:
>>> On 10/25/2010 06:14 AM, Andreas Dilger wrote:
>>>> I don't really agree with this.  The whole reason for having the error flag in the superblock and ALWAYS running e2fsck at mount time to replay the journal is that e2fsck should be done before mounting the filesystem.
>>>>
>>>> I really dislike the reiserfs/XFS model where a filesystem is mounted and fsck is not run in advance, and then if there is a serious error in the filesystem this needs to be detected by the kernel, the filesystem unmounted, e2fsck started, and the filesystem remounted...  That's just backward.
>>>>
>>>> Bernd's issue (the part that I agree with) is that the error may only be recorded in the journal, not in the ext3 superblock, and there is no easy way to detect this from userspace.  Allowing e2fsck to only replay the journal is useful this problem.  Another similar issue is that if tune2fs is run on an unmounted filesystem that hasn't had a journal replay, then it may modify the superblock, but journal replay will clobber this.  There are other similar issues.
>> One more thought here is that effectively the xfs model of mount before fsck is basically just doing the journal replay - if you need to repair the file system, it will fail to mount. If not, you are done.
> This won't happen with ext3 today - if you mount the filesystem, it will succeed regardless of whether the filesystem is in error.  I did like Bernd's suggestion that the "errors=" mount option should be used to detect if a filesystem with errors tries to mount in a read-write state, but I think that is only a safety measure.
>
>> For HA fail over, what Bernd is proposing is effectively equivalent:
>>
>> (1) Replay the journal without doing a full fsck which is the same as the mount for XFS
> Does XFS fail the mount if there was an error from a previous mount on it?
>

It does not have an "in error" state bit, but does have sanity checks at mount time.
>> (2) See if the journal replay failed (i.e., set the error flag) which is the same as seeing if the mount succeeded
> I assume you mean for XFS here, since ext3/4 will happily mount the filesystem today without returning an error.
>

On IRC with Eric, xfs will also mount happily after many types of errors.


>> (3) If error, you need to do a full, time consuming fsck for either
>>
>> (4) If no error in (2), you need to mount the file system for ext4 (xfs is already done at this stage)
>>
>> Aside from putting the journal replay into a magic fsck flag, I really do not see that you are saving any complexity.  In fact, for this case, you add step (4).
> In comparison, the normal ext2/3/4 model is:
>
> 1) Run e2fsck against the filesystem before accessing it (without the -f flag that forces a full check).  e2fsck will replay the journal, and if there is no error recorded it will only check the superblock validity before exiting.  If there is an error, it will run a full e2fsck.

One thing that prevents this from being useful in a cluster fail-over context is 
that it is really hard to script responses for the full fsck for ext*.  Feeding 
it a "-y" should work, but it is still a bit scary in practice.

> 2) mount the filesystem
>
> This is the simplest model, and IMHO the most correct one.  Using "mount" as a proxy for "is my filesystem broken" seems unusual to me, and unsafe for most filesystems.
>
> For Bernd, I guess he needs split step #1 into:
>
> 1a) replay the journal so the superblock is up-to-date
> 1b) check if the filesystem has an error and report it to the HA agent, so that it doesn't have a fit because the mount is taking so long
> 1c) run the actual e2fsck (which may take a few hours on a 16TB filesystem)
>

I suppose that makes some sense, but it would seem that you could do (1a) and 
(1b) today with the mount & unmount (and then check for file system errors)?

Ric


  reply	other threads:[~2010-10-25 19:49 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-10-22 13:33 ext4_clear_journal_err: Filesystem error recorded from previous mount: IO failure Bernd Schubert
2010-10-22 17:25 ` Ted Ts'o
2010-10-22 17:42   ` Bernd Schubert
2010-10-22 18:32     ` Ted Ts'o
2010-10-22 18:54       ` Bernd Schubert
2010-10-23 16:00   ` Amir Goldstein
2010-10-23 17:46     ` Bernd Schubert
2010-10-23 22:26       ` Ted Ts'o
2010-10-23 23:56         ` Bernd Schubert
2010-10-24  0:20           ` Bernd Schubert
2010-10-24  1:08             ` Ted Ts'o
2010-10-24 14:42               ` Bernd Schubert
2010-10-23 22:17     ` Ted Ts'o
2010-10-24  8:50       ` Amir Goldstein
2010-10-24 13:55       ` Ric Wheeler
2010-10-24 14:30         ` Bernd Schubert
2010-10-24 15:20           ` Ric Wheeler
2010-10-24 15:39             ` Bernd Schubert
2010-10-24 15:49               ` Ric Wheeler
2010-10-24 16:16                 ` Bernd Schubert
2010-10-24 16:43                   ` Ric Wheeler
2010-10-25 10:14                     ` Andreas Dilger
2010-10-25 11:45                       ` Ric Wheeler
2010-10-25 12:54                         ` Ric Wheeler
2010-10-25 14:57                           ` Andreas Dilger
2010-10-25 19:49                             ` Ric Wheeler [this message]
2010-10-25 20:08                               ` Bernd Schubert
2010-10-25 20:10                                 ` Ric Wheeler
2010-10-25 19:43                       ` Eric Sandeen
2010-10-25 20:37                         ` Bernd Schubert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4CC5DF41.1080402@redhat.com \
    --to=rwheeler@redhat.com \
    --cc=amir73il@gmail.com \
    --cc=andreas.dilger@oracle.com \
    --cc=bs_lists@aakef.fastmail.fm \
    --cc=bschubert@ddn.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).