linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ted Ts'o <tytso@mit.edu>
To: Bernd Schubert <bs_lists@aakef.fastmail.fm>
Cc: Amir Goldstein <amir73il@gmail.com>,
	linux-ext4@vger.kernel.org, Bernd Schubert <bschubert@ddn.com>
Subject: Re: ext4_clear_journal_err: Filesystem error recorded from previous mount: IO failure
Date: Sat, 23 Oct 2010 18:26:05 -0400	[thread overview]
Message-ID: <20101023222605.GC24650@thunk.org> (raw)
In-Reply-To: <201010231946.56794.bs_lists@aakef.fastmail.fm>

On Sat, Oct 23, 2010 at 07:46:56PM +0200, Bernd Schubert wrote:
> I'm really looking for something to abort the mount if an error comes up. 
> However, I just have an idea to do that without an additional mount flag:
> 
> Let e2fsck play back the journal only. That way e2fsck could set the
> error flag, if it detects a problem in the journal and our pacemaker
> script would refuse to mount. That option also would be quite useful
> for our other scripts, as we usually first run a read-only fsck,
> check the log files (presently by size, as e2fsck always returns an
> error code even for journal recoveries...)  and only if we don't see
> serious corruption we run e2fsck. Otherwise we sometimes create
> device or e2image backups.  Would a patch introducing "-J recover
> journal only" accepted?

So I'm confused, and partially it's because I don't know the
capabilities of pacemaker.

If you have a pacemaker script, why aren't you willing to just run
e2fsck on the journal and be done with it?  Earlier you talked about
"man months of effort" to rewrite pacemaker.  Huh?  If the file system
is fine, it will recover the journal, and then see that the file
system is clean, and then exit.

As far as the exit codes, it sounds like you haven't read the man
page.  The exit codes are documented in both the fsck and e2fsck man
page, and are standardized across all file systems:

            0    - No errors
            1    - File system errors corrected
            2    - System should be rebooted
            4    - File system errors left uncorrected
            8    - Operational error
            16   - Usage or syntax error
            32   - Fsck canceled by user request
            128  - Shared library error

(These status codes are boolean OR'ed together.)

An exit code has the '1' bit set, that means that the file system had
some errors, but they have since been fixed.  And exit code where the
'2' bit is will only occur in the case of a mounted read-only file
system, and instructs the init script to reboot before continuing,
because while the file system may have had errors fixed, there may be
invalid information cached in memory due to the root file system being
mounted, so the only safe way to make sure that invalid information
won't be written back to disk is to reboot.  If you are not checking
the root filesystem, you will never see the '2' bit being set.

So if you are looking at the size of the fsck log files, I'm guessing
it's because no one has bothered to read and understand how the exit
codes for fsck works.

And I really don't understand why you need or want to do a read-only
fsck first....

						- Ted

  reply	other threads:[~2010-10-23 22:26 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-10-22 13:33 ext4_clear_journal_err: Filesystem error recorded from previous mount: IO failure Bernd Schubert
2010-10-22 17:25 ` Ted Ts'o
2010-10-22 17:42   ` Bernd Schubert
2010-10-22 18:32     ` Ted Ts'o
2010-10-22 18:54       ` Bernd Schubert
2010-10-23 16:00   ` Amir Goldstein
2010-10-23 17:46     ` Bernd Schubert
2010-10-23 22:26       ` Ted Ts'o [this message]
2010-10-23 23:56         ` Bernd Schubert
2010-10-24  0:20           ` Bernd Schubert
2010-10-24  1:08             ` Ted Ts'o
2010-10-24 14:42               ` Bernd Schubert
2010-10-23 22:17     ` Ted Ts'o
2010-10-24  8:50       ` Amir Goldstein
2010-10-24 13:55       ` Ric Wheeler
2010-10-24 14:30         ` Bernd Schubert
2010-10-24 15:20           ` Ric Wheeler
2010-10-24 15:39             ` Bernd Schubert
2010-10-24 15:49               ` Ric Wheeler
2010-10-24 16:16                 ` Bernd Schubert
2010-10-24 16:43                   ` Ric Wheeler
2010-10-25 10:14                     ` Andreas Dilger
2010-10-25 11:45                       ` Ric Wheeler
2010-10-25 12:54                         ` Ric Wheeler
2010-10-25 14:57                           ` Andreas Dilger
2010-10-25 19:49                             ` Ric Wheeler
2010-10-25 20:08                               ` Bernd Schubert
2010-10-25 20:10                                 ` Ric Wheeler
2010-10-25 19:43                       ` Eric Sandeen
2010-10-25 20:37                         ` Bernd Schubert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101023222605.GC24650@thunk.org \
    --to=tytso@mit.edu \
    --cc=amir73il@gmail.com \
    --cc=bs_lists@aakef.fastmail.fm \
    --cc=bschubert@ddn.com \
    --cc=linux-ext4@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).