From: Bernd Schubert <bschubert@ddn.com>
To: Ric Wheeler <rwheeler@redhat.com>
Cc: Ted Ts'o <tytso@mit.edu>, Amir Goldstein <amir73il@gmail.com>,
Bernd Schubert <bs_lists@aakef.fastmail.fm>,
"linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org>,
Andreas Dilger <adilger@sun.com>
Subject: Re: ext4_clear_journal_err: Filesystem error recorded from previous mount: IO failure
Date: Sun, 24 Oct 2010 18:16:59 +0200 [thread overview]
Message-ID: <4CC45BFB.4010403@ddn.com> (raw)
In-Reply-To: <4CC45590.80608@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 3020 bytes --]
On 10/24/2010 05:49 PM, Ric Wheeler wrote:
> On 10/24/2010 11:39 AM, Bernd Schubert wrote:
>> On 10/24/2010 05:20 PM, Ric Wheeler wrote:
>>> This still sounds more like a Lustre issue than an ext4 one, Andreas can fill in
>>> the technical details.
>> The underlying device handling is unrelated to Lustre. In that sense it
>> is just a local filesystem.
>>
>>> What ever shared storage sits under ext4 is irrelevant to the fail over case.
>>>
>>> Unless Lustre does other magic, they still need to obey the basic cluster rules
>>> - one mount per cluster.
>> Yes, one mount per cluster.
>>
>>> If Lustre is doing the same trick you would do with active/passive failure over
>>> clusters that export ext4 via NFS, you would still need to clean up the file
>>> system before being able to re-export it from a fail over node.
>> What exactly is your question here? We use pacemaker/stonith to do the
>> fencing job.
>> What exactly do you want to clean up? The device is recovered by
>> journals, Lustre goes into recovery mode, clients reconnect, locks are
>> updated and incomplete transactions resend.
>>
>>
>> Cheers,
>> Bernd
>>
>
> What I don't get (certainly might just be me) is why this is a unique issue when
> used by lustre. Normally, any similar type of fail over will clean up the local
> file system normally before trying to re-export from the second node.
Of course that is not a Lustre specific issue, which is why I also did
not open a Lustre bugzilla, but opened the thread here.
>
> Why exactly can't you use the same type of recovery here? Is it the fencing
> agent killing nodes on detection of the file system errors?
But I'm using the same type of recovery! I just rewrote pacemakers
default "Filesystem" agent to a lustre_server agent, to include more
Lustre specific checks. When I then added last week a check for the
dumpe2fs "Filesystem state", I noticed, that sometimes the error state
is only set *after* mounting the filesystem, so difficult to script it.
And as I also wrote, running e2fsck from that script and to do a
complete fs check is not appropriate, as that might simply time out.
Again not Lustre specific. So after some discussion, the proposed
solution is to add a "journal recovery only" option to e2fsck and to do
that before the mount. I will add that to the 'lustre_server' agent
(which is part of Lustre now), but leave it to someone else to that for
the 'Filesystem' agent script (I'm not using that script myself and IMHO
it is already too complex, as it tries to support all filesystems -
shell code is ideal anymore then).
Really, only Lustre specific here is the feature to have a proc file to
see if filesystem errors came up on a node. That is a missing feature in
extX and all other linux filesystems I have worked with. And Lustre
server nodes just means the usage of dozens to hundreds of
ext3/ext4/ldiskfs devices, so bugs are more likely exposed by that high
number.
Cheers,
Bernd
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]
next prev parent reply other threads:[~2010-10-24 16:17 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-10-22 13:33 ext4_clear_journal_err: Filesystem error recorded from previous mount: IO failure Bernd Schubert
2010-10-22 17:25 ` Ted Ts'o
2010-10-22 17:42 ` Bernd Schubert
2010-10-22 18:32 ` Ted Ts'o
2010-10-22 18:54 ` Bernd Schubert
2010-10-23 16:00 ` Amir Goldstein
2010-10-23 17:46 ` Bernd Schubert
2010-10-23 22:26 ` Ted Ts'o
2010-10-23 23:56 ` Bernd Schubert
2010-10-24 0:20 ` Bernd Schubert
2010-10-24 1:08 ` Ted Ts'o
2010-10-24 14:42 ` Bernd Schubert
2010-10-23 22:17 ` Ted Ts'o
2010-10-24 8:50 ` Amir Goldstein
2010-10-24 13:55 ` Ric Wheeler
2010-10-24 14:30 ` Bernd Schubert
2010-10-24 15:20 ` Ric Wheeler
2010-10-24 15:39 ` Bernd Schubert
2010-10-24 15:49 ` Ric Wheeler
2010-10-24 16:16 ` Bernd Schubert [this message]
2010-10-24 16:43 ` Ric Wheeler
2010-10-25 10:14 ` Andreas Dilger
2010-10-25 11:45 ` Ric Wheeler
2010-10-25 12:54 ` Ric Wheeler
2010-10-25 14:57 ` Andreas Dilger
2010-10-25 19:49 ` Ric Wheeler
2010-10-25 20:08 ` Bernd Schubert
2010-10-25 20:10 ` Ric Wheeler
2010-10-25 19:43 ` Eric Sandeen
2010-10-25 20:37 ` Bernd Schubert
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4CC45BFB.4010403@ddn.com \
--to=bschubert@ddn.com \
--cc=adilger@sun.com \
--cc=amir73il@gmail.com \
--cc=bs_lists@aakef.fastmail.fm \
--cc=linux-ext4@vger.kernel.org \
--cc=rwheeler@redhat.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).