From: Haotian Li <lihaotian9@huawei.com>
To: harshad shirwadkar <harshadshirwadkar@gmail.com>
Cc: Ext4 Developers List <linux-ext4@vger.kernel.org>,
"Theodore Y. Ts'o" <tytso@mit.edu>,
"liuzhiqiang (I)" <liuzhiqiang26@huawei.com>,
linfeilong <linfeilong@huawei.com>, <tytso@alum.mit.edu>,
<liangyun2@huawei.com>
Subject: Re: [PATCH] e2fsck: Avoid changes on recovery flags when jbd2_journal_recover() failed
Date: Mon, 14 Dec 2020 09:27:05 +0800 [thread overview]
Message-ID: <3e3c18f6-9f45-da04-9e81-ebf1ae16747e@huawei.com> (raw)
In-Reply-To: <CAD+ocbxAyyFqoD6AYQVjQyqFzZde3+QOnUhC-VikAq4A3_t8JA@mail.gmail.com>
Hi Harshad,
Thanks for your review. I think you are right, so I try to find
all the recoverable err_codes in journal recovery. But I have no
idea to distinguish all the err_codes. Only the following three
err_codes I think may be recoverable. -ENOMEM,EXT2_ET_NO_MEMORY
,-EIO. In these cases, I think we probably don't need ask user if
they want to continue or not, only tell them why journal recover
failed and exit instead. Because, the reason cause these cases
may not disk errors, we need try to avoid the changes on the disk.
What do you think?
Thanks,
Haotian
在 2020/12/12 6:07, harshad shirwadkar 写道:
> Hi Haotian,
>
> Thanks for your patch. I noticed that the following test fails:
>
> $ make -j 64
> ...
> 365 tests succeeded 1 tests failed
> Tests failed: j_corrupt_revoke_rcount
> make: *** [Makefile:397: test_post] Error 1
>
> This test fails because the test expects e2fsck to continue even if
> the journal superblock is corrupt and with your patch e2fsck exits
> immediately. This brings up a higher level question - if we abort on
> errors when recovery fails during fsck, how would that problem get
> fixed if we don't run fsck? In this particular example, the journal
> superblock is corrupt and that is an unrecoverable error. I wonder if
> instead we should check for certain specific transient errors such as
> -ENOMEM and only then exit? I suspect even in those cases we probably
> should ask the user if they would like to continue or not. What do you
> think?
>
> Thanks,
> Harshad
>
>
> On Fri, Dec 11, 2020 at 4:19 AM Haotian Li <lihaotian9@huawei.com> wrote:
>>
>> jbd2_journal_revocer() may fail when some error occers
>> such as ENOMEM. However, jsb->s_start is still cleared
>> by func e2fsck_journal_release(). This may break
>> consistency between metadata and data in disk. Sometimes,
>> failure in jbd2_journal_revocer() is temporary but retry
>> e2fsck will skip the journal recovery when the temporary
>> problem is fixed.
>>
>> To fix this case, we use "fatal_error" instead "goto errout"
>> when recover journal failed. We think if journal recovery
>> fails, we need send error message to user and reserve the
>> recovery flags to recover the journal when try e2fsck again.
>>
>> Reported-by: Liangyun <liangyun2@huawei.com>
>> Signed-off-by: Haotian Li <lihaotian9@huawei.com>
>> Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com>
>> ---
>> e2fsck/journal.c | 9 +++++++--
>> 1 file changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/e2fsck/journal.c b/e2fsck/journal.c
>> index 7d9f1b40..546beafd 100644
>> --- a/e2fsck/journal.c
>> +++ b/e2fsck/journal.c
>> @@ -952,8 +952,13 @@ static errcode_t recover_ext3_journal(e2fsck_t ctx)
>> goto errout;
>>
>> retval = -jbd2_journal_recover(journal);
>> - if (retval)
>> - goto errout;
>> + if (retval && retval != EFSBADCRC && retval != EFSCORRUPTED) {
>> + ctx->fs->flags &= ~EXT2_FLAG_VALID;
>> + com_err(ctx->program_name, 0,
>> + _("Journal recovery failed "
>> + "on %s\n"), ctx->device_name);
>> + fatal_error(ctx, 0);
>> + }
>>
>> if (journal->j_failed_commit) {
>> pctx.ino = journal->j_failed_commit;
>> --
>> 2.19.1
>>
> .
>
next prev parent reply other threads:[~2020-12-14 1:28 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-12-11 8:04 [PATCH] e2fsck: Avoid changes on recovery flags when jbd2_journal_recover() failed Haotian Li
2020-12-11 22:07 ` harshad shirwadkar
2020-12-14 1:27 ` Haotian Li [this message]
2020-12-14 18:44 ` harshad shirwadkar
2020-12-14 20:27 ` Theodore Y. Ts'o
2020-12-15 7:43 ` Haotian Li
2020-12-25 1:49 ` Zhiqiang Liu
2021-01-05 23:06 ` harshad shirwadkar
2021-03-05 9:48 ` Haotian Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3e3c18f6-9f45-da04-9e81-ebf1ae16747e@huawei.com \
--to=lihaotian9@huawei.com \
--cc=harshadshirwadkar@gmail.com \
--cc=liangyun2@huawei.com \
--cc=linfeilong@huawei.com \
--cc=linux-ext4@vger.kernel.org \
--cc=liuzhiqiang26@huawei.com \
--cc=tytso@alum.mit.edu \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox