* [PATCH] e2fsck: Avoid changes on recovery flags when jbd2_journal_recover() failed @ 2020-12-11 8:04 Haotian Li 2020-12-11 22:07 ` harshad shirwadkar 0 siblings, 1 reply; 9+ messages in thread From: Haotian Li @ 2020-12-11 8:04 UTC (permalink / raw) To: linux-ext4; +Cc: tytso, liuzhiqiang (I), linfeilong, tytso jbd2_journal_revocer() may fail when some error occers such as ENOMEM. However, jsb->s_start is still cleared by func e2fsck_journal_release(). This may break consistency between metadata and data in disk. Sometimes, failure in jbd2_journal_revocer() is temporary but retry e2fsck will skip the journal recovery when the temporary problem is fixed. To fix this case, we use "fatal_error" instead "goto errout" when recover journal failed. We think if journal recovery fails, we need send error message to user and reserve the recovery flags to recover the journal when try e2fsck again. Reported-by: Liangyun <liangyun2@huawei.com> Signed-off-by: Haotian Li <lihaotian9@huawei.com> Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com> --- e2fsck/journal.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/e2fsck/journal.c b/e2fsck/journal.c index 7d9f1b40..546beafd 100644 --- a/e2fsck/journal.c +++ b/e2fsck/journal.c @@ -952,8 +952,13 @@ static errcode_t recover_ext3_journal(e2fsck_t ctx) goto errout; retval = -jbd2_journal_recover(journal); - if (retval) - goto errout; + if (retval && retval != EFSBADCRC && retval != EFSCORRUPTED) { + ctx->fs->flags &= ~EXT2_FLAG_VALID; + com_err(ctx->program_name, 0, + _("Journal recovery failed " + "on %s\n"), ctx->device_name); + fatal_error(ctx, 0); + } if (journal->j_failed_commit) { pctx.ino = journal->j_failed_commit; -- 2.19.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] e2fsck: Avoid changes on recovery flags when jbd2_journal_recover() failed 2020-12-11 8:04 [PATCH] e2fsck: Avoid changes on recovery flags when jbd2_journal_recover() failed Haotian Li @ 2020-12-11 22:07 ` harshad shirwadkar 2020-12-14 1:27 ` Haotian Li 0 siblings, 1 reply; 9+ messages in thread From: harshad shirwadkar @ 2020-12-11 22:07 UTC (permalink / raw) To: Haotian Li Cc: Ext4 Developers List, Theodore Y. Ts'o, liuzhiqiang (I), linfeilong, tytso Hi Haotian, Thanks for your patch. I noticed that the following test fails: $ make -j 64 ... 365 tests succeeded 1 tests failed Tests failed: j_corrupt_revoke_rcount make: *** [Makefile:397: test_post] Error 1 This test fails because the test expects e2fsck to continue even if the journal superblock is corrupt and with your patch e2fsck exits immediately. This brings up a higher level question - if we abort on errors when recovery fails during fsck, how would that problem get fixed if we don't run fsck? In this particular example, the journal superblock is corrupt and that is an unrecoverable error. I wonder if instead we should check for certain specific transient errors such as -ENOMEM and only then exit? I suspect even in those cases we probably should ask the user if they would like to continue or not. What do you think? Thanks, Harshad On Fri, Dec 11, 2020 at 4:19 AM Haotian Li <lihaotian9@huawei.com> wrote: > > jbd2_journal_revocer() may fail when some error occers > such as ENOMEM. However, jsb->s_start is still cleared > by func e2fsck_journal_release(). This may break > consistency between metadata and data in disk. Sometimes, > failure in jbd2_journal_revocer() is temporary but retry > e2fsck will skip the journal recovery when the temporary > problem is fixed. > > To fix this case, we use "fatal_error" instead "goto errout" > when recover journal failed. We think if journal recovery > fails, we need send error message to user and reserve the > recovery flags to recover the journal when try e2fsck again. > > Reported-by: Liangyun <liangyun2@huawei.com> > Signed-off-by: Haotian Li <lihaotian9@huawei.com> > Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com> > --- > e2fsck/journal.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/e2fsck/journal.c b/e2fsck/journal.c > index 7d9f1b40..546beafd 100644 > --- a/e2fsck/journal.c > +++ b/e2fsck/journal.c > @@ -952,8 +952,13 @@ static errcode_t recover_ext3_journal(e2fsck_t ctx) > goto errout; > > retval = -jbd2_journal_recover(journal); > - if (retval) > - goto errout; > + if (retval && retval != EFSBADCRC && retval != EFSCORRUPTED) { > + ctx->fs->flags &= ~EXT2_FLAG_VALID; > + com_err(ctx->program_name, 0, > + _("Journal recovery failed " > + "on %s\n"), ctx->device_name); > + fatal_error(ctx, 0); > + } > > if (journal->j_failed_commit) { > pctx.ino = journal->j_failed_commit; > -- > 2.19.1 > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] e2fsck: Avoid changes on recovery flags when jbd2_journal_recover() failed 2020-12-11 22:07 ` harshad shirwadkar @ 2020-12-14 1:27 ` Haotian Li 2020-12-14 18:44 ` harshad shirwadkar 0 siblings, 1 reply; 9+ messages in thread From: Haotian Li @ 2020-12-14 1:27 UTC (permalink / raw) To: harshad shirwadkar Cc: Ext4 Developers List, Theodore Y. Ts'o, liuzhiqiang (I), linfeilong, tytso, liangyun2 Hi Harshad, Thanks for your review. I think you are right, so I try to find all the recoverable err_codes in journal recovery. But I have no idea to distinguish all the err_codes. Only the following three err_codes I think may be recoverable. -ENOMEM,EXT2_ET_NO_MEMORY ,-EIO. In these cases, I think we probably don't need ask user if they want to continue or not, only tell them why journal recover failed and exit instead. Because, the reason cause these cases may not disk errors, we need try to avoid the changes on the disk. What do you think? Thanks, Haotian 在 2020/12/12 6:07, harshad shirwadkar 写道: > Hi Haotian, > > Thanks for your patch. I noticed that the following test fails: > > $ make -j 64 > ... > 365 tests succeeded 1 tests failed > Tests failed: j_corrupt_revoke_rcount > make: *** [Makefile:397: test_post] Error 1 > > This test fails because the test expects e2fsck to continue even if > the journal superblock is corrupt and with your patch e2fsck exits > immediately. This brings up a higher level question - if we abort on > errors when recovery fails during fsck, how would that problem get > fixed if we don't run fsck? In this particular example, the journal > superblock is corrupt and that is an unrecoverable error. I wonder if > instead we should check for certain specific transient errors such as > -ENOMEM and only then exit? I suspect even in those cases we probably > should ask the user if they would like to continue or not. What do you > think? > > Thanks, > Harshad > > > On Fri, Dec 11, 2020 at 4:19 AM Haotian Li <lihaotian9@huawei.com> wrote: >> >> jbd2_journal_revocer() may fail when some error occers >> such as ENOMEM. However, jsb->s_start is still cleared >> by func e2fsck_journal_release(). This may break >> consistency between metadata and data in disk. Sometimes, >> failure in jbd2_journal_revocer() is temporary but retry >> e2fsck will skip the journal recovery when the temporary >> problem is fixed. >> >> To fix this case, we use "fatal_error" instead "goto errout" >> when recover journal failed. We think if journal recovery >> fails, we need send error message to user and reserve the >> recovery flags to recover the journal when try e2fsck again. >> >> Reported-by: Liangyun <liangyun2@huawei.com> >> Signed-off-by: Haotian Li <lihaotian9@huawei.com> >> Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com> >> --- >> e2fsck/journal.c | 9 +++++++-- >> 1 file changed, 7 insertions(+), 2 deletions(-) >> >> diff --git a/e2fsck/journal.c b/e2fsck/journal.c >> index 7d9f1b40..546beafd 100644 >> --- a/e2fsck/journal.c >> +++ b/e2fsck/journal.c >> @@ -952,8 +952,13 @@ static errcode_t recover_ext3_journal(e2fsck_t ctx) >> goto errout; >> >> retval = -jbd2_journal_recover(journal); >> - if (retval) >> - goto errout; >> + if (retval && retval != EFSBADCRC && retval != EFSCORRUPTED) { >> + ctx->fs->flags &= ~EXT2_FLAG_VALID; >> + com_err(ctx->program_name, 0, >> + _("Journal recovery failed " >> + "on %s\n"), ctx->device_name); >> + fatal_error(ctx, 0); >> + } >> >> if (journal->j_failed_commit) { >> pctx.ino = journal->j_failed_commit; >> -- >> 2.19.1 >> > . > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] e2fsck: Avoid changes on recovery flags when jbd2_journal_recover() failed 2020-12-14 1:27 ` Haotian Li @ 2020-12-14 18:44 ` harshad shirwadkar 2020-12-14 20:27 ` Theodore Y. Ts'o 0 siblings, 1 reply; 9+ messages in thread From: harshad shirwadkar @ 2020-12-14 18:44 UTC (permalink / raw) To: Haotian Li Cc: Ext4 Developers List, Theodore Y. Ts'o, liuzhiqiang (I), linfeilong, tytso, liangyun2 Hi Haotian, Yeah perhaps these are the only recoverable errors. I also think that we can't surely say that these errors are recoverable always. That's because in some setups, these errors may still be unrecoverable (for example, if the machine is running under low memory). I still feel that we should ask the user about whether they want to continue or not. The reason is that firstly if we don't allow running e2fsck in these cases, I wonder what would the user do with their file system - they can't mount / can't run fsck, right? Secondly, not doing that would be a regression. I wonder if some setups would have chosen to ignore journal recovery if there are errors during journal recovery and with this fix they may start seeing that their file systems aren't getting repaired. I'm wondering if you saw any a situation in your setup where exiting e2fsck helped? If possible, could you share what kind of errors were seen in journal recovery and what was the expected behavior? Maybe that would help us decide on the right behavior. Thanks, Harshad On Sun, Dec 13, 2020 at 5:27 PM Haotian Li <lihaotian9@huawei.com> wrote: > > Hi Harshad, > > Thanks for your review. I think you are right, so I try to find > all the recoverable err_codes in journal recovery. But I have no > idea to distinguish all the err_codes. Only the following three > err_codes I think may be recoverable. -ENOMEM,EXT2_ET_NO_MEMORY > ,-EIO. In these cases, I think we probably don't need ask user if > they want to continue or not, only tell them why journal recover > failed and exit instead. Because, the reason cause these cases > may not disk errors, we need try to avoid the changes on the disk. > What do you think? > > Thanks, > Haotian > > 在 2020/12/12 6:07, harshad shirwadkar 写道: > > Hi Haotian, > > > > Thanks for your patch. I noticed that the following test fails: > > > > $ make -j 64 > > ... > > 365 tests succeeded 1 tests failed > > Tests failed: j_corrupt_revoke_rcount > > make: *** [Makefile:397: test_post] Error 1 > > > > This test fails because the test expects e2fsck to continue even if > > the journal superblock is corrupt and with your patch e2fsck exits > > immediately. This brings up a higher level question - if we abort on > > errors when recovery fails during fsck, how would that problem get > > fixed if we don't run fsck? In this particular example, the journal > > superblock is corrupt and that is an unrecoverable error. I wonder if > > instead we should check for certain specific transient errors such as > > -ENOMEM and only then exit? I suspect even in those cases we probably > > should ask the user if they would like to continue or not. What do you > > think? > > > > Thanks, > > Harshad > > > > > > On Fri, Dec 11, 2020 at 4:19 AM Haotian Li <lihaotian9@huawei.com> wrote: > >> > >> jbd2_journal_revocer() may fail when some error occers > >> such as ENOMEM. However, jsb->s_start is still cleared > >> by func e2fsck_journal_release(). This may break > >> consistency between metadata and data in disk. Sometimes, > >> failure in jbd2_journal_revocer() is temporary but retry > >> e2fsck will skip the journal recovery when the temporary > >> problem is fixed. > >> > >> To fix this case, we use "fatal_error" instead "goto errout" > >> when recover journal failed. We think if journal recovery > >> fails, we need send error message to user and reserve the > >> recovery flags to recover the journal when try e2fsck again. > >> > >> Reported-by: Liangyun <liangyun2@huawei.com> > >> Signed-off-by: Haotian Li <lihaotian9@huawei.com> > >> Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com> > >> --- > >> e2fsck/journal.c | 9 +++++++-- > >> 1 file changed, 7 insertions(+), 2 deletions(-) > >> > >> diff --git a/e2fsck/journal.c b/e2fsck/journal.c > >> index 7d9f1b40..546beafd 100644 > >> --- a/e2fsck/journal.c > >> +++ b/e2fsck/journal.c > >> @@ -952,8 +952,13 @@ static errcode_t recover_ext3_journal(e2fsck_t ctx) > >> goto errout; > >> > >> retval = -jbd2_journal_recover(journal); > >> - if (retval) > >> - goto errout; > >> + if (retval && retval != EFSBADCRC && retval != EFSCORRUPTED) { > >> + ctx->fs->flags &= ~EXT2_FLAG_VALID; > >> + com_err(ctx->program_name, 0, > >> + _("Journal recovery failed " > >> + "on %s\n"), ctx->device_name); > >> + fatal_error(ctx, 0); > >> + } > >> > >> if (journal->j_failed_commit) { > >> pctx.ino = journal->j_failed_commit; > >> -- > >> 2.19.1 > >> > > . > > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] e2fsck: Avoid changes on recovery flags when jbd2_journal_recover() failed 2020-12-14 18:44 ` harshad shirwadkar @ 2020-12-14 20:27 ` Theodore Y. Ts'o 2020-12-15 7:43 ` Haotian Li 0 siblings, 1 reply; 9+ messages in thread From: Theodore Y. Ts'o @ 2020-12-14 20:27 UTC (permalink / raw) To: harshad shirwadkar Cc: Haotian Li, Ext4 Developers List, liuzhiqiang (I), linfeilong, liangyun2 On Mon, Dec 14, 2020 at 10:44:29AM -0800, harshad shirwadkar wrote: > Hi Haotian, > > Yeah perhaps these are the only recoverable errors. I also think that > we can't surely say that these errors are recoverable always. That's > because in some setups, these errors may still be unrecoverable (for > example, if the machine is running under low memory). I still feel > that we should ask the user about whether they want to continue or > not. The reason is that firstly if we don't allow running e2fsck in > these cases, I wonder what would the user do with their file system - > they can't mount / can't run fsck, right? Secondly, not doing that > would be a regression. I wonder if some setups would have chosen to > ignore journal recovery if there are errors during journal recovery > and with this fix they may start seeing that their file systems aren't > getting repaired. It may very well be that there are corrupted file system structures that could lead to ENOMEM. If so, I'd consider that someone we should be explicitly checking for in e2fsck, and it's actually relatively unlikely in the jbd2 recovery code, since that's fairly straight forward --- except I'd be concerned about potential cases in your Fast Commit code, since there's quite a bit more complexity when parsing the fast commit journal. This isn't a new concern; we've already talked a about the fact the fast commit needs to have a lot more sanity checks to look for maliciously --- or syzbot generated, which may be the same thing :-) --- inconsistent fields causing the e2fsck reply code to behave in unexpected way, which might include trying to allocate insane amounts of memory, array buffer overruns, etc. But assuming that ENOMEM is always due to operational concerns, as opposed to file system corruption, may not always be a safe assumption. Something else to consider is from the perspective of a naive system administrator, if there is an bad media sector in the journal, simply always aborting the e2fsck run may not allow them an easy way to recover. Simply ignoring the journal and allowing the next write to occur, at which point the HDD or SSD will redirect the write to a bad sector spare spool, will allow for an automatic recovery. Simply always causing e2fsck to fail, would actually result in a worse outcome in this particular case. (This is especially true for a mobile device, where the owner is not likely to have access to the serial console to manually run e2fsck, and where if they can't automatically recover, they will have to take their phone to the local cell phone carrier store for repairs --- which is *not* something that a cellular provider will enjoy, and they will tend to choose other cell phone models to feature as supported/featured devices. So an increased number of failures which cann't be automatically recovered cause the carrier to choose to feature, say, a Xiaomi phone over a ZTE phone.) > I'm wondering if you saw any a situation in your setup where exiting > e2fsck helped? If possible, could you share what kind of errors were > seen in journal recovery and what was the expected behavior? Maybe > that would help us decide on the right behavior. Seconded; I think we should try to understand why it is that e2fsck is failing with these sorts of errors. It may be that there are better ways of solving the high-level problem. For example, the new libext2fs bitmap backends were something that I added because when running a large number of e2fsck processes in parallel on a server machine with dozens of HDD spindles was causing e2fsck processes to run slowly due to memory contention. We fixed it by making e2fsck more memory efficient, by improving the bitmap implementations --- but if that hadn't been sufficient, I had also considered adding support to make /sbin/fsck "smarter" by limiting the number of fsck.XXX processes that would get started simultaneously, since that could actually cause the file system check to run faster by reducing memory thrashing. (The trick would have been how to make fsck smart enough to automatically tune the number of parallel fsck processes to allow, since asking the system administrator to manually tune the max number of processes would be annoying to the sysadmin, and would mean that the feature would never get used outside of $WORK in practice.) So is the actual underlying problem that e2fsck is running out of memory? If so, is it because there simply isn't enough physical memory available? Is it being run in a cgroup container which is too small? Or is it because too many file systems are being checked in parallel at the same time? Or is it I/O errors that you are concerned with? And how do you know that they are not permanent errors; is thie caused by something like fibre channel connections being flaky? Or is this a hypotethical worry, as opposed to something which is causing operational problems right now? Cheers, - Ted ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] e2fsck: Avoid changes on recovery flags when jbd2_journal_recover() failed 2020-12-14 20:27 ` Theodore Y. Ts'o @ 2020-12-15 7:43 ` Haotian Li 2020-12-25 1:49 ` Zhiqiang Liu 0 siblings, 1 reply; 9+ messages in thread From: Haotian Li @ 2020-12-15 7:43 UTC (permalink / raw) To: Theodore Y. Ts'o, harshad shirwadkar Cc: Ext4 Developers List, liuzhiqiang (I), linfeilong, liangyun2 Thanks for your review. I agree with you that it's more important to understand the errors found by e2fsck. we'll decribe the case below about this problem. The probelm we find actually in a remote storage case. It means e2fsck's read or write may fail because of the network packet loss. At first time, some packet loss errors happen during e2fsck's journal recovery (using fsck -a), then recover failed. At second time, we fix the network problem and run e2fsck again, but it still has errors when we try to mount. Then we set jsb->s_start journal flags and retry e2fsck, the problem is fixed. So we suspect something wrong on e2fsck's journal recovery, probably the bug we've described on the patch. Certainly, directly exit is not a good way to fix this problem. just like what Harshad said, we need tell user what happen and listen user's decision, continue e2fsck or not. If we want to safely use e2fsck without human intervention (using fsck -a), I wonder if we need provide a safe mechanism to complate the fast check but avoid changes on journal or something else which may be fixed in feature (such as jsb->s_start flag)? Thanks Haotian 在 2020/12/15 4:27, Theodore Y. Ts'o 写道: > On Mon, Dec 14, 2020 at 10:44:29AM -0800, harshad shirwadkar wrote: >> Hi Haotian, >> >> Yeah perhaps these are the only recoverable errors. I also think that >> we can't surely say that these errors are recoverable always. That's >> because in some setups, these errors may still be unrecoverable (for >> example, if the machine is running under low memory). I still feel >> that we should ask the user about whether they want to continue or >> not. The reason is that firstly if we don't allow running e2fsck in >> these cases, I wonder what would the user do with their file system - >> they can't mount / can't run fsck, right? Secondly, not doing that >> would be a regression. I wonder if some setups would have chosen to >> ignore journal recovery if there are errors during journal recovery >> and with this fix they may start seeing that their file systems aren't >> getting repaired. > > It may very well be that there are corrupted file system structures > that could lead to ENOMEM. If so, I'd consider that someone we should > be explicitly checking for in e2fsck, and it's actually relatively > unlikely in the jbd2 recovery code, since that's fairly straight > forward --- except I'd be concerned about potential cases in your Fast > Commit code, since there's quite a bit more complexity when parsing > the fast commit journal. > > This isn't a new concern; we've already talked a about the fact the > fast commit needs to have a lot more sanity checks to look for > maliciously --- or syzbot generated, which may be the same thing :-) > --- inconsistent fields causing the e2fsck reply code to behave in > unexpected way, which might include trying to allocate insane amounts > of memory, array buffer overruns, etc. > > But assuming that ENOMEM is always due to operational concerns, as > opposed to file system corruption, may not always be a safe > assumption. > > Something else to consider is from the perspective of a naive system > administrator, if there is an bad media sector in the journal, simply > always aborting the e2fsck run may not allow them an easy way to > recover. Simply ignoring the journal and allowing the next write to > occur, at which point the HDD or SSD will redirect the write to a bad > sector spare spool, will allow for an automatic recovery. Simply > always causing e2fsck to fail, would actually result in a worse > outcome in this particular case. > > (This is especially true for a mobile device, where the owner is not > likely to have access to the serial console to manually run e2fsck, > and where if they can't automatically recover, they will have to take > their phone to the local cell phone carrier store for repairs --- > which is *not* something that a cellular provider will enjoy, and they > will tend to choose other cell phone models to feature as > supported/featured devices. So an increased number of failures which > cann't be automatically recovered cause the carrier to choose to > feature, say, a Xiaomi phone over a ZTE phone.) > >> I'm wondering if you saw any a situation in your setup where exiting >> e2fsck helped? If possible, could you share what kind of errors were >> seen in journal recovery and what was the expected behavior? Maybe >> that would help us decide on the right behavior. > > Seconded; I think we should try to understand why it is that e2fsck is > failing with these sorts of errors. It may be that there are better > ways of solving the high-level problem. > > For example, the new libext2fs bitmap backends were something that I > added because when running a large number of e2fsck processes in > parallel on a server machine with dozens of HDD spindles was causing > e2fsck processes to run slowly due to memory contention. We fixed it > by making e2fsck more memory efficient, by improving the bitmap > implementations --- but if that hadn't been sufficient, I had also > considered adding support to make /sbin/fsck "smarter" by limiting the > number of fsck.XXX processes that would get started simultaneously, > since that could actually cause the file system check to run faster by > reducing memory thrashing. (The trick would have been how to make > fsck smart enough to automatically tune the number of parallel fsck > processes to allow, since asking the system administrator to manually > tune the max number of processes would be annoying to the sysadmin, > and would mean that the feature would never get used outside of $WORK > in practice.) > > So is the actual underlying problem that e2fsck is running out of > memory? If so, is it because there simply isn't enough physical > memory available? Is it being run in a cgroup container which is too > small? Or is it because too many file systems are being checked in > parallel at the same time? > > Or is it I/O errors that you are concerned with? And how do you know > that they are not permanent errors; is thie caused by something like > fibre channel connections being flaky? > > Or is this a hypotethical worry, as opposed to something which is > causing operational problems right now? > > Cheers, > > - Ted > > . > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] e2fsck: Avoid changes on recovery flags when jbd2_journal_recover() failed 2020-12-15 7:43 ` Haotian Li @ 2020-12-25 1:49 ` Zhiqiang Liu 2021-01-05 23:06 ` harshad shirwadkar 0 siblings, 1 reply; 9+ messages in thread From: Zhiqiang Liu @ 2020-12-25 1:49 UTC (permalink / raw) To: Haotian Li, Theodore Y. Ts'o, harshad shirwadkar Cc: Ext4 Developers List, linfeilong, liangyun2 friendly ping... On 2020/12/15 15:43, Haotian Li wrote: > Thanks for your review. I agree with you that it's more important > to understand the errors found by e2fsck. we'll decribe the case > below about this problem. > > The probelm we find actually in a remote storage case. It means > e2fsck's read or write may fail because of the network packet loss. > At first time, some packet loss errors happen during e2fsck's journal > recovery (using fsck -a), then recover failed. At second time, we > fix the network problem and run e2fsck again, but it still has errors > when we try to mount. Then we set jsb->s_start journal flags and retry > e2fsck, the problem is fixed. So we suspect something wrong on e2fsck's > journal recovery, probably the bug we've described on the patch. > > Certainly, directly exit is not a good way to fix this problem. > just like what Harshad said, we need tell user what happen and listen > user's decision, continue e2fsck or not. If we want to safely use > e2fsck without human intervention (using fsck -a), I wonder if we need > provide a safe mechanism to complate the fast check but avoid changes > on journal or something else which may be fixed in feature (such > as jsb->s_start flag)? > > Thanks > Haotian > > 在 2020/12/15 4:27, Theodore Y. Ts'o 写道: >> On Mon, Dec 14, 2020 at 10:44:29AM -0800, harshad shirwadkar wrote: >>> Hi Haotian, >>> >>> Yeah perhaps these are the only recoverable errors. I also think that >>> we can't surely say that these errors are recoverable always. That's >>> because in some setups, these errors may still be unrecoverable (for >>> example, if the machine is running under low memory). I still feel >>> that we should ask the user about whether they want to continue or >>> not. The reason is that firstly if we don't allow running e2fsck in >>> these cases, I wonder what would the user do with their file system - >>> they can't mount / can't run fsck, right? Secondly, not doing that >>> would be a regression. I wonder if some setups would have chosen to >>> ignore journal recovery if there are errors during journal recovery >>> and with this fix they may start seeing that their file systems aren't >>> getting repaired. >> >> It may very well be that there are corrupted file system structures >> that could lead to ENOMEM. If so, I'd consider that someone we should >> be explicitly checking for in e2fsck, and it's actually relatively >> unlikely in the jbd2 recovery code, since that's fairly straight >> forward --- except I'd be concerned about potential cases in your Fast >> Commit code, since there's quite a bit more complexity when parsing >> the fast commit journal. >> >> This isn't a new concern; we've already talked a about the fact the >> fast commit needs to have a lot more sanity checks to look for >> maliciously --- or syzbot generated, which may be the same thing :-) >> --- inconsistent fields causing the e2fsck reply code to behave in >> unexpected way, which might include trying to allocate insane amounts >> of memory, array buffer overruns, etc. >> >> But assuming that ENOMEM is always due to operational concerns, as >> opposed to file system corruption, may not always be a safe >> assumption. >> >> Something else to consider is from the perspective of a naive system >> administrator, if there is an bad media sector in the journal, simply >> always aborting the e2fsck run may not allow them an easy way to >> recover. Simply ignoring the journal and allowing the next write to >> occur, at which point the HDD or SSD will redirect the write to a bad >> sector spare spool, will allow for an automatic recovery. Simply >> always causing e2fsck to fail, would actually result in a worse >> outcome in this particular case. >> >> (This is especially true for a mobile device, where the owner is not >> likely to have access to the serial console to manually run e2fsck, >> and where if they can't automatically recover, they will have to take >> their phone to the local cell phone carrier store for repairs --- >> which is *not* something that a cellular provider will enjoy, and they >> will tend to choose other cell phone models to feature as >> supported/featured devices. So an increased number of failures which >> cann't be automatically recovered cause the carrier to choose to >> feature, say, a Xiaomi phone over a ZTE phone.) >> >>> I'm wondering if you saw any a situation in your setup where exiting >>> e2fsck helped? If possible, could you share what kind of errors were >>> seen in journal recovery and what was the expected behavior? Maybe >>> that would help us decide on the right behavior. >> >> Seconded; I think we should try to understand why it is that e2fsck is >> failing with these sorts of errors. It may be that there are better >> ways of solving the high-level problem. >> >> For example, the new libext2fs bitmap backends were something that I >> added because when running a large number of e2fsck processes in >> parallel on a server machine with dozens of HDD spindles was causing >> e2fsck processes to run slowly due to memory contention. We fixed it >> by making e2fsck more memory efficient, by improving the bitmap >> implementations --- but if that hadn't been sufficient, I had also >> considered adding support to make /sbin/fsck "smarter" by limiting the >> number of fsck.XXX processes that would get started simultaneously, >> since that could actually cause the file system check to run faster by >> reducing memory thrashing. (The trick would have been how to make >> fsck smart enough to automatically tune the number of parallel fsck >> processes to allow, since asking the system administrator to manually >> tune the max number of processes would be annoying to the sysadmin, >> and would mean that the feature would never get used outside of $WORK >> in practice.) >> >> So is the actual underlying problem that e2fsck is running out of >> memory? If so, is it because there simply isn't enough physical >> memory available? Is it being run in a cgroup container which is too >> small? Or is it because too many file systems are being checked in >> parallel at the same time? >> >> Or is it I/O errors that you are concerned with? And how do you know >> that they are not permanent errors; is thie caused by something like >> fibre channel connections being flaky? >> >> Or is this a hypotethical worry, as opposed to something which is >> causing operational problems right now? >> >> Cheers, >> >> - Ted >> >> . >> > > . > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] e2fsck: Avoid changes on recovery flags when jbd2_journal_recover() failed 2020-12-25 1:49 ` Zhiqiang Liu @ 2021-01-05 23:06 ` harshad shirwadkar 2021-03-05 9:48 ` Haotian Li 0 siblings, 1 reply; 9+ messages in thread From: harshad shirwadkar @ 2021-01-05 23:06 UTC (permalink / raw) To: Zhiqiang Liu Cc: Haotian Li, Theodore Y. Ts'o, Ext4 Developers List, linfeilong, liangyun2 Sorry for the delay. Thanks for providing more information, Haotian. So this is happening due to IO errors experienced due to a flaky network connection. I can imagine that this is perhaps a situation which is recoverable but I guess when running on physical hardware, it's less likely for such IO errors to be recoverable. I wonder if this means we need an e2fsck.conf option - something like "recovery_error_behavior" with default value of "continue". For usecases such as this, we can set it to "exit" or perhaps "retry"? On Thu, Dec 24, 2020 at 5:49 PM Zhiqiang Liu <liuzhiqiang26@huawei.com> wrote: > > friendly ping... > > On 2020/12/15 15:43, Haotian Li wrote: > > Thanks for your review. I agree with you that it's more important > > to understand the errors found by e2fsck. we'll decribe the case > > below about this problem. > > > > The probelm we find actually in a remote storage case. It means > > e2fsck's read or write may fail because of the network packet loss. > > At first time, some packet loss errors happen during e2fsck's journal > > recovery (using fsck -a), then recover failed. At second time, we > > fix the network problem and run e2fsck again, but it still has errors > > when we try to mount. Then we set jsb->s_start journal flags and retry > > e2fsck, the problem is fixed. So we suspect something wrong on e2fsck's > > journal recovery, probably the bug we've described on the patch. > > > > Certainly, directly exit is not a good way to fix this problem. > > just like what Harshad said, we need tell user what happen and listen > > user's decision, continue e2fsck or not. If we want to safely use > > e2fsck without human intervention (using fsck -a), I wonder if we need > > provide a safe mechanism to complate the fast check but avoid changes > > on journal or something else which may be fixed in feature (such > > as jsb->s_start flag)? > > > > Thanks > > Haotian > > > > 在 2020/12/15 4:27, Theodore Y. Ts'o 写道: > >> On Mon, Dec 14, 2020 at 10:44:29AM -0800, harshad shirwadkar wrote: > >>> Hi Haotian, > >>> > >>> Yeah perhaps these are the only recoverable errors. I also think that > >>> we can't surely say that these errors are recoverable always. That's > >>> because in some setups, these errors may still be unrecoverable (for > >>> example, if the machine is running under low memory). I still feel > >>> that we should ask the user about whether they want to continue or > >>> not. The reason is that firstly if we don't allow running e2fsck in > >>> these cases, I wonder what would the user do with their file system - > >>> they can't mount / can't run fsck, right? Secondly, not doing that > >>> would be a regression. I wonder if some setups would have chosen to > >>> ignore journal recovery if there are errors during journal recovery > >>> and with this fix they may start seeing that their file systems aren't > >>> getting repaired. > >> > >> It may very well be that there are corrupted file system structures > >> that could lead to ENOMEM. If so, I'd consider that someone we should > >> be explicitly checking for in e2fsck, and it's actually relatively > >> unlikely in the jbd2 recovery code, since that's fairly straight > >> forward --- except I'd be concerned about potential cases in your Fast > >> Commit code, since there's quite a bit more complexity when parsing > >> the fast commit journal. > >> > >> This isn't a new concern; we've already talked a about the fact the > >> fast commit needs to have a lot more sanity checks to look for > >> maliciously --- or syzbot generated, which may be the same thing :-) > >> --- inconsistent fields causing the e2fsck reply code to behave in > >> unexpected way, which might include trying to allocate insane amounts > >> of memory, array buffer overruns, etc. > >> > >> But assuming that ENOMEM is always due to operational concerns, as > >> opposed to file system corruption, may not always be a safe > >> assumption. > >> > >> Something else to consider is from the perspective of a naive system > >> administrator, if there is an bad media sector in the journal, simply > >> always aborting the e2fsck run may not allow them an easy way to > >> recover. Simply ignoring the journal and allowing the next write to > >> occur, at which point the HDD or SSD will redirect the write to a bad > >> sector spare spool, will allow for an automatic recovery. Simply > >> always causing e2fsck to fail, would actually result in a worse > >> outcome in this particular case. > >> > >> (This is especially true for a mobile device, where the owner is not > >> likely to have access to the serial console to manually run e2fsck, > >> and where if they can't automatically recover, they will have to take > >> their phone to the local cell phone carrier store for repairs --- > >> which is *not* something that a cellular provider will enjoy, and they > >> will tend to choose other cell phone models to feature as > >> supported/featured devices. So an increased number of failures which > >> cann't be automatically recovered cause the carrier to choose to > >> feature, say, a Xiaomi phone over a ZTE phone.) > >> > >>> I'm wondering if you saw any a situation in your setup where exiting > >>> e2fsck helped? If possible, could you share what kind of errors were > >>> seen in journal recovery and what was the expected behavior? Maybe > >>> that would help us decide on the right behavior. > >> > >> Seconded; I think we should try to understand why it is that e2fsck is > >> failing with these sorts of errors. It may be that there are better > >> ways of solving the high-level problem. > >> > >> For example, the new libext2fs bitmap backends were something that I > >> added because when running a large number of e2fsck processes in > >> parallel on a server machine with dozens of HDD spindles was causing > >> e2fsck processes to run slowly due to memory contention. We fixed it > >> by making e2fsck more memory efficient, by improving the bitmap > >> implementations --- but if that hadn't been sufficient, I had also > >> considered adding support to make /sbin/fsck "smarter" by limiting the > >> number of fsck.XXX processes that would get started simultaneously, > >> since that could actually cause the file system check to run faster by > >> reducing memory thrashing. (The trick would have been how to make > >> fsck smart enough to automatically tune the number of parallel fsck > >> processes to allow, since asking the system administrator to manually > >> tune the max number of processes would be annoying to the sysadmin, > >> and would mean that the feature would never get used outside of $WORK > >> in practice.) > >> > >> So is the actual underlying problem that e2fsck is running out of > >> memory? If so, is it because there simply isn't enough physical > >> memory available? Is it being run in a cgroup container which is too > >> small? Or is it because too many file systems are being checked in > >> parallel at the same time? > >> > >> Or is it I/O errors that you are concerned with? And how do you know > >> that they are not permanent errors; is thie caused by something like > >> fibre channel connections being flaky? > >> > >> Or is this a hypotethical worry, as opposed to something which is > >> causing operational problems right now? > >> > >> Cheers, > >> > >> - Ted > >> > >> . > >> > > > > . > > > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] e2fsck: Avoid changes on recovery flags when jbd2_journal_recover() failed 2021-01-05 23:06 ` harshad shirwadkar @ 2021-03-05 9:48 ` Haotian Li 0 siblings, 0 replies; 9+ messages in thread From: Haotian Li @ 2021-03-05 9:48 UTC (permalink / raw) To: harshad shirwadkar Cc: Theodore Y. Ts'o, Ext4 Developers List, linfeilong, liangyun2, Zhiqiang Liu I'm very sorry for the delay. Thanks for your suggestion. Just as you said, we use an e2fsck.conf option "recovery_error_behavior" to help user adopt different behavior on this situation. The new v2 patch will be resent. 在 2021/1/6 7:06, harshad shirwadkar 写道: > Sorry for the delay. Thanks for providing more information, Haotian. > So this is happening due to IO errors experienced due to a flaky > network connection. I can imagine that this is perhaps a situation > which is recoverable but I guess when running on physical hardware, > it's less likely for such IO errors to be recoverable. I wonder if > this means we need an e2fsck.conf option - something like > "recovery_error_behavior" with default value of "continue". For > usecases such as this, we can set it to "exit" or perhaps "retry"? > > On Thu, Dec 24, 2020 at 5:49 PM Zhiqiang Liu <liuzhiqiang26@huawei.com> wrote: >> >> friendly ping... >> >> On 2020/12/15 15:43, Haotian Li wrote: >>> Thanks for your review. I agree with you that it's more important >>> to understand the errors found by e2fsck. we'll decribe the case >>> below about this problem. >>> >>> The probelm we find actually in a remote storage case. It means >>> e2fsck's read or write may fail because of the network packet loss. >>> At first time, some packet loss errors happen during e2fsck's journal >>> recovery (using fsck -a), then recover failed. At second time, we >>> fix the network problem and run e2fsck again, but it still has errors >>> when we try to mount. Then we set jsb->s_start journal flags and retry >>> e2fsck, the problem is fixed. So we suspect something wrong on e2fsck's >>> journal recovery, probably the bug we've described on the patch. >>> >>> Certainly, directly exit is not a good way to fix this problem. >>> just like what Harshad said, we need tell user what happen and listen >>> user's decision, continue e2fsck or not. If we want to safely use >>> e2fsck without human intervention (using fsck -a), I wonder if we need >>> provide a safe mechanism to complate the fast check but avoid changes >>> on journal or something else which may be fixed in feature (such >>> as jsb->s_start flag)? >>> >>> Thanks >>> Haotian >>> >>> 在 2020/12/15 4:27, Theodore Y. Ts'o 写道: >>>> On Mon, Dec 14, 2020 at 10:44:29AM -0800, harshad shirwadkar wrote: >>>>> Hi Haotian, >>>>> >>>>> Yeah perhaps these are the only recoverable errors. I also think that >>>>> we can't surely say that these errors are recoverable always. That's >>>>> because in some setups, these errors may still be unrecoverable (for >>>>> example, if the machine is running under low memory). I still feel >>>>> that we should ask the user about whether they want to continue or >>>>> not. The reason is that firstly if we don't allow running e2fsck in >>>>> these cases, I wonder what would the user do with their file system - >>>>> they can't mount / can't run fsck, right? Secondly, not doing that >>>>> would be a regression. I wonder if some setups would have chosen to >>>>> ignore journal recovery if there are errors during journal recovery >>>>> and with this fix they may start seeing that their file systems aren't >>>>> getting repaired. >>>> >>>> It may very well be that there are corrupted file system structures >>>> that could lead to ENOMEM. If so, I'd consider that someone we should >>>> be explicitly checking for in e2fsck, and it's actually relatively >>>> unlikely in the jbd2 recovery code, since that's fairly straight >>>> forward --- except I'd be concerned about potential cases in your Fast >>>> Commit code, since there's quite a bit more complexity when parsing >>>> the fast commit journal. >>>> >>>> This isn't a new concern; we've already talked a about the fact the >>>> fast commit needs to have a lot more sanity checks to look for >>>> maliciously --- or syzbot generated, which may be the same thing :-) >>>> --- inconsistent fields causing the e2fsck reply code to behave in >>>> unexpected way, which might include trying to allocate insane amounts >>>> of memory, array buffer overruns, etc. >>>> >>>> But assuming that ENOMEM is always due to operational concerns, as >>>> opposed to file system corruption, may not always be a safe >>>> assumption. >>>> >>>> Something else to consider is from the perspective of a naive system >>>> administrator, if there is an bad media sector in the journal, simply >>>> always aborting the e2fsck run may not allow them an easy way to >>>> recover. Simply ignoring the journal and allowing the next write to >>>> occur, at which point the HDD or SSD will redirect the write to a bad >>>> sector spare spool, will allow for an automatic recovery. Simply >>>> always causing e2fsck to fail, would actually result in a worse >>>> outcome in this particular case. >>>> >>>> (This is especially true for a mobile device, where the owner is not >>>> likely to have access to the serial console to manually run e2fsck, >>>> and where if they can't automatically recover, they will have to take >>>> their phone to the local cell phone carrier store for repairs --- >>>> which is *not* something that a cellular provider will enjoy, and they >>>> will tend to choose other cell phone models to feature as >>>> supported/featured devices. So an increased number of failures which >>>> cann't be automatically recovered cause the carrier to choose to >>>> feature, say, a Xiaomi phone over a ZTE phone.) >>>> >>>>> I'm wondering if you saw any a situation in your setup where exiting >>>>> e2fsck helped? If possible, could you share what kind of errors were >>>>> seen in journal recovery and what was the expected behavior? Maybe >>>>> that would help us decide on the right behavior. >>>> >>>> Seconded; I think we should try to understand why it is that e2fsck is >>>> failing with these sorts of errors. It may be that there are better >>>> ways of solving the high-level problem. >>>> >>>> For example, the new libext2fs bitmap backends were something that I >>>> added because when running a large number of e2fsck processes in >>>> parallel on a server machine with dozens of HDD spindles was causing >>>> e2fsck processes to run slowly due to memory contention. We fixed it >>>> by making e2fsck more memory efficient, by improving the bitmap >>>> implementations --- but if that hadn't been sufficient, I had also >>>> considered adding support to make /sbin/fsck "smarter" by limiting the >>>> number of fsck.XXX processes that would get started simultaneously, >>>> since that could actually cause the file system check to run faster by >>>> reducing memory thrashing. (The trick would have been how to make >>>> fsck smart enough to automatically tune the number of parallel fsck >>>> processes to allow, since asking the system administrator to manually >>>> tune the max number of processes would be annoying to the sysadmin, >>>> and would mean that the feature would never get used outside of $WORK >>>> in practice.) >>>> >>>> So is the actual underlying problem that e2fsck is running out of >>>> memory? If so, is it because there simply isn't enough physical >>>> memory available? Is it being run in a cgroup container which is too >>>> small? Or is it because too many file systems are being checked in >>>> parallel at the same time? >>>> >>>> Or is it I/O errors that you are concerned with? And how do you know >>>> that they are not permanent errors; is thie caused by something like >>>> fibre channel connections being flaky? >>>> >>>> Or is this a hypotethical worry, as opposed to something which is >>>> causing operational problems right now? >>>> >>>> Cheers, >>>> >>>> - Ted >>>> >>>> . >>>> >>> >>> . >>> >> > . > ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2021-03-05 9:49 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-12-11 8:04 [PATCH] e2fsck: Avoid changes on recovery flags when jbd2_journal_recover() failed Haotian Li 2020-12-11 22:07 ` harshad shirwadkar 2020-12-14 1:27 ` Haotian Li 2020-12-14 18:44 ` harshad shirwadkar 2020-12-14 20:27 ` Theodore Y. Ts'o 2020-12-15 7:43 ` Haotian Li 2020-12-25 1:49 ` Zhiqiang Liu 2021-01-05 23:06 ` harshad shirwadkar 2021-03-05 9:48 ` Haotian Li
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox