* [PATCH] e2fsck: delay quotas loading in release_orphan_inodes()
@ 2023-08-17 8:18 Baokun Li
2023-08-23 17:05 ` Jan Kara
0 siblings, 1 reply; 11+ messages in thread
From: Baokun Li @ 2023-08-17 8:18 UTC (permalink / raw)
To: linux-ext4
Cc: tytso, adilger.kernel, jack, darrick.wong, yi.zhang, yangerkun,
yukuai3, libaokun1
After 7d79b40b ("e2fsck: adjust quota counters when clearing orphaned
inodes"), we load all the quotas before we process the orphaned inodes,
and when we load the quotas, we check the checsum of the bbitmap for each
group. If one of the bbitmap checksums is wrong, the following error will
be reported:
“Error initializing quota context in support library:
Block bitmap checksum does not match bitmap”
But loading quotas comes before checking the current superblock for the
EXT2_ERROR_FS flag, which makes it impossible to use e2fsck to repair any
image that contains orphan inodes and has the wrong bbitmap checksum.
So delaying quota loading until after the EXT2_ERROR_FS judgment avoids
the above problem.
Signed-off-by: Baokun Li <libaokun1@huawei.com>
---
e2fsck/super.c | 21 +++++++++------------
1 file changed, 9 insertions(+), 12 deletions(-)
diff --git a/e2fsck/super.c b/e2fsck/super.c
index 9495e029..b1aaaed6 100644
--- a/e2fsck/super.c
+++ b/e2fsck/super.c
@@ -503,15 +503,6 @@ static int release_orphan_inodes(e2fsck_t ctx)
!ext2fs_has_feature_orphan_present(fs->super))
return 0;
- clear_problem_context(&pctx);
- ino = fs->super->s_last_orphan;
- pctx.ino = ino;
- pctx.errcode = e2fsck_read_all_quotas(ctx);
- if (pctx.errcode) {
- fix_problem(ctx, PR_0_QUOTA_INIT_CTX, &pctx);
- return 1;
- }
-
/*
* Win or lose, we won't be using the head of the orphan inode
* list again.
@@ -525,10 +516,16 @@ static int release_orphan_inodes(e2fsck_t ctx)
* be running a full e2fsck run anyway... We clear orphan file contents
* after filesystem is checked to avoid clearing someone else's data.
*/
- if (fs->super->s_state & EXT2_ERROR_FS) {
- if (ctx->qctx)
- quota_release_context(&ctx->qctx);
+ if (fs->super->s_state & EXT2_ERROR_FS)
return 0;
+
+ clear_problem_context(&pctx);
+ ino = fs->super->s_last_orphan;
+ pctx.ino = ino;
+ pctx.errcode = e2fsck_read_all_quotas(ctx);
+ if (pctx.errcode) {
+ fix_problem(ctx, PR_0_QUOTA_INIT_CTX, &pctx);
+ return 1;
}
if (ino && ((ino < EXT2_FIRST_INODE(fs->super)) ||
--
2.31.1
^ permalink raw reply related [flat|nested] 11+ messages in thread* Re: [PATCH] e2fsck: delay quotas loading in release_orphan_inodes() 2023-08-17 8:18 [PATCH] e2fsck: delay quotas loading in release_orphan_inodes() Baokun Li @ 2023-08-23 17:05 ` Jan Kara 2023-08-24 2:27 ` Baokun Li 0 siblings, 1 reply; 11+ messages in thread From: Jan Kara @ 2023-08-23 17:05 UTC (permalink / raw) To: Baokun Li Cc: linux-ext4, tytso, adilger.kernel, jack, darrick.wong, yi.zhang, yangerkun, yukuai3 On Thu 17-08-23 16:18:28, Baokun Li wrote: > After 7d79b40b ("e2fsck: adjust quota counters when clearing orphaned > inodes"), we load all the quotas before we process the orphaned inodes, > and when we load the quotas, we check the checsum of the bbitmap for each > group. If one of the bbitmap checksums is wrong, the following error will > be reported: > > “Error initializing quota context in support library: > Block bitmap checksum does not match bitmap” > > But loading quotas comes before checking the current superblock for the > EXT2_ERROR_FS flag, which makes it impossible to use e2fsck to repair any > image that contains orphan inodes and has the wrong bbitmap checksum. > So delaying quota loading until after the EXT2_ERROR_FS judgment avoids > the above problem. > > Signed-off-by: Baokun Li <libaokun1@huawei.com> This certainly looks better but I wonder if there still isn't a problem if the bitmap checksums are wrong but EXT2_ERROR_FS is not set. Shouldn't we rather move the initialization of the quota files after the call to e2fsck_read_bitmaps()? Honza > --- > e2fsck/super.c | 21 +++++++++------------ > 1 file changed, 9 insertions(+), 12 deletions(-) > > diff --git a/e2fsck/super.c b/e2fsck/super.c > index 9495e029..b1aaaed6 100644 > --- a/e2fsck/super.c > +++ b/e2fsck/super.c > @@ -503,15 +503,6 @@ static int release_orphan_inodes(e2fsck_t ctx) > !ext2fs_has_feature_orphan_present(fs->super)) > return 0; > > - clear_problem_context(&pctx); > - ino = fs->super->s_last_orphan; > - pctx.ino = ino; > - pctx.errcode = e2fsck_read_all_quotas(ctx); > - if (pctx.errcode) { > - fix_problem(ctx, PR_0_QUOTA_INIT_CTX, &pctx); > - return 1; > - } > - > /* > * Win or lose, we won't be using the head of the orphan inode > * list again. > @@ -525,10 +516,16 @@ static int release_orphan_inodes(e2fsck_t ctx) > * be running a full e2fsck run anyway... We clear orphan file contents > * after filesystem is checked to avoid clearing someone else's data. > */ > - if (fs->super->s_state & EXT2_ERROR_FS) { > - if (ctx->qctx) > - quota_release_context(&ctx->qctx); > + if (fs->super->s_state & EXT2_ERROR_FS) > return 0; > + > + clear_problem_context(&pctx); > + ino = fs->super->s_last_orphan; > + pctx.ino = ino; > + pctx.errcode = e2fsck_read_all_quotas(ctx); > + if (pctx.errcode) { > + fix_problem(ctx, PR_0_QUOTA_INIT_CTX, &pctx); > + return 1; > } > > if (ino && ((ino < EXT2_FIRST_INODE(fs->super)) || > -- > 2.31.1 > -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] e2fsck: delay quotas loading in release_orphan_inodes() 2023-08-23 17:05 ` Jan Kara @ 2023-08-24 2:27 ` Baokun Li 2023-08-24 10:08 ` Jan Kara 2023-08-24 19:49 ` Andreas Dilger 0 siblings, 2 replies; 11+ messages in thread From: Baokun Li @ 2023-08-24 2:27 UTC (permalink / raw) To: Jan Kara Cc: linux-ext4, tytso, adilger.kernel, darrick.wong, yi.zhang, yangerkun, yukuai3, Baokun Li Hello, Jan! On 2023/8/24 1:05, Jan Kara wrote: > On Thu 17-08-23 16:18:28, Baokun Li wrote: >> After 7d79b40b ("e2fsck: adjust quota counters when clearing orphaned >> inodes"), we load all the quotas before we process the orphaned inodes, >> and when we load the quotas, we check the checsum of the bbitmap for each >> group. If one of the bbitmap checksums is wrong, the following error will >> be reported: >> >> “Error initializing quota context in support library: >> Block bitmap checksum does not match bitmap” >> >> But loading quotas comes before checking the current superblock for the >> EXT2_ERROR_FS flag, which makes it impossible to use e2fsck to repair any >> image that contains orphan inodes and has the wrong bbitmap checksum. >> So delaying quota loading until after the EXT2_ERROR_FS judgment avoids >> the above problem. >> >> Signed-off-by: Baokun Li <libaokun1@huawei.com> > This certainly looks better but I wonder if there still isn't a problem if > the bitmap checksums are wrong but EXT2_ERROR_FS is not set. Shouldn't we > rather move the initialization of the quota files after the call to > e2fsck_read_bitmaps()? > > Honza When the bitmap checksums are wrong but EXT2_ERROR_FS is not set, we must have lost some data (error flag or group descriptor or bitmap), so there is something wrong with the kernel at this time, so I don't think we should fix the image directly, but rather let the user realize that something is wrong with the filesystem logic. Moreover, if we don't care how this happened, but just want to fix the image, we only need to run "e2fsck -a" twice. After merging in the current patch, we always empty the orphan list before loading the quotas, and EXT2_ERROR_FS is set when loading the quotas fails, so this will be fixed the second time you run e2fsck. It will not happen that every e2fsck will fail like it did before. Thanks! -- With Best Regards, Baokun Li . ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] e2fsck: delay quotas loading in release_orphan_inodes() 2023-08-24 2:27 ` Baokun Li @ 2023-08-24 10:08 ` Jan Kara 2023-08-24 12:56 ` Baokun Li 2023-08-24 19:49 ` Andreas Dilger 1 sibling, 1 reply; 11+ messages in thread From: Jan Kara @ 2023-08-24 10:08 UTC (permalink / raw) To: Baokun Li Cc: Jan Kara, linux-ext4, tytso, adilger.kernel, darrick.wong, yi.zhang, yangerkun, yukuai3 On Thu 24-08-23 10:27:46, Baokun Li wrote: > Hello, Jan! > > On 2023/8/24 1:05, Jan Kara wrote: > > On Thu 17-08-23 16:18:28, Baokun Li wrote: > > > After 7d79b40b ("e2fsck: adjust quota counters when clearing orphaned > > > inodes"), we load all the quotas before we process the orphaned inodes, > > > and when we load the quotas, we check the checsum of the bbitmap for each > > > group. If one of the bbitmap checksums is wrong, the following error will > > > be reported: > > > > > > “Error initializing quota context in support library: > > > Block bitmap checksum does not match bitmap” > > > > > > But loading quotas comes before checking the current superblock for the > > > EXT2_ERROR_FS flag, which makes it impossible to use e2fsck to repair any > > > image that contains orphan inodes and has the wrong bbitmap checksum. > > > So delaying quota loading until after the EXT2_ERROR_FS judgment avoids > > > the above problem. > > > > > > Signed-off-by: Baokun Li <libaokun1@huawei.com> > > This certainly looks better but I wonder if there still isn't a problem if > > the bitmap checksums are wrong but EXT2_ERROR_FS is not set. Shouldn't we > > rather move the initialization of the quota files after the call to > > e2fsck_read_bitmaps()? > > > > Honza > When the bitmap checksums are wrong but EXT2_ERROR_FS is not set, we must > have lost some data (error flag or group descriptor or bitmap), so there > is something wrong with the kernel at this time, so I don't think we > should fix the image directly, but rather let the user realize that > something is wrong with the filesystem logic. I agree it means there is a problem somewhere (the storage, the kernel, or similar). But just ignoring bitmap checksums in release_orphan_inodes() is exactly how e2fsck behaves on filesystems without quota feature so I see no reason for quota feature to change that because the inconsistency has nothing to do with quotas... > Moreover, if we don't care how this happened, but just want to fix the > image, we only need to run "e2fsck -a" twice. After merging in the > current patch, we always empty the orphan list before loading the quotas, > and EXT2_ERROR_FS is set when loading the quotas fails, so this will be > fixed the second time you run e2fsck. It will not happen that every > e2fsck will fail like it did before. I see, you're right so it isn't as bad as I originally thought but still my argument above holds - IMO e2fsck should treat wrong bitmap checksums the same way with and without the quota feature. Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] e2fsck: delay quotas loading in release_orphan_inodes() 2023-08-24 10:08 ` Jan Kara @ 2023-08-24 12:56 ` Baokun Li 2023-08-24 19:37 ` Jan Kara 0 siblings, 1 reply; 11+ messages in thread From: Baokun Li @ 2023-08-24 12:56 UTC (permalink / raw) To: Jan Kara Cc: linux-ext4, tytso, adilger.kernel, darrick.wong, yi.zhang, yangerkun, yukuai3, Baokun Li On 2023/8/24 18:08, Jan Kara wrote: > On Thu 24-08-23 10:27:46, Baokun Li wrote: >> Hello, Jan! >> >> On 2023/8/24 1:05, Jan Kara wrote: >>> On Thu 17-08-23 16:18:28, Baokun Li wrote: >>>> After 7d79b40b ("e2fsck: adjust quota counters when clearing orphaned >>>> inodes"), we load all the quotas before we process the orphaned inodes, >>>> and when we load the quotas, we check the checsum of the bbitmap for each >>>> group. If one of the bbitmap checksums is wrong, the following error will >>>> be reported: >>>> >>>> “Error initializing quota context in support library: >>>> Block bitmap checksum does not match bitmap” >>>> >>>> But loading quotas comes before checking the current superblock for the >>>> EXT2_ERROR_FS flag, which makes it impossible to use e2fsck to repair any >>>> image that contains orphan inodes and has the wrong bbitmap checksum. >>>> So delaying quota loading until after the EXT2_ERROR_FS judgment avoids >>>> the above problem. >>>> >>>> Signed-off-by: Baokun Li <libaokun1@huawei.com> >>> This certainly looks better but I wonder if there still isn't a problem if >>> the bitmap checksums are wrong but EXT2_ERROR_FS is not set. Shouldn't we >>> rather move the initialization of the quota files after the call to >>> e2fsck_read_bitmaps()? >>> >>> Honza >> When the bitmap checksums are wrong but EXT2_ERROR_FS is not set, we must >> have lost some data (error flag or group descriptor or bitmap), so there >> is something wrong with the kernel at this time, so I don't think we >> should fix the image directly, but rather let the user realize that >> something is wrong with the filesystem logic. > I agree it means there is a problem somewhere (the storage, the kernel, or > similar). But just ignoring bitmap checksums in release_orphan_inodes() is > exactly how e2fsck behaves on filesystems without quota feature so I see no > reason for quota feature to change that because the inconsistency has > nothing to do with quotas... > >> Moreover, if we don't care how this happened, but just want to fix the >> image, we only need to run "e2fsck -a" twice. After merging in the >> current patch, we always empty the orphan list before loading the quotas, >> and EXT2_ERROR_FS is set when loading the quotas fails, so this will be >> fixed the second time you run e2fsck. It will not happen that every >> e2fsck will fail like it did before. > I see, you're right so it isn't as bad as I originally thought but still my > argument above holds - IMO e2fsck should treat wrong bitmap checksums the > same way with and without the quota feature. > > Honza The original flow that went wrong here is as follows: e2fsck e2fsck_run_ext3_journal check_super_block release_orphan_inodes e2fsck_read_all_quotas quota_read_all_dquots quota_file_open ext2fs_read_bitmaps ext2fs_rw_bitmaps read_bitmaps_range read_bitmaps_range_start ext2fs_block_bitmap_csum_verify !!! error e2fsck_run Yes, the inconsistency has nothing to do with quota, but quota is loaded here to keep track of space changes during the normal processing of orphan list. If quota was not loaded, we would not have read and check bitmaps until Pass5, and we had already done a lot of checking and tweaking of inodes, blocks, and dirs before Pass5, and the bitmaps inconsistency may have been fixed during that time. So even though the inconsistency has nothing to do with the quota, it is the loading of the quota that reveals the inconsistency and causes the program to exit. It is also unnecessary to read the bitmaps ahead of time alone, since the bitmaps inconsistency could be caused by orphan inodes in the orphan list not being handled. So in my opinion, there has to be a distinction between the two cases of enabling and disabling quota. Unless we don't read the quotas on disk just log the quota changes in memory and then update the quota in memory to disk when all the checking and fixing is done (like the quota handling before and after e2fsck_run()). Thanks! -- With Best Regards, Baokun Li . ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] e2fsck: delay quotas loading in release_orphan_inodes() 2023-08-24 12:56 ` Baokun Li @ 2023-08-24 19:37 ` Jan Kara 2023-08-25 2:08 ` Baokun Li 0 siblings, 1 reply; 11+ messages in thread From: Jan Kara @ 2023-08-24 19:37 UTC (permalink / raw) To: Baokun Li Cc: Jan Kara, linux-ext4, tytso, adilger.kernel, darrick.wong, yi.zhang, yangerkun, yukuai3 On Thu 24-08-23 20:56:14, Baokun Li wrote: > On 2023/8/24 18:08, Jan Kara wrote: > > On Thu 24-08-23 10:27:46, Baokun Li wrote: > > > Hello, Jan! > > > > > > On 2023/8/24 1:05, Jan Kara wrote: > > > > On Thu 17-08-23 16:18:28, Baokun Li wrote: > > > > > After 7d79b40b ("e2fsck: adjust quota counters when clearing orphaned > > > > > inodes"), we load all the quotas before we process the orphaned inodes, > > > > > and when we load the quotas, we check the checsum of the bbitmap for each > > > > > group. If one of the bbitmap checksums is wrong, the following error will > > > > > be reported: > > > > > > > > > > “Error initializing quota context in support library: > > > > > Block bitmap checksum does not match bitmap” > > > > > > > > > > But loading quotas comes before checking the current superblock for the > > > > > EXT2_ERROR_FS flag, which makes it impossible to use e2fsck to repair any > > > > > image that contains orphan inodes and has the wrong bbitmap checksum. > > > > > So delaying quota loading until after the EXT2_ERROR_FS judgment avoids > > > > > the above problem. > > > > > > > > > > Signed-off-by: Baokun Li <libaokun1@huawei.com> > > > > This certainly looks better but I wonder if there still isn't a problem if > > > > the bitmap checksums are wrong but EXT2_ERROR_FS is not set. Shouldn't we > > > > rather move the initialization of the quota files after the call to > > > > e2fsck_read_bitmaps()? > > > > > > > > Honza > > > When the bitmap checksums are wrong but EXT2_ERROR_FS is not set, we must > > > have lost some data (error flag or group descriptor or bitmap), so there > > > is something wrong with the kernel at this time, so I don't think we > > > should fix the image directly, but rather let the user realize that > > > something is wrong with the filesystem logic. > > I agree it means there is a problem somewhere (the storage, the kernel, or > > similar). But just ignoring bitmap checksums in release_orphan_inodes() is > > exactly how e2fsck behaves on filesystems without quota feature so I see no > > reason for quota feature to change that because the inconsistency has > > nothing to do with quotas... > > > > > Moreover, if we don't care how this happened, but just want to fix the > > > image, we only need to run "e2fsck -a" twice. After merging in the > > > current patch, we always empty the orphan list before loading the quotas, > > > and EXT2_ERROR_FS is set when loading the quotas fails, so this will be > > > fixed the second time you run e2fsck. It will not happen that every > > > e2fsck will fail like it did before. > > I see, you're right so it isn't as bad as I originally thought but still my > > argument above holds - IMO e2fsck should treat wrong bitmap checksums the > > same way with and without the quota feature. > > > > Honza > The original flow that went wrong here is as follows: > e2fsck > e2fsck_run_ext3_journal > check_super_block > release_orphan_inodes > e2fsck_read_all_quotas > quota_read_all_dquots > quota_file_open > ext2fs_read_bitmaps > ext2fs_rw_bitmaps > read_bitmaps_range > read_bitmaps_range_start > ext2fs_block_bitmap_csum_verify > !!! error > e2fsck_run > > Yes, the inconsistency has nothing to do with quota, but quota is loaded > here to keep track of space changes during the normal processing of > orphan list. If quota was not loaded, we would not have read and check > bitmaps until Pass5, and we had already done a lot of checking and > tweaking of inodes, blocks, and dirs before Pass5, and the bitmaps > inconsistency may have been fixed during that time. This is not true. release_orphan_inodes() calls e2fsck_read_bitmaps() which loads all the bitmaps while ignoring checksum failures. This is needed so that blocks released during orphan cleanup are properly tracked as free. All I want to do is to move the call to e2fsck_read_all_quotas() a bit further than you moved it to a place after the e2fsck_read_bitmaps() call... Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] e2fsck: delay quotas loading in release_orphan_inodes() 2023-08-24 19:37 ` Jan Kara @ 2023-08-25 2:08 ` Baokun Li 2023-08-25 12:50 ` Jan Kara 0 siblings, 1 reply; 11+ messages in thread From: Baokun Li @ 2023-08-25 2:08 UTC (permalink / raw) To: Jan Kara Cc: linux-ext4, tytso, adilger.kernel, darrick.wong, yi.zhang, yangerkun, yukuai3, Baokun Li On 2023/8/25 3:37, Jan Kara wrote: > On Thu 24-08-23 20:56:14, Baokun Li wrote: >> On 2023/8/24 18:08, Jan Kara wrote: >>> On Thu 24-08-23 10:27:46, Baokun Li wrote: >>>> Hello, Jan! >>>> >>>> On 2023/8/24 1:05, Jan Kara wrote: >>>>> On Thu 17-08-23 16:18:28, Baokun Li wrote: >>>>>> After 7d79b40b ("e2fsck: adjust quota counters when clearing orphaned >>>>>> inodes"), we load all the quotas before we process the orphaned inodes, >>>>>> and when we load the quotas, we check the checsum of the bbitmap for each >>>>>> group. If one of the bbitmap checksums is wrong, the following error will >>>>>> be reported: >>>>>> >>>>>> “Error initializing quota context in support library: >>>>>> Block bitmap checksum does not match bitmap” >>>>>> >>>>>> But loading quotas comes before checking the current superblock for the >>>>>> EXT2_ERROR_FS flag, which makes it impossible to use e2fsck to repair any >>>>>> image that contains orphan inodes and has the wrong bbitmap checksum. >>>>>> So delaying quota loading until after the EXT2_ERROR_FS judgment avoids >>>>>> the above problem. >>>>>> >>>>>> Signed-off-by: Baokun Li <libaokun1@huawei.com> >>>>> This certainly looks better but I wonder if there still isn't a problem if >>>>> the bitmap checksums are wrong but EXT2_ERROR_FS is not set. Shouldn't we >>>>> rather move the initialization of the quota files after the call to >>>>> e2fsck_read_bitmaps()? >>>>> >>>>> Honza >>>> When the bitmap checksums are wrong but EXT2_ERROR_FS is not set, we must >>>> have lost some data (error flag or group descriptor or bitmap), so there >>>> is something wrong with the kernel at this time, so I don't think we >>>> should fix the image directly, but rather let the user realize that >>>> something is wrong with the filesystem logic. >>> I agree it means there is a problem somewhere (the storage, the kernel, or >>> similar). But just ignoring bitmap checksums in release_orphan_inodes() is >>> exactly how e2fsck behaves on filesystems without quota feature so I see no >>> reason for quota feature to change that because the inconsistency has >>> nothing to do with quotas... >>> >>>> Moreover, if we don't care how this happened, but just want to fix the >>>> image, we only need to run "e2fsck -a" twice. After merging in the >>>> current patch, we always empty the orphan list before loading the quotas, >>>> and EXT2_ERROR_FS is set when loading the quotas fails, so this will be >>>> fixed the second time you run e2fsck. It will not happen that every >>>> e2fsck will fail like it did before. >>> I see, you're right so it isn't as bad as I originally thought but still my >>> argument above holds - IMO e2fsck should treat wrong bitmap checksums the >>> same way with and without the quota feature. >>> >>> Honza >> The original flow that went wrong here is as follows: >> e2fsck >> e2fsck_run_ext3_journal >> check_super_block >> release_orphan_inodes >> e2fsck_read_all_quotas >> quota_read_all_dquots >> quota_file_open >> ext2fs_read_bitmaps >> ext2fs_rw_bitmaps >> read_bitmaps_range >> read_bitmaps_range_start >> ext2fs_block_bitmap_csum_verify >> !!! error >> e2fsck_run >> >> Yes, the inconsistency has nothing to do with quota, but quota is loaded >> here to keep track of space changes during the normal processing of >> orphan list. If quota was not loaded, we would not have read and check >> bitmaps until Pass5, and we had already done a lot of checking and >> tweaking of inodes, blocks, and dirs before Pass5, and the bitmaps >> inconsistency may have been fixed during that time. > This is not true. release_orphan_inodes() calls e2fsck_read_bitmaps() which > loads all the bitmaps while ignoring checksum failures. This is needed so > that blocks released during orphan cleanup are properly tracked as free. > All I want to do is to move the call to e2fsck_read_all_quotas() a bit > further than you moved it to a place after the e2fsck_read_bitmaps() > call... > > Honza Yes, e2fsck_read_bitmaps() ignores checksum errors for reading bitmaps, which prevents us from exiting e2fsck due to checksum error in release_orphan_inodes(), but in the case of the previously mentioned checksum error but EXT2_ERROR_FS is not set, when we execute "e2fsck -a", since checksum is ignored, the filesystem is considered clean, so it exits e2fsck without performing a force check, but the error is still there. Baokun -- With Best Regards, Baokun Li . ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] e2fsck: delay quotas loading in release_orphan_inodes() 2023-08-25 2:08 ` Baokun Li @ 2023-08-25 12:50 ` Jan Kara 2023-08-25 13:15 ` Baokun Li 0 siblings, 1 reply; 11+ messages in thread From: Jan Kara @ 2023-08-25 12:50 UTC (permalink / raw) To: Baokun Li Cc: Jan Kara, linux-ext4, tytso, adilger.kernel, darrick.wong, yi.zhang, yangerkun, yukuai3 On Fri 25-08-23 10:08:17, Baokun Li wrote: > On 2023/8/25 3:37, Jan Kara wrote: > > On Thu 24-08-23 20:56:14, Baokun Li wrote: > > > On 2023/8/24 18:08, Jan Kara wrote: > > > > On Thu 24-08-23 10:27:46, Baokun Li wrote: > > > > > Hello, Jan! > > > > > > > > > > On 2023/8/24 1:05, Jan Kara wrote: > > > > > > On Thu 17-08-23 16:18:28, Baokun Li wrote: > > > > > > > After 7d79b40b ("e2fsck: adjust quota counters when clearing orphaned > > > > > > > inodes"), we load all the quotas before we process the orphaned inodes, > > > > > > > and when we load the quotas, we check the checsum of the bbitmap for each > > > > > > > group. If one of the bbitmap checksums is wrong, the following error will > > > > > > > be reported: > > > > > > > > > > > > > > “Error initializing quota context in support library: > > > > > > > Block bitmap checksum does not match bitmap” > > > > > > > > > > > > > > But loading quotas comes before checking the current superblock for the > > > > > > > EXT2_ERROR_FS flag, which makes it impossible to use e2fsck to repair any > > > > > > > image that contains orphan inodes and has the wrong bbitmap checksum. > > > > > > > So delaying quota loading until after the EXT2_ERROR_FS judgment avoids > > > > > > > the above problem. > > > > > > > > > > > > > > Signed-off-by: Baokun Li <libaokun1@huawei.com> > > > > > > This certainly looks better but I wonder if there still isn't a problem if > > > > > > the bitmap checksums are wrong but EXT2_ERROR_FS is not set. Shouldn't we > > > > > > rather move the initialization of the quota files after the call to > > > > > > e2fsck_read_bitmaps()? > > > > > > > > > > > > Honza > > > > > When the bitmap checksums are wrong but EXT2_ERROR_FS is not set, we must > > > > > have lost some data (error flag or group descriptor or bitmap), so there > > > > > is something wrong with the kernel at this time, so I don't think we > > > > > should fix the image directly, but rather let the user realize that > > > > > something is wrong with the filesystem logic. > > > > I agree it means there is a problem somewhere (the storage, the kernel, or > > > > similar). But just ignoring bitmap checksums in release_orphan_inodes() is > > > > exactly how e2fsck behaves on filesystems without quota feature so I see no > > > > reason for quota feature to change that because the inconsistency has > > > > nothing to do with quotas... > > > > > > > > > Moreover, if we don't care how this happened, but just want to fix the > > > > > image, we only need to run "e2fsck -a" twice. After merging in the > > > > > current patch, we always empty the orphan list before loading the quotas, > > > > > and EXT2_ERROR_FS is set when loading the quotas fails, so this will be > > > > > fixed the second time you run e2fsck. It will not happen that every > > > > > e2fsck will fail like it did before. > > > > I see, you're right so it isn't as bad as I originally thought but still my > > > > argument above holds - IMO e2fsck should treat wrong bitmap checksums the > > > > same way with and without the quota feature. > > > > > > > > Honza > > > The original flow that went wrong here is as follows: > > > e2fsck > > > e2fsck_run_ext3_journal > > > check_super_block > > > release_orphan_inodes > > > e2fsck_read_all_quotas > > > quota_read_all_dquots > > > quota_file_open > > > ext2fs_read_bitmaps > > > ext2fs_rw_bitmaps > > > read_bitmaps_range > > > read_bitmaps_range_start > > > ext2fs_block_bitmap_csum_verify > > > !!! error > > > e2fsck_run > > > > > > Yes, the inconsistency has nothing to do with quota, but quota is loaded > > > here to keep track of space changes during the normal processing of > > > orphan list. If quota was not loaded, we would not have read and check > > > bitmaps until Pass5, and we had already done a lot of checking and > > > tweaking of inodes, blocks, and dirs before Pass5, and the bitmaps > > > inconsistency may have been fixed during that time. > > This is not true. release_orphan_inodes() calls e2fsck_read_bitmaps() which > > loads all the bitmaps while ignoring checksum failures. This is needed so > > that blocks released during orphan cleanup are properly tracked as free. > > All I want to do is to move the call to e2fsck_read_all_quotas() a bit > > further than you moved it to a place after the e2fsck_read_bitmaps() > > call... > Yes, e2fsck_read_bitmaps() ignores checksum errors for reading bitmaps, > which prevents us from exiting e2fsck due to checksum error in > release_orphan_inodes(), but in the case of the previously mentioned > checksum error but EXT2_ERROR_FS is not set, when we execute "e2fsck -a", > since checksum is ignored, the filesystem is considered clean, so it > exits e2fsck without performing a force check, but the error is still > there. Yes, and I believe that is a correct behavior because "e2fsck -a" means "don't check the filesystem unless it is required" - i.e., too long since the last check, too many mounts, or errors detected state. And if the filesystem doesn't have the quota feature, this is indeed what is going to happen. We'll happily skip the filesystem with bitmap checksum errors. So why should we complain about it when quota feature is enabled? If you think bitmap checksums should be checked by e2fsck -a, then we can have that discussion (separate patch from your quota fixup) but then it should happen regardless of the quota feature. Because doing it only with quota feature enabled is really unexpected and is going to confuse users. Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] e2fsck: delay quotas loading in release_orphan_inodes() 2023-08-25 12:50 ` Jan Kara @ 2023-08-25 13:15 ` Baokun Li 0 siblings, 0 replies; 11+ messages in thread From: Baokun Li @ 2023-08-25 13:15 UTC (permalink / raw) To: Jan Kara Cc: linux-ext4, tytso, adilger.kernel, darrick.wong, yi.zhang, yangerkun, yukuai3, Baokun Li On 2023/8/25 20:50, Jan Kara wrote: > On Fri 25-08-23 10:08:17, Baokun Li wrote: >> On 2023/8/25 3:37, Jan Kara wrote: >>> On Thu 24-08-23 20:56:14, Baokun Li wrote: >>>> On 2023/8/24 18:08, Jan Kara wrote: >>>>> On Thu 24-08-23 10:27:46, Baokun Li wrote: >>>>>> Hello, Jan! >>>>>> >>>>>> On 2023/8/24 1:05, Jan Kara wrote: >>>>>>> On Thu 17-08-23 16:18:28, Baokun Li wrote: >>>>>>>> After 7d79b40b ("e2fsck: adjust quota counters when clearing orphaned >>>>>>>> inodes"), we load all the quotas before we process the orphaned inodes, >>>>>>>> and when we load the quotas, we check the checsum of the bbitmap for each >>>>>>>> group. If one of the bbitmap checksums is wrong, the following error will >>>>>>>> be reported: >>>>>>>> >>>>>>>> “Error initializing quota context in support library: >>>>>>>> Block bitmap checksum does not match bitmap” >>>>>>>> >>>>>>>> But loading quotas comes before checking the current superblock for the >>>>>>>> EXT2_ERROR_FS flag, which makes it impossible to use e2fsck to repair any >>>>>>>> image that contains orphan inodes and has the wrong bbitmap checksum. >>>>>>>> So delaying quota loading until after the EXT2_ERROR_FS judgment avoids >>>>>>>> the above problem. >>>>>>>> >>>>>>>> Signed-off-by: Baokun Li <libaokun1@huawei.com> >>>>>>> This certainly looks better but I wonder if there still isn't a problem if >>>>>>> the bitmap checksums are wrong but EXT2_ERROR_FS is not set. Shouldn't we >>>>>>> rather move the initialization of the quota files after the call to >>>>>>> e2fsck_read_bitmaps()? >>>>>>> >>>>>>> Honza >>>>>> When the bitmap checksums are wrong but EXT2_ERROR_FS is not set, we must >>>>>> have lost some data (error flag or group descriptor or bitmap), so there >>>>>> is something wrong with the kernel at this time, so I don't think we >>>>>> should fix the image directly, but rather let the user realize that >>>>>> something is wrong with the filesystem logic. >>>>> I agree it means there is a problem somewhere (the storage, the kernel, or >>>>> similar). But just ignoring bitmap checksums in release_orphan_inodes() is >>>>> exactly how e2fsck behaves on filesystems without quota feature so I see no >>>>> reason for quota feature to change that because the inconsistency has >>>>> nothing to do with quotas... >>>>> >>>>>> Moreover, if we don't care how this happened, but just want to fix the >>>>>> image, we only need to run "e2fsck -a" twice. After merging in the >>>>>> current patch, we always empty the orphan list before loading the quotas, >>>>>> and EXT2_ERROR_FS is set when loading the quotas fails, so this will be >>>>>> fixed the second time you run e2fsck. It will not happen that every >>>>>> e2fsck will fail like it did before. >>>>> I see, you're right so it isn't as bad as I originally thought but still my >>>>> argument above holds - IMO e2fsck should treat wrong bitmap checksums the >>>>> same way with and without the quota feature. >>>>> >>>>> Honza >>>> The original flow that went wrong here is as follows: >>>> e2fsck >>>> e2fsck_run_ext3_journal >>>> check_super_block >>>> release_orphan_inodes >>>> e2fsck_read_all_quotas >>>> quota_read_all_dquots >>>> quota_file_open >>>> ext2fs_read_bitmaps >>>> ext2fs_rw_bitmaps >>>> read_bitmaps_range >>>> read_bitmaps_range_start >>>> ext2fs_block_bitmap_csum_verify >>>> !!! error >>>> e2fsck_run >>>> >>>> Yes, the inconsistency has nothing to do with quota, but quota is loaded >>>> here to keep track of space changes during the normal processing of >>>> orphan list. If quota was not loaded, we would not have read and check >>>> bitmaps until Pass5, and we had already done a lot of checking and >>>> tweaking of inodes, blocks, and dirs before Pass5, and the bitmaps >>>> inconsistency may have been fixed during that time. >>> This is not true. release_orphan_inodes() calls e2fsck_read_bitmaps() which >>> loads all the bitmaps while ignoring checksum failures. This is needed so >>> that blocks released during orphan cleanup are properly tracked as free. >>> All I want to do is to move the call to e2fsck_read_all_quotas() a bit >>> further than you moved it to a place after the e2fsck_read_bitmaps() >>> call... >> Yes, e2fsck_read_bitmaps() ignores checksum errors for reading bitmaps, >> which prevents us from exiting e2fsck due to checksum error in >> release_orphan_inodes(), but in the case of the previously mentioned >> checksum error but EXT2_ERROR_FS is not set, when we execute "e2fsck -a", >> since checksum is ignored, the filesystem is considered clean, so it >> exits e2fsck without performing a force check, but the error is still >> there. > Yes, and I believe that is a correct behavior because "e2fsck -a" means > "don't check the filesystem unless it is required" - i.e., too long since the > last check, too many mounts, or errors detected state. And if the > filesystem doesn't have the quota feature, this is indeed what is going to > happen. We'll happily skip the filesystem with bitmap checksum errors. So > why should we complain about it when quota feature is enabled? > > If you think bitmap checksums should be checked by e2fsck -a, then we can > have that discussion (separate patch from your quota fixup) but then it > should happen regardless of the quota feature. Because doing it only with > quota feature enabled is really unexpected and is going to confuse users. > > Honza Got it! We didn't care about bitmap checksum errors before Pass5, so it's as expected to ignore them! Thank you for your patient explanation! I'll update the patch with your suggestion in the next version! Thanks! -- With Best Regards, Baokun Li . ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] e2fsck: delay quotas loading in release_orphan_inodes() 2023-08-24 2:27 ` Baokun Li 2023-08-24 10:08 ` Jan Kara @ 2023-08-24 19:49 ` Andreas Dilger 2023-08-25 1:20 ` Baokun Li 1 sibling, 1 reply; 11+ messages in thread From: Andreas Dilger @ 2023-08-24 19:49 UTC (permalink / raw) To: Baokun Li Cc: Jan Kara, Ext4 Developers List, Theodore Ts'o, Darrick J. Wong, Zhang Yi, yangerkun, yukuai3 [-- Attachment #1: Type: text/plain, Size: 2250 bytes --] On Aug 23, 2023, at 8:27 PM, Baokun Li <libaokun1@huawei.com> wrote: > On 2023/8/24 1:05, Jan Kara wrote: >> On Thu 17-08-23 16:18:28, Baokun Li wrote: >>> After 7d79b40b ("e2fsck: adjust quota counters when clearing orphaned >>> inodes"), we load all the quotas before we process the orphaned inodes, >>> and when we load the quotas, we check the checsum of the bbitmap for each >>> group. If one of the bbitmap checksums is wrong, the following error will >>> be reported: >>> >>> “Error initializing quota context in support library: >>> Block bitmap checksum does not match bitmap” >>> >>> But loading quotas comes before checking the current superblock for the >>> EXT2_ERROR_FS flag, which makes it impossible to use e2fsck to repair any >>> image that contains orphan inodes and has the wrong bbitmap checksum. >>> So delaying quota loading until after the EXT2_ERROR_FS judgment avoids >>> the above problem. >>> >>> Signed-off-by: Baokun Li <libaokun1@huawei.com> >> This certainly looks better but I wonder if there still isn't a problem if >> the bitmap checksums are wrong but EXT2_ERROR_FS is not set. Shouldn't we >> rather move the initialization of the quota files after the call to >> e2fsck_read_bitmaps()? > > When the bitmap checksums are wrong but EXT2_ERROR_FS is not set, we must > have lost some data (error flag or group descriptor or bitmap), so there is > something wrong with the kernel at this time, so I don't think we should > fix the image directly, but rather let the user realize that something is > wrong with the filesystem logic. > > Moreover, if we don't care how this happened, but just want to fix the image, > we only need to run "e2fsck -a" twice. After merging in the current patch, we > always empty the orphan list before loading the quotas, and EXT2_ERROR_FS > is set when loading the quotas fails, so this will be fixed the second time > you run e2fsck. It will not happen that every e2fsck will fail like it did > before. I recall that Ted prefers e2fsck to fix everything in a single pass, or at worst if a fatal problem hit during the run it should restart itself so that it will fix all of the problems before exiting. Cheers, Andreas [-- Attachment #2: Message signed with OpenPGP --] [-- Type: application/pgp-signature, Size: 873 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] e2fsck: delay quotas loading in release_orphan_inodes() 2023-08-24 19:49 ` Andreas Dilger @ 2023-08-25 1:20 ` Baokun Li 0 siblings, 0 replies; 11+ messages in thread From: Baokun Li @ 2023-08-25 1:20 UTC (permalink / raw) To: Andreas Dilger Cc: Jan Kara, Ext4 Developers List, Theodore Ts'o, Darrick J. Wong, Zhang Yi, yangerkun, yukuai3, Baokun Li On 2023/8/25 3:49, Andreas Dilger wrote: > On Aug 23, 2023, at 8:27 PM, Baokun Li <libaokun1@huawei.com> wrote: >> On 2023/8/24 1:05, Jan Kara wrote: >>> On Thu 17-08-23 16:18:28, Baokun Li wrote: >>>> After 7d79b40b ("e2fsck: adjust quota counters when clearing orphaned >>>> inodes"), we load all the quotas before we process the orphaned inodes, >>>> and when we load the quotas, we check the checsum of the bbitmap for each >>>> group. If one of the bbitmap checksums is wrong, the following error will >>>> be reported: >>>> >>>> “Error initializing quota context in support library: >>>> Block bitmap checksum does not match bitmap” >>>> >>>> But loading quotas comes before checking the current superblock for the >>>> EXT2_ERROR_FS flag, which makes it impossible to use e2fsck to repair any >>>> image that contains orphan inodes and has the wrong bbitmap checksum. >>>> So delaying quota loading until after the EXT2_ERROR_FS judgment avoids >>>> the above problem. >>>> >>>> Signed-off-by: Baokun Li <libaokun1@huawei.com> >>> This certainly looks better but I wonder if there still isn't a problem if >>> the bitmap checksums are wrong but EXT2_ERROR_FS is not set. Shouldn't we >>> rather move the initialization of the quota files after the call to >>> e2fsck_read_bitmaps()? >> When the bitmap checksums are wrong but EXT2_ERROR_FS is not set, we must >> have lost some data (error flag or group descriptor or bitmap), so there is >> something wrong with the kernel at this time, so I don't think we should >> fix the image directly, but rather let the user realize that something is >> wrong with the filesystem logic. >> >> Moreover, if we don't care how this happened, but just want to fix the image, >> we only need to run "e2fsck -a" twice. After merging in the current patch, we >> always empty the orphan list before loading the quotas, and EXT2_ERROR_FS >> is set when loading the quotas fails, so this will be fixed the second time >> you run e2fsck. It will not happen that every e2fsck will fail like it did >> before. > I recall that Ted prefers e2fsck to fix everything in a single pass, or at > worst if a fatal problem hit during the run it should restart itself so > that it will fix all of the problems before exiting. > > Cheers, Andreas > > As mentioned earlier, when a bitmap checksums error occurs but EXT2_ERROR_FS is not set, something may be wrong, so we should stop and tell the developer what may be wrong, rather than just fixing it to hide the possible problem, which can make it harder to locate some of the issues. Cheers! -- With Best Regards, Baokun Li . ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2023-08-25 13:15 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-08-17 8:18 [PATCH] e2fsck: delay quotas loading in release_orphan_inodes() Baokun Li 2023-08-23 17:05 ` Jan Kara 2023-08-24 2:27 ` Baokun Li 2023-08-24 10:08 ` Jan Kara 2023-08-24 12:56 ` Baokun Li 2023-08-24 19:37 ` Jan Kara 2023-08-25 2:08 ` Baokun Li 2023-08-25 12:50 ` Jan Kara 2023-08-25 13:15 ` Baokun Li 2023-08-24 19:49 ` Andreas Dilger 2023-08-25 1:20 ` Baokun Li
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox