* [PATCH]xfs: fix mounting failed caused by sequencing problem ,in the log records
@ 2023-06-08 13:38 Wu Guanghao
2023-06-09 1:08 ` [PATCH]xfs_repair: fix the problem of repair failure caused by dirty flag being abnormally set on buffer Wu Guanghao
0 siblings, 1 reply; 8+ messages in thread
From: Wu Guanghao @ 2023-06-08 13:38 UTC (permalink / raw)
To: cem, Darrick J. Wong, linux-xfs; +Cc: louhongxiang, liuzhiqiang (I)
We found an issue where repair failed in the fault injection.
$ xfs_repair test.img
...
Phase 3 - for each AG...
- scan and clear agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
Metadata CRC error detected at 0x55a30e420c7d, xfs_bmbt block 0x51d68/0x1000
- agno = 3
Metadata CRC error detected at 0x55a30e420c7d, xfs_bmbt block 0x51d68/0x1000
btree block 0/41901 is suspect, error -74
bad magic # 0x58534c4d in inode 3306572 (data fork) bmbt block 41901
bad data fork in inode 3306572
cleared inode 3306572
...
Phase 7 - verify and correct link counts...
Metadata corruption detected at 0x55a30e420b58, xfs_bmbt block 0x51d68/0x1000
libxfs_bwrite: write verifier failed on xfs_bmbt bno 0x51d68/0x8
xfs_repair: Releasing dirty buffer to free list!
xfs_repair: Refusing to write a corrupt buffer to the data device!
xfs_repair: Lost a write to the data device!
fatal error -- File system metadata writeout failed, err=117. Re-run xfs_repair.
$ xfs_db test.img
xfs_db> inode 3306572
xfs_db> p
core.magic = 0x494e
core.mode = 0100666 // regular file
core.version = 3
core.format = 3 (btree)
...
u3.bmbt.keys[1] = [startoff]
1:[6]
u3.bmbt.ptrs[1] = 41901 // btree root
...
$ hexdump -C -n 4096 41901.img
00000000 58 53 4c 4d 00 00 00 00 00 00 01 e8 d6 f4 03 14 |XSLM............|
00000010 09 f3 a6 1b 0a 3c 45 5a 96 39 41 ac 09 2f 66 99 |.....<EZ.9A../f.|
00000020 00 00 00 00 00 05 1f fb 00 00 00 00 00 05 1d 68 |...............h|
...
The block data associated with inode 3306572 is abnormal, but check the CRC first
when reading. If the CRC check fails, badcrc will be set. Then the dirty flag
will be set on bp when badcrc is set. In the final stage of repair, the dirty bp
will be refreshed in batches. When refresh to the disk, the data in bp will be
verified. At this time, if the data verification fails, resulting in a repair
error.
After scan_bmapbt returns an error, the inode will be cleaned up. Then bp
doesn't need to set dirty flag, so that it won't trigger writeback verification
failure.
Signed-off-by: Wu Guanghao <wuguanghao3@huawei.com>
---
repair/scan.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/repair/scan.c b/repair/scan.c
index 7b720131..b5458eb8 100644
--- a/repair/scan.c
+++ b/repair/scan.c
@@ -185,7 +185,7 @@ scan_lbtree(
ASSERT(dirty == 0 || (dirty && !no_modify));
- if ((dirty || badcrc) && !no_modify) {
+ if (!err && (dirty || badcrc) && !no_modify) {
libxfs_buf_mark_dirty(bp);
libxfs_buf_relse(bp);
}
--
2.27.0
.
^ permalink raw reply related [flat|nested] 8+ messages in thread* [PATCH]xfs_repair: fix the problem of repair failure caused by dirty flag being abnormally set on buffer
2023-06-08 13:38 [PATCH]xfs: fix mounting failed caused by sequencing problem ,in the log records Wu Guanghao
@ 2023-06-09 1:08 ` Wu Guanghao
2023-06-10 2:44 ` Darrick J. Wong
0 siblings, 1 reply; 8+ messages in thread
From: Wu Guanghao @ 2023-06-09 1:08 UTC (permalink / raw)
To: cem, Darrick J. Wong, linux-xfs; +Cc: louhongxiang, liuzhiqiang (I)
We found an issue where repair failed in the fault injection.
$ xfs_repair test.img
...
Phase 3 - for each AG...
- scan and clear agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
Metadata CRC error detected at 0x55a30e420c7d, xfs_bmbt block 0x51d68/0x1000
- agno = 3
Metadata CRC error detected at 0x55a30e420c7d, xfs_bmbt block 0x51d68/0x1000
btree block 0/41901 is suspect, error -74
bad magic # 0x58534c4d in inode 3306572 (data fork) bmbt block 41901
bad data fork in inode 3306572
cleared inode 3306572
...
Phase 7 - verify and correct link counts...
Metadata corruption detected at 0x55a30e420b58, xfs_bmbt block 0x51d68/0x1000
libxfs_bwrite: write verifier failed on xfs_bmbt bno 0x51d68/0x8
xfs_repair: Releasing dirty buffer to free list!
xfs_repair: Refusing to write a corrupt buffer to the data device!
xfs_repair: Lost a write to the data device!
fatal error -- File system metadata writeout failed, err=117. Re-run xfs_repair.
$ xfs_db test.img
xfs_db> inode 3306572
xfs_db> p
core.magic = 0x494e
core.mode = 0100666 // regular file
core.version = 3
core.format = 3 (btree)
...
u3.bmbt.keys[1] = [startoff]
1:[6]
u3.bmbt.ptrs[1] = 41901 // btree root
...
$ hexdump -C -n 4096 41901.img
00000000 58 53 4c 4d 00 00 00 00 00 00 01 e8 d6 f4 03 14 |XSLM............|
00000010 09 f3 a6 1b 0a 3c 45 5a 96 39 41 ac 09 2f 66 99 |.....<EZ.9A../f.|
00000020 00 00 00 00 00 05 1f fb 00 00 00 00 00 05 1d 68 |...............h|
...
The block data associated with inode 3306572 is abnormal, but check the CRC first
when reading. If the CRC check fails, badcrc will be set. Then the dirty flag
will be set on bp when badcrc is set. In the final stage of repair, the dirty bp
will be refreshed in batches. When refresh to the disk, the data in bp will be
verified. At this time, if the data verification fails, resulting in a repair
error.
After scan_bmapbt returns an error, the inode will be cleaned up. Then bp
doesn't need to set dirty flag, so that it won't trigger writeback verification
failure.
Signed-off-by: Wu Guanghao <wuguanghao3@huawei.com>
---
repair/scan.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/repair/scan.c b/repair/scan.c
index 7b720131..b5458eb8 100644
--- a/repair/scan.c
+++ b/repair/scan.c
@@ -185,7 +185,7 @@ scan_lbtree(
ASSERT(dirty == 0 || (dirty && !no_modify));
- if ((dirty || badcrc) && !no_modify) {
+ if (!err && (dirty || badcrc) && !no_modify) {
libxfs_buf_mark_dirty(bp);
libxfs_buf_relse(bp);
}
--
2.27.0
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [PATCH]xfs_repair: fix the problem of repair failure caused by dirty flag being abnormally set on buffer
2023-06-09 1:08 ` [PATCH]xfs_repair: fix the problem of repair failure caused by dirty flag being abnormally set on buffer Wu Guanghao
@ 2023-06-10 2:44 ` Darrick J. Wong
2023-06-28 2:36 ` Wu Guanghao
0 siblings, 1 reply; 8+ messages in thread
From: Darrick J. Wong @ 2023-06-10 2:44 UTC (permalink / raw)
To: Wu Guanghao; +Cc: cem, linux-xfs, louhongxiang, liuzhiqiang (I)
On Fri, Jun 09, 2023 at 09:08:01AM +0800, Wu Guanghao wrote:
> We found an issue where repair failed in the fault injection.
>
> $ xfs_repair test.img
> ...
> Phase 3 - for each AG...
> - scan and clear agi unlinked lists...
> - process known inodes and perform inode discovery...
> - agno = 0
> - agno = 1
> - agno = 2
> Metadata CRC error detected at 0x55a30e420c7d, xfs_bmbt block 0x51d68/0x1000
> - agno = 3
> Metadata CRC error detected at 0x55a30e420c7d, xfs_bmbt block 0x51d68/0x1000
> btree block 0/41901 is suspect, error -74
> bad magic # 0x58534c4d in inode 3306572 (data fork) bmbt block 41901
> bad data fork in inode 3306572
> cleared inode 3306572
> ...
> Phase 7 - verify and correct link counts...
> Metadata corruption detected at 0x55a30e420b58, xfs_bmbt block 0x51d68/0x1000
> libxfs_bwrite: write verifier failed on xfs_bmbt bno 0x51d68/0x8
> xfs_repair: Releasing dirty buffer to free list!
> xfs_repair: Refusing to write a corrupt buffer to the data device!
> xfs_repair: Lost a write to the data device!
>
> fatal error -- File system metadata writeout failed, err=117. Re-run xfs_repair.
>
>
> $ xfs_db test.img
> xfs_db> inode 3306572
> xfs_db> p
> core.magic = 0x494e
> core.mode = 0100666 // regular file
> core.version = 3
> core.format = 3 (btree)
> ...
> u3.bmbt.keys[1] = [startoff]
> 1:[6]
> u3.bmbt.ptrs[1] = 41901 // btree root
> ...
>
> $ hexdump -C -n 4096 41901.img
> 00000000 58 53 4c 4d 00 00 00 00 00 00 01 e8 d6 f4 03 14 |XSLM............|
> 00000010 09 f3 a6 1b 0a 3c 45 5a 96 39 41 ac 09 2f 66 99 |.....<EZ.9A../f.|
> 00000020 00 00 00 00 00 05 1f fb 00 00 00 00 00 05 1d 68 |...............h|
> ...
>
> The block data associated with inode 3306572 is abnormal, but check the CRC first
> when reading. If the CRC check fails, badcrc will be set. Then the dirty flag
> will be set on bp when badcrc is set. In the final stage of repair, the dirty bp
> will be refreshed in batches. When refresh to the disk, the data in bp will be
> verified. At this time, if the data verification fails, resulting in a repair
> error.
>
> After scan_bmapbt returns an error, the inode will be cleaned up. Then bp
> doesn't need to set dirty flag, so that it won't trigger writeback verification
> failure.
>
> Signed-off-by: Wu Guanghao <wuguanghao3@huawei.com>
> ---
> repair/scan.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/repair/scan.c b/repair/scan.c
> index 7b720131..b5458eb8 100644
> --- a/repair/scan.c
> +++ b/repair/scan.c
> @@ -185,7 +185,7 @@ scan_lbtree(
>
> ASSERT(dirty == 0 || (dirty && !no_modify));
>
> - if ((dirty || badcrc) && !no_modify) {
> + if (!err && (dirty || badcrc) && !no_modify) {
> libxfs_buf_mark_dirty(bp);
> libxfs_buf_relse(bp);
Hm. So if scan_lbtree returns 1, that means that we clear the inode.
Hence there's no point in dirtying this buffer since we're going to zap
the whole inode anyway.
This looks correct to me, so
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
But that said, you could refactor this part:
if (!err && (dirty || badcrc) && !no_modify)
libxfs_buf_mark_dirty(bp);
libxfs_buf_relse(bp);
More questions: Let's say that the btree-format fork has this btree:
fork
/ | \
A B C
Are there any cases where A is corrupt enough that the write verifier
will trip but scan_lbtree/scan_bmapbt return 0?
Or, let's say that we dirty A, then scan_bmapbt decides that B is total
garbage and returns 1. Should we then mark A stale so that it doesn't
get written out unnecessarily?
Or, let's say that A is corrupt enough to trip the write verifier but
scan_lbtree/scan_bmapbt return 0; and B is corrupt enough that
scan_bmapbt returns 1. In that case, we'd need to mark A stale so that
we clear the inode and repair can complete without tripping over A or B.
Does that actually happen?
--D
> }
> --
> 2.27.0
>
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH]xfs_repair: fix the problem of repair failure caused by dirty flag being abnormally set on buffer
2023-06-10 2:44 ` Darrick J. Wong
@ 2023-06-28 2:36 ` Wu Guanghao
2023-07-13 5:20 ` Darrick J. Wong
0 siblings, 1 reply; 8+ messages in thread
From: Wu Guanghao @ 2023-06-28 2:36 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: cem, linux-xfs, louhongxiang, liuzhiqiang (I)
在 2023/6/10 10:44, Darrick J. Wong 写道:
> On Fri, Jun 09, 2023 at 09:08:01AM +0800, Wu Guanghao wrote:
>> We found an issue where repair failed in the fault injection.
>>
>> $ xfs_repair test.img
>> ...
>> Phase 3 - for each AG...
>> - scan and clear agi unlinked lists...
>> - process known inodes and perform inode discovery...
>> - agno = 0
>> - agno = 1
>> - agno = 2
>> Metadata CRC error detected at 0x55a30e420c7d, xfs_bmbt block 0x51d68/0x1000
>> - agno = 3
>> Metadata CRC error detected at 0x55a30e420c7d, xfs_bmbt block 0x51d68/0x1000
>> btree block 0/41901 is suspect, error -74
>> bad magic # 0x58534c4d in inode 3306572 (data fork) bmbt block 41901
>> bad data fork in inode 3306572
>> cleared inode 3306572
>> ...
>> Phase 7 - verify and correct link counts...
>> Metadata corruption detected at 0x55a30e420b58, xfs_bmbt block 0x51d68/0x1000
>> libxfs_bwrite: write verifier failed on xfs_bmbt bno 0x51d68/0x8
>> xfs_repair: Releasing dirty buffer to free list!
>> xfs_repair: Refusing to write a corrupt buffer to the data device!
>> xfs_repair: Lost a write to the data device!
>>
>> fatal error -- File system metadata writeout failed, err=117. Re-run xfs_repair.
>>
>>
>> $ xfs_db test.img
>> xfs_db> inode 3306572
>> xfs_db> p
>> core.magic = 0x494e
>> core.mode = 0100666 // regular file
>> core.version = 3
>> core.format = 3 (btree)
>> ...
>> u3.bmbt.keys[1] = [startoff]
>> 1:[6]
>> u3.bmbt.ptrs[1] = 41901 // btree root
>> ...
>>
>> $ hexdump -C -n 4096 41901.img
>> 00000000 58 53 4c 4d 00 00 00 00 00 00 01 e8 d6 f4 03 14 |XSLM............|
>> 00000010 09 f3 a6 1b 0a 3c 45 5a 96 39 41 ac 09 2f 66 99 |.....<EZ.9A../f.|
>> 00000020 00 00 00 00 00 05 1f fb 00 00 00 00 00 05 1d 68 |...............h|
>> ...
>>
>> The block data associated with inode 3306572 is abnormal, but check the CRC first
>> when reading. If the CRC check fails, badcrc will be set. Then the dirty flag
>> will be set on bp when badcrc is set. In the final stage of repair, the dirty bp
>> will be refreshed in batches. When refresh to the disk, the data in bp will be
>> verified. At this time, if the data verification fails, resulting in a repair
>> error.
>>
>> After scan_bmapbt returns an error, the inode will be cleaned up. Then bp
>> doesn't need to set dirty flag, so that it won't trigger writeback verification
>> failure.
>>
>> Signed-off-by: Wu Guanghao <wuguanghao3@huawei.com>
>> ---
>> repair/scan.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/repair/scan.c b/repair/scan.c
>> index 7b720131..b5458eb8 100644
>> --- a/repair/scan.c
>> +++ b/repair/scan.c
>> @@ -185,7 +185,7 @@ scan_lbtree(
>>
>> ASSERT(dirty == 0 || (dirty && !no_modify));
>>
>> - if ((dirty || badcrc) && !no_modify) {
>> + if (!err && (dirty || badcrc) && !no_modify) {
>> libxfs_buf_mark_dirty(bp);
>> libxfs_buf_relse(bp);
>
> Hm. So if scan_lbtree returns 1, that means that we clear the inode.
> Hence there's no point in dirtying this buffer since we're going to zap
> the whole inode anyway.
>
> This looks correct to me, so
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
>
> But that said, you could refactor this part:
>
> if (!err && (dirty || badcrc) && !no_modify)
> libxfs_buf_mark_dirty(bp);
> libxfs_buf_relse(bp);
>
> More questions: Let's say that the btree-format fork has this btree:
>
> fork
> / | \
> A B C
>
> Are there any cases where A is corrupt enough that the write verifier
> will trip but scan_lbtree/scan_bmapbt return 0?
>
I'm sorry for replying so late.
This situation should not exist.
scan_bmapbt() performs the following checks:
1. Check the magic of the block
2. Check the level of the block
3. Check which inode the owner of the block belongs to.
4. If it is a V5 filesystem,it will check three items: UUID, block consistency,
and which inode the block belongs to.
5. Calculate which AG the block belongs to and see the usage of the block
6. If it is a leaf node, it will check whether the number of records exceeds the maximum value
xfs_bmbt_write_verify() performs the following checks:
1. Check the magic of the block
2. If it is a V5 filesystem,it will check three items: UUID, block consistency,
and which inode the block belongs to.
3. Check if the level of the block is within the specified range
4. Check if the number of nodes in the block record exceeds the maximum limit
5. Check if the left and right nodes of the block are within the range of the file system
As can be seen from above, scan_bmapbt() checks more and in more detail than
xfs_bmbt_write_verify(). Therefore, if scan_bmapbt() returns 0,
xfs_bmbt_write_verify() will not report an error.
> Or, let's say that we dirty A, then scan_bmapbt decides that B is total
> garbage and returns 1. Should we then mark A stale so that it doesn't
> get written out unnecessarily?
>
We can allocate space in process_btinode() and pass it to scan_lbtree/scan_bmapbt.
If a bp is set as dirty, we record it. If the inode needs to be cleaned up,
we set all recorded bps as stale.However, this does not affect the repair process.
Even if no record is kept, it only adds some unnecessary data writing.
If there is nothing wrong with this,I will push V2 patch.
Thanks
Guanghao
> Or, let's say that A is corrupt enough to trip the write verifier but
> scan_lbtree/scan_bmapbt return 0; and B is corrupt enough that
> scan_bmapbt returns 1. In that case, we'd need to mark A stale so that
> we clear the inode and repair can complete without tripping over A or B.
> Does that actually happen?
>
> --D
>
>> }
>> --
>> 2.27.0
>>
> .
>
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH]xfs_repair: fix the problem of repair failure caused by dirty flag being abnormally set on buffer
2023-06-28 2:36 ` Wu Guanghao
@ 2023-07-13 5:20 ` Darrick J. Wong
2023-07-17 2:02 ` Wu Guanghao
0 siblings, 1 reply; 8+ messages in thread
From: Darrick J. Wong @ 2023-07-13 5:20 UTC (permalink / raw)
To: Wu Guanghao; +Cc: cem, linux-xfs, louhongxiang, liuzhiqiang (I)
On Wed, Jun 28, 2023 at 10:36:09AM +0800, Wu Guanghao wrote:
>
>
> 在 2023/6/10 10:44, Darrick J. Wong 写道:
> > On Fri, Jun 09, 2023 at 09:08:01AM +0800, Wu Guanghao wrote:
> >> We found an issue where repair failed in the fault injection.
> >>
> >> $ xfs_repair test.img
> >> ...
> >> Phase 3 - for each AG...
> >> - scan and clear agi unlinked lists...
> >> - process known inodes and perform inode discovery...
> >> - agno = 0
> >> - agno = 1
> >> - agno = 2
> >> Metadata CRC error detected at 0x55a30e420c7d, xfs_bmbt block 0x51d68/0x1000
> >> - agno = 3
> >> Metadata CRC error detected at 0x55a30e420c7d, xfs_bmbt block 0x51d68/0x1000
> >> btree block 0/41901 is suspect, error -74
> >> bad magic # 0x58534c4d in inode 3306572 (data fork) bmbt block 41901
> >> bad data fork in inode 3306572
> >> cleared inode 3306572
> >> ...
> >> Phase 7 - verify and correct link counts...
> >> Metadata corruption detected at 0x55a30e420b58, xfs_bmbt block 0x51d68/0x1000
> >> libxfs_bwrite: write verifier failed on xfs_bmbt bno 0x51d68/0x8
> >> xfs_repair: Releasing dirty buffer to free list!
> >> xfs_repair: Refusing to write a corrupt buffer to the data device!
> >> xfs_repair: Lost a write to the data device!
> >>
> >> fatal error -- File system metadata writeout failed, err=117. Re-run xfs_repair.
> >>
> >>
> >> $ xfs_db test.img
> >> xfs_db> inode 3306572
> >> xfs_db> p
> >> core.magic = 0x494e
> >> core.mode = 0100666 // regular file
> >> core.version = 3
> >> core.format = 3 (btree)
> >> ...
> >> u3.bmbt.keys[1] = [startoff]
> >> 1:[6]
> >> u3.bmbt.ptrs[1] = 41901 // btree root
> >> ...
> >>
> >> $ hexdump -C -n 4096 41901.img
> >> 00000000 58 53 4c 4d 00 00 00 00 00 00 01 e8 d6 f4 03 14 |XSLM............|
> >> 00000010 09 f3 a6 1b 0a 3c 45 5a 96 39 41 ac 09 2f 66 99 |.....<EZ.9A../f.|
> >> 00000020 00 00 00 00 00 05 1f fb 00 00 00 00 00 05 1d 68 |...............h|
> >> ...
> >>
> >> The block data associated with inode 3306572 is abnormal, but check the CRC first
> >> when reading. If the CRC check fails, badcrc will be set. Then the dirty flag
> >> will be set on bp when badcrc is set. In the final stage of repair, the dirty bp
> >> will be refreshed in batches. When refresh to the disk, the data in bp will be
> >> verified. At this time, if the data verification fails, resulting in a repair
> >> error.
> >>
> >> After scan_bmapbt returns an error, the inode will be cleaned up. Then bp
> >> doesn't need to set dirty flag, so that it won't trigger writeback verification
> >> failure.
> >>
> >> Signed-off-by: Wu Guanghao <wuguanghao3@huawei.com>
> >> ---
> >> repair/scan.c | 2 +-
> >> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/repair/scan.c b/repair/scan.c
> >> index 7b720131..b5458eb8 100644
> >> --- a/repair/scan.c
> >> +++ b/repair/scan.c
> >> @@ -185,7 +185,7 @@ scan_lbtree(
> >>
> >> ASSERT(dirty == 0 || (dirty && !no_modify));
> >>
> >> - if ((dirty || badcrc) && !no_modify) {
> >> + if (!err && (dirty || badcrc) && !no_modify) {
> >> libxfs_buf_mark_dirty(bp);
> >> libxfs_buf_relse(bp);
> >
> > Hm. So if scan_lbtree returns 1, that means that we clear the inode.
> > Hence there's no point in dirtying this buffer since we're going to zap
> > the whole inode anyway.
> >
> > This looks correct to me, so
> > Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> >
> > But that said, you could refactor this part:
> >
> > if (!err && (dirty || badcrc) && !no_modify)
> > libxfs_buf_mark_dirty(bp);
> > libxfs_buf_relse(bp);
> >
> > More questions: Let's say that the btree-format fork has this btree:
> >
> > fork
> > / | \
> > A B C
> >
> > Are there any cases where A is corrupt enough that the write verifier
> > will trip but scan_lbtree/scan_bmapbt return 0?
> >
> I'm sorry for replying so late.
Don't worry about it, I'm just as slow. :)
> This situation should not exist.
> scan_bmapbt() performs the following checks:
> 1. Check the magic of the block
> 2. Check the level of the block
> 3. Check which inode the owner of the block belongs to.
> 4. If it is a V5 filesystem,it will check three items: UUID, block consistency,
> and which inode the block belongs to.
> 5. Calculate which AG the block belongs to and see the usage of the block
> 6. If it is a leaf node, it will check whether the number of records exceeds the maximum value
>
> xfs_bmbt_write_verify() performs the following checks:
> 1. Check the magic of the block
> 2. If it is a V5 filesystem,it will check three items: UUID, block consistency,
> and which inode the block belongs to.
> 3. Check if the level of the block is within the specified range
> 4. Check if the number of nodes in the block record exceeds the maximum limit
> 5. Check if the left and right nodes of the block are within the range of the file system
>
> As can be seen from above, scan_bmapbt() checks more and in more detail than
> xfs_bmbt_write_verify(). Therefore, if scan_bmapbt() returns 0,
> xfs_bmbt_write_verify() will not report an error.
>
> > Or, let's say that we dirty A, then scan_bmapbt decides that B is total
> > garbage and returns 1. Should we then mark A stale so that it doesn't
> > get written out unnecessarily?
> >
> We can allocate space in process_btinode() and pass it to scan_lbtree/scan_bmapbt.
> If a bp is set as dirty, we record it. If the inode needs to be cleaned up,
> we set all recorded bps as stale.However, this does not affect the repair process.
> Even if no record is kept, it only adds some unnecessary data writing.
>
> If there is nothing wrong with this,I will push V2 patch.
Hmm. It's tempting to have scan_bmapbt put all the buffers it finds on
a list. The corrected ones would be marked dirty, the good ones just
end up on the list. If we decide to kill the bmbt we can then
invalidate all the buffers. If we keep it, then we can write the dirty
blocks.
Ugh. But that gets gross -- if the bmbt is larger than memory, we then
can end up OOMing xfs_repair. Creating an interval bitmap of fsblock
numbers visited buffers might be less bad, but who knows.
(Or I guess we could just apply this patch and see if anyone complains
about A being written after we decided to kill the bmbt due to B. ;))
--D
> Thanks
>
> Guanghao
>
> > Or, let's say that A is corrupt enough to trip the write verifier but
> > scan_lbtree/scan_bmapbt return 0; and B is corrupt enough that
> > scan_bmapbt returns 1. In that case, we'd need to mark A stale so that
> > we clear the inode and repair can complete without tripping over A or B.
> > Does that actually happen?
> >
>
> > --D
> >
> >> }
> >> --
> >> 2.27.0
> >>
> > .
> >
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH]xfs_repair: fix the problem of repair failure caused by dirty flag being abnormally set on buffer
2023-07-13 5:20 ` Darrick J. Wong
@ 2023-07-17 2:02 ` Wu Guanghao
2023-07-19 1:28 ` Darrick J. Wong
0 siblings, 1 reply; 8+ messages in thread
From: Wu Guanghao @ 2023-07-17 2:02 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: cem, linux-xfs, louhongxiang, liuzhiqiang (I)
在 2023/7/13 13:20, Darrick J. Wong 写道:
> On Wed, Jun 28, 2023 at 10:36:09AM +0800, Wu Guanghao wrote:
>>
>>
>> 在 2023/6/10 10:44, Darrick J. Wong 写道:
>>> On Fri, Jun 09, 2023 at 09:08:01AM +0800, Wu Guanghao wrote:
>>>> We found an issue where repair failed in the fault injection.
>>>>
>>>> $ xfs_repair test.img
>>>> ...
>>>> Phase 3 - for each AG...
>>>> - scan and clear agi unlinked lists...
>>>> - process known inodes and perform inode discovery...
>>>> - agno = 0
>>>> - agno = 1
>>>> - agno = 2
>>>> Metadata CRC error detected at 0x55a30e420c7d, xfs_bmbt block 0x51d68/0x1000
>>>> - agno = 3
>>>> Metadata CRC error detected at 0x55a30e420c7d, xfs_bmbt block 0x51d68/0x1000
>>>> btree block 0/41901 is suspect, error -74
>>>> bad magic # 0x58534c4d in inode 3306572 (data fork) bmbt block 41901
>>>> bad data fork in inode 3306572
>>>> cleared inode 3306572
>>>> ...
>>>> Phase 7 - verify and correct link counts...
>>>> Metadata corruption detected at 0x55a30e420b58, xfs_bmbt block 0x51d68/0x1000
>>>> libxfs_bwrite: write verifier failed on xfs_bmbt bno 0x51d68/0x8
>>>> xfs_repair: Releasing dirty buffer to free list!
>>>> xfs_repair: Refusing to write a corrupt buffer to the data device!
>>>> xfs_repair: Lost a write to the data device!
>>>>
>>>> fatal error -- File system metadata writeout failed, err=117. Re-run xfs_repair.
>>>>
>>>>
>>>> $ xfs_db test.img
>>>> xfs_db> inode 3306572
>>>> xfs_db> p
>>>> core.magic = 0x494e
>>>> core.mode = 0100666 // regular file
>>>> core.version = 3
>>>> core.format = 3 (btree)
>>>> ...
>>>> u3.bmbt.keys[1] = [startoff]
>>>> 1:[6]
>>>> u3.bmbt.ptrs[1] = 41901 // btree root
>>>> ...
>>>>
>>>> $ hexdump -C -n 4096 41901.img
>>>> 00000000 58 53 4c 4d 00 00 00 00 00 00 01 e8 d6 f4 03 14 |XSLM............|
>>>> 00000010 09 f3 a6 1b 0a 3c 45 5a 96 39 41 ac 09 2f 66 99 |.....<EZ.9A../f.|
>>>> 00000020 00 00 00 00 00 05 1f fb 00 00 00 00 00 05 1d 68 |...............h|
>>>> ...
>>>>
>>>> The block data associated with inode 3306572 is abnormal, but check the CRC first
>>>> when reading. If the CRC check fails, badcrc will be set. Then the dirty flag
>>>> will be set on bp when badcrc is set. In the final stage of repair, the dirty bp
>>>> will be refreshed in batches. When refresh to the disk, the data in bp will be
>>>> verified. At this time, if the data verification fails, resulting in a repair
>>>> error.
>>>>
>>>> After scan_bmapbt returns an error, the inode will be cleaned up. Then bp
>>>> doesn't need to set dirty flag, so that it won't trigger writeback verification
>>>> failure.
>>>>
>>>> Signed-off-by: Wu Guanghao <wuguanghao3@huawei.com>
>>>> ---
>>>> repair/scan.c | 2 +-
>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/repair/scan.c b/repair/scan.c
>>>> index 7b720131..b5458eb8 100644
>>>> --- a/repair/scan.c
>>>> +++ b/repair/scan.c
>>>> @@ -185,7 +185,7 @@ scan_lbtree(
>>>>
>>>> ASSERT(dirty == 0 || (dirty && !no_modify));
>>>>
>>>> - if ((dirty || badcrc) && !no_modify) {
>>>> + if (!err && (dirty || badcrc) && !no_modify) {
>>>> libxfs_buf_mark_dirty(bp);
>>>> libxfs_buf_relse(bp);
>>>
>>> Hm. So if scan_lbtree returns 1, that means that we clear the inode.
>>> Hence there's no point in dirtying this buffer since we're going to zap
>>> the whole inode anyway.
>>>
>>> This looks correct to me, so
>>> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
>>>
>>> But that said, you could refactor this part:
>>>
>>> if (!err && (dirty || badcrc) && !no_modify)
>>> libxfs_buf_mark_dirty(bp);
>>> libxfs_buf_relse(bp);
>>>
>>> More questions: Let's say that the btree-format fork has this btree:
>>>
>>> fork
>>> / | \
>>> A B C
>>>
>>> Are there any cases where A is corrupt enough that the write verifier
>>> will trip but scan_lbtree/scan_bmapbt return 0?
>>>
>> I'm sorry for replying so late.
>
> Don't worry about it, I'm just as slow. :)
>
>> This situation should not exist.
>> scan_bmapbt() performs the following checks:
>> 1. Check the magic of the block
>> 2. Check the level of the block
>> 3. Check which inode the owner of the block belongs to.
>> 4. If it is a V5 filesystem,it will check three items: UUID, block consistency,
>> and which inode the block belongs to.
>> 5. Calculate which AG the block belongs to and see the usage of the block
>> 6. If it is a leaf node, it will check whether the number of records exceeds the maximum value
>>
>> xfs_bmbt_write_verify() performs the following checks:
>> 1. Check the magic of the block
>> 2. If it is a V5 filesystem,it will check three items: UUID, block consistency,
>> and which inode the block belongs to.
>> 3. Check if the level of the block is within the specified range
>> 4. Check if the number of nodes in the block record exceeds the maximum limit
>> 5. Check if the left and right nodes of the block are within the range of the file system
>>
>> As can be seen from above, scan_bmapbt() checks more and in more detail than
>> xfs_bmbt_write_verify(). Therefore, if scan_bmapbt() returns 0,
>> xfs_bmbt_write_verify() will not report an error.
>>
>>> Or, let's say that we dirty A, then scan_bmapbt decides that B is total
>>> garbage and returns 1. Should we then mark A stale so that it doesn't
>>> get written out unnecessarily?
>>>
>> We can allocate space in process_btinode() and pass it to scan_lbtree/scan_bmapbt.
>> If a bp is set as dirty, we record it. If the inode needs to be cleaned up,
>> we set all recorded bps as stale.However, this does not affect the repair process.
>> Even if no record is kept, it only adds some unnecessary data writing.
>>
>> If there is nothing wrong with this,I will push V2 patch.
>
> Hmm. It's tempting to have scan_bmapbt put all the buffers it finds on
> a list. The corrected ones would be marked dirty, the good ones just
> end up on the list. If we decide to kill the bmbt we can then
> invalidate all the buffers. If we keep it, then we can write the dirty
> blocks.
>
> Ugh. But that gets gross -- if the bmbt is larger than memory, we then
> can end up OOMing xfs_repair. Creating an interval bitmap of fsblock
> numbers visited buffers might be less bad, but who knows.
>
> (Or I guess we could just apply this patch and see if anyone complains
> about A being written after we decided to kill the bmbt due to B. ;))
>
OK, I agree. Do I need to resend the patch or do something else?
> --D
>
>> Thanks
>>
>> Guanghao
>>
>>> Or, let's say that A is corrupt enough to trip the write verifier but
>>> scan_lbtree/scan_bmapbt return 0; and B is corrupt enough that
>>> scan_bmapbt returns 1. In that case, we'd need to mark A stale so that
>>> we clear the inode and repair can complete without tripping over A or B.
>>> Does that actually happen?
>>>
>>
>>> --D
>>>
>>>> }
>>>> --
>>>> 2.27.0
>>>>
>>> .
>>>
> .
>
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH]xfs_repair: fix the problem of repair failure caused by dirty flag being abnormally set on buffer
2023-07-17 2:02 ` Wu Guanghao
@ 2023-07-19 1:28 ` Darrick J. Wong
0 siblings, 0 replies; 8+ messages in thread
From: Darrick J. Wong @ 2023-07-19 1:28 UTC (permalink / raw)
To: Wu Guanghao; +Cc: cem, linux-xfs, louhongxiang, liuzhiqiang (I)
On Mon, Jul 17, 2023 at 10:02:41AM +0800, Wu Guanghao wrote:
>
>
> 在 2023/7/13 13:20, Darrick J. Wong 写道:
> > On Wed, Jun 28, 2023 at 10:36:09AM +0800, Wu Guanghao wrote:
> >>
> >>
> >> 在 2023/6/10 10:44, Darrick J. Wong 写道:
> >>> On Fri, Jun 09, 2023 at 09:08:01AM +0800, Wu Guanghao wrote:
> >>>> We found an issue where repair failed in the fault injection.
> >>>>
> >>>> $ xfs_repair test.img
> >>>> ...
> >>>> Phase 3 - for each AG...
> >>>> - scan and clear agi unlinked lists...
> >>>> - process known inodes and perform inode discovery...
> >>>> - agno = 0
> >>>> - agno = 1
> >>>> - agno = 2
> >>>> Metadata CRC error detected at 0x55a30e420c7d, xfs_bmbt block 0x51d68/0x1000
> >>>> - agno = 3
> >>>> Metadata CRC error detected at 0x55a30e420c7d, xfs_bmbt block 0x51d68/0x1000
> >>>> btree block 0/41901 is suspect, error -74
> >>>> bad magic # 0x58534c4d in inode 3306572 (data fork) bmbt block 41901
> >>>> bad data fork in inode 3306572
> >>>> cleared inode 3306572
> >>>> ...
> >>>> Phase 7 - verify and correct link counts...
> >>>> Metadata corruption detected at 0x55a30e420b58, xfs_bmbt block 0x51d68/0x1000
> >>>> libxfs_bwrite: write verifier failed on xfs_bmbt bno 0x51d68/0x8
> >>>> xfs_repair: Releasing dirty buffer to free list!
> >>>> xfs_repair: Refusing to write a corrupt buffer to the data device!
> >>>> xfs_repair: Lost a write to the data device!
> >>>>
> >>>> fatal error -- File system metadata writeout failed, err=117. Re-run xfs_repair.
> >>>>
> >>>>
> >>>> $ xfs_db test.img
> >>>> xfs_db> inode 3306572
> >>>> xfs_db> p
> >>>> core.magic = 0x494e
> >>>> core.mode = 0100666 // regular file
> >>>> core.version = 3
> >>>> core.format = 3 (btree)
> >>>> ...
> >>>> u3.bmbt.keys[1] = [startoff]
> >>>> 1:[6]
> >>>> u3.bmbt.ptrs[1] = 41901 // btree root
> >>>> ...
> >>>>
> >>>> $ hexdump -C -n 4096 41901.img
> >>>> 00000000 58 53 4c 4d 00 00 00 00 00 00 01 e8 d6 f4 03 14 |XSLM............|
> >>>> 00000010 09 f3 a6 1b 0a 3c 45 5a 96 39 41 ac 09 2f 66 99 |.....<EZ.9A../f.|
> >>>> 00000020 00 00 00 00 00 05 1f fb 00 00 00 00 00 05 1d 68 |...............h|
> >>>> ...
> >>>>
> >>>> The block data associated with inode 3306572 is abnormal, but check the CRC first
> >>>> when reading. If the CRC check fails, badcrc will be set. Then the dirty flag
> >>>> will be set on bp when badcrc is set. In the final stage of repair, the dirty bp
> >>>> will be refreshed in batches. When refresh to the disk, the data in bp will be
> >>>> verified. At this time, if the data verification fails, resulting in a repair
> >>>> error.
> >>>>
> >>>> After scan_bmapbt returns an error, the inode will be cleaned up. Then bp
> >>>> doesn't need to set dirty flag, so that it won't trigger writeback verification
> >>>> failure.
> >>>>
> >>>> Signed-off-by: Wu Guanghao <wuguanghao3@huawei.com>
> >>>> ---
> >>>> repair/scan.c | 2 +-
> >>>> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/repair/scan.c b/repair/scan.c
> >>>> index 7b720131..b5458eb8 100644
> >>>> --- a/repair/scan.c
> >>>> +++ b/repair/scan.c
> >>>> @@ -185,7 +185,7 @@ scan_lbtree(
> >>>>
> >>>> ASSERT(dirty == 0 || (dirty && !no_modify));
> >>>>
> >>>> - if ((dirty || badcrc) && !no_modify) {
> >>>> + if (!err && (dirty || badcrc) && !no_modify) {
> >>>> libxfs_buf_mark_dirty(bp);
> >>>> libxfs_buf_relse(bp);
> >>>
> >>> Hm. So if scan_lbtree returns 1, that means that we clear the inode.
> >>> Hence there's no point in dirtying this buffer since we're going to zap
> >>> the whole inode anyway.
> >>>
> >>> This looks correct to me, so
> >>> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> >>>
> >>> But that said, you could refactor this part:
> >>>
> >>> if (!err && (dirty || badcrc) && !no_modify)
> >>> libxfs_buf_mark_dirty(bp);
> >>> libxfs_buf_relse(bp);
> >>>
> >>> More questions: Let's say that the btree-format fork has this btree:
> >>>
> >>> fork
> >>> / | \
> >>> A B C
> >>>
> >>> Are there any cases where A is corrupt enough that the write verifier
> >>> will trip but scan_lbtree/scan_bmapbt return 0?
> >>>
> >> I'm sorry for replying so late.
> >
> > Don't worry about it, I'm just as slow. :)
> >
> >> This situation should not exist.
> >> scan_bmapbt() performs the following checks:
> >> 1. Check the magic of the block
> >> 2. Check the level of the block
> >> 3. Check which inode the owner of the block belongs to.
> >> 4. If it is a V5 filesystem,it will check three items: UUID, block consistency,
> >> and which inode the block belongs to.
> >> 5. Calculate which AG the block belongs to and see the usage of the block
> >> 6. If it is a leaf node, it will check whether the number of records exceeds the maximum value
> >>
> >> xfs_bmbt_write_verify() performs the following checks:
> >> 1. Check the magic of the block
> >> 2. If it is a V5 filesystem,it will check three items: UUID, block consistency,
> >> and which inode the block belongs to.
> >> 3. Check if the level of the block is within the specified range
> >> 4. Check if the number of nodes in the block record exceeds the maximum limit
> >> 5. Check if the left and right nodes of the block are within the range of the file system
> >>
> >> As can be seen from above, scan_bmapbt() checks more and in more detail than
> >> xfs_bmbt_write_verify(). Therefore, if scan_bmapbt() returns 0,
> >> xfs_bmbt_write_verify() will not report an error.
> >>
> >>> Or, let's say that we dirty A, then scan_bmapbt decides that B is total
> >>> garbage and returns 1. Should we then mark A stale so that it doesn't
> >>> get written out unnecessarily?
> >>>
> >> We can allocate space in process_btinode() and pass it to scan_lbtree/scan_bmapbt.
> >> If a bp is set as dirty, we record it. If the inode needs to be cleaned up,
> >> we set all recorded bps as stale.However, this does not affect the repair process.
> >> Even if no record is kept, it only adds some unnecessary data writing.
> >>
> >> If there is nothing wrong with this,I will push V2 patch.
> >
> > Hmm. It's tempting to have scan_bmapbt put all the buffers it finds on
> > a list. The corrected ones would be marked dirty, the good ones just
> > end up on the list. If we decide to kill the bmbt we can then
> > invalidate all the buffers. If we keep it, then we can write the dirty
> > blocks.
> >
> > Ugh. But that gets gross -- if the bmbt is larger than memory, we then
> > can end up OOMing xfs_repair. Creating an interval bitmap of fsblock
> > numbers visited buffers might be less bad, but who knows.
> >
> > (Or I guess we could just apply this patch and see if anyone complains
> > about A being written after we decided to kill the bmbt due to B. ;))
> >
>
> OK, I agree. Do I need to resend the patch or do something else?
Yeah, you might as well resend it with my RVB tag attached.
--D
> > --D
> >
> >> Thanks
> >>
> >> Guanghao
> >>
> >>> Or, let's say that A is corrupt enough to trip the write verifier but
> >>> scan_lbtree/scan_bmapbt return 0; and B is corrupt enough that
> >>> scan_bmapbt returns 1. In that case, we'd need to mark A stale so that
> >>> we clear the inode and repair can complete without tripping over A or B.
> >>> Does that actually happen?
> >>>
> >>
> >>> --D
> >>>
> >>>> }
> >>>> --
> >>>> 2.27.0
> >>>>
> >>> .
> >>>
> > .
> >
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH]xfs_repair: fix the problem of repair failure caused by dirty flag being abnormally set on buffer
@ 2023-07-26 1:43 Wu Guanghao
0 siblings, 0 replies; 8+ messages in thread
From: Wu Guanghao @ 2023-07-26 1:43 UTC (permalink / raw)
To: cem, Darrick J. Wong, linux-xfs; +Cc: louhongxiang, liuzhiqiang26
We found an issue where repair failed in the fault injection.
$ xfs_repair test.img
...
Phase 3 - for each AG...
- scan and clear agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
Metadata CRC error detected at 0x55a30e420c7d, xfs_bmbt block 0x51d68/0x1000
- agno = 3
Metadata CRC error detected at 0x55a30e420c7d, xfs_bmbt block 0x51d68/0x1000
btree block 0/41901 is suspect, error -74
bad magic # 0x58534c4d in inode 3306572 (data fork) bmbt block 41901
bad data fork in inode 3306572
cleared inode 3306572
...
Phase 7 - verify and correct link counts...
Metadata corruption detected at 0x55a30e420b58, xfs_bmbt block 0x51d68/0x1000
libxfs_bwrite: write verifier failed on xfs_bmbt bno 0x51d68/0x8
xfs_repair: Releasing dirty buffer to free list!
xfs_repair: Refusing to write a corrupt buffer to the data device!
xfs_repair: Lost a write to the data device!
fatal error -- File system metadata writeout failed, err=117. Re-run xfs_repair.
$ xfs_db test.img
xfs_db> inode 3306572
xfs_db> p
core.magic = 0x494e
core.mode = 0100666 // regular file
core.version = 3
core.format = 3 (btree)
...
u3.bmbt.keys[1] = [startoff]
1:[6]
u3.bmbt.ptrs[1] = 41901 // btree root
...
$ hexdump -C -n 4096 41901.img
00000000 58 53 4c 4d 00 00 00 00 00 00 01 e8 d6 f4 03 14 |XSLM............|
00000010 09 f3 a6 1b 0a 3c 45 5a 96 39 41 ac 09 2f 66 99 |.....<EZ.9A../f.|
00000020 00 00 00 00 00 05 1f fb 00 00 00 00 00 05 1d 68 |...............h|
...
The block data associated with inode 3306572 is abnormal, but check the CRC first
when reading. If the CRC check fails, badcrc will be set. Then the dirty flag
will be set on bp when badcrc is set. In the final stage of repair, the dirty bp
will be refreshed in batches. When refresh to the disk, the data in bp will be
verified. At this time, if the data verification fails, resulting in a repair
error.
After scan_bmapbt returns an error, the inode will be cleaned up. Then bp
doesn't need to set dirty flag, so that it won't trigger writeback verification
failure.
Signed-off-by: Wu Guanghao <wuguanghao3@huawei.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
repair/scan.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/repair/scan.c b/repair/scan.c
index 7b720131..b5458eb8 100644
--- a/repair/scan.c
+++ b/repair/scan.c
@@ -185,7 +185,7 @@ scan_lbtree(
ASSERT(dirty == 0 || (dirty && !no_modify));
- if ((dirty || badcrc) && !no_modify) {
+ if (!err && (dirty || badcrc) && !no_modify) {
libxfs_buf_mark_dirty(bp);
libxfs_buf_relse(bp);
}
--
2.27.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
end of thread, other threads:[~2023-07-26 1:43 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-08 13:38 [PATCH]xfs: fix mounting failed caused by sequencing problem ,in the log records Wu Guanghao
2023-06-09 1:08 ` [PATCH]xfs_repair: fix the problem of repair failure caused by dirty flag being abnormally set on buffer Wu Guanghao
2023-06-10 2:44 ` Darrick J. Wong
2023-06-28 2:36 ` Wu Guanghao
2023-07-13 5:20 ` Darrick J. Wong
2023-07-17 2:02 ` Wu Guanghao
2023-07-19 1:28 ` Darrick J. Wong
-- strict thread matches above, loose matches on Subject: below --
2023-07-26 1:43 Wu Guanghao
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox