[PATCH] ocfs2: fix unexpected zeroing of virtual disk

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] ocfs2: fix unexpected zeroing of virtual disk
@ 2024-08-15  9:21 Chi Zhiling
  2024-08-18 10:31 ` Heming Zhao
  0 siblings, 1 reply; 4+ messages in thread
From: Chi Zhiling @ 2024-08-15  9:21 UTC (permalink / raw)
  To: mark, jlbec, joseph.qi
  Cc: ocfs2-devel, linux-kernel, starzhangzsd, Chi Zhiling, Shida Zhang

From: Chi Zhiling <chizhiling@kylinos.cn>

In a guest virtual machine, we found that there is unexpected data
zeroing problem detected occassionly:

XFS (vdb): Mounting V5 Filesystem
XFS (vdb): Ending clean mount
XFS (vdb): Metadata CRC error detected at xfs_refcountbt_read_verify+0x2c/0xf0, xfs_refcountbt block 0x200028
XFS (vdb): Unmount and run xfs_repair
XFS (vdb): First 128 bytes of corrupted metadata buffer:
00000000e0cd2f5e: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00000000cafd57f5: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00000000d0298d7d: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00000000f0698484: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00000000adb789a7: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
000000005292b878: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00000000885b4700: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00000000fd4b4df7: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
XFS (vdb): metadata I/O error in "xfs_trans_read_buf_map" at daddr 0x200028 len 8 error 74
XFS (vdb): Error -117 recovering leftover CoW allocations.
XFS (vdb): xfs_do_force_shutdown(0x8) called from line 994 of file fs/xfs/xfs_mount.c.  Return address = 000000003a53523a
XFS (vdb): Corruption of in-memory data detected.  Shutting down filesystem
XFS (vdb): Please umount the filesystem and rectify the problem(s)

It turns out that the root cause is from the physical host machine.
More specifically, it is caused by the ocfs2.

when the page_size is 64k, the block should advance by 16 each time
instead of 1.
This will lead to a wrong mapping from the page to the disk, which
will zero some adjacent part of the disk.

Suggested-by: Shida Zhang <zhangshida@kylinos.cn>
Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>
---
 fs/ocfs2/aops.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index d6c985cc6353..1fea43c33b6b 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -1187,7 +1187,7 @@ static int ocfs2_write_cluster(struct address_space *mapping,
 
 		/* This is the direct io target page. */
 		if (wc->w_pages[i] == NULL) {
-			p_blkno++;
+			p_blkno += (1 << (PAGE_SHIFT - inode->i_sb->s_blocksize_bits));
 			continue;
 		}
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] ocfs2: fix unexpected zeroing of virtual disk
  2024-08-15  9:21 [PATCH] ocfs2: fix unexpected zeroing of virtual disk Chi Zhiling
@ 2024-08-18 10:31 ` Heming Zhao
  2024-08-19  2:32   ` Joseph Qi
  0 siblings, 1 reply; 4+ messages in thread
From: Heming Zhao @ 2024-08-18 10:31 UTC (permalink / raw)
  To: Chi Zhiling, mark, jlbec, joseph.qi
  Cc: ocfs2-devel, linux-kernel, starzhangzsd, Chi Zhiling, Shida Zhang

On 8/15/24 17:21, Chi Zhiling wrote:
> From: Chi Zhiling <chizhiling@kylinos.cn>
> 
> In a guest virtual machine, we found that there is unexpected data
> zeroing problem detected occassionly:
> 
> XFS (vdb): Mounting V5 Filesystem
> XFS (vdb): Ending clean mount
> XFS (vdb): Metadata CRC error detected at xfs_refcountbt_read_verify+0x2c/0xf0, xfs_refcountbt block 0x200028
> XFS (vdb): Unmount and run xfs_repair
> XFS (vdb): First 128 bytes of corrupted metadata buffer:
> 00000000e0cd2f5e: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> 00000000cafd57f5: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> 00000000d0298d7d: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> 00000000f0698484: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> 00000000adb789a7: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> 000000005292b878: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> 00000000885b4700: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> 00000000fd4b4df7: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> XFS (vdb): metadata I/O error in "xfs_trans_read_buf_map" at daddr 0x200028 len 8 error 74
> XFS (vdb): Error -117 recovering leftover CoW allocations.
> XFS (vdb): xfs_do_force_shutdown(0x8) called from line 994 of file fs/xfs/xfs_mount.c.  Return address = 000000003a53523a
> XFS (vdb): Corruption of in-memory data detected.  Shutting down filesystem
> XFS (vdb): Please umount the filesystem and rectify the problem(s)
> 
> It turns out that the root cause is from the physical host machine.
> More specifically, it is caused by the ocfs2.
> 
> when the page_size is 64k, the block should advance by 16 each time
> instead of 1.
> This will lead to a wrong mapping from the page to the disk, which
> will zero some adjacent part of the disk.
> 
> Suggested-by: Shida Zhang <zhangshida@kylinos.cn>
> Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>
> ---
>   fs/ocfs2/aops.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
> index d6c985cc6353..1fea43c33b6b 100644
> --- a/fs/ocfs2/aops.c
> +++ b/fs/ocfs2/aops.c
> @@ -1187,7 +1187,7 @@ static int ocfs2_write_cluster(struct address_space *mapping,
>   
>   		/* This is the direct io target page. */
>   		if (wc->w_pages[i] == NULL) {
> -			p_blkno++;
> +			p_blkno += (1 << (PAGE_SHIFT - inode->i_sb->s_blocksize_bits));
>   			continue;
>   		}
>   

Looks good to me.
Signed-off-by: Heming Zhao <heming.zhao@suse.com>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] ocfs2: fix unexpected zeroing of virtual disk
  2024-08-18 10:31 ` Heming Zhao
@ 2024-08-19  2:32   ` Joseph Qi
  2024-08-19  2:40     ` Heming Zhao
  0 siblings, 1 reply; 4+ messages in thread
From: Joseph Qi @ 2024-08-19  2:32 UTC (permalink / raw)
  To: Heming Zhao, Chi Zhiling, akpm
  Cc: ocfs2-devel, linux-kernel, starzhangzsd, Chi Zhiling, Shida Zhang,
	Mark Fasheh, Joel Becker

Looks good.
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>

BTW, ocfs2 hasn't been tested thoroughly under 64k page, so I'm afraid
there are other bugs when running ocfs2 under 64k page.

On 8/18/24 6:31 PM, Heming Zhao wrote:
> On 8/15/24 17:21, Chi Zhiling wrote:
>> From: Chi Zhiling <chizhiling@kylinos.cn>
>>
>> In a guest virtual machine, we found that there is unexpected data
>> zeroing problem detected occassionly:
>>
>> XFS (vdb): Mounting V5 Filesystem
>> XFS (vdb): Ending clean mount
>> XFS (vdb): Metadata CRC error detected at xfs_refcountbt_read_verify+0x2c/0xf0, xfs_refcountbt block 0x200028
>> XFS (vdb): Unmount and run xfs_repair
>> XFS (vdb): First 128 bytes of corrupted metadata buffer:
>> 00000000e0cd2f5e: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>> 00000000cafd57f5: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>> 00000000d0298d7d: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>> 00000000f0698484: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>> 00000000adb789a7: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>> 000000005292b878: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>> 00000000885b4700: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>> 00000000fd4b4df7: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>> XFS (vdb): metadata I/O error in "xfs_trans_read_buf_map" at daddr 0x200028 len 8 error 74
>> XFS (vdb): Error -117 recovering leftover CoW allocations.
>> XFS (vdb): xfs_do_force_shutdown(0x8) called from line 994 of file fs/xfs/xfs_mount.c.  Return address = 000000003a53523a
>> XFS (vdb): Corruption of in-memory data detected.  Shutting down filesystem
>> XFS (vdb): Please umount the filesystem and rectify the problem(s)
>>
>> It turns out that the root cause is from the physical host machine.
>> More specifically, it is caused by the ocfs2.
>>
>> when the page_size is 64k, the block should advance by 16 each time
>> instead of 1.
>> This will lead to a wrong mapping from the page to the disk, which
>> will zero some adjacent part of the disk.
>>
>> Suggested-by: Shida Zhang <zhangshida@kylinos.cn>
>> Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>
>> ---
>>   fs/ocfs2/aops.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
>> index d6c985cc6353..1fea43c33b6b 100644
>> --- a/fs/ocfs2/aops.c
>> +++ b/fs/ocfs2/aops.c
>> @@ -1187,7 +1187,7 @@ static int ocfs2_write_cluster(struct address_space *mapping,
>>             /* This is the direct io target page. */
>>           if (wc->w_pages[i] == NULL) {
>> -            p_blkno++;
>> +            p_blkno += (1 << (PAGE_SHIFT - inode->i_sb->s_blocksize_bits));
>>               continue;
>>           }
>>   
> 
> Looks good to me.
> Signed-off-by: Heming Zhao <heming.zhao@suse.com>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] ocfs2: fix unexpected zeroing of virtual disk
  2024-08-19  2:32   ` Joseph Qi
@ 2024-08-19  2:40     ` Heming Zhao
  0 siblings, 0 replies; 4+ messages in thread
From: Heming Zhao @ 2024-08-19  2:40 UTC (permalink / raw)
  To: Joseph Qi, Chi Zhiling, akpm
  Cc: ocfs2-devel, linux-kernel, starzhangzsd, Chi Zhiling, Shida Zhang,
	Mark Fasheh, Joel Becker

Sorry, I just realized I posted the wrong tag "Signed-off-by: Heming Zhao ...".
The correct should be Reviewed-by: Heming Zhao <heming.zhao@suse.com>.

On 8/19/24 10:32, Joseph Qi wrote:
> Looks good.
> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
> 
> BTW, ocfs2 hasn't been tested thoroughly under 64k page, so I'm afraid
> there are other bugs when running ocfs2 under 64k page.
> 
> On 8/18/24 6:31 PM, Heming Zhao wrote:
>> On 8/15/24 17:21, Chi Zhiling wrote:
>>> From: Chi Zhiling <chizhiling@kylinos.cn>
>>>
>>> In a guest virtual machine, we found that there is unexpected data
>>> zeroing problem detected occassionly:
>>>
>>> XFS (vdb): Mounting V5 Filesystem
>>> XFS (vdb): Ending clean mount
>>> XFS (vdb): Metadata CRC error detected at xfs_refcountbt_read_verify+0x2c/0xf0, xfs_refcountbt block 0x200028
>>> XFS (vdb): Unmount and run xfs_repair
>>> XFS (vdb): First 128 bytes of corrupted metadata buffer:
>>> 00000000e0cd2f5e: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>> 00000000cafd57f5: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>> 00000000d0298d7d: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>> 00000000f0698484: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>> 00000000adb789a7: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>> 000000005292b878: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>> 00000000885b4700: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>> 00000000fd4b4df7: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>> XFS (vdb): metadata I/O error in "xfs_trans_read_buf_map" at daddr 0x200028 len 8 error 74
>>> XFS (vdb): Error -117 recovering leftover CoW allocations.
>>> XFS (vdb): xfs_do_force_shutdown(0x8) called from line 994 of file fs/xfs/xfs_mount.c.  Return address = 000000003a53523a
>>> XFS (vdb): Corruption of in-memory data detected.  Shutting down filesystem
>>> XFS (vdb): Please umount the filesystem and rectify the problem(s)
>>>
>>> It turns out that the root cause is from the physical host machine.
>>> More specifically, it is caused by the ocfs2.
>>>
>>> when the page_size is 64k, the block should advance by 16 each time
>>> instead of 1.
>>> This will lead to a wrong mapping from the page to the disk, which
>>> will zero some adjacent part of the disk.
>>>
>>> Suggested-by: Shida Zhang <zhangshida@kylinos.cn>
>>> Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>
>>> ---
>>>    fs/ocfs2/aops.c | 2 +-
>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
>>> index d6c985cc6353..1fea43c33b6b 100644
>>> --- a/fs/ocfs2/aops.c
>>> +++ b/fs/ocfs2/aops.c
>>> @@ -1187,7 +1187,7 @@ static int ocfs2_write_cluster(struct address_space *mapping,
>>>              /* This is the direct io target page. */
>>>            if (wc->w_pages[i] == NULL) {
>>> -            p_blkno++;
>>> +            p_blkno += (1 << (PAGE_SHIFT - inode->i_sb->s_blocksize_bits));
>>>                continue;
>>>            }
>>>    
>>
>> Looks good to me.
>> Signed-off-by: Heming Zhao <heming.zhao@suse.com>


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-08-19  2:40 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-15  9:21 [PATCH] ocfs2: fix unexpected zeroing of virtual disk Chi Zhiling
2024-08-18 10:31 ` Heming Zhao
2024-08-19  2:32   ` Joseph Qi
2024-08-19  2:40     ` Heming Zhao

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.