* [bug report][stable] xfstests:generic/538 failed on xfs
@ 2019-06-05 12:21 Alvin Zheng
2019-06-05 12:42 ` gregkh
0 siblings, 1 reply; 5+ messages in thread
From: Alvin Zheng @ 2019-06-05 12:21 UTC (permalink / raw)
To: darrick.wong, axboe, gregkh, linux-block, linux-xfs; +Cc: caspar, joseph.qi
Hi,
I was using kernel v4.19.48 and found that it cannot pass the generic/538 on xfs. The error output is as follows:
FSTYP -- xfs (non-debug)
PLATFORM -- Linux/x86_64 alinux2-6 4.19.48
MKFS_OPTIONS -- -f -bsize=4096 /dev/vdc
MOUNT_OPTIONS -- /dev/vdc /mnt/testarea/scra
generic/538 0s ... - output mismatch (see /root/usr/local/src/xfstests/results//generic/538.out.bad)
--- tests/generic/538.out 2019-05-27 13:57:06.505666465 +0800
+++ /root/usr/local/src/xfstests/results//generic/538.out.bad 2019-06-05 16:43:14.702002326 +0800
@@ -1,2 +1,10 @@
QA output created by 538
+Data verification fails
+Find corruption
+00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
+*
+00000200 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
+00002000
...
(Run 'diff -u /root/usr/local/src/xfstests/tests/generic/538.out /root/usr/local/src/xfstests/results//generic/538.out.bad' to see the entire diff)
Ran: generic/538
Failures: generic/538
Failed 1 of 1 tests
I also found that the latest kernel (v5.2.0-rc2) of upstream can pass the generic/538 test. Therefore, I bisected and found the first good commit is 3110fc79606. This commit adds the hardware queue into the sort function. Besides, the sort function returns a negative value when the offset and queue (software and hardware) of two I/O requests are same. I think the second part of the change make senses. The kernel should not change the relative position of two I/O requests when their offset and queue are same. So I made the following changes and merged it into the kernel 4.19.48. After the modification, we can pass the generic/538 test on xfs. The same case can be passed on ext4, since ext4 has corresponding fix 0db24122bd7f ("ext4: fix data corruption caused by overlapping unaligned and aligned IO"). Though I think xfs should be responsible for this issue, the block layer code below is also problematic. Any ideas?
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 4e563ee..a7309cd 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1610,7 +1610,7 @@ static int plug_ctx_cmp(void *priv, struct list_head *a, struct list_head *b)
return !(rqa->mq_ctx < rqb->mq_ctx ||
(rqa->mq_ctx == rqb->mq_ctx &&
- blk_rq_pos(rqa) < blk_rq_pos(rqb)));
+ blk_rq_pos(rqa) <= blk_rq_pos(rqb)));
}
void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule)
Best regards,
Alvin
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [bug report][stable] xfstests:generic/538 failed on xfs
2019-06-05 12:21 [bug report][stable] xfstests:generic/538 failed on xfs Alvin Zheng
@ 2019-06-05 12:42 ` gregkh
2019-06-05 13:57 ` Brian Foster
2019-06-06 3:13 ` Alvin Zheng
0 siblings, 2 replies; 5+ messages in thread
From: gregkh @ 2019-06-05 12:42 UTC (permalink / raw)
To: Alvin Zheng
Cc: darrick.wong, axboe, linux-block, linux-xfs, caspar, joseph.qi
On Wed, Jun 05, 2019 at 08:21:44PM +0800, Alvin Zheng wrote:
> Hi,
> I was using kernel v4.19.48 and found that it cannot pass the generic/538 on xfs. The error output is as follows:
Has 4.19 ever been able to pass that test? If not, I wouldn't worry
about it :)
>
> FSTYP -- xfs (non-debug)
> PLATFORM -- Linux/x86_64 alinux2-6 4.19.48
> MKFS_OPTIONS -- -f -bsize=4096 /dev/vdc
> MOUNT_OPTIONS -- /dev/vdc /mnt/testarea/scra
> generic/538 0s ... - output mismatch (see /root/usr/local/src/xfstests/results//generic/538.out.bad)
> --- tests/generic/538.out 2019-05-27 13:57:06.505666465 +0800
> +++ /root/usr/local/src/xfstests/results//generic/538.out.bad 2019-06-05 16:43:14.702002326 +0800
> @@ -1,2 +1,10 @@
> QA output created by 538
> +Data verification fails
> +Find corruption
> +00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> +*
> +00000200 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
> +00002000
> ...
> (Run 'diff -u /root/usr/local/src/xfstests/tests/generic/538.out /root/usr/local/src/xfstests/results//generic/538.out.bad' to see the entire diff)
> Ran: generic/538
> Failures: generic/538
> Failed 1 of 1 tests
>
> I also found that the latest kernel (v5.2.0-rc2) of upstream can pass the generic/538 test. Therefore, I bisected and found the first good commit is 3110fc79606. This commit adds the hardware queue into the sort function. Besides, the sort function returns a negative value when the offset and queue (software and hardware) of two I/O requests are same. I think the second part of the change make senses. The kernel should not change the relative position of two I/O requests when their offset and queue are same. So I made the following changes and merged it into the kernel 4.19.48. After the modification, we can pass the generic/538 test on xfs. The same case can be passed on ext4, since ext4 has corresponding fix 0db24122bd7f ("ext4: fix data corruption caused by overlapping unaligned and aligned IO"). Though I think xfs should be responsible for this issue, the block layer code below is also problematic. Any ideas?
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 4e563ee..a7309cd 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1610,7 +1610,7 @@ static int plug_ctx_cmp(void *priv, struct list_head *a, struct list_head *b)
>
> return !(rqa->mq_ctx < rqb->mq_ctx ||
> (rqa->mq_ctx == rqb->mq_ctx &&
> - blk_rq_pos(rqa) < blk_rq_pos(rqb)));
> + blk_rq_pos(rqa) <= blk_rq_pos(rqb)));
> }
>
> void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule)
I would not like to take a patch that is not upstream, but rather take
the original commit.
Can 3110fc79606f ("blk-mq: improve plug list sorting") on its own
resolve this issue for 4.19.y?
thanks,
greg k-h
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [bug report][stable] xfstests:generic/538 failed on xfs
2019-06-05 12:42 ` gregkh
@ 2019-06-05 13:57 ` Brian Foster
2019-06-05 22:51 ` Joseph Qi
2019-06-06 3:13 ` Alvin Zheng
1 sibling, 1 reply; 5+ messages in thread
From: Brian Foster @ 2019-06-05 13:57 UTC (permalink / raw)
To: gregkh
Cc: Alvin Zheng, darrick.wong, axboe, linux-block, linux-xfs, caspar,
joseph.qi
On Wed, Jun 05, 2019 at 02:42:27PM +0200, gregkh wrote:
> On Wed, Jun 05, 2019 at 08:21:44PM +0800, Alvin Zheng wrote:
> > Hi,
> > I was using kernel v4.19.48 and found that it cannot pass the generic/538 on xfs. The error output is as follows:
>
> Has 4.19 ever been able to pass that test? If not, I wouldn't worry
> about it :)
>
FWIW, the fstests commit references the following kernel patches for
fixes in XFS and ext4:
xfs: serialize unaligned dio writes against all other dio writes
ext4: fix data corruption caused by unaligned direct AIO
It looks like both of those patches landed in 5.1.
Brian
> >
> > FSTYP -- xfs (non-debug)
> > PLATFORM -- Linux/x86_64 alinux2-6 4.19.48
> > MKFS_OPTIONS -- -f -bsize=4096 /dev/vdc
> > MOUNT_OPTIONS -- /dev/vdc /mnt/testarea/scra
> > generic/538 0s ... - output mismatch (see /root/usr/local/src/xfstests/results//generic/538.out.bad)
> > --- tests/generic/538.out 2019-05-27 13:57:06.505666465 +0800
> > +++ /root/usr/local/src/xfstests/results//generic/538.out.bad 2019-06-05 16:43:14.702002326 +0800
> > @@ -1,2 +1,10 @@
> > QA output created by 538
> > +Data verification fails
> > +Find corruption
> > +00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> > +*
> > +00000200 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
> > +00002000
> > ...
> > (Run 'diff -u /root/usr/local/src/xfstests/tests/generic/538.out /root/usr/local/src/xfstests/results//generic/538.out.bad' to see the entire diff)
> > Ran: generic/538
> > Failures: generic/538
> > Failed 1 of 1 tests
> >
> > I also found that the latest kernel (v5.2.0-rc2) of upstream can pass the generic/538 test. Therefore, I bisected and found the first good commit is 3110fc79606. This commit adds the hardware queue into the sort function. Besides, the sort function returns a negative value when the offset and queue (software and hardware) of two I/O requests are same. I think the second part of the change make senses. The kernel should not change the relative position of two I/O requests when their offset and queue are same. So I made the following changes and merged it into the kernel 4.19.48. After the modification, we can pass the generic/538 test on xfs. The same case can be passed on ext4, since ext4 has corresponding fix 0db24122bd7f ("ext4: fix data corruption caused by overlapping unaligned and aligned IO"). Though I think xfs should be responsible for this issue, the block layer code below is also problematic. Any ideas?
> >
> > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > index 4e563ee..a7309cd 100644
> > --- a/block/blk-mq.c
> > +++ b/block/blk-mq.c
> > @@ -1610,7 +1610,7 @@ static int plug_ctx_cmp(void *priv, struct list_head *a, struct list_head *b)
> >
> > return !(rqa->mq_ctx < rqb->mq_ctx ||
> > (rqa->mq_ctx == rqb->mq_ctx &&
> > - blk_rq_pos(rqa) < blk_rq_pos(rqb)));
> > + blk_rq_pos(rqa) <= blk_rq_pos(rqb)));
> > }
> >
> > void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule)
>
> I would not like to take a patch that is not upstream, but rather take
> the original commit.
>
> Can 3110fc79606f ("blk-mq: improve plug list sorting") on its own
> resolve this issue for 4.19.y?
>
> thanks,
>
> greg k-h
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [bug report][stable] xfstests:generic/538 failed on xfs
2019-06-05 13:57 ` Brian Foster
@ 2019-06-05 22:51 ` Joseph Qi
0 siblings, 0 replies; 5+ messages in thread
From: Joseph Qi @ 2019-06-05 22:51 UTC (permalink / raw)
To: Brian Foster, gregkh
Cc: Alvin Zheng, darrick.wong, axboe, linux-block, linux-xfs, caspar
On 19/6/5 21:57, Brian Foster wrote:
> On Wed, Jun 05, 2019 at 02:42:27PM +0200, gregkh wrote:
>> On Wed, Jun 05, 2019 at 08:21:44PM +0800, Alvin Zheng wrote:
>>> Hi,
>>> I was using kernel v4.19.48 and found that it cannot pass the generic/538 on xfs. The error output is as follows:
>>
>> Has 4.19 ever been able to pass that test? If not, I wouldn't worry
>> about it :)
>>
>
> FWIW, the fstests commit references the following kernel patches for
> fixes in XFS and ext4:
>
> xfs: serialize unaligned dio writes against all other dio writes
> ext4: fix data corruption caused by unaligned direct AIO
IIUC, the corresponding ext4 fix is:
ext4: fix data corruption caused by overlapping unaligned and aligned IO
It was backported in 4.19.45.
Thanks,
Joseph
>
> It looks like both of those patches landed in 5.1.
>
> Brian
>
>>>
>>> FSTYP -- xfs (non-debug)
>>> PLATFORM -- Linux/x86_64 alinux2-6 4.19.48
>>> MKFS_OPTIONS -- -f -bsize=4096 /dev/vdc
>>> MOUNT_OPTIONS -- /dev/vdc /mnt/testarea/scra
>>> generic/538 0s ... - output mismatch (see /root/usr/local/src/xfstests/results//generic/538.out.bad)
>>> --- tests/generic/538.out 2019-05-27 13:57:06.505666465 +0800
>>> +++ /root/usr/local/src/xfstests/results//generic/538.out.bad 2019-06-05 16:43:14.702002326 +0800
>>> @@ -1,2 +1,10 @@
>>> QA output created by 538
>>> +Data verification fails
>>> +Find corruption
>>> +00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
>>> +*
>>> +00000200 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
>>> +00002000
>>> ...
>>> (Run 'diff -u /root/usr/local/src/xfstests/tests/generic/538.out /root/usr/local/src/xfstests/results//generic/538.out.bad' to see the entire diff)
>>> Ran: generic/538
>>> Failures: generic/538
>>> Failed 1 of 1 tests
>>>
>>> I also found that the latest kernel (v5.2.0-rc2) of upstream can pass the generic/538 test. Therefore, I bisected and found the first good commit is 3110fc79606. This commit adds the hardware queue into the sort function. Besides, the sort function returns a negative value when the offset and queue (software and hardware) of two I/O requests are same. I think the second part of the change make senses. The kernel should not change the relative position of two I/O requests when their offset and queue are same. So I made the following changes and merged it into the kernel 4.19.48. After the modification, we can pass the generic/538 test on xfs. The same case can be passed on ext4, since ext4 has corresponding fix 0db24122bd7f ("ext4: fix data corruption caused by overlapping unaligned and aligned IO"). Though I think xfs should be responsible for this issue, the block layer code below is also problematic. Any ideas?
>>>
>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>> index 4e563ee..a7309cd 100644
>>> --- a/block/blk-mq.c
>>> +++ b/block/blk-mq.c
>>> @@ -1610,7 +1610,7 @@ static int plug_ctx_cmp(void *priv, struct list_head *a, struct list_head *b)
>>>
>>> return !(rqa->mq_ctx < rqb->mq_ctx ||
>>> (rqa->mq_ctx == rqb->mq_ctx &&
>>> - blk_rq_pos(rqa) < blk_rq_pos(rqb)));
>>> + blk_rq_pos(rqa) <= blk_rq_pos(rqb)));
>>> }
>>>
>>> void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule)
>>
>> I would not like to take a patch that is not upstream, but rather take
>> the original commit.
>>
>> Can 3110fc79606f ("blk-mq: improve plug list sorting") on its own
>> resolve this issue for 4.19.y?
>>
>> thanks,
>>
>> greg k-h
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [bug report][stable] xfstests:generic/538 failed on xfs
2019-06-05 12:42 ` gregkh
2019-06-05 13:57 ` Brian Foster
@ 2019-06-06 3:13 ` Alvin Zheng
1 sibling, 0 replies; 5+ messages in thread
From: Alvin Zheng @ 2019-06-06 3:13 UTC (permalink / raw)
To: gregkh; +Cc: darrick.wong, axboe, linux-block, linux-xfs, caspar, joseph.qi
The xfs patch (xfs: serialize unaligned dio writes against all other dio writes) does fix the data corruption bug of kernel 4.19 on xfs.
As for 3110fc79606f ("blk-mq: improve plug list sorting"), it happens to fix the logic error of the sort function in the block layer and it is based on the multiple maps of the blk-mq which was introduced in v5.0. Therefore, backporting this commit will introduce a lot of relevant code.
Regards,
Alvin
------------------------------------------------------------------
From:gregkh <gregkh@linuxfoundation.org>
Send Time:2019年6月5日(星期三) 20:42
To:Alvin Zheng <Alvin@linux.alibaba.com>
Cc:darrick.wong <darrick.wong@oracle.com>; axboe <axboe@kernel.dk>; linux-block <linux-block@vger.kernel.org>; linux-xfs <linux-xfs@vger.kernel.org>; caspar <caspar@linux.alibaba.com>; joseph.qi <joseph.qi@linux.alibaba.com>
Subject:Re: [bug report][stable] xfstests:generic/538 failed on xfs
On Wed, Jun 05, 2019 at 08:21:44PM +0800, Alvin Zheng wrote:
> Hi,
> I was using kernel v4.19.48 and found that it cannot pass the generic/538 on xfs. The error output is as follows:
Has 4.19 ever been able to pass that test? If not, I wouldn't worry
about it :)
>
> FSTYP -- xfs (non-debug)
> PLATFORM -- Linux/x86_64 alinux2-6 4.19.48
> MKFS_OPTIONS -- -f -bsize=4096 /dev/vdc
> MOUNT_OPTIONS -- /dev/vdc /mnt/testarea/scra
> generic/538 0s ... - output mismatch (see /root/usr/local/src/xfstests/results//generic/538.out.bad)
> --- tests/generic/538.out 2019-05-27 13:57:06.505666465 +0800
> +++ /root/usr/local/src/xfstests/results//generic/538.out.bad 2019-06-05 16:43:14.702002326 +0800
> @@ -1,2 +1,10 @@
> QA output created by 538
> +Data verification fails
> +Find corruption
> +00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> +*
> +00000200 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
> +00002000
> ...
> (Run 'diff -u /root/usr/local/src/xfstests/tests/generic/538.out /root/usr/local/src/xfstests/results//generic/538.out.bad' to see the entire diff)
> Ran: generic/538
> Failures: generic/538
> Failed 1 of 1 tests
>
> I also found that the latest kernel (v5.2.0-rc2) of upstream can pass the generic/538 test. Therefore, I bisected and found the first good commit is 3110fc79606. This commit adds the hardware queue into the sort function. Besides, the sort function returns a negative value when the offset and queue (software and hardware) of two I/O requests are same. I think the second part of the change make senses. The kernel should not change the relative position of two I/O requests when their offset and queue are same. So I made the following changes and merged it into the kernel 4.19.48. After the modification, we can pass the generic/538 test on xfs. The same case can be passed on ext4, since ext4 has corresponding fix 0db24122bd7f ("ext4: fix data corruption caused by overlapping unaligned and aligned IO"). Though I think xfs should be responsible for this issue, the block layer code below is also problematic. Any ideas?
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 4e563ee..a7309cd 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1610,7 +1610,7 @@ static int plug_ctx_cmp(void *priv, struct list_head *a, struct list_head *b)
>
> return !(rqa->mq_ctx < rqb->mq_ctx ||
> (rqa->mq_ctx == rqb->mq_ctx &&
> - blk_rq_pos(rqa) < blk_rq_pos(rqb)));
> + blk_rq_pos(rqa) <= blk_rq_pos(rqb)));
> }
>
> void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule)
I would not like to take a patch that is not upstream, but rather take
the original commit.
Can 3110fc79606f ("blk-mq: improve plug list sorting") on its own
resolve this issue for 4.19.y?
thanks,
greg k-h
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2019-06-06 3:14 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-06-05 12:21 [bug report][stable] xfstests:generic/538 failed on xfs Alvin Zheng
2019-06-05 12:42 ` gregkh
2019-06-05 13:57 ` Brian Foster
2019-06-05 22:51 ` Joseph Qi
2019-06-06 3:13 ` Alvin Zheng
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox