blktests dm/002 always fails for me

linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* blktests dm/002 always fails for me
@ 2024-06-13  8:20 Christoph Hellwig
  2024-06-14  6:03 ` Shinichiro Kawasaki
  0 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2024-06-13  8:20 UTC (permalink / raw)
  To: Bryan Gurney; +Cc: Shin'ichiro Kawasaki, linux-block

Hi all,

every since the test was added dm/002 fails for me as shown below.
Does it it need fixes not in mainline yet?

dm/002 => vdb (dm-dust general functionality test)
[failed]tors
    runtime  0.044s  ...  0.049son dev dm-0, logical block 8, async page read
    --- tests/dm/002.out	2024-06-02 08:38:27.252957357 +0000
    +++ /root/blktests/results/vdb/dm/002.out.bad 2024-06-13 08:19:31.526336224 +0000
    @@ -6,5 +6,3 @@
     dust_clear_badblocks: badblocks cleared
     countbadblocks: 0 badblock(s) found
     countbadblocks: 3 badblock(s) found
    -countbadblocks: 0 badblock(s) found
    -Test complete
     modprobe: FATAL: Module dm_dust is in use.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: blktests dm/002 always fails for me
  2024-06-13  8:20 blktests dm/002 always fails for me Christoph Hellwig
@ 2024-06-14  6:03 ` Shinichiro Kawasaki
  2024-06-14  6:10   ` hch
  0 siblings, 1 reply; 10+ messages in thread
From: Shinichiro Kawasaki @ 2024-06-14  6:03 UTC (permalink / raw)
  To: hch@infradead.org; +Cc: Bryan Gurney, linux-block@vger.kernel.org

On Jun 13, 2024 / 01:20, Christoph Hellwig wrote:
> Hi all,
> 
> every since the test was added dm/002 fails for me as shown below.
> Does it it need fixes not in mainline yet?

The test case always passes for me and I did not expect this failure. I'm not
aware of any changes waiting for mainline, both kernel side and blktests side.

I would like to take a closer look in the failure, but I can not recreate it on
my test systems. I wonder what is the difference between your test system and
mine. I guess kernel config difference could be the difference. Could you share
your kernel config?

> 
> dm/002 => vdb (dm-dust general functionality test)
> [failed]tors
>     runtime  0.044s  ...  0.049son dev dm-0, logical block 8, async page read
>     --- tests/dm/002.out	2024-06-02 08:38:27.252957357 +0000
>     +++ /root/blktests/results/vdb/dm/002.out.bad 2024-06-13 08:19:31.526336224 +0000
>     @@ -6,5 +6,3 @@
>      dust_clear_badblocks: badblocks cleared
>      countbadblocks: 0 badblock(s) found
>      countbadblocks: 3 badblock(s) found
>     -countbadblocks: 0 badblock(s) found
>     -Test complete
>      modprobe: FATAL: Module dm_dust is in use.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: blktests dm/002 always fails for me
  2024-06-14  6:03 ` Shinichiro Kawasaki
@ 2024-06-14  6:10   ` hch
  2024-06-24  4:58     ` Shinichiro Kawasaki
  0 siblings, 1 reply; 10+ messages in thread
From: hch @ 2024-06-14  6:10 UTC (permalink / raw)
  To: Shinichiro Kawasaki
  Cc: hch@infradead.org, Bryan Gurney, linux-block@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 347 bytes --]

On Fri, Jun 14, 2024 at 06:03:36AM +0000, Shinichiro Kawasaki wrote:
> I would like to take a closer look in the failure, but I can not recreate it on
> my test systems. I wonder what is the difference between your test system and
> mine. I guess kernel config difference could be the difference. Could you share
> your kernel config?

Attached.


[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 39549 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: blktests dm/002 always fails for me
  2024-06-14  6:10   ` hch
@ 2024-06-24  4:58     ` Shinichiro Kawasaki
  2024-06-24  8:43       ` hch
  0 siblings, 1 reply; 10+ messages in thread
From: Shinichiro Kawasaki @ 2024-06-24  4:58 UTC (permalink / raw)
  To: hch@infradead.org
  Cc: Bryan Gurney, linux-block@vger.kernel.org,
	dm-devel@lists.linux.dev

CC+: dm-devel

On Jun 13, 2024 / 23:10, hch@infradead.org wrote:
> On Fri, Jun 14, 2024 at 06:03:36AM +0000, Shinichiro Kawasaki wrote:
> > I would like to take a closer look in the failure, but I can not recreate it on
> > my test systems. I wonder what is the difference between your test system and
> > mine. I guess kernel config difference could be the difference. Could you share
> > your kernel config?
> 
> Attached.

Thank you for the kernel config. I tried dm/002 run using the kernel config on
several machines. One of the machines made dm/002 fail, but unfortunately, its
failure symptom was different [1]. I will chase it as another failure.

As for the failure you observe, I have one guess about the cause. According to
the error log out put, it looks like the dd command did not cause error, even
though the previous "dmsetup message dust1 0 enable" command enabled the I/O
error on the dm-dust device. I looked in the dm-dust code, and found that the
fail_read_on_bb flag of the struct dust_device is not guarded by any lock. So,
I guess the fail_read_on_bb flag change by the "dmsetup .. enable" command on
CPU1 did not take effect the dd on CPU2. IOW, it looks like a memory barrier
issue for me.

Based on this guess, I guess a change below may avoid the failure.

Christoph, may I ask you to see if this change avoids the failure you observe?

diff --git a/drivers/md/dm-dust.c b/drivers/md/dm-dust.c
index 1a33820c9f46..da3ebdde287a 100644
--- a/drivers/md/dm-dust.c
+++ b/drivers/md/dm-dust.c
@@ -229,6 +229,7 @@ static int dust_map(struct dm_target *ti, struct bio *bio)
 	bio_set_dev(bio, dd->dev->bdev);
 	bio->bi_iter.bi_sector = dd->start + dm_target_offset(ti, bio->bi_iter.bi_sector);
 
+	smp_rmb();
 	if (bio_data_dir(bio) == READ)
 		r = dust_map_read(dd, bio->bi_iter.bi_sector, dd->fail_read_on_bb);
 	else
@@ -433,10 +434,12 @@ static int dust_message(struct dm_target *ti, unsigned int argc, char **argv,
 		} else if (!strcasecmp(argv[0], "disable")) {
 			DMINFO("disabling read failures on bad sectors");
 			dd->fail_read_on_bb = false;
+			smp_wmb();
 			r = 0;
 		} else if (!strcasecmp(argv[0], "enable")) {
 			DMINFO("enabling read failures on bad sectors");
 			dd->fail_read_on_bb = true;
+			smp_wmb();
 			r = 0;
 		} else if (!strcasecmp(argv[0], "countbadblocks")) {
 			spin_lock_irqsave(&dd->dust_lock, flags);




[1]

dm/002 => nvme0n1 (dm-dust general functionality test)       [failed]
    runtime  0.204s  ...  0.174s
    --- tests/dm/002.out        2024-06-14 14:37:40.480794693 +0900
    +++ /home/shin/Blktests/blktests/results/nvme0n1/dm/002.out.bad     2024-06-14 21:38:18.588976499 +0900
    @@ -7,4 +7,6 @@
     countbadblocks: 0 badblock(s) found
     countbadblocks: 3 badblock(s) found
     countbadblocks: 0 badblock(s) found
    +device-mapper: remove ioctl on dust1  failed: Device or resource busy
    +Command failed.
     Test complete
modprobe: FATAL: Module dm_dust is in use.



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: blktests dm/002 always fails for me
  2024-06-24  4:58     ` Shinichiro Kawasaki
@ 2024-06-24  8:43       ` hch
  2024-06-24 11:21         ` Shinichiro Kawasaki
  0 siblings, 1 reply; 10+ messages in thread
From: hch @ 2024-06-24  8:43 UTC (permalink / raw)
  To: Shinichiro Kawasaki
  Cc: hch@infradead.org, Bryan Gurney, linux-block@vger.kernel.org,
	dm-devel@lists.linux.dev

On Mon, Jun 24, 2024 at 04:58:29AM +0000, Shinichiro Kawasaki wrote:
> Based on this guess, I guess a change below may avoid the failure.
> 
> Christoph, may I ask you to see if this change avoids the failure you observe?

Still fails in exactly the same way with that patch.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: blktests dm/002 always fails for me
  2024-06-24  8:43       ` hch
@ 2024-06-24 11:21         ` Shinichiro Kawasaki
  2024-06-24 11:46           ` hch
  0 siblings, 1 reply; 10+ messages in thread
From: Shinichiro Kawasaki @ 2024-06-24 11:21 UTC (permalink / raw)
  To: hch@infradead.org
  Cc: Bryan Gurney, linux-block@vger.kernel.org,
	dm-devel@lists.linux.dev

On Jun 24, 2024 / 01:43, hch@infradead.org wrote:
> On Mon, Jun 24, 2024 at 04:58:29AM +0000, Shinichiro Kawasaki wrote:
> > Based on this guess, I guess a change below may avoid the failure.
> > 
> > Christoph, may I ask you to see if this change avoids the failure you observe?
> 
> Still fails in exactly the same way with that patch.

Hmm, sorry for wasting your time. My guess was wrong.

I took a look in the test script and dm-dust code again, and now I think the dd
command is expected to success. The added bad blocks have default wr_fail_cnt
value 0, then write error should not happen for the dd command. (Bryan, if this
understanding is wrong, please let me know.)

So the error log that Christoph observes indicates that the dd command failed,
and this failure is unexpected. I can not think of any cause of the failure.

Christoph, may I ask you to share the kernel messages during the test run?
Also, I would like to check the dd command output. The one liner patch below
to the blktests will create resutls/vdb/dm/002.full with the dd output.

diff --git a/tests/dm/002 b/tests/dm/002
index 6635c43..ad3b6c0 100755
--- a/tests/dm/002
+++ b/tests/dm/002
@@ -30,7 +30,7 @@ test_device() {
 	dmsetup message dust1 0 addbadblock 72
 	dmsetup message dust1 0 countbadblocks
 	dmsetup message dust1 0 enable
-	dd if=/dev/zero of=/dev/mapper/dust1 bs=512 count=128 oflag=direct >/dev/null 2>&1 || return $?
+	dd if=/dev/zero of=/dev/mapper/dust1 bs=512 count=128 oflag=direct >"$FULL" 2>&1 || return $?
 	sync
 	dmsetup message dust1 0 countbadblocks
 	sync

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: blktests dm/002 always fails for me
  2024-06-24 11:21         ` Shinichiro Kawasaki
@ 2024-06-24 11:46           ` hch
  2024-06-24 13:01             ` Shinichiro Kawasaki
  0 siblings, 1 reply; 10+ messages in thread
From: hch @ 2024-06-24 11:46 UTC (permalink / raw)
  To: Shinichiro Kawasaki
  Cc: hch@infradead.org, Bryan Gurney, linux-block@vger.kernel.org,
	dm-devel@lists.linux.dev

On Mon, Jun 24, 2024 at 11:21:33AM +0000, Shinichiro Kawasaki wrote:
> I took a look in the test script and dm-dust code again, and now I think the dd
> command is expected to success. The added bad blocks have default wr_fail_cnt
> value 0, then write error should not happen for the dd command. (Bryan, if this
> understanding is wrong, please let me know.)
> 
> So the error log that Christoph observes indicates that the dd command failed,
> and this failure is unexpected. I can not think of any cause of the failure.

Yes, it does indeed fail, this is 002.full with your patch:

dd: error writing '/dev/mapper/dust1': Invalid argument
1+0 records in
0+0 records out
0 bytes copied, 0.000373943 s, 0.0 kB/s

> 
> Christoph, may I ask you to share the kernel messages during the test run?
> Also, I would like to check the dd command output. The one liner patch below
> to the blktests will create resutls/vdb/dm/002.full with the dd output.

the relevant lines of dmesg output below:

[   57.773967] run blktests dm/002 at 2024-06-24 11:43:53
[   57.791251] I/O error, dev vdb, sector 774 op 0x0:(READ) flags 0x80700 phys_seg 250 prio class 0
[   57.791849] I/O error, dev vdb, sector 520 op 0x0:(READ) flags 0x84700 phys_seg 254 prio class 0
[   57.792420] I/O error, dev vdb, sector 520 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[   57.792805] I/O error, dev vdb, sector 521 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[   57.793190] I/O error, dev vdb, sector 522 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[   57.793578] I/O error, dev vdb, sector 523 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[   57.793955] I/O error, dev vdb, sector 524 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[   57.794318] I/O error, dev vdb, sector 525 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[   57.794700] I/O error, dev vdb, sector 526 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[   57.795130] I/O error, dev vdb, sector 527 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[   57.795516] Buffer I/O error on dev dm-0, logical block 65, async page read
[   57.800743] device-mapper: dust: dust_add_block: badblock added at block 60 with write fail count 0
[   57.802587] device-mapper: dust: dust_add_block: badblock added at block 67 with write fail count 0
[   57.804359] device-mapper: dust: dust_add_block: badblock added at block 72 with write fail count 0
[   57.811253] device-mapper: dust: dust_add_block: badblock added at block 60 with write fail count 0
[   57.813065] device-mapper: dust: dust_add_block: badblock added at block 67 with write fail count 0
[   57.814786] device-mapper: dust: dust_add_block: badblock added at block 72 with write fail count 0
[   57.818023] device-mapper: dust: enabling read failures on bad sectors
[   57.826500] Buffer I/O error on dev dm-0, logical block 8, async page read

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: blktests dm/002 always fails for me
  2024-06-24 11:46           ` hch
@ 2024-06-24 13:01             ` Shinichiro Kawasaki
  2024-06-24 13:08               ` hch
  0 siblings, 1 reply; 10+ messages in thread
From: Shinichiro Kawasaki @ 2024-06-24 13:01 UTC (permalink / raw)
  To: hch@infradead.org
  Cc: Bryan Gurney, linux-block@vger.kernel.org,
	dm-devel@lists.linux.dev

On Jun 24, 2024 / 04:46, hch@infradead.org wrote:
> On Mon, Jun 24, 2024 at 11:21:33AM +0000, Shinichiro Kawasaki wrote:
> > I took a look in the test script and dm-dust code again, and now I think the dd
> > command is expected to success. The added bad blocks have default wr_fail_cnt
> > value 0, then write error should not happen for the dd command. (Bryan, if this
> > understanding is wrong, please let me know.)
> > 
> > So the error log that Christoph observes indicates that the dd command failed,
> > and this failure is unexpected. I can not think of any cause of the failure.
> 
> Yes, it does indeed fail, this is 002.full with your patch:
> 
> dd: error writing '/dev/mapper/dust1': Invalid argument
> 1+0 records in
> 0+0 records out
> 0 bytes copied, 0.000373943 s, 0.0 kB/s

Thank you for sharing the logs. The dd failed at the very first block.

> 
> > 
> > Christoph, may I ask you to share the kernel messages during the test run?
> > Also, I would like to check the dd command output. The one liner patch below
> > to the blktests will create resutls/vdb/dm/002.full with the dd output.
> 
> the relevant lines of dmesg output below:
> 
> [   57.773967] run blktests dm/002 at 2024-06-24 11:43:53
> [   57.791251] I/O error, dev vdb, sector 774 op 0x0:(READ) flags 0x80700 phys_seg 250 prio class 0
> [   57.791849] I/O error, dev vdb, sector 520 op 0x0:(READ) flags 0x84700 phys_seg 254 prio class 0
> [   57.792420] I/O error, dev vdb, sector 520 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
> [   57.792805] I/O error, dev vdb, sector 521 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
> [   57.793190] I/O error, dev vdb, sector 522 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
> [   57.793578] I/O error, dev vdb, sector 523 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
> [   57.793955] I/O error, dev vdb, sector 524 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
> [   57.794318] I/O error, dev vdb, sector 525 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
> [   57.794700] I/O error, dev vdb, sector 526 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
> [   57.795130] I/O error, dev vdb, sector 527 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
> [   57.795516] Buffer I/O error on dev dm-0, logical block 65, async page read
> [   57.800743] device-mapper: dust: dust_add_block: badblock added at block 60 with write fail count 0
> [   57.802587] device-mapper: dust: dust_add_block: badblock added at block 67 with write fail count 0
> [   57.804359] device-mapper: dust: dust_add_block: badblock added at block 72 with write fail count 0
> [   57.811253] device-mapper: dust: dust_add_block: badblock added at block 60 with write fail count 0
> [   57.813065] device-mapper: dust: dust_add_block: badblock added at block 67 with write fail count 0
> [   57.814786] device-mapper: dust: dust_add_block: badblock added at block 72 with write fail count 0
> [   57.818023] device-mapper: dust: enabling read failures on bad sectors
> [   57.826500] Buffer I/O error on dev dm-0, logical block 8, async page read

Hmm, there are many I/O errors for /dev/vdb and /dev/mapper/dust1. When the test
case succeeds, no I/O error is reported [2]. So, the next question is "why the
I/O errors happen?" My mere guess is that the TEST_DEV might have non-512 block
size, probably 4096. The test case dm/002 specifies 512 byte as the block size
for the /dev/mapper/dust1, then 4096 block size will cause I/O errors due to the
unaligned block sizes. I ran dm/002 with a null_blk with 4096 block size, and
observed very similar symptom as what Christoph observed.

Christoph, what is the output of "blockdev --getbsz /dev/vdb" on your system?
If it is not 512, could you try blktests side fix patch candidate [3]?


[2] Kernel message on success case:

[17024.868423] [ T160674] run blktests dm/002 at 2024-06-24 20:07:17
[17024.983252] [ T160697] device-mapper: dust: dust_add_block: badblock added at block 60 with write fail c
ount 0
[17024.992010] [ T160698] device-mapper: dust: dust_add_block: badblock added at block 67 with write fail c
ount 0
[17024.999706] [ T160699] device-mapper: dust: dust_add_block: badblock added at block 72 with write fail c
ount 0
[17025.029553] [ T160704] device-mapper: dust: dust_add_block: badblock added at block 60 with write fail c
ount 0
[17025.036658] [ T160705] device-mapper: dust: dust_add_block: badblock added at block 67 with write fail c
ount 0
[17025.044143] [ T160706] device-mapper: dust: dust_add_block: badblock added at block 72 with write fail c
ount 0
[17025.058206] [ T160708] device-mapper: dust: enabling read failures on bad sectors
[17025.067598] [ T160709] device-mapper: dust: block 60 removed from badblocklist by write
[17025.069738] [ T160709] device-mapper: dust: block 67 removed from badblocklist by write
[17025.071753] [ T160709] device-mapper: dust: block 72 removed from badblocklist by write


[3]

diff --git a/tests/dm/002 b/tests/dm/002
index 6635c43..f60e3bf 100755
--- a/tests/dm/002
+++ b/tests/dm/002
@@ -14,10 +14,12 @@ requires() {
 
 
 test_device() {
+	local sz bsz
 	echo "Running ${TEST_NAME}"
 
-	TEST_DEV_SZ=$(blockdev --getsz "$TEST_DEV")
-	dmsetup create dust1 --table "0 $TEST_DEV_SZ dust $TEST_DEV 0 512"
+	sz=$(blockdev --getsz "$TEST_DEV")
+	bsz=$(blockdev --getbsz "$TEST_DEV")
+	dmsetup create dust1 --table "0 $sz dust $TEST_DEV 0 $bsz"
 	dmsetup message dust1 0 addbadblock 60
 	dmsetup message dust1 0 addbadblock 67
 	dmsetup message dust1 0 addbadblock 72
@@ -30,7 +32,7 @@ test_device() {
 	dmsetup message dust1 0 addbadblock 72
 	dmsetup message dust1 0 countbadblocks
 	dmsetup message dust1 0 enable
-	dd if=/dev/zero of=/dev/mapper/dust1 bs=512 count=128 oflag=direct >/dev/null 2>&1 || return $?
+	dd if=/dev/zero of=/dev/mapper/dust1 bs=$bsz count=128 oflag=direct >$FULL 2>&1 || return $?
 	sync
 	dmsetup message dust1 0 countbadblocks
 	sync

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: blktests dm/002 always fails for me
  2024-06-24 13:01             ` Shinichiro Kawasaki
@ 2024-06-24 13:08               ` hch
  2024-06-24 13:43                 ` Shinichiro Kawasaki
  0 siblings, 1 reply; 10+ messages in thread
From: hch @ 2024-06-24 13:08 UTC (permalink / raw)
  To: Shinichiro Kawasaki
  Cc: hch@infradead.org, Bryan Gurney, linux-block@vger.kernel.org,
	dm-devel@lists.linux.dev

Yes, it is a 4k block size device.  With your patch the test passes
fine.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: blktests dm/002 always fails for me
  2024-06-24 13:08               ` hch
@ 2024-06-24 13:43                 ` Shinichiro Kawasaki
  0 siblings, 0 replies; 10+ messages in thread
From: Shinichiro Kawasaki @ 2024-06-24 13:43 UTC (permalink / raw)
  To: hch@infradead.org
  Cc: Bryan Gurney, linux-block@vger.kernel.org,
	dm-devel@lists.linux.dev

On Jun 24, 2024 / 06:08, hch@infradead.org wrote:
> Yes, it is a 4k block size device.  With your patch the test passes
> fine.

Thank you very much! Will post the formal fix patch tomorrow.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-06-24 13:43 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-13  8:20 blktests dm/002 always fails for me Christoph Hellwig
2024-06-14  6:03 ` Shinichiro Kawasaki
2024-06-14  6:10   ` hch
2024-06-24  4:58     ` Shinichiro Kawasaki
2024-06-24  8:43       ` hch
2024-06-24 11:21         ` Shinichiro Kawasaki
2024-06-24 11:46           ` hch
2024-06-24 13:01             ` Shinichiro Kawasaki
2024-06-24 13:08               ` hch
2024-06-24 13:43                 ` Shinichiro Kawasaki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).