Linux block layer
 help / color / mirror / Atom feed
* Re: [block] 4e366a7962: blktests.nvme.005.fail
       [not found] <20190325010409.GB6379@shao2-debian>
@ 2019-03-28 16:31 ` Johannes Thumshirn
  2019-03-28 17:48   ` Christoph Hellwig
  2019-03-29  0:33   ` Ming Lei
  0 siblings, 2 replies; 6+ messages in thread
From: Johannes Thumshirn @ 2019-03-28 16:31 UTC (permalink / raw)
  To: Linux Block Layer Mailinglist
  Cc: Hannes Reinecke, Bart Van Assche, Christoph Hellwig, Jan Kara,
	Linux FSDEVEL Mailinglist, Jens Axboe, lkp, Ming Lei

On 25/03/2019 02:04, kernel test robot wrote:
[...]
> nvme/005 (reset local loopback target)                      
>     runtime  ...
> nvme/005 (reset local loopback target)                       [failed]
>     runtime  ...  0.596s
>     something found in dmesg:
>     [   24.160182] run blktests nvme/005 at 2019-03-24 07:35:16
>     [   24.346189] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
>     [   24.373089] nvmet: creating controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:dbd85962-80d1-4872-ac0f-d0214f3b1131.
>     [   24.394247] WARNING: CPU: 1 PID: 881 at /kbuild/src/x86_64/block/blk-merge.c:608 blk_rq_map_sg+0x5f0/0x6c0
>     [   24.396082] Modules linked in: nvme_loop nvme_fabrics nvmet nvme_core loop crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ata_generic sd_mod pata_acpi sg bochs_drm ttm ppdev drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops snd_pcm drm ata_piix snd_timer aesni_intel crypto_simd snd cryptd glue_helper libata joydev soundcore pcspkr serio_raw virtio_scsi i2c_piix4 parport_pc floppy parport ip_tables
>     [   24.403009] CPU: 1 PID: 881 Comm: nvme Not tainted 5.0.0-rc6-00148-g4e366a7 #1
>     [   24.404372] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
>     [   24.405930] RIP: 0010:blk_rq_map_sg+0x5f0/0x6c0
>     [   24.406808] Code: 41 8b 91 10 01 00 00 8b 43 24 85 c2 0f 84 e3 fb ff ff f7 d0 21 d0 83 c0 01 41 01 40 0c 01 83 d0 00 00 00 e9 cd fb ff ff 0f 0b <0f> 0b e9 c3 fc ff ff 80 3d 66 f3 1d 01 00 0f 85 88 fb ff ff 48 c7
>     [   24.410274] RSP: 0000:ffffb95c409d7908 EFLAGS: 00010202
>     ...
>     (See '/lkp/benchmarks/blktests/blktests/results/nodev/nvme/005.dmesg' for the entire message)

OK we're tripping over this in blk_rq_map_sg():

607    /*
608     * Something must have been wrong if the figured number of
609     * segment is bigger than number of req's physical segments
610     */
611    WARN_ON(nsegs > blk_rq_nr_phys_segments(rq));

In all cases I could reproduce it I had nsegs == 2 and
rq->nr_phys_segments == 1 (rq->bio->bi_phys_segments == 1).

Adding this into nvme-loop "fixes" the issue but I don't think this is
the correct way to do

--- a/drivers/nvme/target/loop.c
+++ b/drivers/nvme/target/loop.c
@@ -387,6 +387,7 @@ static int nvme_loop_configure_admin_que
                goto out_free_tagset;
        }

+       blk_queue_segment_boundary(ctrl->ctrl.admin_q, PAGE_SIZE - 1);
        error = nvmf_connect_admin_queue(&ctrl->ctrl);
        if (error)
                goto out_cleanup_queue;

What raises my suspicion is this code fragment from bio_add_pc_page():

 731
 732   /* If we may be able to merge these biovecs, force a recount */
 733   if (bio->bi_vcnt > 1 && biovec_phys_mergeable(q, bvec - 1, bvec))
 734           bio->bi_phys_segments = -1;

This -1 (from this patch) would fit the one less than 'nsegs' but after
days of searching and staring at the code, I cannot find where we loose
the one. Before this patch set we would only have cleared the
BIO_SEG_VALID flag and thus we never run out of sync.

Adding a call to blk_recount_segments() doesn't do any difference here.

It can of be a read herring as well.

What makes me suspicious is the fact that I can't reliably trigger it by
running nvme/005 but have to run it in a loop ranging between several
seconds to some minutes.

Anyone any ideas? I've been staring at this for several days now but
can't find out whats wrong.

Thanks a lot,
	Johannes
-- 
Johannes Thumshirn                            SUSE Labs Filesystems
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [block] 4e366a7962: blktests.nvme.005.fail
  2019-03-28 16:31 ` [block] 4e366a7962: blktests.nvme.005.fail Johannes Thumshirn
@ 2019-03-28 17:48   ` Christoph Hellwig
  2019-03-28 17:52     ` Jens Axboe
  2019-03-29  0:33   ` Ming Lei
  1 sibling, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2019-03-28 17:48 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Linux Block Layer Mailinglist, Hannes Reinecke, Bart Van Assche,
	Christoph Hellwig, Jan Kara, Linux FSDEVEL Mailinglist,
	Jens Axboe, lkp, Ming Lei

On Thu, Mar 28, 2019 at 05:31:29PM +0100, Johannes Thumshirn wrote:
> In all cases I could reproduce it I had nsegs == 2 and
> rq->nr_phys_segments == 1 (rq->bio->bi_phys_segments == 1).

And all cases are pass through as well?

Can you check if this still shows up with the series from Ming which
switches the passthrough requests to use multi-page biovecs aswell
and gets rid of the hairy code you quote below.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [block] 4e366a7962: blktests.nvme.005.fail
  2019-03-28 17:48   ` Christoph Hellwig
@ 2019-03-28 17:52     ` Jens Axboe
  2019-03-29  8:06       ` Johannes Thumshirn
  0 siblings, 1 reply; 6+ messages in thread
From: Jens Axboe @ 2019-03-28 17:52 UTC (permalink / raw)
  To: Christoph Hellwig, Johannes Thumshirn
  Cc: Linux Block Layer Mailinglist, Hannes Reinecke, Bart Van Assche,
	Jan Kara, Linux FSDEVEL Mailinglist, lkp, Ming Lei

On 3/28/19 11:48 AM, Christoph Hellwig wrote:
> On Thu, Mar 28, 2019 at 05:31:29PM +0100, Johannes Thumshirn wrote:
>> In all cases I could reproduce it I had nsegs == 2 and
>> rq->nr_phys_segments == 1 (rq->bio->bi_phys_segments == 1).
> 
> And all cases are pass through as well?
> 
> Can you check if this still shows up with the series from Ming which
> switches the passthrough requests to use multi-page biovecs aswell
> and gets rid of the hairy code you quote below.

Since I already merged that in my 5.2 but didn't push it out yet, I've
made a temporary branch available if you want to test through that:

git://git.kernel.dk/linux-block 5.2-tmp


-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [block] 4e366a7962: blktests.nvme.005.fail
  2019-03-28 16:31 ` [block] 4e366a7962: blktests.nvme.005.fail Johannes Thumshirn
  2019-03-28 17:48   ` Christoph Hellwig
@ 2019-03-29  0:33   ` Ming Lei
  2019-03-29  8:18     ` Johannes Thumshirn
  1 sibling, 1 reply; 6+ messages in thread
From: Ming Lei @ 2019-03-29  0:33 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Linux Block Layer Mailinglist, Hannes Reinecke, Bart Van Assche,
	Christoph Hellwig, Jan Kara, Linux FSDEVEL Mailinglist,
	Jens Axboe, LKP ML, Ming Lei

On Fri, Mar 29, 2019 at 12:32 AM Johannes Thumshirn <jthumshirn@suse.de> wrote:
>
> On 25/03/2019 02:04, kernel test robot wrote:
> [...]
> > nvme/005 (reset local loopback target)
> >     runtime  ...
> > nvme/005 (reset local loopback target)                       [failed]
> >     runtime  ...  0.596s
> >     something found in dmesg:
> >     [   24.160182] run blktests nvme/005 at 2019-03-24 07:35:16
> >     [   24.346189] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
> >     [   24.373089] nvmet: creating controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:dbd85962-80d1-4872-ac0f-d0214f3b1131.
> >     [   24.394247] WARNING: CPU: 1 PID: 881 at /kbuild/src/x86_64/block/blk-merge.c:608 blk_rq_map_sg+0x5f0/0x6c0
> >     [   24.396082] Modules linked in: nvme_loop nvme_fabrics nvmet nvme_core loop crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ata_generic sd_mod pata_acpi sg bochs_drm ttm ppdev drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops snd_pcm drm ata_piix snd_timer aesni_intel crypto_simd snd cryptd glue_helper libata joydev soundcore pcspkr serio_raw virtio_scsi i2c_piix4 parport_pc floppy parport ip_tables
> >     [   24.403009] CPU: 1 PID: 881 Comm: nvme Not tainted 5.0.0-rc6-00148-g4e366a7 #1
> >     [   24.404372] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
> >     [   24.405930] RIP: 0010:blk_rq_map_sg+0x5f0/0x6c0
> >     [   24.406808] Code: 41 8b 91 10 01 00 00 8b 43 24 85 c2 0f 84 e3 fb ff ff f7 d0 21 d0 83 c0 01 41 01 40 0c 01 83 d0 00 00 00 e9 cd fb ff ff 0f 0b <0f> 0b e9 c3 fc ff ff 80 3d 66 f3 1d 01 00 0f 85 88 fb ff ff 48 c7
> >     [   24.410274] RSP: 0000:ffffb95c409d7908 EFLAGS: 00010202
> >     ...
> >     (See '/lkp/benchmarks/blktests/blktests/results/nodev/nvme/005.dmesg' for the entire message)
>
> OK we're tripping over this in blk_rq_map_sg():
>
> 607    /*
> 608     * Something must have been wrong if the figured number of
> 609     * segment is bigger than number of req's physical segments
> 610     */
> 611    WARN_ON(nsegs > blk_rq_nr_phys_segments(rq));
>
> In all cases I could reproduce it I had nsegs == 2 and
> rq->nr_phys_segments == 1 (rq->bio->bi_phys_segments == 1).
>
> Adding this into nvme-loop "fixes" the issue but I don't think this is
> the correct way to do
>
> --- a/drivers/nvme/target/loop.c
> +++ b/drivers/nvme/target/loop.c
> @@ -387,6 +387,7 @@ static int nvme_loop_configure_admin_que
>                 goto out_free_tagset;
>         }
>
> +       blk_queue_segment_boundary(ctrl->ctrl.admin_q, PAGE_SIZE - 1);

The above isn't needed for linus tree, however it is required for 5.0 and older
releases.

But nvme-loop target have other problems, and I'd suggest you to apply
the following patch:

https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git/commit/?h=for-linus&id=02db99548d3608a625cf481cff2bb7b626829b3f

>         error = nvmf_connect_admin_queue(&ctrl->ctrl);
>         if (error)
>                 goto out_cleanup_queue;
>
> What raises my suspicion is this code fragment from bio_add_pc_page():
>
>  731
>  732   /* If we may be able to merge these biovecs, force a recount */
>  733   if (bio->bi_vcnt > 1 && biovec_phys_mergeable(q, bvec - 1, bvec))
>  734           bio->bi_phys_segments = -1;

The above change from you might cause issue given some queue's .bi_phys_segments
is 255.

>
> This -1 (from this patch) would fit the one less than 'nsegs' but after
> days of searching and staring at the code, I cannot find where we loose
> the one. Before this patch set we would only have cleared the
> BIO_SEG_VALID flag and thus we never run out of sync.

I'd suggest you to address the following comments first before working on
your patch:

https://lore.kernel.org/linux-block/CACVXFVNPW1TOsCaAibc-YJHDEdK95Y8-v1rHNaab7oV=eb+8TA@mail.gmail.com/

https://lore.kernel.org/linux-block/CACVXFVOq1ngjpD62SreSZda5k_WJjxOZRHwpqt6si7bRv=O46Q@mail.gmail.com/

>
> Adding a call to blk_recount_segments() doesn't do any difference here.
>
> It can of be a read herring as well.
>
> What makes me suspicious is the fact that I can't reliably trigger it by
> running nvme/005 but have to run it in a loop ranging between several
> seconds to some minutes.
>
> Anyone any ideas? I've been staring at this for several days now but
> can't find out whats wrong.

Can you trigger this issue on linus tree or 5.2-tmp of block tree?


Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [block] 4e366a7962: blktests.nvme.005.fail
  2019-03-28 17:52     ` Jens Axboe
@ 2019-03-29  8:06       ` Johannes Thumshirn
  0 siblings, 0 replies; 6+ messages in thread
From: Johannes Thumshirn @ 2019-03-29  8:06 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig
  Cc: Linux Block Layer Mailinglist, Hannes Reinecke, Bart Van Assche,
	Jan Kara, Linux FSDEVEL Mailinglist, lkp, Ming Lei

On 28/03/2019 18:52, Jens Axboe wrote:
> On 3/28/19 11:48 AM, Christoph Hellwig wrote:
>> On Thu, Mar 28, 2019 at 05:31:29PM +0100, Johannes Thumshirn wrote:
>>> In all cases I could reproduce it I had nsegs == 2 and
>>> rq->nr_phys_segments == 1 (rq->bio->bi_phys_segments == 1).
>>
>> And all cases are pass through as well?
>>
>> Can you check if this still shows up with the series from Ming which
>> switches the passthrough requests to use multi-page biovecs aswell
>> and gets rid of the hairy code you quote below.

Good idea, thanks.

> Since I already merged that in my 5.2 but didn't push it out yet, I've
> made a temporary branch available if you want to test through that:
> 
> git://git.kernel.dk/linux-block 5.2-tmp

Awesome thanks.


-- 
Johannes Thumshirn                            SUSE Labs Filesystems
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [block] 4e366a7962: blktests.nvme.005.fail
  2019-03-29  0:33   ` Ming Lei
@ 2019-03-29  8:18     ` Johannes Thumshirn
  0 siblings, 0 replies; 6+ messages in thread
From: Johannes Thumshirn @ 2019-03-29  8:18 UTC (permalink / raw)
  To: Ming Lei
  Cc: Linux Block Layer Mailinglist, Hannes Reinecke, Bart Van Assche,
	Christoph Hellwig, Jan Kara, Linux FSDEVEL Mailinglist,
	Jens Axboe, LKP ML, Ming Lei

On 29/03/2019 01:33, Ming Lei wrote:
[...]

>> +       blk_queue_segment_boundary(ctrl->ctrl.admin_q, PAGE_SIZE - 1);
> 
> The above isn't needed for linus tree, however it is required for 5.0 and older
> releases.

Yes but it was worth a try after days of failing to find the issue.

> But nvme-loop target have other problems, and I'd suggest you to apply
> the following patch:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git/commit/?h=for-linus&id=02db99548d3608a625cf481cff2bb7b626829b3f

This only applies to the file backed target, doesn't it? My testcase
uses bdev backed namespaces.

[...]

>>  731
>>  732   /* If we may be able to merge these biovecs, force a recount */
>>  733   if (bio->bi_vcnt > 1 && biovec_phys_mergeable(q, bvec - 1, bvec))
>>  734           bio->bi_phys_segments = -1;
> 
> The above change from you might cause issue given some queue's .bi_phys_segments
> is 255.

65535 but yes there are drivers setting it to USHRT_MAX. I'll have to
see if I can get hold of one for testing.
 [...]

> I'd suggest you to address the following comments first before working on
> your patch:
> 
> https://lore.kernel.org/linux-block/CACVXFVNPW1TOsCaAibc-YJHDEdK95Y8-v1rHNaab7oV=eb+8TA@mail.gmail.com/
> 
> https://lore.kernel.org/linux-block/CACVXFVOq1ngjpD62SreSZda5k_WJjxOZRHwpqt6si7bRv=O46Q@mail.gmail.com/

I'll have a look, thanks.

[...]

> Can you trigger this issue on linus tree or 5.2-tmp of block tree?

I haven't been able to trigger it on vanilla so far, so it's most likely
related to the patch. Will rebase the patchset to 5.2-tmp and retest
from there.

Thanks,
	Johannes
-- 
Johannes Thumshirn                            SUSE Labs Filesystems
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-03-29  8:19 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20190325010409.GB6379@shao2-debian>
2019-03-28 16:31 ` [block] 4e366a7962: blktests.nvme.005.fail Johannes Thumshirn
2019-03-28 17:48   ` Christoph Hellwig
2019-03-28 17:52     ` Jens Axboe
2019-03-29  8:06       ` Johannes Thumshirn
2019-03-29  0:33   ` Ming Lei
2019-03-29  8:18     ` Johannes Thumshirn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox