All of lore.kernel.org
 help / color / mirror / Atom feed
From: Johannes Thumshirn <jthumshirn@suse.de>
To: Ming Lei <ming.lei@redhat.com>
Cc: Jens Axboe <axboe@fb.com>, Omar Sandoval <osandov@osandov.com>,
	Bart Van Assche <Bart.VanAssche@sandisk.com>,
	Hannes Reinecke <hare@suse.de>, Christoph Hellwig <hch@lst.de>,
	Linux Block Layer Mailinglist <linux-block@vger.kernel.org>,
	Linux Kernel Mailinglist <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] block: bios with an offset are always gappy
Date: Thu, 13 Apr 2017 13:53:28 +0200	[thread overview]
Message-ID: <20170413115328.GH6734@linux-x5ow.site> (raw)
In-Reply-To: <20170413100110.GB5964@ming.t460p>

On Thu, Apr 13, 2017 at 06:02:21PM +0800, Ming Lei wrote:
> On Thu, Apr 13, 2017 at 10:06:29AM +0200, Johannes Thumshirn wrote:
> > Doing a mkfs.btrfs on a (qemu emulated) PCIe NVMe causes a kernel panic
> > in nvme_setup_prps() because the dma_len will drop below zero but the
> > length not.
> 
> Looks I can't reproduce the issue in QEMU(32G nvme, either partitioned
> or not, just use 'mkfs.btrfs /dev/nvme0n1p1'), could you share the exact
> mkfs command line and size of your emulated NVMe?

the exact cmdline is mkfs.btrfs -f /dev/nvme0n1p1 (-f because there was a
existing btrfs on the image). The image is 17179869184 (a.k.a 16G) bytes.

[...]

> Could you try the following patch to see if it fixes your issue?

It's back to the old, erratic behaviour, see log below.
> 
> ---
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 7548f332121a..65d1510681c6 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -1659,16 +1659,28 @@ static inline bool bvec_gap_to_prev(struct request_queue *q,
>   * and the 1st bvec in the 2nd bio can be handled in one segment.
>   */
>  static inline bool bios_segs_mergeable(struct request_queue *q,
> -		struct bio *prev, struct bio_vec *prev_last_bv,
> +		struct bio *prev, struct bio *next,
> +		struct bio_vec *prev_last_bv,
>  		struct bio_vec *next_first_bv)
>  {
>  	if (!BIOVEC_PHYS_MERGEABLE(prev_last_bv, next_first_bv))
>  		return false;
>  	if (!BIOVEC_SEG_BOUNDARY(q, prev_last_bv, next_first_bv))
>  		return false;
> -	if (prev->bi_seg_back_size + next_first_bv->bv_len >
> +	if (prev->bi_seg_back_size + next->bi_seg_front_size >
>  			queue_max_segment_size(q))
>  		return false;
> +
> +	/*
> +	 * if 'next' has multiple segments, we need to make
> +	 * sure the merged segment from 'pb' and the 1st segment
> +	 * of 'next' ends at aligned virt boundary.
> +	 */
> +	if ((next->bi_seg_front_size < next->bi_iter.bi_size) &&
> +	    ((prev_last_bv->bv_offset + prev_last_bv->bv_len +
> +	     next->bi_seg_front_size) & queue_virt_boundary(q)))
> +		return false;
> +
>  	return true;
>  }
>  
> @@ -1681,7 +1693,7 @@ static inline bool bio_will_gap(struct request_queue *q, struct bio *prev,
>  		bio_get_last_bvec(prev, &pb);
>  		bio_get_first_bvec(next, &nb);
>  
> -		if (!bios_segs_mergeable(q, prev, &pb, &nb))
> +		if (!bios_segs_mergeable(q, prev, next, &pb, &nb))
>  			return __bvec_gap_to_prev(q, &pb, nb.bv_offset);
>  	}



dracut:/# [    1.211567] tsc: Refined TSC clocksource calibration: 2297.338 MHz
[    1.212601] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x211d6274d86, max_idle_ns: 440795243673 ns
dracut:/# modprobe btrfs
[    8.139179] raid6_pq: module verification failed: signature and/or required key missing - tainting kernel
[    8.207509] raid6: sse2x1   gen()  6827 MB/s
[    8.275512] raid6: sse2x1   xor()  5654 MB/s
[    8.343507] raid6: sse2x2   gen() 11573 MB/s
[    8.411503] raid6: sse2x2   xor()  8826 MB/s
[    8.479504] raid6: sse2x4   gen() 14794 MB/s
[    8.547504] raid6: sse2x4   xor() 10618 MB/s
[    8.547830] raid6: using algorithm sse2x4 gen() 14794 MB/s
[    8.548218] raid6: .... xor() 10618 MB/s, rmw enabled
[    8.548558] raid6: using intx1 recovery algorithm
[    8.549341] xor: measuring software checksum speed
[    8.587533]    prefetch64-sse: 15090.000 MB/sec
[    8.627553]    generic_sse: 13530.000 MB/sec
[    8.627945] xor: using function: prefetch64-sse (15090.000 MB/sec)
[    8.633795] Btrfs loaded, crc32c=crc32c-generic, assert=on
dracut:/# modprobe nvme
[   12.348762] nvme nvme0: pci function 0000:00:04.0
[   12.386300] ACPI: PCI Interrupt Link [LNKD] enabled at IRQ 11
dracut:/# [   12.391707]  nvme0n1: p1
dracut:/# mkfs.b[   36.553376] random: fast init done
tr
dracut:/# mkfs.btrfs -f /dev/nvme0n1p1
btrfs-progs v4.5.3+20160729
See http://btrfs.wiki.kernel.org for more information.

Detected a SSD, turning off metadata duplication.  Mkfs with -m dup if you want to force metadata duplication.
[   46.696671] ------------[ cut here ]------------
[   46.697338] kernel BUG at drivers/nvme/host/pci.c:494!
[   46.697806] invalid opcode: 0000 [#1] SMP
[   46.698175] Modules linked in: nvme(E) nvme_core(E) btrfs(E) xor(E) raid6_pq(E)
[   46.698879] CPU: 1 PID: 18 Comm: kworker/1:0H Tainted: G            E   4.11.0-rc6-default+ #43
[   46.699686] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014
[   46.700737] Workqueue: kblockd blk_mq_run_work_fn
[   46.701169] task: ffff88007bd24540 task.stack: ffffc900003bc000
[   46.701709] RIP: 0010:nvme_queue_rq+0x85d/0x886 [nvme]
[   46.702185] RSP: 0018:ffffc900003bfc78 EFLAGS: 00010286
[   46.702670] RAX: 0000000000000078 RBX: 0000000000001000 RCX: 000000007f625000
[   46.703318] RDX: 0000000000000000 RSI: 0000000000000246 RDI: 0000000000000246
[   46.703968] RBP: ffffc900003bfd50 R08: 000000000013ee00 R09: 0000000000001000
[   46.704624] R10: ffff88007f1ed000 R11: ffff88007f220000 R12: ffff88007f1ed000
[   46.705276] R13: 00000000fffffe00 R14: 0000000000000010 R15: 000000000012fe00
[   46.705927] FS:  0000000000000000(0000) GS:ffff88007ea80000(0000) knlGS:0000000000000000
[   46.706673] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   46.707199] CR2: 00007ffdd1294000 CR3: 000000007b742000 CR4: 00000000000006e0
[   46.707846] Call Trace:
[   46.708082]  blk_mq_dispatch_rq_list+0x2a0/0x3d0
[   46.708510]  blk_mq_sched_dispatch_requests+0x138/0x160
[   46.708991]  __blk_mq_run_hw_queue+0x8c/0xa0
[   46.709407]  blk_mq_run_work_fn+0x12/0x20
[   46.709781]  process_one_work+0x153/0x400
[   46.710152]  worker_thread+0x12b/0x4b0
[   46.711698]  kthread+0x109/0x140
[   46.712013]  ? rescuer_thread+0x340/0x340
[   46.712391]  ? kthread_park+0x90/0x90
[   46.712741]  ret_from_fork+0x2c/0x40
[   46.713081] Code: 01 00 48 8b 40 10 48 89 45 a8 49 8b 87 70 01 00 00 48 89 45 b0 0f 84 3e fa ff ff 49 8b 87 88 01 00 00 48 89 45 a0 e9 2e fa ff ff <0f> 0b 4c 8b 0d e2 35 8f e1 eb 80 0f 0b 4c 89 ef c6 07 00 0f 1f
[   46.714861] RIP: nvme_queue_rq+0x85d/0x886 [nvme] RSP: ffffc900003bfc78
[   46.715810] ---[ end trace 280a594163a124fb ]---
[   46.796265] ------------[ cut here ]------------


Thanks,
	Johannes

-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

  parent reply	other threads:[~2017-04-13 11:53 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-13  8:06 [PATCH] block: bios with an offset are always gappy Johannes Thumshirn
2017-04-13  9:48 ` Christoph Hellwig
2017-04-13  9:56   ` Johannes Thumshirn
2017-04-13 10:01   ` Johannes Thumshirn
2017-04-13 10:02 ` Ming Lei
2017-04-13 10:10   ` Johannes Thumshirn
2017-04-13 11:53   ` Johannes Thumshirn [this message]
2017-04-13 12:11     ` Ming Lei
     [not found]       ` <20170413122010.GJ6734@linux-x5ow.site>
2017-04-13 13:44         ` Ming Lei
2017-04-13 14:45     ` Ming Lei
2017-04-13 14:50       ` Johannes Thumshirn
2017-04-13 20:35       ` Andreas Mohr
2017-04-14  1:15         ` Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170413115328.GH6734@linux-x5ow.site \
    --to=jthumshirn@suse.de \
    --cc=Bart.VanAssche@sandisk.com \
    --cc=axboe@fb.com \
    --cc=hare@suse.de \
    --cc=hch@lst.de \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ming.lei@redhat.com \
    --cc=osandov@osandov.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.