From: mlin@kernel.org (Ming Lin)
Subject: [PATCH v6 05/11] block: remove split code in blkdev_issue_{discard,write_same}
Date: Wed, 21 Oct 2015 13:13:09 -0700 [thread overview]
Message-ID: <1445458389.26847.10.camel@ssi> (raw)
In-Reply-To: <20151021181812.GA5807@redhat.com>
On Wed, 2015-10-21@14:18 -0400, Mike Snitzer wrote:
> On Wed, Oct 21 2015 at 1:33pm -0400,
> Ming Lin <mlin@kernel.org> wrote:
>
> > On Wed, 2015-10-21@12:19 -0400, Mike Snitzer wrote:
> > > On Wed, Oct 21 2015 at 12:02pm -0400,
> > > Mike Snitzer <snitzer@redhat.com> wrote:
> > >
> > > > On Wed, Oct 14 2015 at 9:27am -0400,
> > > > Christoph Hellwig <hch@infradead.org> wrote:
> > > >
> > > > > On Tue, Oct 13, 2015@10:44:11AM -0700, Ming Lin wrote:
> > > > > > I just did a quick test with a Samsung 900G NVMe device.
> > > > > > mkfs.xfs is OK on 4.3-rc5.
> > > > > >
> > > > > > What's your device model? I may find a similar one to try.
> > > > >
> > > > > This is a HGST Ultrastar SN100
> > > > >
> > > > > Analsys and tentativ fix below:
> > > > >
> > > > > blktrace for before the commit:
> > > > >
> > > > > 259,0 1 2 0.000002543 2394 G D 0 + 8388607 [mkfs.xfs]
> > > > > 259,0 1 3 0.000008230 2394 I D 0 + 8388607 [mkfs.xfs]
> > > > > 259,0 1 4 0.000031090 207 D D 0 + 8388607 [kworker/1:1H]
> > > > > 259,0 1 5 0.000044869 2394 Q D 8388607 + 8388607 [mkfs.xfs]
> > > > > 259,0 1 6 0.000045992 2394 G D 8388607 + 8388607 [mkfs.xfs]
> > > > > 259,0 1 7 0.000049559 2394 I D 8388607 + 8388607 [mkfs.xfs]
> > > > > 259,0 1 8 0.000061551 207 D D 8388607 + 8388607 [kworker/1:1H]
> > > > >
> > > > > .. and so on.
> > > > >
> > > > > blktrace with the commit:
> > > > >
> > > > > 259,0 2 1 0.000000000 1228 Q D 0 + 4194304 [mkfs.xfs]
> > > > > 259,0 2 2 0.000002543 1228 G D 0 + 4194304 [mkfs.xfs]
> > > > > 259,0 2 3 0.000010080 1228 I D 0 + 4194304 [mkfs.xfs]
> > > > > 259,0 2 4 0.000082187 267 D D 0 + 4194304 [kworker/2:1H]
> > > > > 259,0 2 5 0.000224869 1228 Q D 4194304 + 4194304 [mkfs.xfs]
> > > > > 259,0 2 6 0.000225835 1228 G D 4194304 + 4194304 [mkfs.xfs]
> > > > > 259,0 2 7 0.000229457 1228 I D 4194304 + 4194304 [mkfs.xfs]
> > > > > 259,0 2 8 0.000238507 267 D D 4194304 + 4194304 [kworker/2:1H]
> > > > >
> > > > > So discards are smaller, but better aligned. Now if I tweak a single
> > > > > line in blk-lib.c to be able to use all of bi_size I get the old I/O
> > > > > pattern back and everything works fine again:
> > > > >
> > > > > diff --git a/block/blk-lib.c b/block/blk-lib.c
> > > > > index bd40292..65b61dc 100644
> > > > > --- a/block/blk-lib.c
> > > > > +++ b/block/blk-lib.c
> > > > > @@ -82,7 +82,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
> > > > > break;
> > > > > }
> > > > >
> > > > > - req_sects = min_t(sector_t, nr_sects, MAX_BIO_SECTORS);
> > > > > + req_sects = min_t(sector_t, nr_sects, UINT_MAX >> 9);
> > > > > end_sect = sector + req_sects;
> > > > >
> > > > > bio->bi_iter.bi_sector = sector;
> > > >
> > > > Can we change UINT_MAX >> 9 to rounddown to the first factor of
> > > > minimum_io_size?
> > > >
> > > > That should work for all devices and for dm-thinp (and dm-cache) in
> > > > particular will ensure that all discards that are issued will be a
> > > > multiple of the underlying device's blocksize.
> > >
> > > Jeff Moyer pointed out having req_sects be a factor of
> > > discard_granularity makes more sense. And I agree. Same difference in
> > > the end (since dm-thinp sets discard_granularity to the thinp
> > > blocksize).
> >
> > An old version of this patch did use discard_granularity
> > https://www.redhat.com/archives/dm-devel/2015-August/msg00000.html
> >
> > But you didn't agree.
> > https://www.redhat.com/archives/dm-devel/2015-August/msg00001.html
> >
> > Maybe we can re-add discard_granularity now?
>
> I disagreed on a more generic level than discard_granularity shaping the
> split boundary.
>
> But we are where we are. If we're going to split (due to 32-bit limits
> in bio->bi_iter.bi_size) then we should at least do so in terms of the
> support discard_granularity.
How about below?
It actually reverts commit b49a0871 and adds patch at
https://www.redhat.com/archives/dm-devel/2015-August/msg00000.html
Christoph, could you help to try it?
commit 122bf0a43cb1611ed62aaf945f25b649c27a71ed
Author: Ming Lin <mlin at kernel.org>
Date: Wed Oct 21 11:24:48 2015 -0700
block: check discard_granularity and alignment
Signed-off-by: Ming Lin <ming.l at ssi.samsung.com>
---
block/blk-lib.c | 31 ++++++++++++++++++++++---------
1 file changed, 22 insertions(+), 9 deletions(-)
diff --git a/block/blk-lib.c b/block/blk-lib.c
index bd40292..9ebf653 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -26,13 +26,6 @@ static void bio_batch_end_io(struct bio *bio)
bio_put(bio);
}
-/*
- * Ensure that max discard sectors doesn't overflow bi_size and hopefully
- * it is of the proper granularity as long as the granularity is a power
- * of two.
- */
-#define MAX_BIO_SECTORS ((1U << 31) >> 9)
-
/**
* blkdev_issue_discard - queue a discard
* @bdev: blockdev to issue discard for
@@ -50,6 +43,8 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
DECLARE_COMPLETION_ONSTACK(wait);
struct request_queue *q = bdev_get_queue(bdev);
int type = REQ_WRITE | REQ_DISCARD;
+ unsigned int granularity;
+ int alignment;
struct bio_batch bb;
struct bio *bio;
int ret = 0;
@@ -61,6 +56,10 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
if (!blk_queue_discard(q))
return -EOPNOTSUPP;
+ /* Zero-sector (unknown) and one-sector granularities are the same. */
+ granularity = max(q->limits.discard_granularity >> 9, 1U);
+ alignment = (bdev_discard_alignment(bdev) >> 9) % granularity;
+
if (flags & BLKDEV_DISCARD_SECURE) {
if (!blk_queue_secdiscard(q))
return -EOPNOTSUPP;
@@ -74,7 +73,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
blk_start_plug(&plug);
while (nr_sects) {
unsigned int req_sects;
- sector_t end_sect;
+ sector_t end_sect, tmp;
bio = bio_alloc(gfp_mask, 1);
if (!bio) {
@@ -82,8 +81,22 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
break;
}
- req_sects = min_t(sector_t, nr_sects, MAX_BIO_SECTORS);
+ /* Make sure bi_size doesn't overflow */
+ req_sects = min_t(sector_t, nr_sects, UINT_MAX >> 9);
+
+ /*
+ * If splitting a request, and the next starting sector would be
+ * misaligned, stop the discard at the previous aligned sector.
+ */
end_sect = sector + req_sects;
+ tmp = end_sect;
+ if (req_sects < nr_sects &&
+ sector_div(tmp, granularity) != alignment) {
+ end_sect = end_sect - alignment;
+ sector_div(end_sect, granularity);
+ end_sect = end_sect * granularity + alignment;
+ req_sects = end_sect - sector;
+ }
bio->bi_iter.bi_sector = sector;
bio->bi_end_io = bio_batch_end_io;
WARNING: multiple messages have this Message-ID (diff)
From: Ming Lin <mlin@kernel.org>
To: Mike Snitzer <snitzer@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>,
lkml <linux-kernel@vger.kernel.org>, Jens Axboe <axboe@kernel.dk>,
Kent Overstreet <kent.overstreet@gmail.com>,
Dongsu Park <dpark@posteo.net>,
"Martin K. Petersen" <martin.petersen@oracle.com>,
Ming Lin <ming.l@ssi.samsung.com>,
linux-nvme@lists.infradead.org
Subject: Re: [PATCH v6 05/11] block: remove split code in blkdev_issue_{discard,write_same}
Date: Wed, 21 Oct 2015 13:13:09 -0700 [thread overview]
Message-ID: <1445458389.26847.10.camel@ssi> (raw)
In-Reply-To: <20151021181812.GA5807@redhat.com>
On Wed, 2015-10-21 at 14:18 -0400, Mike Snitzer wrote:
> On Wed, Oct 21 2015 at 1:33pm -0400,
> Ming Lin <mlin@kernel.org> wrote:
>
> > On Wed, 2015-10-21 at 12:19 -0400, Mike Snitzer wrote:
> > > On Wed, Oct 21 2015 at 12:02pm -0400,
> > > Mike Snitzer <snitzer@redhat.com> wrote:
> > >
> > > > On Wed, Oct 14 2015 at 9:27am -0400,
> > > > Christoph Hellwig <hch@infradead.org> wrote:
> > > >
> > > > > On Tue, Oct 13, 2015 at 10:44:11AM -0700, Ming Lin wrote:
> > > > > > I just did a quick test with a Samsung 900G NVMe device.
> > > > > > mkfs.xfs is OK on 4.3-rc5.
> > > > > >
> > > > > > What's your device model? I may find a similar one to try.
> > > > >
> > > > > This is a HGST Ultrastar SN100
> > > > >
> > > > > Analsys and tentativ fix below:
> > > > >
> > > > > blktrace for before the commit:
> > > > >
> > > > > 259,0 1 2 0.000002543 2394 G D 0 + 8388607 [mkfs.xfs]
> > > > > 259,0 1 3 0.000008230 2394 I D 0 + 8388607 [mkfs.xfs]
> > > > > 259,0 1 4 0.000031090 207 D D 0 + 8388607 [kworker/1:1H]
> > > > > 259,0 1 5 0.000044869 2394 Q D 8388607 + 8388607 [mkfs.xfs]
> > > > > 259,0 1 6 0.000045992 2394 G D 8388607 + 8388607 [mkfs.xfs]
> > > > > 259,0 1 7 0.000049559 2394 I D 8388607 + 8388607 [mkfs.xfs]
> > > > > 259,0 1 8 0.000061551 207 D D 8388607 + 8388607 [kworker/1:1H]
> > > > >
> > > > > .. and so on.
> > > > >
> > > > > blktrace with the commit:
> > > > >
> > > > > 259,0 2 1 0.000000000 1228 Q D 0 + 4194304 [mkfs.xfs]
> > > > > 259,0 2 2 0.000002543 1228 G D 0 + 4194304 [mkfs.xfs]
> > > > > 259,0 2 3 0.000010080 1228 I D 0 + 4194304 [mkfs.xfs]
> > > > > 259,0 2 4 0.000082187 267 D D 0 + 4194304 [kworker/2:1H]
> > > > > 259,0 2 5 0.000224869 1228 Q D 4194304 + 4194304 [mkfs.xfs]
> > > > > 259,0 2 6 0.000225835 1228 G D 4194304 + 4194304 [mkfs.xfs]
> > > > > 259,0 2 7 0.000229457 1228 I D 4194304 + 4194304 [mkfs.xfs]
> > > > > 259,0 2 8 0.000238507 267 D D 4194304 + 4194304 [kworker/2:1H]
> > > > >
> > > > > So discards are smaller, but better aligned. Now if I tweak a single
> > > > > line in blk-lib.c to be able to use all of bi_size I get the old I/O
> > > > > pattern back and everything works fine again:
> > > > >
> > > > > diff --git a/block/blk-lib.c b/block/blk-lib.c
> > > > > index bd40292..65b61dc 100644
> > > > > --- a/block/blk-lib.c
> > > > > +++ b/block/blk-lib.c
> > > > > @@ -82,7 +82,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
> > > > > break;
> > > > > }
> > > > >
> > > > > - req_sects = min_t(sector_t, nr_sects, MAX_BIO_SECTORS);
> > > > > + req_sects = min_t(sector_t, nr_sects, UINT_MAX >> 9);
> > > > > end_sect = sector + req_sects;
> > > > >
> > > > > bio->bi_iter.bi_sector = sector;
> > > >
> > > > Can we change UINT_MAX >> 9 to rounddown to the first factor of
> > > > minimum_io_size?
> > > >
> > > > That should work for all devices and for dm-thinp (and dm-cache) in
> > > > particular will ensure that all discards that are issued will be a
> > > > multiple of the underlying device's blocksize.
> > >
> > > Jeff Moyer pointed out having req_sects be a factor of
> > > discard_granularity makes more sense. And I agree. Same difference in
> > > the end (since dm-thinp sets discard_granularity to the thinp
> > > blocksize).
> >
> > An old version of this patch did use discard_granularity
> > https://www.redhat.com/archives/dm-devel/2015-August/msg00000.html
> >
> > But you didn't agree.
> > https://www.redhat.com/archives/dm-devel/2015-August/msg00001.html
> >
> > Maybe we can re-add discard_granularity now?
>
> I disagreed on a more generic level than discard_granularity shaping the
> split boundary.
>
> But we are where we are. If we're going to split (due to 32-bit limits
> in bio->bi_iter.bi_size) then we should at least do so in terms of the
> support discard_granularity.
How about below?
It actually reverts commit b49a0871 and adds patch at
https://www.redhat.com/archives/dm-devel/2015-August/msg00000.html
Christoph, could you help to try it?
commit 122bf0a43cb1611ed62aaf945f25b649c27a71ed
Author: Ming Lin <mlin@kernel.org>
Date: Wed Oct 21 11:24:48 2015 -0700
block: check discard_granularity and alignment
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
---
block/blk-lib.c | 31 ++++++++++++++++++++++---------
1 file changed, 22 insertions(+), 9 deletions(-)
diff --git a/block/blk-lib.c b/block/blk-lib.c
index bd40292..9ebf653 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -26,13 +26,6 @@ static void bio_batch_end_io(struct bio *bio)
bio_put(bio);
}
-/*
- * Ensure that max discard sectors doesn't overflow bi_size and hopefully
- * it is of the proper granularity as long as the granularity is a power
- * of two.
- */
-#define MAX_BIO_SECTORS ((1U << 31) >> 9)
-
/**
* blkdev_issue_discard - queue a discard
* @bdev: blockdev to issue discard for
@@ -50,6 +43,8 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
DECLARE_COMPLETION_ONSTACK(wait);
struct request_queue *q = bdev_get_queue(bdev);
int type = REQ_WRITE | REQ_DISCARD;
+ unsigned int granularity;
+ int alignment;
struct bio_batch bb;
struct bio *bio;
int ret = 0;
@@ -61,6 +56,10 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
if (!blk_queue_discard(q))
return -EOPNOTSUPP;
+ /* Zero-sector (unknown) and one-sector granularities are the same. */
+ granularity = max(q->limits.discard_granularity >> 9, 1U);
+ alignment = (bdev_discard_alignment(bdev) >> 9) % granularity;
+
if (flags & BLKDEV_DISCARD_SECURE) {
if (!blk_queue_secdiscard(q))
return -EOPNOTSUPP;
@@ -74,7 +73,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
blk_start_plug(&plug);
while (nr_sects) {
unsigned int req_sects;
- sector_t end_sect;
+ sector_t end_sect, tmp;
bio = bio_alloc(gfp_mask, 1);
if (!bio) {
@@ -82,8 +81,22 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
break;
}
- req_sects = min_t(sector_t, nr_sects, MAX_BIO_SECTORS);
+ /* Make sure bi_size doesn't overflow */
+ req_sects = min_t(sector_t, nr_sects, UINT_MAX >> 9);
+
+ /*
+ * If splitting a request, and the next starting sector would be
+ * misaligned, stop the discard at the previous aligned sector.
+ */
end_sect = sector + req_sects;
+ tmp = end_sect;
+ if (req_sects < nr_sects &&
+ sector_div(tmp, granularity) != alignment) {
+ end_sect = end_sect - alignment;
+ sector_div(end_sect, granularity);
+ end_sect = end_sect * granularity + alignment;
+ req_sects = end_sect - sector;
+ }
bio->bi_iter.bi_sector = sector;
bio->bi_end_io = bio_batch_end_io;
next prev parent reply other threads:[~2015-10-21 20:13 UTC|newest]
Thread overview: 59+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-08-12 7:07 [PATCH v6 00/11] simplify block layer based on immutable biovecs Ming Lin
[not found] ` <1439363241-31772-1-git-send-email-mlin-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2015-08-12 7:07 ` [PATCH v6 01/11] block: make generic_make_request handle arbitrarily sized bios Ming Lin
2015-08-12 7:07 ` Ming Lin
2015-08-12 7:07 ` [Drbd-dev] " Ming Lin
2015-08-12 7:07 ` [PATCH v6 02/11] block: simplify bio_add_page() Ming Lin
2015-08-12 7:07 ` [PATCH v6 03/11] bcache: remove driver private bio splitting code Ming Lin
2016-01-08 1:53 ` Eric Wheeler
2016-01-13 2:00 ` Eric Wheeler
2016-01-13 5:54 ` Vojtech Pavlik
2016-01-13 23:03 ` Eric Wheeler
2015-08-12 7:07 ` [PATCH v6 04/11] btrfs: remove bio splitting and merge_bvec_fn() calls Ming Lin
2015-08-12 7:07 ` [PATCH v6 05/11] block: remove split code in blkdev_issue_{discard,write_same} Ming Lin
2015-10-13 11:50 ` Christoph Hellwig
2015-10-13 11:50 ` Christoph Hellwig
2015-10-13 17:44 ` [PATCH v6 05/11] block: remove split code in blkdev_issue_{discard, write_same} Ming Lin
2015-10-13 17:44 ` [PATCH v6 05/11] block: remove split code in blkdev_issue_{discard,write_same} Ming Lin
2015-10-14 13:27 ` Christoph Hellwig
2015-10-14 13:27 ` Christoph Hellwig
2015-10-14 16:38 ` [PATCH v6 05/11] block: remove split code in blkdev_issue_{discard,write_same}B Keith Busch
2015-10-14 16:38 ` Keith Busch
2015-10-14 16:50 ` Christoph Hellwig
2015-10-14 16:50 ` Christoph Hellwig
2015-10-21 16:02 ` [PATCH v6 05/11] block: remove split code in blkdev_issue_{discard,write_same} Mike Snitzer
2015-10-21 16:02 ` Mike Snitzer
2015-10-21 16:19 ` Mike Snitzer
2015-10-21 16:19 ` Mike Snitzer
2015-10-21 16:33 ` [PATCH v6 05/11] block: remove split code in blkdev_issue_{discard, write_same} Martin K. Petersen
2015-10-21 16:33 ` [PATCH v6 05/11] block: remove split code in blkdev_issue_{discard,write_same} Martin K. Petersen
2015-10-21 17:33 ` Ming Lin
2015-10-21 17:33 ` Ming Lin
2015-10-21 18:18 ` Mike Snitzer
2015-10-21 18:18 ` Mike Snitzer
2015-10-21 20:13 ` Ming Lin [this message]
2015-10-21 20:13 ` Ming Lin
2015-10-22 10:24 ` Christoph Hellwig
2015-10-22 10:24 ` Christoph Hellwig
2015-10-22 11:22 ` Christoph Hellwig
2015-10-22 11:22 ` Christoph Hellwig
2015-10-21 7:21 ` Christoph Hellwig
2015-10-21 7:21 ` Christoph Hellwig
2015-10-21 13:39 ` [PATCH v6 05/11] block: remove split code in blkdev_issue_{discard, write_same} Jeff Moyer
2015-10-21 13:39 ` [PATCH v6 05/11] block: remove split code in blkdev_issue_{discard,write_same} Jeff Moyer
2015-10-21 15:01 ` Ming Lin
2015-10-21 15:01 ` Ming Lin
2015-10-21 15:33 ` Mike Snitzer
2015-10-21 15:33 ` Mike Snitzer
2015-10-21 17:18 ` Ming Lin
2015-10-21 17:18 ` Ming Lin
2015-08-12 7:07 ` [PATCH v6 06/11] md/raid5: split bio for chunk_aligned_read Ming Lin
2015-08-12 7:07 ` [PATCH v6 07/11] md/raid5: get rid of bio_fits_rdev() Ming Lin
2015-08-12 7:07 ` [PATCH v6 08/11] block: kill merge_bvec_fn() completely Ming Lin
2015-08-12 7:07 ` [Drbd-dev] " Ming Lin
2015-08-12 7:07 ` [PATCH v6 09/11] fs: use helper bio_add_page() instead of open coding on bi_io_vec Ming Lin
2015-08-12 7:07 ` [PATCH v6 10/11] block: remove bio_get_nr_vecs() Ming Lin
2015-08-12 7:07 ` [PATCH v6 11/11] Documentation: update notes in biovecs about arbitrarily sized bios Ming Lin
2015-08-13 16:51 ` [PATCH v6 00/11] simplify block layer based on immutable biovecs Jens Axboe
2015-08-13 17:03 ` Ming Lin
2015-08-13 17:07 ` Jens Axboe
2015-08-13 17:36 ` Ming Lin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1445458389.26847.10.camel@ssi \
--to=mlin@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.