* Re: [PATCH] blk-mq: don't complete un-started request in timeout handler
From: Ming Lei @ 2017-03-23 12:14 UTC (permalink / raw)
To: Keith Busch
Cc: Jens Axboe, linux-kernel, linux-block, Christoph Hellwig,
Yi Zhang, Bart Van Assche, Hannes Reinecke
In-Reply-To: <20170322155817.GA18960@localhost.localdomain>
On Wed, Mar 22, 2017 at 11:58:17AM -0400, Keith Busch wrote:
> On Tue, Mar 21, 2017 at 11:03:59PM -0400, Jens Axboe wrote:
> > On 03/21/2017 10:14 PM, Ming Lei wrote:
> > > When iterating busy requests in timeout handler,
> > > if the STARTED flag of one request isn't set, that means
> > > the request is being processed in block layer or driver, and
> > > isn't submitted to hardware yet.
> > >
> > > In current implementation of blk_mq_check_expired(),
> > > if the request queue becomes dying, un-started requests are
> > > handled as being completed/freed immediately. This way is
> > > wrong, and can cause rq corruption or double allocation[1][2],
> > > when doing I/O and removing&resetting NVMe device at the sametime.
> >
> > I agree, completing it looks bogus. If the request is in a scheduler or
> > on a software queue, this won't end well at all. Looks like it was
> > introduced by this patch:
> >
> > commit eb130dbfc40eabcd4e10797310bda6b9f6dd7e76
> > Author: Keith Busch <keith.busch@intel.com>
> > Date: Thu Jan 8 08:59:53 2015 -0700
> >
> > blk-mq: End unstarted requests on a dying queue
> >
> > Before that, we just ignored it. Keith?
>
> The above was intended for a stopped hctx on a dying queue such that
> there's nothing in flight to the driver. Nvme had been relying on this
> to end unstarted requests so we may progress when a controller dies.
So the brokenness started just from the begining.
>
> We've since obviated the need: we restart the hw queues to flush entered
> requests to failure, so we don't need that brokenness.
Looks the following commit need to be backported too if we port this patch.
commit 69d9a99c258eb1d6478fd9608a2070890797eed7
Author: Keith Busch <keith.busch@intel.com>
Date: Wed Feb 24 09:15:56 2016 -0700
NVMe: Move error handling to failed reset handler
Thanks,
Ming
^ permalink raw reply
* Re: support ranges TRIM for libata
From: James Bottomley @ 2017-03-23 13:47 UTC (permalink / raw)
To: Christoph Hellwig, Tejun Heo
Cc: martin.petersen, axboe, linux-ide, linux-scsi, linux-block
In-Reply-To: <20170322181947.GA4733@lst.de>
On Wed, 2017-03-22 at 19:19 +0100, Christoph Hellwig wrote:
> On Tue, Mar 21, 2017 at 02:59:01PM -0400, Tejun Heo wrote:
> > I do like the fact that this is a lot simpler than the previous
> > implementation but am not quite sure we want to deviate
> > significantly from what we do for other commands (command
> > translation). Is it because fixing the existing implementation
> > would involve invaisve changes including memory allocations?
>
> The current implementation already has the issue of that it does
> corrupt user data reliably if the using SG_IO for WRITE SAME
> commands.
That does need fixing.
> Doing ranges using translation would turn into a nightmare because
> ATA TRIM ranges are 16 bits long while SCSI UNAMP ranges are 32-bit,
> so we effectively can't translated them without introducing a
> non-standard hook between libata and scsi to communicate that
> limit.
Why can't we do what the t10 sat document recommends: if the ATA device
doesn't support the XL version (32 bit ranges) then translate unmap to
multiple non-XL commands?
I don't necessarily object to the vendor specific 1<->1 approach, it's
just it won't fix the problem you cited above (SG_IO WRITE SAME), its
just that now we error the command, which may cause some surprise. I
also wonder if we couldn't simply do an ATA_16 TRIM if we're already
going to all the trouble of recognising ATA devices in the sd discard
path?
James
> And once we're down that path we might as well just do the
> right thing directly.
^ permalink raw reply
* Re: support ranges TRIM for libata
From: Christoph Hellwig @ 2017-03-23 13:55 UTC (permalink / raw)
To: James Bottomley
Cc: Christoph Hellwig, Tejun Heo, martin.petersen, axboe, linux-ide,
linux-scsi, linux-block
In-Reply-To: <1490276861.2202.8.camel@HansenPartnership.com>
On Thu, Mar 23, 2017 at 09:47:41AM -0400, James Bottomley wrote:
> > The current implementation already has the issue of that it does
> > corrupt user data reliably if the using SG_IO for WRITE SAME
> > commands.
>
> That does need fixing.
I don't think it's fixable as long as we translate the data payload.
> Why can't we do what the t10 sat document recommends: if the ATA device
> doesn't support the XL version (32 bit ranges) then translate unmap to
> multiple non-XL commands?
Because libata sits underneath the tag allocator we'd getting into
a giant set of problems. Where do you expect the new commands to
magically come from? And adhere to the queueing limits and not actually
deadlock in one way or another?
> I don't necessarily object to the vendor specific 1<->1 approach, it's
> just it won't fix the problem you cited above (SG_IO WRITE SAME), its
> just that now we error the command, which may cause some surprise.
We now error WRITE SAME for passthrough consistently. Before we only
accepted it only with the unmap bit set, and even did so incorrectly
(not checking that the payload was all zeros).
> I
> also wonder if we couldn't simply do an ATA_16 TRIM if we're already
> going to all the trouble of recognising ATA devices in the sd discard
> path?
We probably could do that as well. But that would drag a lot more
ATA-specific code into sd.c than just formatting the ranges.
^ permalink raw reply
* Re: [PATCH] block: make nr_iovecs unsigned in bio_alloc_bioset()
From: Christoph Hellwig @ 2017-03-23 14:07 UTC (permalink / raw)
To: Dan Carpenter
Cc: Jens Axboe, Ming Lei, Mike Christie, Kent Overstreet,
Hannes Reinecke, Chaitanya Kulkarni, Mike Snitzer, Adrian Hunter,
linux-block, kernel-janitors
In-Reply-To: <20170323102455.GA20154@mwanda>
Looks fine,
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply
* Re: [PATCH] block: make nr_iovecs unsigned in bio_alloc_bioset()
From: Jens Axboe @ 2017-03-23 14:16 UTC (permalink / raw)
To: Dan Carpenter
Cc: Ming Lei, Mike Christie, Kent Overstreet, Hannes Reinecke,
Chaitanya Kulkarni, Mike Snitzer, Adrian Hunter, linux-block,
kernel-janitors
In-Reply-To: <20170323102455.GA20154@mwanda>
On 03/23/2017 04:24 AM, Dan Carpenter wrote:
> There isn't a bug here, but Smatch is not smart enough to know that
> "nr_iovecs" can't be negative so it complains about underflows.
> Really, it's slightly cleaner to make this parameter unsigned.
Applied, thanks Dan.
--
Jens Axboe
^ permalink raw reply
* RFC: always use REQ_OP_WRITE_ZEROES for zeroing offload
From: Christoph Hellwig @ 2017-03-23 14:33 UTC (permalink / raw)
To: axboe, martin.petersen, agk, snitzer, shli, philipp.reisner,
lars.ellenberg
Cc: linux-block, linux-scsi, drbd-dev, dm-devel, linux-raid
This series makes REQ_OP_WRITE_ZEROES the only zeroing offload
supported by the block layer, and switches existing implementations
of REQ_OP_DISCARD that correctly set discard_zeroes_data to it,
removes incorrect discard_zeroes_data, and also switches WRITE SAME
based zeroing in SCSI to this new method.
I've done testing with ATA, SCSI and NVMe setups, but there are
a few things that will need more attention:
- what is dm-kcopyd doing with the current WRITE SAME usage
that gets multiple biovecs? Can we fix it up in any way.
- what are we going to do with discards through parity raid?
I suspect we should either just disable it for now entirely
or modify raid5.c to issue REQ_OP_WRITE_ZEROES to the underlying
devices. But I need someone with such a setup to test it.
- The DRBD code in this area was very odd, and will need an
audit from the maintainers.
Note that this series needs to be applied on top of the
"support ranges TRIM for libata" series.
A git tree is also avaiable at:
git://git.infradead.org/users/hch/block.git discard-rework
Gitweb:
http://git.infradead.org/users/hch/block.git/shortlog/refs/heads/discard-rework
^ permalink raw reply
* [PATCH 01/23] block: renumber REQ_OP_WRITE_ZEROES
From: Christoph Hellwig @ 2017-03-23 14:33 UTC (permalink / raw)
To: axboe, martin.petersen, agk, snitzer, shli, philipp.reisner,
lars.ellenberg
Cc: linux-block, linux-scsi, drbd-dev, dm-devel, linux-raid
In-Reply-To: <20170323143341.31549-1-hch@lst.de>
Make life easy for implementations that needs to send a data buffer
to the device (e.g. SCSI) by numbering it as a data out command.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
include/linux/blk_types.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index d703acb55d0f..6393c13a6498 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -160,7 +160,7 @@ enum req_opf {
/* write the same sector many times */
REQ_OP_WRITE_SAME = 7,
/* write the zero filled sector many times */
- REQ_OP_WRITE_ZEROES = 8,
+ REQ_OP_WRITE_ZEROES = 9,
/* SCSI passthrough using struct scsi_request */
REQ_OP_SCSI_IN = 32,
--
2.11.0
^ permalink raw reply related
* [PATCH 02/23] block: implement splitting of REQ_OP_WRITE_ZEROES bios
From: Christoph Hellwig @ 2017-03-23 14:33 UTC (permalink / raw)
To: axboe, martin.petersen, agk, snitzer, shli, philipp.reisner,
lars.ellenberg
Cc: linux-block, linux-scsi, drbd-dev, dm-devel, linux-raid
In-Reply-To: <20170323143341.31549-1-hch@lst.de>
Copy and past the REQ_OP_WRITE_SAME code to prepare to implementations
that limit the write zeroes size.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
block/blk-merge.c | 17 +++++++++++++++--
1 file changed, 15 insertions(+), 2 deletions(-)
diff --git a/block/blk-merge.c b/block/blk-merge.c
index c62a6f0325e0..f9e5a3e0944b 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -63,6 +63,20 @@ static struct bio *blk_bio_discard_split(struct request_queue *q,
return bio_split(bio, split_sectors, GFP_NOIO, bs);
}
+static struct bio *blk_bio_write_zeroes_split(struct request_queue *q,
+ struct bio *bio, struct bio_set *bs, unsigned *nsegs)
+{
+ *nsegs = 1;
+
+ if (!q->limits.max_write_zeroes_sectors)
+ return NULL;
+
+ if (bio_sectors(bio) <= q->limits.max_write_zeroes_sectors)
+ return NULL;
+
+ return bio_split(bio, q->limits.max_write_zeroes_sectors, GFP_NOIO, bs);
+}
+
static struct bio *blk_bio_write_same_split(struct request_queue *q,
struct bio *bio,
struct bio_set *bs,
@@ -209,8 +223,7 @@ void blk_queue_split(struct request_queue *q, struct bio **bio,
split = blk_bio_discard_split(q, *bio, bs, &nsegs);
break;
case REQ_OP_WRITE_ZEROES:
- split = NULL;
- nsegs = (*bio)->bi_phys_segments;
+ split = blk_bio_write_zeroes_split(q, *bio, bs, &nsegs);
break;
case REQ_OP_WRITE_SAME:
split = blk_bio_write_same_split(q, *bio, bs, &nsegs);
--
2.11.0
^ permalink raw reply related
* [PATCH 03/23] sd: implement REQ_OP_WRITE_ZEROES
From: Christoph Hellwig @ 2017-03-23 14:33 UTC (permalink / raw)
To: axboe, martin.petersen, agk, snitzer, shli, philipp.reisner,
lars.ellenberg
Cc: linux-block, linux-scsi, drbd-dev, dm-devel, linux-raid
In-Reply-To: <20170323143341.31549-1-hch@lst.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/scsi/sd.c | 45 ++++++++++++++++++++++++++++++++++++++++-----
drivers/scsi/sd_zbc.c | 1 +
2 files changed, 41 insertions(+), 5 deletions(-)
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index af632e350ab4..b6f70a09a301 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -748,7 +748,7 @@ static int sd_setup_unmap_cmnd(struct scsi_cmnd *cmd)
return scsi_init_io(cmd);
}
-static int sd_setup_write_same16_cmnd(struct scsi_cmnd *cmd)
+static int sd_setup_write_same16_cmnd(struct scsi_cmnd *cmd, bool unmap)
{
struct scsi_device *sdp = cmd->device;
struct request *rq = cmd->request;
@@ -765,13 +765,14 @@ static int sd_setup_write_same16_cmnd(struct scsi_cmnd *cmd)
cmd->cmd_len = 16;
cmd->cmnd[0] = WRITE_SAME_16;
- cmd->cmnd[1] = 0x8; /* UNMAP */
+ if (unmap)
+ cmd->cmnd[1] = 0x8; /* UNMAP */
put_unaligned_be64(sector, &cmd->cmnd[2]);
put_unaligned_be32(nr_sectors, &cmd->cmnd[10]);
cmd->allowed = SD_MAX_RETRIES;
cmd->transfersize = data_len;
- rq->timeout = SD_TIMEOUT;
+ rq->timeout = unmap ? SD_TIMEOUT : SD_WRITE_SAME_TIMEOUT;
scsi_req(rq)->resid_len = data_len;
return scsi_init_io(cmd);
@@ -801,7 +802,7 @@ static int sd_setup_write_same10_cmnd(struct scsi_cmnd *cmd, bool unmap)
cmd->allowed = SD_MAX_RETRIES;
cmd->transfersize = data_len;
- rq->timeout = SD_TIMEOUT;
+ rq->timeout = unmap ? SD_TIMEOUT : SD_WRITE_SAME_TIMEOUT;
scsi_req(rq)->resid_len = data_len;
return scsi_init_io(cmd);
@@ -857,11 +858,39 @@ static int sd_setup_ata_trim_cmnd(struct scsi_cmnd *cmd)
return scsi_init_io(cmd);
}
+static int sd_setup_write_zeroes_cmnd(struct scsi_cmnd *cmd)
+{
+ struct request *rq = cmd->request;
+ struct scsi_device *sdp = cmd->device;
+ struct scsi_disk *sdkp = scsi_disk(rq->rq_disk);
+ u64 sector = blk_rq_pos(rq) >> (ilog2(sdp->sector_size) - 9);
+ u32 nr_sectors = blk_rq_sectors(rq) >> (ilog2(sdp->sector_size) - 9);
+
+ if (sdp->ata_trim) {
+ if (!sdp->ata_trim_zeroes_data)
+ return BLKPREP_INVALID;
+ return sd_setup_ata_trim_cmnd(cmd);
+ }
+ if (sdp->no_write_same)
+ return BLKPREP_INVALID;
+ if (sdkp->ws16 || sector > 0xffffffff || nr_sectors > 0xffff)
+ return sd_setup_write_same16_cmnd(cmd, false);
+ return sd_setup_write_same10_cmnd(cmd, false);
+}
+
static void sd_config_write_same(struct scsi_disk *sdkp)
{
struct request_queue *q = sdkp->disk->queue;
unsigned int logical_block_size = sdkp->device->sector_size;
+ if (sdkp->device->ata_trim) {
+ if (sdkp->device->ata_trim_zeroes_data)
+ sdkp->max_ws_blocks = 65535 * (512 / sizeof(__le64));
+ else
+ sdkp->max_ws_blocks = 0;
+ goto config_write_zeroes;
+ }
+
if (sdkp->device->no_write_same) {
sdkp->max_ws_blocks = 0;
goto out;
@@ -886,6 +915,9 @@ static void sd_config_write_same(struct scsi_disk *sdkp)
out:
blk_queue_max_write_same_sectors(q, sdkp->max_ws_blocks *
(logical_block_size >> 9));
+config_write_zeroes:
+ blk_queue_max_write_zeroes_sectors(q, sdkp->max_ws_blocks *
+ (logical_block_size >> 9));
}
/**
@@ -1226,7 +1258,7 @@ static int sd_init_command(struct scsi_cmnd *cmd)
case SD_LBP_UNMAP:
return sd_setup_unmap_cmnd(cmd);
case SD_LBP_WS16:
- return sd_setup_write_same16_cmnd(cmd);
+ return sd_setup_write_same16_cmnd(cmd, true);
case SD_LBP_WS10:
return sd_setup_write_same10_cmnd(cmd, true);
case SD_LBP_ZERO:
@@ -1236,6 +1268,8 @@ static int sd_init_command(struct scsi_cmnd *cmd)
default:
return BLKPREP_INVALID;
}
+ case REQ_OP_WRITE_ZEROES:
+ return sd_setup_write_zeroes_cmnd(cmd);
case REQ_OP_WRITE_SAME:
return sd_setup_write_same_cmnd(cmd);
case REQ_OP_FLUSH:
@@ -1876,6 +1910,7 @@ static int sd_done(struct scsi_cmnd *SCpnt)
switch (req_op(req)) {
case REQ_OP_DISCARD:
+ case REQ_OP_WRITE_ZEROES:
case REQ_OP_WRITE_SAME:
case REQ_OP_ZONE_RESET:
if (!result) {
diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
index 92620c8ea8ad..1994f7799fce 100644
--- a/drivers/scsi/sd_zbc.c
+++ b/drivers/scsi/sd_zbc.c
@@ -329,6 +329,7 @@ void sd_zbc_complete(struct scsi_cmnd *cmd,
switch (req_op(rq)) {
case REQ_OP_WRITE:
+ case REQ_OP_WRITE_ZEROES:
case REQ_OP_WRITE_SAME:
case REQ_OP_ZONE_RESET:
--
2.11.0
^ permalink raw reply related
* [PATCH 04/23] md: support REQ_OP_WRITE_ZEROES
From: Christoph Hellwig @ 2017-03-23 14:33 UTC (permalink / raw)
To: axboe, martin.petersen, agk, snitzer, shli, philipp.reisner,
lars.ellenberg
Cc: linux-block, linux-scsi, drbd-dev, dm-devel, linux-raid
In-Reply-To: <20170323143341.31549-1-hch@lst.de>
Copy & paste from the REQ_OP_WRITE_SAME code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/md/linear.c | 1 +
drivers/md/md.h | 7 +++++++
drivers/md/multipath.c | 1 +
drivers/md/raid0.c | 2 ++
drivers/md/raid1.c | 4 +++-
drivers/md/raid10.c | 1 +
drivers/md/raid5.c | 1 +
7 files changed, 16 insertions(+), 1 deletion(-)
diff --git a/drivers/md/linear.c b/drivers/md/linear.c
index 3e38e0207a3e..377a8a3672e3 100644
--- a/drivers/md/linear.c
+++ b/drivers/md/linear.c
@@ -293,6 +293,7 @@ static void linear_make_request(struct mddev *mddev, struct bio *bio)
split, disk_devt(mddev->gendisk),
bio_sector);
mddev_check_writesame(mddev, split);
+ mddev_check_write_zeroes(mddev, split);
generic_make_request(split);
}
} while (split != bio);
diff --git a/drivers/md/md.h b/drivers/md/md.h
index dde8ecb760c8..1e76d64ce180 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -709,4 +709,11 @@ static inline void mddev_check_writesame(struct mddev *mddev, struct bio *bio)
!bdev_get_queue(bio->bi_bdev)->limits.max_write_same_sectors)
mddev->queue->limits.max_write_same_sectors = 0;
}
+
+static inline void mddev_check_write_zeroes(struct mddev *mddev, struct bio *bio)
+{
+ if (bio_op(bio) == REQ_OP_WRITE_ZEROES &&
+ !bdev_get_queue(bio->bi_bdev)->limits.max_write_zeroes_sectors)
+ mddev->queue->limits.max_write_zeroes_sectors = 0;
+}
#endif /* _MD_MD_H */
diff --git a/drivers/md/multipath.c b/drivers/md/multipath.c
index 79a12b59250b..e95d521d93e9 100644
--- a/drivers/md/multipath.c
+++ b/drivers/md/multipath.c
@@ -139,6 +139,7 @@ static void multipath_make_request(struct mddev *mddev, struct bio * bio)
mp_bh->bio.bi_end_io = multipath_end_request;
mp_bh->bio.bi_private = mp_bh;
mddev_check_writesame(mddev, &mp_bh->bio);
+ mddev_check_write_zeroes(mddev, &mp_bh->bio);
generic_make_request(&mp_bh->bio);
return;
}
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index 93347ca7c7a6..ce7a6a56cf73 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -383,6 +383,7 @@ static int raid0_run(struct mddev *mddev)
blk_queue_max_hw_sectors(mddev->queue, mddev->chunk_sectors);
blk_queue_max_write_same_sectors(mddev->queue, mddev->chunk_sectors);
+ blk_queue_max_write_zeroes_sectors(mddev->queue, mddev->chunk_sectors);
blk_queue_max_discard_sectors(mddev->queue, mddev->chunk_sectors);
blk_queue_io_min(mddev->queue, mddev->chunk_sectors << 9);
@@ -504,6 +505,7 @@ static void raid0_make_request(struct mddev *mddev, struct bio *bio)
split, disk_devt(mddev->gendisk),
bio_sector);
mddev_check_writesame(mddev, split);
+ mddev_check_write_zeroes(mddev, split);
generic_make_request(split);
}
} while (split != bio);
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index a34f58772022..b59cc100320a 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -3177,8 +3177,10 @@ static int raid1_run(struct mddev *mddev)
if (IS_ERR(conf))
return PTR_ERR(conf);
- if (mddev->queue)
+ if (mddev->queue) {
blk_queue_max_write_same_sectors(mddev->queue, 0);
+ blk_queue_max_write_zeroes_sectors(mddev->queue, 0);
+ }
rdev_for_each(rdev, mddev) {
if (!mddev->gendisk)
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index e89a8d78a9ed..28ec3a93acee 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -3749,6 +3749,7 @@ static int raid10_run(struct mddev *mddev)
blk_queue_max_discard_sectors(mddev->queue,
mddev->chunk_sectors);
blk_queue_max_write_same_sectors(mddev->queue, 0);
+ blk_queue_max_write_zeroes_sectors(mddev->queue, 0);
blk_queue_io_min(mddev->queue, chunk_size);
if (conf->geo.raid_disks % conf->geo.near_copies)
blk_queue_io_opt(mddev->queue, chunk_size * conf->geo.raid_disks);
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index ed5cd705b985..8cf1f86dcd05 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -7272,6 +7272,7 @@ static int raid5_run(struct mddev *mddev)
mddev->queue->limits.discard_zeroes_data = 0;
blk_queue_max_write_same_sectors(mddev->queue, 0);
+ blk_queue_max_write_zeroes_sectors(mddev->queue, 0);
rdev_for_each(rdev, mddev) {
disk_stack_limits(mddev->gendisk, rdev->bdev,
--
2.11.0
^ permalink raw reply related
* [PATCH 05/23] dm: support REQ_OP_WRITE_ZEROES
From: Christoph Hellwig @ 2017-03-23 14:33 UTC (permalink / raw)
To: axboe, martin.petersen, agk, snitzer, shli, philipp.reisner,
lars.ellenberg
Cc: linux-block, linux-scsi, drbd-dev, dm-devel, linux-raid
In-Reply-To: <20170323143341.31549-1-hch@lst.de>
Copy & paste from the REQ_OP_WRITE_SAME code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/md/dm-core.h | 1 +
drivers/md/dm-io.c | 10 +++++++---
drivers/md/dm-linear.c | 1 +
drivers/md/dm-mpath.c | 1 +
drivers/md/dm-rq.c | 11 ++++++++---
drivers/md/dm-stripe.c | 2 ++
drivers/md/dm-table.c | 30 ++++++++++++++++++++++++++++++
drivers/md/dm.c | 31 ++++++++++++++++++++++++++++---
include/linux/device-mapper.h | 6 ++++++
9 files changed, 84 insertions(+), 9 deletions(-)
diff --git a/drivers/md/dm-core.h b/drivers/md/dm-core.h
index 136fda3ff9e5..fea5bd52ada8 100644
--- a/drivers/md/dm-core.h
+++ b/drivers/md/dm-core.h
@@ -132,6 +132,7 @@ void dm_init_md_queue(struct mapped_device *md);
void dm_init_normal_md_queue(struct mapped_device *md);
int md_in_flight(struct mapped_device *md);
void disable_write_same(struct mapped_device *md);
+void disable_write_zeroes(struct mapped_device *md);
static inline struct completion *dm_get_completion_from_kobject(struct kobject *kobj)
{
diff --git a/drivers/md/dm-io.c b/drivers/md/dm-io.c
index 03940bf36f6c..fad2d5d1888e 100644
--- a/drivers/md/dm-io.c
+++ b/drivers/md/dm-io.c
@@ -312,9 +312,12 @@ static void do_region(int op, int op_flags, unsigned region,
*/
if (op == REQ_OP_DISCARD)
special_cmd_max_sectors = q->limits.max_discard_sectors;
+ else if (op == REQ_OP_WRITE_ZEROES)
+ special_cmd_max_sectors = q->limits.max_write_zeroes_sectors;
else if (op == REQ_OP_WRITE_SAME)
special_cmd_max_sectors = q->limits.max_write_same_sectors;
- if ((op == REQ_OP_DISCARD || op == REQ_OP_WRITE_SAME) &&
+ if ((op == REQ_OP_DISCARD || op == REQ_OP_WRITE_ZEROES ||
+ op == REQ_OP_WRITE_SAME) &&
special_cmd_max_sectors == 0) {
dec_count(io, region, -EOPNOTSUPP);
return;
@@ -328,7 +331,8 @@ static void do_region(int op, int op_flags, unsigned region,
/*
* Allocate a suitably sized-bio.
*/
- if ((op == REQ_OP_DISCARD) || (op == REQ_OP_WRITE_SAME))
+ if ((op == REQ_OP_DISCARD) || (op == REQ_OP_WRITE_ZEROES) ||
+ (op == REQ_OP_WRITE_SAME))
num_bvecs = 1;
else
num_bvecs = min_t(int, BIO_MAX_PAGES,
@@ -341,7 +345,7 @@ static void do_region(int op, int op_flags, unsigned region,
bio_set_op_attrs(bio, op, op_flags);
store_io_and_region_in_bio(bio, io, region);
- if (op == REQ_OP_DISCARD) {
+ if (op == REQ_OP_DISCARD || op == REQ_OP_WRITE_ZEROES) {
num_sectors = min_t(sector_t, special_cmd_max_sectors, remaining);
bio->bi_iter.bi_size = num_sectors << SECTOR_SHIFT;
remaining -= num_sectors;
diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
index 4788b0b989a9..e17fd44ceef5 100644
--- a/drivers/md/dm-linear.c
+++ b/drivers/md/dm-linear.c
@@ -59,6 +59,7 @@ static int linear_ctr(struct dm_target *ti, unsigned int argc, char **argv)
ti->num_flush_bios = 1;
ti->num_discard_bios = 1;
ti->num_write_same_bios = 1;
+ ti->num_write_zeroes_bios = 1;
ti->private = lc;
return 0;
diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index 7f223dbed49f..ab55955ed704 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -1103,6 +1103,7 @@ static int multipath_ctr(struct dm_target *ti, unsigned argc, char **argv)
ti->num_flush_bios = 1;
ti->num_discard_bios = 1;
ti->num_write_same_bios = 1;
+ ti->num_write_zeroes_bios = 1;
if (m->queue_mode == DM_TYPE_BIO_BASED)
ti->per_io_data_size = multipath_per_bio_data_size();
else
diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index 28955b94d2b2..6006d5d4b06e 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -298,9 +298,14 @@ static void dm_done(struct request *clone, int error, bool mapped)
r = rq_end_io(tio->ti, clone, error, &tio->info);
}
- if (unlikely(r == -EREMOTEIO && (req_op(clone) == REQ_OP_WRITE_SAME) &&
- !clone->q->limits.max_write_same_sectors))
- disable_write_same(tio->md);
+ if (unlikely(r == -EREMOTEIO)) {
+ if (req_op(clone) == REQ_OP_WRITE_SAME &&
+ !clone->q->limits.max_write_same_sectors)
+ disable_write_same(tio->md);
+ if (req_op(clone) == REQ_OP_WRITE_ZEROES &&
+ !clone->q->limits.max_write_zeroes_sectors)
+ disable_write_zeroes(tio->md);
+ }
if (r <= 0)
/* The target wants to complete the I/O */
diff --git a/drivers/md/dm-stripe.c b/drivers/md/dm-stripe.c
index 28193a57bf47..5ef49c121d99 100644
--- a/drivers/md/dm-stripe.c
+++ b/drivers/md/dm-stripe.c
@@ -169,6 +169,7 @@ static int stripe_ctr(struct dm_target *ti, unsigned int argc, char **argv)
ti->num_flush_bios = stripes;
ti->num_discard_bios = stripes;
ti->num_write_same_bios = stripes;
+ ti->num_write_zeroes_bios = stripes;
sc->chunk_size = chunk_size;
if (chunk_size & (chunk_size - 1))
@@ -293,6 +294,7 @@ static int stripe_map(struct dm_target *ti, struct bio *bio)
return DM_MAPIO_REMAPPED;
}
if (unlikely(bio_op(bio) == REQ_OP_DISCARD) ||
+ unlikely(bio_op(bio) == REQ_OP_WRITE_ZEROES) ||
unlikely(bio_op(bio) == REQ_OP_WRITE_SAME)) {
target_bio_nr = dm_bio_get_target_bio_nr(bio);
BUG_ON(target_bio_nr >= sc->stripes);
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 3ad16d9c9d5a..5cd665c91ead 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -1533,6 +1533,34 @@ static bool dm_table_supports_write_same(struct dm_table *t)
return true;
}
+static int device_not_write_zeroes_capable(struct dm_target *ti, struct dm_dev *dev,
+ sector_t start, sector_t len, void *data)
+{
+ struct request_queue *q = bdev_get_queue(dev->bdev);
+
+ return q && !q->limits.max_write_zeroes_sectors;
+}
+
+static bool dm_table_supports_write_zeroes(struct dm_table *t)
+{
+ struct dm_target *ti;
+ unsigned i = 0;
+
+ while (i < dm_table_get_num_targets(t)) {
+ ti = dm_table_get_target(t, i++);
+
+ if (!ti->num_write_zeroes_bios)
+ return false;
+
+ if (!ti->type->iterate_devices ||
+ ti->type->iterate_devices(ti, device_not_write_zeroes_capable, NULL))
+ return false;
+ }
+
+ return true;
+}
+
+
static int device_discard_capable(struct dm_target *ti, struct dm_dev *dev,
sector_t start, sector_t len, void *data)
{
@@ -1603,6 +1631,8 @@ void dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
if (!dm_table_supports_write_same(t))
q->limits.max_write_same_sectors = 0;
+ if (!dm_table_supports_write_zeroes(t))
+ q->limits.max_write_zeroes_sectors = 0;
if (dm_table_all_devices_attribute(t, queue_supports_sg_merge))
queue_flag_clear_unlocked(QUEUE_FLAG_NO_SG_MERGE, q);
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index dfb75979e455..e8226359c8f7 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -825,6 +825,14 @@ void disable_write_same(struct mapped_device *md)
limits->max_write_same_sectors = 0;
}
+void disable_write_zeroes(struct mapped_device *md)
+{
+ struct queue_limits *limits = dm_get_queue_limits(md);
+
+ /* device doesn't really support WRITE ZEROES, disable it */
+ limits->max_write_zeroes_sectors = 0;
+}
+
static void clone_endio(struct bio *bio)
{
int error = bio->bi_error;
@@ -851,9 +859,14 @@ static void clone_endio(struct bio *bio)
}
}
- if (unlikely(r == -EREMOTEIO && (bio_op(bio) == REQ_OP_WRITE_SAME) &&
- !bdev_get_queue(bio->bi_bdev)->limits.max_write_same_sectors))
- disable_write_same(md);
+ if (unlikely(r == -EREMOTEIO)) {
+ if (bio_op(bio) == REQ_OP_WRITE_SAME &&
+ !bdev_get_queue(bio->bi_bdev)->limits.max_write_same_sectors)
+ disable_write_same(md);
+ if (bio_op(bio) == REQ_OP_WRITE_ZEROES &&
+ !bdev_get_queue(bio->bi_bdev)->limits.max_write_zeroes_sectors)
+ disable_write_zeroes(md);
+ }
free_tio(tio);
dec_pending(io, error);
@@ -1202,6 +1215,11 @@ static unsigned get_num_write_same_bios(struct dm_target *ti)
return ti->num_write_same_bios;
}
+static unsigned get_num_write_zeroes_bios(struct dm_target *ti)
+{
+ return ti->num_write_zeroes_bios;
+}
+
typedef bool (*is_split_required_fn)(struct dm_target *ti);
static bool is_split_required_for_discard(struct dm_target *ti)
@@ -1256,6 +1274,11 @@ static int __send_write_same(struct clone_info *ci)
return __send_changing_extent_only(ci, get_num_write_same_bios, NULL);
}
+static int __send_write_zeroes(struct clone_info *ci)
+{
+ return __send_changing_extent_only(ci, get_num_write_zeroes_bios, NULL);
+}
+
/*
* Select the correct strategy for processing a non-flush bio.
*/
@@ -1270,6 +1293,8 @@ static int __split_and_process_non_flush(struct clone_info *ci)
return __send_discard(ci);
else if (unlikely(bio_op(bio) == REQ_OP_WRITE_SAME))
return __send_write_same(ci);
+ else if (unlikely(bio_op(bio) == REQ_OP_WRITE_ZEROES))
+ return __send_write_zeroes(ci);
ti = dm_table_find_target(ci->map, ci->sector);
if (!dm_target_is_valid(ti))
diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
index a7e6903866fd..3829bee2302a 100644
--- a/include/linux/device-mapper.h
+++ b/include/linux/device-mapper.h
@@ -255,6 +255,12 @@ struct dm_target {
unsigned num_write_same_bios;
/*
+ * The number of WRITE ZEROES bios that will be submitted to the target.
+ * The bio number can be accessed with dm_bio_get_target_bio_nr.
+ */
+ unsigned num_write_zeroes_bios;
+
+ /*
* The minimum number of extra bytes allocated in each io for the
* target to use.
*/
--
2.11.0
^ permalink raw reply related
* [PATCH 06/23] dm-kcopyd: switch to use REQ_OP_WRITE_ZEROES
From: Christoph Hellwig @ 2017-03-23 14:33 UTC (permalink / raw)
To: axboe, martin.petersen, agk, snitzer, shli, philipp.reisner,
lars.ellenberg
Cc: linux-block, linux-scsi, drbd-dev, dm-devel, linux-raid
In-Reply-To: <20170323143341.31549-1-hch@lst.de>
It seems like the code currently passes whatever it was using for writes
to WRITE SAME. Just switch it to WRITE ZEROES, although that doesn't
need any payload.
Untested, and confused by the code, maybe someone who understands it
better than me can help..
Not-yet-signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/md/dm-kcopyd.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/md/dm-kcopyd.c b/drivers/md/dm-kcopyd.c
index 9e9d04cb7d51..f85846741d50 100644
--- a/drivers/md/dm-kcopyd.c
+++ b/drivers/md/dm-kcopyd.c
@@ -733,11 +733,11 @@ int dm_kcopyd_copy(struct dm_kcopyd_client *kc, struct dm_io_region *from,
job->pages = &zero_page_list;
/*
- * Use WRITE SAME to optimize zeroing if all dests support it.
+ * Use WRITE ZEROES to optimize zeroing if all dests support it.
*/
- job->rw = REQ_OP_WRITE_SAME;
+ job->rw = REQ_OP_WRITE_ZEROES;
for (i = 0; i < job->num_dests; i++)
- if (!bdev_write_same(job->dests[i].bdev)) {
+ if (!bdev_write_zeroes_sectors(job->dests[i].bdev)) {
job->rw = WRITE;
break;
}
--
2.11.0
^ permalink raw reply related
* [PATCH 07/23] block: stop using blkdev_issue_write_same for zeroing
From: Christoph Hellwig @ 2017-03-23 14:33 UTC (permalink / raw)
To: axboe, martin.petersen, agk, snitzer, shli, philipp.reisner,
lars.ellenberg
Cc: linux-block, linux-scsi, drbd-dev, dm-devel, linux-raid
In-Reply-To: <20170323143341.31549-1-hch@lst.de>
We'll always use the WRITE ZEROES code for zeroing now.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
block/blk-lib.c | 4 ----
1 file changed, 4 deletions(-)
diff --git a/block/blk-lib.c b/block/blk-lib.c
index ed1e78e24db0..a93115ccf461 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -364,10 +364,6 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
return 0;
}
- if (!blkdev_issue_write_same(bdev, sector, nr_sects, gfp_mask,
- ZERO_PAGE(0)))
- return 0;
-
blk_start_plug(&plug);
ret = __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask,
&bio, discard);
--
2.11.0
^ permalink raw reply related
* [PATCH 08/23] block: add a flags argument to (__)blkdev_issue_zeroout
From: Christoph Hellwig @ 2017-03-23 14:33 UTC (permalink / raw)
To: axboe, martin.petersen, agk, snitzer, shli, philipp.reisner,
lars.ellenberg
Cc: linux-block, linux-scsi, drbd-dev, dm-devel, linux-raid
In-Reply-To: <20170323143341.31549-1-hch@lst.de>
Turn the existin discard flag into a new BLKDEV_ZERO_UNMAP flag with
similar semantics, but without referring to diѕcard.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
block/blk-lib.c | 31 ++++++++++++++-----------------
block/ioctl.c | 3 +--
drivers/nvme/target/io-cmd.c | 2 +-
fs/block_dev.c | 2 +-
fs/dax.c | 2 +-
fs/xfs/xfs_bmap_util.c | 2 +-
include/linux/blkdev.h | 14 +++++++++-----
7 files changed, 28 insertions(+), 28 deletions(-)
diff --git a/block/blk-lib.c b/block/blk-lib.c
index a93115ccf461..1e849f904d4c 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -282,14 +282,18 @@ static int __blkdev_issue_write_zeroes(struct block_device *bdev,
* @nr_sects: number of sectors to write
* @gfp_mask: memory allocation flags (for bio_alloc)
* @biop: pointer to anchor bio
- * @discard: discard flag
+ * @flags: controls detailed behavior
*
* Description:
- * Generate and issue number of bios with zerofiled pages.
+ * Zero-fill a block range, either using hardware offload or by explicitly
+ * writing zeroes to the device.
+ *
+ * If a device is using logical block provisioning, the underlying space will
+ * only be release if %flags contains BLKDEV_ZERO_UNMAP.
*/
int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
sector_t nr_sects, gfp_t gfp_mask, struct bio **biop,
- bool discard)
+ unsigned flags)
{
int ret;
int bi_size = 0;
@@ -337,28 +341,21 @@ EXPORT_SYMBOL(__blkdev_issue_zeroout);
* @sector: start sector
* @nr_sects: number of sectors to write
* @gfp_mask: memory allocation flags (for bio_alloc)
- * @discard: whether to discard the block range
+ * @flags: controls detailed behavior
*
* Description:
- * Zero-fill a block range. If the discard flag is set and the block
- * device guarantees that subsequent READ operations to the block range
- * in question will return zeroes, the blocks will be discarded. Should
- * the discard request fail, if the discard flag is not set, or if
- * discard_zeroes_data is not supported, this function will resort to
- * zeroing the blocks manually, thus provisioning (allocating,
- * anchoring) them. If the block device supports WRITE ZEROES or WRITE SAME
- * command(s), blkdev_issue_zeroout() will use it to optimize the process of
- * clearing the block range. Otherwise the zeroing will be performed
- * using regular WRITE calls.
+ * Zero-fill a block range, either using hardware offload or by explicitly
+ * writing zeroes to the device. See __blkdev_issue_zeroout() for the
+ * valid values for %flags.
*/
int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
- sector_t nr_sects, gfp_t gfp_mask, bool discard)
+ sector_t nr_sects, gfp_t gfp_mask, unsigned flags)
{
int ret;
struct bio *bio = NULL;
struct blk_plug plug;
- if (discard) {
+ if (flags & BLKDEV_ZERO_UNMAP) {
if (!blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask,
BLKDEV_DISCARD_ZERO))
return 0;
@@ -366,7 +363,7 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
blk_start_plug(&plug);
ret = __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask,
- &bio, discard);
+ &bio, flags);
if (ret == 0 && bio) {
ret = submit_bio_wait(bio);
bio_put(bio);
diff --git a/block/ioctl.c b/block/ioctl.c
index 7b88820b93d9..304685022d9f 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -254,8 +254,7 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
mapping = bdev->bd_inode->i_mapping;
truncate_inode_pages_range(mapping, start, end);
- return blkdev_issue_zeroout(bdev, start >> 9, len >> 9, GFP_KERNEL,
- false);
+ return blkdev_issue_zeroout(bdev, start >> 9, len >> 9, GFP_KERNEL, 0);
}
static int put_ushort(unsigned long arg, unsigned short val)
diff --git a/drivers/nvme/target/io-cmd.c b/drivers/nvme/target/io-cmd.c
index 4195115c7e54..1366babaee50 100644
--- a/drivers/nvme/target/io-cmd.c
+++ b/drivers/nvme/target/io-cmd.c
@@ -184,7 +184,7 @@ static void nvmet_execute_write_zeroes(struct nvmet_req *req)
(req->ns->blksize_shift - 9)) + 1;
if (__blkdev_issue_zeroout(req->ns->bdev, sector, nr_sector,
- GFP_KERNEL, &bio, true))
+ GFP_KERNEL, &bio, BLKDEV_ZERO_UNMAP))
status = NVME_SC_INTERNAL | NVME_SC_DNR;
if (bio) {
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 2eca00ec4370..6128283b7830 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -2110,7 +2110,7 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
case FALLOC_FL_ZERO_RANGE:
case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE:
error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
- GFP_KERNEL, false);
+ GFP_KERNEL, 0);
break;
case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE:
/* Only punch if the device can do zeroing discard. */
diff --git a/fs/dax.c b/fs/dax.c
index de622d4282a6..7a107355d33f 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -982,7 +982,7 @@ int __dax_zero_page_range(struct block_device *bdev, sector_t sector,
sector_t start_sector = dax.sector + (offset >> 9);
return blkdev_issue_zeroout(bdev, start_sector,
- length >> 9, GFP_NOFS, true);
+ length >> 9, GFP_NOFS, BLKDEV_ZERO_UNMAP);
} else {
if (dax_map_atomic(bdev, &dax) < 0)
return PTR_ERR(dax.addr);
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 8b75dcea5966..6b87c4296787 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -81,7 +81,7 @@ xfs_zero_extent(
return blkdev_issue_zeroout(xfs_find_bdev_for_inode(VFS_I(ip)),
block << (mp->m_super->s_blocksize_bits - 9),
count_fsb << (mp->m_super->s_blocksize_bits - 9),
- GFP_NOFS, true);
+ GFP_NOFS, BLKDEV_ZERO_UNMAP);
}
int
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 3b3bd646f580..19209416225d 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1333,23 +1333,27 @@ static inline struct request *blk_map_queue_find_tag(struct blk_queue_tag *bqt,
return bqt->tag_index[tag];
}
+extern int blkdev_issue_flush(struct block_device *, gfp_t, sector_t *);
+extern int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
+ sector_t nr_sects, gfp_t gfp_mask, struct page *page);
#define BLKDEV_DISCARD_SECURE (1 << 0) /* issue a secure erase */
#define BLKDEV_DISCARD_ZERO (1 << 1) /* must reliably zero data */
-extern int blkdev_issue_flush(struct block_device *, gfp_t, sector_t *);
extern int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
sector_t nr_sects, gfp_t gfp_mask, unsigned long flags);
extern int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
sector_t nr_sects, gfp_t gfp_mask, int flags,
struct bio **biop);
-extern int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
- sector_t nr_sects, gfp_t gfp_mask, struct page *page);
+
+#define BLKDEV_ZERO_UNMAP (1 << 0) /* free unused blocks */
+
extern int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
sector_t nr_sects, gfp_t gfp_mask, struct bio **biop,
- bool discard);
+ unsigned flags);
extern int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
- sector_t nr_sects, gfp_t gfp_mask, bool discard);
+ sector_t nr_sects, gfp_t gfp_mask, unsigned flags);
+
static inline int sb_issue_discard(struct super_block *sb, sector_t block,
sector_t nr_blocks, gfp_t gfp_mask, unsigned long flags)
{
--
2.11.0
^ permalink raw reply related
* [PATCH 09/23] block: add a REQ_UNMAP flag for REQ_OP_WRITE_ZEROES
From: Christoph Hellwig @ 2017-03-23 14:33 UTC (permalink / raw)
To: axboe, martin.petersen, agk, snitzer, shli, philipp.reisner,
lars.ellenberg
Cc: linux-block, linux-scsi, drbd-dev, dm-devel, linux-raid
In-Reply-To: <20170323143341.31549-1-hch@lst.de>
If this flag is set logical provisioning capable device should
release space for the zeroed blocks if possible, if it is not set
devices should keep the blocks anchored.
Also remove an out of sync kerneldoc comment for a static function
that would have become even more out of data with this change.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
block/blk-lib.c | 19 +++++--------------
include/linux/blk_types.h | 5 +++++
2 files changed, 10 insertions(+), 14 deletions(-)
diff --git a/block/blk-lib.c b/block/blk-lib.c
index 1e849f904d4c..a59bb54ac7c7 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -226,20 +226,9 @@ int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
}
EXPORT_SYMBOL(blkdev_issue_write_same);
-/**
- * __blkdev_issue_write_zeroes - generate number of bios with WRITE ZEROES
- * @bdev: blockdev to issue
- * @sector: start sector
- * @nr_sects: number of sectors to write
- * @gfp_mask: memory allocation flags (for bio_alloc)
- * @biop: pointer to anchor bio
- *
- * Description:
- * Generate and issue number of bios(REQ_OP_WRITE_ZEROES) with zerofiled pages.
- */
static int __blkdev_issue_write_zeroes(struct block_device *bdev,
sector_t sector, sector_t nr_sects, gfp_t gfp_mask,
- struct bio **biop)
+ struct bio **biop, unsigned flags)
{
struct bio *bio = *biop;
unsigned int max_write_zeroes_sectors;
@@ -258,7 +247,9 @@ static int __blkdev_issue_write_zeroes(struct block_device *bdev,
bio = next_bio(bio, 0, gfp_mask);
bio->bi_iter.bi_sector = sector;
bio->bi_bdev = bdev;
- bio_set_op_attrs(bio, REQ_OP_WRITE_ZEROES, 0);
+ bio->bi_opf = REQ_OP_WRITE_ZEROES;
+ if (flags & BLKDEV_ZERO_UNMAP)
+ bio->bi_opf |= REQ_UNMAP;
if (nr_sects > max_write_zeroes_sectors) {
bio->bi_iter.bi_size = max_write_zeroes_sectors << 9;
@@ -306,7 +297,7 @@ int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
return -EINVAL;
ret = __blkdev_issue_write_zeroes(bdev, sector, nr_sects, gfp_mask,
- biop);
+ biop, flags);
if (ret == 0 || (ret && ret != -EOPNOTSUPP))
goto out;
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 6393c13a6498..4da61fd12c94 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -187,6 +187,10 @@ enum req_flag_bits {
__REQ_PREFLUSH, /* request for cache flush */
__REQ_RAHEAD, /* read ahead, can fail anytime */
__REQ_BACKGROUND, /* background IO */
+
+ /* command specific flags for REQ_OP_WRITE_ZEROES: */
+ __REQ_UNMAP, /* explicitly release space when zeroing */
+
__REQ_NR_BITS, /* stops here */
};
@@ -203,6 +207,7 @@ enum req_flag_bits {
#define REQ_PREFLUSH (1ULL << __REQ_PREFLUSH)
#define REQ_RAHEAD (1ULL << __REQ_RAHEAD)
#define REQ_BACKGROUND (1ULL << __REQ_BACKGROUND)
+#define REQ_UNMAP (1ULL << __REQ_UNMAP)
#define REQ_FAILFAST_MASK \
(REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT | REQ_FAILFAST_DRIVER)
--
2.11.0
^ permalink raw reply related
* [PATCH 10/23] block: add a new BLKDEV_ZERO_NOFALLBACK flag
From: Christoph Hellwig @ 2017-03-23 14:33 UTC (permalink / raw)
To: axboe, martin.petersen, agk, snitzer, shli, philipp.reisner,
lars.ellenberg
Cc: linux-block, linux-scsi, drbd-dev, dm-devel, linux-raid
In-Reply-To: <20170323143341.31549-1-hch@lst.de>
This avoids fallbacks to explicit zeroing in (__)blkdev_issue_zeroout if
the caller doesn't want them.
Also clean up the convoluted check for the return condition that this
new flag is added to.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
block/blk-lib.c | 5 ++++-
include/linux/blkdev.h | 1 +
2 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/block/blk-lib.c b/block/blk-lib.c
index a59bb54ac7c7..33c5bf373b7f 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -281,6 +281,9 @@ static int __blkdev_issue_write_zeroes(struct block_device *bdev,
*
* If a device is using logical block provisioning, the underlying space will
* only be release if %flags contains BLKDEV_ZERO_UNMAP.
+ *
+ * If %flags contains BLKDEV_ZERO_NOFALLBACK, the function will return
+ * -EOPNOTSUPP if no explicit hardware offload for zeroing is provided.
*/
int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
sector_t nr_sects, gfp_t gfp_mask, struct bio **biop,
@@ -298,7 +301,7 @@ int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
ret = __blkdev_issue_write_zeroes(bdev, sector, nr_sects, gfp_mask,
biop, flags);
- if (ret == 0 || (ret && ret != -EOPNOTSUPP))
+ if (ret != -EOPNOTSUPP || (flags & BLKDEV_ZERO_NOFALLBACK))
goto out;
ret = 0;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 19209416225d..1bddcfc631a4 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1347,6 +1347,7 @@ extern int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
struct bio **biop);
#define BLKDEV_ZERO_UNMAP (1 << 0) /* free unused blocks */
+#define BLKDEV_ZERO_NOFALLBACK (1 << 1) /* don't write explicit zeroes */
extern int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
sector_t nr_sects, gfp_t gfp_mask, struct bio **biop,
--
2.11.0
^ permalink raw reply related
* [PATCH 11/23] block_dev: use blkdev_issue_zerout for hole punches
From: Christoph Hellwig @ 2017-03-23 14:33 UTC (permalink / raw)
To: axboe, martin.petersen, agk, snitzer, shli, philipp.reisner,
lars.ellenberg
Cc: linux-block, linux-scsi, drbd-dev, dm-devel, linux-raid
In-Reply-To: <20170323143341.31549-1-hch@lst.de>
This gets us support for non-discard efficient write of zeroes (e.g. NVMe)
and preparse for removing the discard_zeroes_data flag.
Also remove a pointless discard support check, which is done in
blkdev_issue_discard already.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/block_dev.c | 16 +++++-----------
1 file changed, 5 insertions(+), 11 deletions(-)
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 6128283b7830..5c701c16e8ff 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -2074,10 +2074,10 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
loff_t len)
{
struct block_device *bdev = I_BDEV(bdev_file_inode(file));
- struct request_queue *q = bdev_get_queue(bdev);
struct address_space *mapping;
loff_t end = start + len - 1;
loff_t isize;
+ unsigned flags = 0;
int error;
/* Fail if we don't recognize the flags. */
@@ -2107,21 +2107,15 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
truncate_inode_pages_range(mapping, start, end);
switch (mode) {
+ case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE:
+ flags = BLKDEV_ZERO_NOFALLBACK;
+ /*FALLTHRU*/
case FALLOC_FL_ZERO_RANGE:
case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE:
error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
- GFP_KERNEL, 0);
- break;
- case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE:
- /* Only punch if the device can do zeroing discard. */
- if (!blk_queue_discard(q) || !q->limits.discard_zeroes_data)
- return -EOPNOTSUPP;
- error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
- GFP_KERNEL, 0);
+ GFP_KERNEL, flags);
break;
case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE:
- if (!blk_queue_discard(q))
- return -EOPNOTSUPP;
error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
GFP_KERNEL, 0);
break;
--
2.11.0
^ permalink raw reply related
* [PATCH 12/23] sd: handle REQ_UNMAP
From: Christoph Hellwig @ 2017-03-23 14:33 UTC (permalink / raw)
To: axboe, martin.petersen, agk, snitzer, shli, philipp.reisner,
lars.ellenberg
Cc: linux-block, linux-scsi, drbd-dev, dm-devel, linux-raid
In-Reply-To: <20170323143341.31549-1-hch@lst.de>
Try to use a write same with unmap bit variant if the device supports it
and the caller asks for it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/scsi/sd.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index b6f70a09a301..ca96bb33471b 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -871,6 +871,16 @@ static int sd_setup_write_zeroes_cmnd(struct scsi_cmnd *cmd)
return BLKPREP_INVALID;
return sd_setup_ata_trim_cmnd(cmd);
}
+
+ if (rq->cmd_flags & REQ_UNMAP) {
+ switch (sdkp->provisioning_mode) {
+ case SD_LBP_WS16:
+ return sd_setup_write_same16_cmnd(cmd, true);
+ case SD_LBP_WS10:
+ return sd_setup_write_same10_cmnd(cmd, true);
+ }
+ }
+
if (sdp->no_write_same)
return BLKPREP_INVALID;
if (sdkp->ws16 || sector > 0xffffffff || nr_sectors > 0xffff)
--
2.11.0
^ permalink raw reply related
* [PATCH 13/23] nvme: implement REQ_OP_WRITE_ZEROES
From: Christoph Hellwig @ 2017-03-23 14:33 UTC (permalink / raw)
To: axboe, martin.petersen, agk, snitzer, shli, philipp.reisner,
lars.ellenberg
Cc: linux-block, linux-scsi, drbd-dev, dm-devel, linux-raid
In-Reply-To: <20170323143341.31549-1-hch@lst.de>
But now for the real NVMe Write Zeroes yet, just to get rid of the
discard abuse for zeroing. Also rename the quirk flag to be a bit
more self-explanatory.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/nvme/host/core.c | 10 +++++-----
drivers/nvme/host/nvme.h | 6 +++---
drivers/nvme/host/pci.c | 6 +++---
3 files changed, 11 insertions(+), 11 deletions(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 9b3b57fef446..bfa0595dc767 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -335,6 +335,8 @@ int nvme_setup_cmd(struct nvme_ns *ns, struct request *req,
case REQ_OP_FLUSH:
nvme_setup_flush(ns, cmd);
break;
+ case REQ_OP_WRITE_ZEROES:
+ /* currently only aliased to deallocate for a few ctrls: */
case REQ_OP_DISCARD:
ret = nvme_setup_discard(ns, req, cmd);
break;
@@ -900,16 +902,14 @@ static void nvme_config_discard(struct nvme_ns *ns)
BUILD_BUG_ON(PAGE_SIZE / sizeof(struct nvme_dsm_range) <
NVME_DSM_MAX_RANGES);
- if (ctrl->quirks & NVME_QUIRK_DISCARD_ZEROES)
- ns->queue->limits.discard_zeroes_data = 1;
- else
- ns->queue->limits.discard_zeroes_data = 0;
-
ns->queue->limits.discard_alignment = logical_block_size;
ns->queue->limits.discard_granularity = logical_block_size;
blk_queue_max_discard_sectors(ns->queue, UINT_MAX);
blk_queue_max_discard_segments(ns->queue, NVME_DSM_MAX_RANGES);
queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, ns->queue);
+
+ if (ctrl->quirks & NVME_QUIRK_DEALLOCATE_ZEROES)
+ blk_queue_max_write_zeroes_sectors(ns->queue, UINT_MAX);
}
static int nvme_revalidate_ns(struct nvme_ns *ns, struct nvme_id_ns **id)
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 2aa20e3e5675..07ebc4a1c8fc 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -68,10 +68,10 @@ enum nvme_quirks {
NVME_QUIRK_IDENTIFY_CNS = (1 << 1),
/*
- * The controller deterministically returns O's on reads to discarded
- * logical blocks.
+ * The controller deterministically returns O's on reads to
+ * logical blocks that deallocate was called on.
*/
- NVME_QUIRK_DISCARD_ZEROES = (1 << 2),
+ NVME_QUIRK_DEALLOCATE_ZEROES = (1 << 2),
/*
* The controller needs a delay before starts checking the device
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 26a5fd05fe88..0a28787267f0 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2135,13 +2135,13 @@ static const struct pci_error_handlers nvme_err_handler = {
static const struct pci_device_id nvme_id_table[] = {
{ PCI_VDEVICE(INTEL, 0x0953),
.driver_data = NVME_QUIRK_STRIPE_SIZE |
- NVME_QUIRK_DISCARD_ZEROES, },
+ NVME_QUIRK_DEALLOCATE_ZEROES, },
{ PCI_VDEVICE(INTEL, 0x0a53),
.driver_data = NVME_QUIRK_STRIPE_SIZE |
- NVME_QUIRK_DISCARD_ZEROES, },
+ NVME_QUIRK_DEALLOCATE_ZEROES, },
{ PCI_VDEVICE(INTEL, 0x0a54),
.driver_data = NVME_QUIRK_STRIPE_SIZE |
- NVME_QUIRK_DISCARD_ZEROES, },
+ NVME_QUIRK_DEALLOCATE_ZEROES, },
{ PCI_VDEVICE(INTEL, 0x5845), /* Qemu emulated controller */
.driver_data = NVME_QUIRK_IDENTIFY_CNS, },
{ PCI_DEVICE(0x1c58, 0x0003), /* HGST adapter */
--
2.11.0
^ permalink raw reply related
* [PATCH 14/23] zram: implement REQ_OP_WRITE_ZEROES
From: Christoph Hellwig @ 2017-03-23 14:33 UTC (permalink / raw)
To: axboe, martin.petersen, agk, snitzer, shli, philipp.reisner,
lars.ellenberg
Cc: linux-block, linux-scsi, drbd-dev, dm-devel, linux-raid
In-Reply-To: <20170323143341.31549-1-hch@lst.de>
Just the same as discard if the block size equals the system page size.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/block/zram/zram_drv.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index dceb5edd1e54..1710b06f04a7 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -829,10 +829,14 @@ static void __zram_make_request(struct zram *zram, struct bio *bio)
offset = (bio->bi_iter.bi_sector &
(SECTORS_PER_PAGE - 1)) << SECTOR_SHIFT;
- if (unlikely(bio_op(bio) == REQ_OP_DISCARD)) {
+ switch (bio_op(bio)) {
+ case REQ_OP_DISCARD:
+ case REQ_OP_WRITE_ZEROES:
zram_bio_discard(zram, index, offset, bio);
bio_endio(bio);
return;
+ default:
+ break;
}
bio_for_each_segment(bvec, bio, iter) {
@@ -1192,6 +1196,8 @@ static int zram_add(void)
zram->disk->queue->limits.max_sectors = SECTORS_PER_PAGE;
zram->disk->queue->limits.chunk_sectors = 0;
blk_queue_max_discard_sectors(zram->disk->queue, UINT_MAX);
+ queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, zram->disk->queue);
+
/*
* zram_bio_discard() will clear all logical blocks if logical block
* size is identical with physical block size(PAGE_SIZE). But if it is
@@ -1201,10 +1207,7 @@ static int zram_add(void)
* zeroed.
*/
if (ZRAM_LOGICAL_BLOCK_SIZE == PAGE_SIZE)
- zram->disk->queue->limits.discard_zeroes_data = 1;
- else
- zram->disk->queue->limits.discard_zeroes_data = 0;
- queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, zram->disk->queue);
+ blk_queue_max_write_zeroes_sectors(zram->disk->queue, UINT_MAX);
add_disk(zram->disk);
--
2.11.0
^ permalink raw reply related
* [PATCH 15/23] loop: implement REQ_OP_WRITE_ZEROES
From: Christoph Hellwig @ 2017-03-23 14:33 UTC (permalink / raw)
To: axboe, martin.petersen, agk, snitzer, shli, philipp.reisner,
lars.ellenberg
Cc: linux-block, linux-scsi, drbd-dev, dm-devel, linux-raid
In-Reply-To: <20170323143341.31549-1-hch@lst.de>
It's identical to discard as hole punches will always leave us with
zeroes on reads.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/block/loop.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 0ecb6461ed81..265cd2e33ff0 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -528,6 +528,7 @@ static int do_req_filebacked(struct loop_device *lo, struct request *rq)
case REQ_OP_FLUSH:
return lo_req_flush(lo, rq);
case REQ_OP_DISCARD:
+ case REQ_OP_WRITE_ZEROES:
return lo_discard(lo, rq, pos);
case REQ_OP_WRITE:
if (lo->transfer)
@@ -826,6 +827,7 @@ static void loop_config_discard(struct loop_device *lo)
q->limits.discard_granularity = 0;
q->limits.discard_alignment = 0;
blk_queue_max_discard_sectors(q, 0);
+ blk_queue_max_write_zeroes_sectors(q, 0);
q->limits.discard_zeroes_data = 0;
queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, q);
return;
@@ -834,6 +836,7 @@ static void loop_config_discard(struct loop_device *lo)
q->limits.discard_granularity = inode->i_sb->s_blocksize;
q->limits.discard_alignment = 0;
blk_queue_max_discard_sectors(q, UINT_MAX >> 9);
+ blk_queue_max_write_zeroes_sectors(q, UINT_MAX >> 9);
q->limits.discard_zeroes_data = 1;
queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, q);
}
@@ -1660,6 +1663,7 @@ static int loop_queue_rq(struct blk_mq_hw_ctx *hctx,
switch (req_op(cmd->rq)) {
case REQ_OP_FLUSH:
case REQ_OP_DISCARD:
+ case REQ_OP_WRITE_ZEROES:
cmd->use_aio = false;
break;
default:
--
2.11.0
^ permalink raw reply related
* [PATCH 16/23] brd: remove discard support
From: Christoph Hellwig @ 2017-03-23 14:33 UTC (permalink / raw)
To: axboe, martin.petersen, agk, snitzer, shli, philipp.reisner,
lars.ellenberg
Cc: linux-block, linux-scsi, drbd-dev, dm-devel, linux-raid
In-Reply-To: <20170323143341.31549-1-hch@lst.de>
It's just a in-driver reimplementation of writing zeroes to the pages,
which fails if the discards aren't page aligned.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/block/brd.c | 56 -----------------------------------------------------
1 file changed, 56 deletions(-)
diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index 3adc32a3153b..c15d0581531f 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -134,28 +134,6 @@ static struct page *brd_insert_page(struct brd_device *brd, sector_t sector)
return page;
}
-static void brd_free_page(struct brd_device *brd, sector_t sector)
-{
- struct page *page;
- pgoff_t idx;
-
- spin_lock(&brd->brd_lock);
- idx = sector >> PAGE_SECTORS_SHIFT;
- page = radix_tree_delete(&brd->brd_pages, idx);
- spin_unlock(&brd->brd_lock);
- if (page)
- __free_page(page);
-}
-
-static void brd_zero_page(struct brd_device *brd, sector_t sector)
-{
- struct page *page;
-
- page = brd_lookup_page(brd, sector);
- if (page)
- clear_highpage(page);
-}
-
/*
* Free all backing store pages and radix tree. This must only be called when
* there are no other users of the device.
@@ -212,24 +190,6 @@ static int copy_to_brd_setup(struct brd_device *brd, sector_t sector, size_t n)
return 0;
}
-static void discard_from_brd(struct brd_device *brd,
- sector_t sector, size_t n)
-{
- while (n >= PAGE_SIZE) {
- /*
- * Don't want to actually discard pages here because
- * re-allocating the pages can result in writeback
- * deadlocks under heavy load.
- */
- if (0)
- brd_free_page(brd, sector);
- else
- brd_zero_page(brd, sector);
- sector += PAGE_SIZE >> SECTOR_SHIFT;
- n -= PAGE_SIZE;
- }
-}
-
/*
* Copy n bytes from src to the brd starting at sector. Does not sleep.
*/
@@ -338,14 +298,6 @@ static blk_qc_t brd_make_request(struct request_queue *q, struct bio *bio)
if (bio_end_sector(bio) > get_capacity(bdev->bd_disk))
goto io_error;
- if (unlikely(bio_op(bio) == REQ_OP_DISCARD)) {
- if (sector & ((PAGE_SIZE >> SECTOR_SHIFT) - 1) ||
- bio->bi_iter.bi_size & ~PAGE_MASK)
- goto io_error;
- discard_from_brd(brd, sector, bio->bi_iter.bi_size);
- goto out;
- }
-
bio_for_each_segment(bvec, bio, iter) {
unsigned int len = bvec.bv_len;
int err;
@@ -357,9 +309,6 @@ static blk_qc_t brd_make_request(struct request_queue *q, struct bio *bio)
sector += len >> SECTOR_SHIFT;
}
-out:
- bio_endio(bio);
- return BLK_QC_T_NONE;
io_error:
bio_io_error(bio);
return BLK_QC_T_NONE;
@@ -464,11 +413,6 @@ static struct brd_device *brd_alloc(int i)
* is harmless)
*/
blk_queue_physical_block_size(brd->brd_queue, PAGE_SIZE);
-
- brd->brd_queue->limits.discard_granularity = PAGE_SIZE;
- blk_queue_max_discard_sectors(brd->brd_queue, UINT_MAX);
- brd->brd_queue->limits.discard_zeroes_data = 1;
- queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, brd->brd_queue);
#ifdef CONFIG_BLK_DEV_RAM_DAX
queue_flag_set_unlocked(QUEUE_FLAG_DAX, brd->brd_queue);
#endif
--
2.11.0
^ permalink raw reply related
* [PATCH 17/23] rbd: remove the discard_zeroes_data flag
From: Christoph Hellwig @ 2017-03-23 14:33 UTC (permalink / raw)
To: axboe, martin.petersen, agk, snitzer, shli, philipp.reisner,
lars.ellenberg
Cc: linux-block, linux-scsi, drbd-dev, dm-devel, linux-raid
In-Reply-To: <20170323143341.31549-1-hch@lst.de>
rbd only supports discarding on large alignments, so the zeroing code
would always fall back to explicit writings of zeroes.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/block/rbd.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 517838b65964..0ec3b430e81d 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -4380,7 +4380,6 @@ static int rbd_init_disk(struct rbd_device *rbd_dev)
q->limits.discard_granularity = segment_size;
q->limits.discard_alignment = segment_size;
blk_queue_max_discard_sectors(q, segment_size / SECTOR_SIZE);
- q->limits.discard_zeroes_data = 1;
if (!ceph_test_opt(rbd_dev->rbd_client->client, NOCRC))
q->backing_dev_info->capabilities |= BDI_CAP_STABLE_WRITES;
--
2.11.0
^ permalink raw reply related
* [PATCH 18/23] rsxx: remove the discard_zeroes_data flag
From: Christoph Hellwig @ 2017-03-23 14:33 UTC (permalink / raw)
To: axboe, martin.petersen, agk, snitzer, shli, philipp.reisner,
lars.ellenberg
Cc: linux-block, linux-scsi, drbd-dev, dm-devel, linux-raid
In-Reply-To: <20170323143341.31549-1-hch@lst.de>
rsxx only supports discarding on large alignments, so the zeroing code
would always fall back to explicit writings of zeroes.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/block/rsxx/dev.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/block/rsxx/dev.c b/drivers/block/rsxx/dev.c
index f81d70b39d10..9c566364ac9c 100644
--- a/drivers/block/rsxx/dev.c
+++ b/drivers/block/rsxx/dev.c
@@ -300,7 +300,6 @@ int rsxx_setup_dev(struct rsxx_cardinfo *card)
RSXX_HW_BLK_SIZE >> 9);
card->queue->limits.discard_granularity = RSXX_HW_BLK_SIZE;
card->queue->limits.discard_alignment = RSXX_HW_BLK_SIZE;
- card->queue->limits.discard_zeroes_data = 1;
}
card->queue->queuedata = card;
--
2.11.0
^ permalink raw reply related
* [PATCH 19/23] mmc: remove the discard_zeroes_data flag
From: Christoph Hellwig @ 2017-03-23 14:33 UTC (permalink / raw)
To: axboe, martin.petersen, agk, snitzer, shli, philipp.reisner,
lars.ellenberg
Cc: linux-block, linux-scsi, drbd-dev, dm-devel, linux-raid
In-Reply-To: <20170323143341.31549-1-hch@lst.de>
mmc only supports discarding on large alignments, so the zeroing code
would always fall back to explicit writings of zeroes.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/mmc/core/queue.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
index 493eb10ce580..4c54ad34e17a 100644
--- a/drivers/mmc/core/queue.c
+++ b/drivers/mmc/core/queue.c
@@ -167,8 +167,6 @@ static void mmc_queue_setup_discard(struct request_queue *q,
queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, q);
blk_queue_max_discard_sectors(q, max_discard);
- if (card->erased_byte == 0 && !mmc_can_discard(card))
- q->limits.discard_zeroes_data = 1;
q->limits.discard_granularity = card->pref_erase << 9;
/* granularity must not be greater than max. discard */
if (card->pref_erase > max_discard)
--
2.11.0
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox