* [PATCH] block: introduce BLKDEV_DISCARD_ZERO to fix zeroout
@ 2016-06-20 10:28 Christoph Hellwig
2016-06-20 18:29 ` Jeff Moyer
2016-06-21 0:44 ` Martin K. Petersen
0 siblings, 2 replies; 10+ messages in thread
From: Christoph Hellwig @ 2016-06-20 10:28 UTC (permalink / raw)
To: axboe; +Cc: shli, martin.petersen, snitzer, sitsofe, linux-block
Currently blkdev_issue_zeroout cascades down from discards (if the driver
gurantees that discards zero data), to WRITE SAME and then to a loop
writing zeroes. Unfortunately we ignore tun-time EOPNOTSUPP errors in the
block layer blkdev_issue_discard helper to work around DM volumes that
may have mixed discard support underneath.
This path intoroduces a new BLKDEV_DISCARD_ZERO flag to
blkdev_issue_discard that indicates we are called for zeroing operation.
This allows both to ignore the EOPNOTSUPP hack and actually consolidating
the discard_zeroes_data check into the function.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
block/blk-lib.c | 17 +++++++++++------
include/linux/blkdev.h | 4 +++-
2 files changed, 14 insertions(+), 7 deletions(-)
diff --git a/block/blk-lib.c b/block/blk-lib.c
index 78626c2..45b35b1 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -36,12 +36,17 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
return -ENXIO;
if (flags & BLKDEV_DISCARD_SECURE) {
+ if (flags & BLKDEV_DISCARD_ZERO)
+ return -EOPNOTSUPP;
if (!blk_queue_secure_erase(q))
return -EOPNOTSUPP;
op = REQ_OP_SECURE_ERASE;
} else {
if (!blk_queue_discard(q))
return -EOPNOTSUPP;
+ if ((flags & BLKDEV_DISCARD_ZERO) &&
+ !q->limits.discard_zeroes_data)
+ return -EOPNOTSUPP;
op = REQ_OP_DISCARD;
}
@@ -116,7 +121,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
&bio);
if (!ret && bio) {
ret = submit_bio_wait(bio);
- if (ret == -EOPNOTSUPP)
+ if (ret == -EOPNOTSUPP && !(flags & BLKDEV_DISCARD_ZERO))
ret = 0;
}
blk_finish_plug(&plug);
@@ -241,11 +246,11 @@ static int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
sector_t nr_sects, gfp_t gfp_mask, bool discard)
{
- struct request_queue *q = bdev_get_queue(bdev);
-
- if (discard && blk_queue_discard(q) && q->limits.discard_zeroes_data &&
- blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, 0) == 0)
- return 0;
+ if (discard) {
+ if (!blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask,
+ BLKDEV_DISCARD_ZERO))
+ return 0;
+ }
if (bdev_write_same(bdev) &&
blkdev_issue_write_same(bdev, sector, nr_sects, gfp_mask,
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 9d1e0a4..b65ca66 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1136,7 +1136,9 @@ static inline struct request *blk_map_queue_find_tag(struct blk_queue_tag *bqt,
return bqt->tag_index[tag];
}
-#define BLKDEV_DISCARD_SECURE 0x01 /* secure discard */
+
+#define BLKDEV_DISCARD_SECURE (1 << 0) /* issue a secure erase */
+#define BLKDEV_DISCARD_ZERO (1 << 1) /* must reliably zero data */
extern int blkdev_issue_flush(struct block_device *, gfp_t, sector_t *);
extern int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
--
2.1.4
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [PATCH] block: introduce BLKDEV_DISCARD_ZERO to fix zeroout
2016-06-20 10:28 [PATCH] block: introduce BLKDEV_DISCARD_ZERO to fix zeroout Christoph Hellwig
@ 2016-06-20 18:29 ` Jeff Moyer
2016-06-21 12:39 ` Christoph Hellwig
2016-06-21 0:44 ` Martin K. Petersen
1 sibling, 1 reply; 10+ messages in thread
From: Jeff Moyer @ 2016-06-20 18:29 UTC (permalink / raw)
To: Christoph Hellwig
Cc: axboe, shli, martin.petersen, snitzer, sitsofe, linux-block
Christoph Hellwig <hch@lst.de> writes:
> Currently blkdev_issue_zeroout cascades down from discards (if the driver
> gurantees that discards zero data), to WRITE SAME and then to a loop
> writing zeroes. Unfortunately we ignore tun-time EOPNOTSUPP errors in the
> block layer blkdev_issue_discard helper to work around DM volumes that
> may have mixed discard support underneath.
>
> This path intoroduces a new BLKDEV_DISCARD_ZERO flag to
> blkdev_issue_discard that indicates we are called for zeroing operation.
> This allows both to ignore the EOPNOTSUPP hack and actually consolidating
> the discard_zeroes_data check into the function.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> block/blk-lib.c | 17 +++++++++++------
> include/linux/blkdev.h | 4 +++-
> 2 files changed, 14 insertions(+), 7 deletions(-)
>
> diff --git a/block/blk-lib.c b/block/blk-lib.c
> index 78626c2..45b35b1 100644
> --- a/block/blk-lib.c
> +++ b/block/blk-lib.c
> @@ -36,12 +36,17 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
> return -ENXIO;
>
> if (flags & BLKDEV_DISCARD_SECURE) {
> + if (flags & BLKDEV_DISCARD_ZERO)
> + return -EOPNOTSUPP;
Should this be -EINVAL?
-Jeff
> if (!blk_queue_secure_erase(q))
> return -EOPNOTSUPP;
> op = REQ_OP_SECURE_ERASE;
> } else {
> if (!blk_queue_discard(q))
> return -EOPNOTSUPP;
> + if ((flags & BLKDEV_DISCARD_ZERO) &&
> + !q->limits.discard_zeroes_data)
> + return -EOPNOTSUPP;
> op = REQ_OP_DISCARD;
> }
>
> @@ -116,7 +121,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
> &bio);
> if (!ret && bio) {
> ret = submit_bio_wait(bio);
> - if (ret == -EOPNOTSUPP)
> + if (ret == -EOPNOTSUPP && !(flags & BLKDEV_DISCARD_ZERO))
> ret = 0;
> }
> blk_finish_plug(&plug);
> @@ -241,11 +246,11 @@ static int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
> int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
> sector_t nr_sects, gfp_t gfp_mask, bool discard)
> {
> - struct request_queue *q = bdev_get_queue(bdev);
> -
> - if (discard && blk_queue_discard(q) && q->limits.discard_zeroes_data &&
> - blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, 0) == 0)
> - return 0;
> + if (discard) {
> + if (!blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask,
> + BLKDEV_DISCARD_ZERO))
> + return 0;
> + }
>
> if (bdev_write_same(bdev) &&
> blkdev_issue_write_same(bdev, sector, nr_sects, gfp_mask,
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 9d1e0a4..b65ca66 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -1136,7 +1136,9 @@ static inline struct request *blk_map_queue_find_tag(struct blk_queue_tag *bqt,
> return bqt->tag_index[tag];
> }
>
> -#define BLKDEV_DISCARD_SECURE 0x01 /* secure discard */
> +
> +#define BLKDEV_DISCARD_SECURE (1 << 0) /* issue a secure erase */
> +#define BLKDEV_DISCARD_ZERO (1 << 1) /* must reliably zero data */
>
> extern int blkdev_issue_flush(struct block_device *, gfp_t, sector_t *);
> extern int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH] block: introduce BLKDEV_DISCARD_ZERO to fix zeroout
2016-06-20 18:29 ` Jeff Moyer
@ 2016-06-21 12:39 ` Christoph Hellwig
2016-06-21 14:02 ` Jeff Moyer
0 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2016-06-21 12:39 UTC (permalink / raw)
To: Jeff Moyer
Cc: Christoph Hellwig, axboe, shli, martin.petersen, snitzer, sitsofe,
linux-block
> > if (flags & BLKDEV_DISCARD_SECURE) {
> > + if (flags & BLKDEV_DISCARD_ZERO)
> > + return -EOPNOTSUPP;
>
> Should this be -EINVAL?
BLKDEV_DISCARD_SECURE | BLKDEV_DISCARD_ZERO seems like a theoretically
possible case, but I don't really care..
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH] block: introduce BLKDEV_DISCARD_ZERO to fix zeroout
2016-06-21 12:39 ` Christoph Hellwig
@ 2016-06-21 14:02 ` Jeff Moyer
0 siblings, 0 replies; 10+ messages in thread
From: Jeff Moyer @ 2016-06-21 14:02 UTC (permalink / raw)
To: Christoph Hellwig
Cc: axboe, shli, martin.petersen, snitzer, sitsofe, linux-block
Christoph Hellwig <hch@lst.de> writes:
>> > if (flags & BLKDEV_DISCARD_SECURE) {
>> > + if (flags & BLKDEV_DISCARD_ZERO)
>> > + return -EOPNOTSUPP;
>>
>> Should this be -EINVAL?
>
> BLKDEV_DISCARD_SECURE | BLKDEV_DISCARD_ZERO seems like a theoretically
> possible case, but I don't really care..
I agree, that is a possible combination. EOPNOTSUPP is fine.
-Jeff
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] block: introduce BLKDEV_DISCARD_ZERO to fix zeroout
2016-06-20 10:28 [PATCH] block: introduce BLKDEV_DISCARD_ZERO to fix zeroout Christoph Hellwig
2016-06-20 18:29 ` Jeff Moyer
@ 2016-06-21 0:44 ` Martin K. Petersen
2016-06-21 12:40 ` Christoph Hellwig
1 sibling, 1 reply; 10+ messages in thread
From: Martin K. Petersen @ 2016-06-21 0:44 UTC (permalink / raw)
To: Christoph Hellwig
Cc: axboe, shli, martin.petersen, snitzer, sitsofe, linux-block
>>>>> "Christoph" == Christoph Hellwig <hch@lst.de> writes:
Christoph> Currently blkdev_issue_zeroout cascades down from discards
Christoph> (if the driver gurantees that discards zero data), to WRITE
guarantees
Christoph> SAME and then to a loop writing zeroes. Unfortunately we
Christoph> ignore tun-time EOPNOTSUPP errors in the block layer
run-time
Christoph> blkdev_issue_discard helper to work around DM volumes that
Christoph> may have mixed discard support underneath.
Christoph> This path intoroduces a new BLKDEV_DISCARD_ZERO flag to
patch introduces
Christoph> blkdev_issue_discard that indicates we are called for zeroing
Christoph> operation. This allows both to ignore the EOPNOTSUPP hack
Christoph> and actually consolidating the discard_zeroes_data check into
Christoph> the function.
I like this better than the I/O error tracking approach. The
blkdev_issue_write_same() -EOPNOTSUPP regression is still present,
though.
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] block: introduce BLKDEV_DISCARD_ZERO to fix zeroout
2016-06-21 0:44 ` Martin K. Petersen
@ 2016-06-21 12:40 ` Christoph Hellwig
2016-06-22 2:02 ` Martin K. Petersen
0 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2016-06-21 12:40 UTC (permalink / raw)
To: Martin K. Petersen
Cc: Christoph Hellwig, axboe, shli, snitzer, sitsofe, linux-block
On Mon, Jun 20, 2016 at 08:44:02PM -0400, Martin K. Petersen wrote:
> Christoph> operation. This allows both to ignore the EOPNOTSUPP hack
> Christoph> and actually consolidating the discard_zeroes_data check into
> Christoph> the function.
>
> I like this better than the I/O error tracking approach. The
> blkdev_issue_write_same() -EOPNOTSUPP regression is still present,
> though.
Which one? Can't think of how that has anything to do with
blkdev_issue_discard.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] block: introduce BLKDEV_DISCARD_ZERO to fix zeroout
2016-06-21 12:40 ` Christoph Hellwig
@ 2016-06-22 2:02 ` Martin K. Petersen
2016-06-22 7:52 ` Christoph Hellwig
0 siblings, 1 reply; 10+ messages in thread
From: Martin K. Petersen @ 2016-06-22 2:02 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Martin K. Petersen, axboe, shli, snitzer, sitsofe, linux-block
>>>>> "Christoph" == Christoph Hellwig <hch@lst.de> writes:
Christoph,
>> I like this better than the I/O error tracking approach. The
>> blkdev_issue_write_same() -EOPNOTSUPP regression is still present,
>> though.
Christoph> Which one? Can't think of how that has anything to do with
Christoph> blkdev_issue_discard.
In 9082e87bfbf8 ("block: remove struct bio_batch") you changed the error
return for blkdev_issue_write_same() to look like the one for discard:
- if (bb.error)
- return bb.error;
- return ret;
+ if (bio)
+ ret = submit_bio_wait(REQ_WRITE | REQ_WRITE_SAME, bio);
+ return ret != -EOPNOTSUPP ? ret : 0;
It's not a problem for SCSI since I'd be returning -EREMOTEIO. But if a
stacking driver returns -EOPNOTSUPP in response to a WRITE SAME request
we'll get data corruption.
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH] block: introduce BLKDEV_DISCARD_ZERO to fix zeroout
2016-06-22 2:02 ` Martin K. Petersen
@ 2016-06-22 7:52 ` Christoph Hellwig
2016-06-22 13:53 ` Martin K. Petersen
0 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2016-06-22 7:52 UTC (permalink / raw)
To: Martin K. Petersen
Cc: Christoph Hellwig, axboe, shli, snitzer, sitsofe, linux-block
On Tue, Jun 21, 2016 at 10:02:10PM -0400, Martin K. Petersen wrote:
> In 9082e87bfbf8 ("block: remove struct bio_batch") you changed the error
> return for blkdev_issue_write_same() to look like the one for discard:
>
> - if (bb.error)
> - return bb.error;
> - return ret;
> + if (bio)
> + ret = submit_bio_wait(REQ_WRITE | REQ_WRITE_SAME, bio);
> + return ret != -EOPNOTSUPP ? ret : 0;
>
> It's not a problem for SCSI since I'd be returning -EREMOTEIO. But if a
> stacking driver returns -EOPNOTSUPP in response to a WRITE SAME request
> we'll get data corruption.
Oh, I see what you mean, but I disagree with the analysis. Unlike
discard outside the zeroout path, write same is a data integrity operation.
Just like in the zero out case turning an EOPNOTSUPP into 0 will get you
data corruption, as the caller will see a successful return for an
operation that did not actually write data to disk. Same is true for
writing the actual zeroes in __blkdev_issue_zeroout.
Re stacking drivers and discard / write same: why does
blk_set_stacking_limits set discard_zeroes_data to 1 and
max_write_same_sectors to UINT_MAX? These seem like inherently dangerous
defaults.
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH] block: introduce BLKDEV_DISCARD_ZERO to fix zeroout
2016-06-22 7:52 ` Christoph Hellwig
@ 2016-06-22 13:53 ` Martin K. Petersen
2016-06-22 14:04 ` Christoph Hellwig
0 siblings, 1 reply; 10+ messages in thread
From: Martin K. Petersen @ 2016-06-22 13:53 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Martin K. Petersen, axboe, shli, snitzer, sitsofe, linux-block
>>>>> "Christoph" == Christoph Hellwig <hch@lst.de> writes:
Christoph,
Christoph> Unlike discard outside the zeroout path, write same is a data
Christoph> integrity operation. Just like in the zero out case turning
Christoph> an EOPNOTSUPP into 0 will get you data corruption, as the
Christoph> caller will see a successful return for an operation that did
Christoph> not actually write data to disk.
Exactly. So why add the dreaded -EOPNOTSUPP special casing to
blkdev_issue_write_same()?
Christoph> Re stacking drivers and discard / write same: why does
Christoph> blk_set_stacking_limits set discard_zeroes_data to 1 and
Christoph> max_write_same_sectors to UINT_MAX? These seem like
Christoph> inherently dangerous defaults.
It's just shorthand for "stacking driver does not impose any limits, use
whatever the low-level device sets".
If the stacking driver has an actual constraint it is free to set a
different limit prior to calling the stacking function.
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] block: introduce BLKDEV_DISCARD_ZERO to fix zeroout
2016-06-22 13:53 ` Martin K. Petersen
@ 2016-06-22 14:04 ` Christoph Hellwig
0 siblings, 0 replies; 10+ messages in thread
From: Christoph Hellwig @ 2016-06-22 14:04 UTC (permalink / raw)
To: Martin K. Petersen
Cc: Christoph Hellwig, axboe, shli, snitzer, sitsofe, linux-block
On Wed, Jun 22, 2016 at 09:53:04AM -0400, Martin K. Petersen wrote:
> >>>>> "Christoph" == Christoph Hellwig <hch@lst.de> writes:
>
> Christoph,
>
> Christoph> Unlike discard outside the zeroout path, write same is a data
> Christoph> integrity operation. Just like in the zero out case turning
> Christoph> an EOPNOTSUPP into 0 will get you data corruption, as the
> Christoph> caller will see a successful return for an operation that did
> Christoph> not actually write data to disk.
>
> Exactly. So why add the dreaded -EOPNOTSUPP special casing to
> blkdev_issue_write_same()?
This is a leftover from bio_batch_end_io and not new behavior. But
I agree that we should kill it.
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2016-06-22 14:05 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-06-20 10:28 [PATCH] block: introduce BLKDEV_DISCARD_ZERO to fix zeroout Christoph Hellwig
2016-06-20 18:29 ` Jeff Moyer
2016-06-21 12:39 ` Christoph Hellwig
2016-06-21 14:02 ` Jeff Moyer
2016-06-21 0:44 ` Martin K. Petersen
2016-06-21 12:40 ` Christoph Hellwig
2016-06-22 2:02 ` Martin K. Petersen
2016-06-22 7:52 ` Christoph Hellwig
2016-06-22 13:53 ` Martin K. Petersen
2016-06-22 14:04 ` Christoph Hellwig
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.