Linux block layer

Linux block layer
 help / color / mirror / Atom feed

* Re: [PATCH] blk-mq: Export queue state through /sys/kernel/debug/block/*/state
From: Jens Axboe @ 2017-03-30 15:19 UTC (permalink / raw)
  To: Bart Van Assche, hare@suse.de; +Cc: osandov@fb.com, linux-block@vger.kernel.org
In-Reply-To: <1490886955.2753.3.camel@sandisk.com>

On 03/30/2017 09:16 AM, Bart Van Assche wrote:
> On Thu, 2017-03-30 at 07:50 +0200, Hannes Reinecke wrote:
>> On 03/29/2017 10:20 PM, Bart Van Assche wrote:
>>> Make it possible to check whether or not a block layer queue has
>>> been stopped. Make it possible to run a blk-mq queue from user
>>> space.
>>>
>>> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
>>> Cc: Omar Sandoval <osandov@fb.com>
>>> Cc: Hannes Reinecke <hare@suse.com>
>>> ---
>>>  block/blk-mq-debugfs.c | 84 ++++++++++++++++++++++++++++++++++++++++++++++++++
>>>  1 file changed, 84 insertions(+)
>>>
>>
>> About bloody time :-)
>>
>> Reviewed-by: Hannes Reinecke <hare@suse.com>
> 
> Hello Hannes,
> 
> Thanks for the review :-) However, had you noticed that I had already
> posted a v2 of this patch? Anyway, since I have improved v2 further
> after I had posted it, I will post a v3 today.

I didn't see a v2 posting?

-- 
Jens Axboe

^ permalink raw reply

* Re: [PATCH] blk-mq: Export queue state through /sys/kernel/debug/block/*/state
From: Bart Van Assche @ 2017-03-30 15:16 UTC (permalink / raw)
  To: hare@suse.de, axboe@fb.com; +Cc: osandov@fb.com, linux-block@vger.kernel.org
In-Reply-To: <824697ce-3fe4-f7ab-e23b-237b280196fb@suse.de>

On Thu, 2017-03-30 at 07:50 +0200, Hannes Reinecke wrote:
> On 03/29/2017 10:20 PM, Bart Van Assche wrote:
> > Make it possible to check whether or not a block layer queue has
> > been stopped. Make it possible to run a blk-mq queue from user
> > space.
> >=20
> > Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
> > Cc: Omar Sandoval <osandov@fb.com>
> > Cc: Hannes Reinecke <hare@suse.com>
> > ---
> >  block/blk-mq-debugfs.c | 84 ++++++++++++++++++++++++++++++++++++++++++=
++++++++
> >  1 file changed, 84 insertions(+)
> >=20
>=20
> About bloody time :-)
>=20
> Reviewed-by: Hannes Reinecke <hare@suse.com>

Hello Hannes,

Thanks for the review :-) However, had you noticed that I had already
posted a v2 of this patch? Anyway, since I have improved v2 further
after I had posted it, I will post a v3 today.

Bart.=

^ permalink raw reply

* Re: RFC: always use REQ_OP_WRITE_ZEROES for zeroing offload
From: Mike Snitzer @ 2017-03-30 15:12 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: axboe, martin.petersen, agk, shli, philipp.reisner,
	lars.ellenberg, linux-block, linux-raid, dm-devel, linux-scsi,
	drbd-dev
In-Reply-To: <20170323143341.31549-1-hch@lst.de>

Would be very useful, particularly for testing, if
drivers/scsi/scsi_debug.c were updated to support WRITE ZEROES.

^ permalink raw reply

* Re: [PATCH] zram: set physical queue limits to avoid array out of bounds accesses
From: Minchan Kim @ 2017-03-30 15:08 UTC (permalink / raw)
  To: Johannes Thumshirn, Jens Axboe
  Cc: Hannes Reinecke, Nitin Gupta, Christoph Hellwig,
	Sergey Senozhatsky, yizhan, Linux Block Layer Mailinglist,
	Linux Kernel Mailinglist, Andrew Morton
In-Reply-To: <20170309052829.GA854@bbox>

Hi Jens,

It seems you miss this.
Could you handle this?

Thanks.

On Thu, Mar 9, 2017 at 2:28 PM, Minchan Kim <minchan@kernel.org> wrote:

< snip>

> Jens, Could you replace the one merged with this? And I don't want
> to add stable mark in this patch because I feel it need enough
> testing in 64K page system I don't have. ;(
>
> From bb73e75ab0e21016f60858fd61e7dc6a6813e359 Mon Sep 17 00:00:00 2001
> From: Minchan Kim <minchan@kernel.org>
> Date: Thu, 9 Mar 2017 14:00:40 +0900
> Subject: [PATCH] zram: handle multiple pages attached bio's bvec
>
> Johannes Thumshirn reported system goes the panic when using NVMe over
> Fabrics loopback target with zram.
>
> The reason is zram expects each bvec in bio contains a single page
> but nvme can attach a huge bulk of pages attached to the bio's bvec
> so that zram's index arithmetic could be wrong so that out-of-bound
> access makes panic.
>
> This patch solves the problem via removing the limit(a bvec should
> contains a only single page).
>
> Cc: Hannes Reinecke <hare@suse.com>
> Reported-by: Johannes Thumshirn <jthumshirn@suse.de>
> Tested-by: Johannes Thumshirn <jthumshirn@suse.de>
> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
> Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
> Signed-off-by: Minchan Kim <minchan@kernel.org>
> ---
> I don't add stable mark intentionally because I think it's rather risky
> without enough testing on 64K page system(ie, partial IO part).
>
> Thanks for the help, Johannes and Hannes!!
>
>  drivers/block/zram/zram_drv.c | 37 ++++++++++---------------------------
>  1 file changed, 10 insertions(+), 27 deletions(-)
>
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index 01944419b1f3..fefdf260503a 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -137,8 +137,7 @@ static inline bool valid_io_request(struct zram *zram,
>
>  static void update_position(u32 *index, int *offset, struct bio_vec *bvec)
>  {
> -       if (*offset + bvec->bv_len >= PAGE_SIZE)
> -               (*index)++;
> +       *index  += (*offset + bvec->bv_len) / PAGE_SIZE;
>         *offset = (*offset + bvec->bv_len) % PAGE_SIZE;
>  }
>
> @@ -838,34 +837,20 @@ static void __zram_make_request(struct zram *zram, struct bio *bio)
>         }
>
>         bio_for_each_segment(bvec, bio, iter) {
> -               int max_transfer_size = PAGE_SIZE - offset;
> -
> -               if (bvec.bv_len > max_transfer_size) {
> -                       /*
> -                        * zram_bvec_rw() can only make operation on a single
> -                        * zram page. Split the bio vector.
> -                        */
> -                       struct bio_vec bv;
> -
> -                       bv.bv_page = bvec.bv_page;
> -                       bv.bv_len = max_transfer_size;
> -                       bv.bv_offset = bvec.bv_offset;
> +               struct bio_vec bv = bvec;
> +               unsigned int remained = bvec.bv_len;
>
> +               do {
> +                       bv.bv_len = min_t(unsigned int, PAGE_SIZE, remained);
>                         if (zram_bvec_rw(zram, &bv, index, offset,
> -                                        op_is_write(bio_op(bio))) < 0)
> +                                       op_is_write(bio_op(bio))) < 0)
>                                 goto out;
>
> -                       bv.bv_len = bvec.bv_len - max_transfer_size;
> -                       bv.bv_offset += max_transfer_size;
> -                       if (zram_bvec_rw(zram, &bv, index + 1, 0,
> -                                        op_is_write(bio_op(bio))) < 0)
> -                               goto out;
> -               } else
> -                       if (zram_bvec_rw(zram, &bvec, index, offset,
> -                                        op_is_write(bio_op(bio))) < 0)
> -                               goto out;
> +                       bv.bv_offset += bv.bv_len;
> +                       remained -= bv.bv_len;
>
> -               update_position(&index, &offset, &bvec);
> +                       update_position(&index, &offset, &bv);
> +               } while (remained);
>         }
>
>         bio_endio(bio);
> @@ -882,8 +867,6 @@ static blk_qc_t zram_make_request(struct request_queue *queue, struct bio *bio)
>  {
>         struct zram *zram = queue->queuedata;
>
> -       blk_queue_split(queue, &bio, queue->bio_split);
> -
>         if (!valid_io_request(zram, bio->bi_iter.bi_sector,
>                                         bio->bi_iter.bi_size)) {
>                 atomic64_inc(&zram->stats.invalid_io);
> --
> 2.7.4
>
>



-- 
Kind regards,
Minchan Kim

^ permalink raw reply

* Re: [PATCH] block: do not put mq context in blk_mq_alloc_request_hctx
From: Jens Axboe @ 2017-03-30 14:12 UTC (permalink / raw)
  To: Minchan Kim
  Cc: kernel-team, linux-block, linux-kernel, Sagi Grimberg,
	Omar Sandoval
In-Reply-To: <1490851245-32245-1-git-send-email-minchan@kernel.org>

On 03/29/2017 11:20 PM, Minchan Kim wrote:
> In blk_mq_alloc_request_hctx, blk_mq_sched_get_request doesn't
> get sw context so we don't need to put the context with
> blk_mq_put_ctx. Unless, we will see preempt counter underflow.

Good catch, that's definitely a bug. I have applied your patch
for 4.11.

-- 
Jens Axboe

^ permalink raw reply

* Re: [PATCH] block: do not put mq context in blk_mq_alloc_request_hctx
From: Sagi Grimberg @ 2017-03-30 14:02 UTC (permalink / raw)
  To: Minchan Kim, Jens Axboe
  Cc: kernel-team, linux-block, linux-kernel, Omar Sandoval
In-Reply-To: <1490851245-32245-1-git-send-email-minchan@kernel.org>

Looks good,

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

^ permalink raw reply

* [PATCH 8/8] tcm_fileio: Prevent information leak for short reads
From: Dmitry Monakhov @ 2017-03-30 13:49 UTC (permalink / raw)
  To: linux-kernel, linux-block, martin.petersen; +Cc: Dmitry Monakhov
In-Reply-To: <1490881776-28735-1-git-send-email-dmonakhov@openvz.org>

If we failed to read data from backing file (probably because some one
truncate file under us), we must zerofill cmd's data, otherwise it will
be returned as is. Most likely cmd's data are unitialized pages from
page cache. This result in information leak.

xfstests: generic/420
http://marc.info/?l=linux-scsi&m=149087996913448&w=2

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
 drivers/target/target_core_file.c | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/drivers/target/target_core_file.c b/drivers/target/target_core_file.c
index 87aa376..d69908d 100644
--- a/drivers/target/target_core_file.c
+++ b/drivers/target/target_core_file.c
@@ -277,12 +277,11 @@ static int fd_do_rw(struct se_cmd *cmd, struct file *fd,
 	else
 		ret = vfs_iter_read(fd, &iter, &pos);
 
-	kfree(bvec);
-
 	if (is_write) {
 		if (ret < 0 || ret != data_length) {
 			pr_err("%s() write returned %d\n", __func__, ret);
-			return (ret < 0 ? ret : -EINVAL);
+			if (ret >= 0)
+				ret = -EINVAL;
 		}
 	} else {
 		/*
@@ -295,17 +294,27 @@ static int fd_do_rw(struct se_cmd *cmd, struct file *fd,
 				pr_err("%s() returned %d, expecting %u for "
 						"S_ISBLK\n", __func__, ret,
 						data_length);
-				return (ret < 0 ? ret : -EINVAL);
+				if (ret >= 0)
+					ret = -EINVAL;
 			}
 		} else {
 			if (ret < 0) {
 				pr_err("%s() returned %d for non S_ISBLK\n",
 						__func__, ret);
-				return ret;
+			} else if (ret != data_length) {
+				/*
+				 * Short read case:
+				 * Probably some one truncate file under us.
+				 * We must explicitly zero sg-pages to prevent
+				 * expose uninizialized pages to userspace.
+				 */
+				BUG_ON(ret > data_length);
+				ret += iov_iter_zero(data_length - ret, &iter);
 			}
 		}
 	}
-	return 1;
+	kfree(bvec);
+	return ret;
 }
 
 static sense_reason_t
-- 
2.9.3

^ permalink raw reply related

* Re: [PATCH 22/23] drbd: implement REQ_OP_WRITE_ZEROES
From: Mike Snitzer @ 2017-03-30 13:49 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: axboe, martin.petersen, agk, shli, philipp.reisner, linux-block,
	linux-scsi, drbd-dev, dm-devel, linux-raid
In-Reply-To: <20170330114408.GA15777@lst.de>

On Thu, Mar 30 2017 at  7:44am -0400,
Christoph Hellwig <hch@lst.de> wrote:

> On Thu, Mar 30, 2017 at 12:06:41PM +0200, Lars Ellenberg wrote:
> 
> > Will it make an fstrim cause thinly provisioned
> > devices to suddenly be fully allocated?
> 
> Not for SCSI devices.  Yes for dm-thinp until it implements
> REQ_OP_WRITE_ZEROES, which will hopefully be soon.

I can work on this now.  Only question I have is: should DM thinp take
care to zero any misaligned head and tail?  (I assume so but with all
the back and forth between Bart, Paolo and Martin I figured I'd ask
explicitly).

^ permalink raw reply

* [PATCH 5/8] bio-integrity: fix interface for bio_integrity_trim
From: Dmitry Monakhov @ 2017-03-30 13:49 UTC (permalink / raw)
  To: linux-kernel, linux-block, martin.petersen; +Cc: Dmitry Monakhov
In-Reply-To: <1490881776-28735-1-git-send-email-dmonakhov@openvz.org>

bio_integrity_trim inherent it's interface from bio_trim and accept
offset and size, but this API is error prone because data offset
must always be in sync with bio's data offset. That is why we have
integrity update hook in bio_advance()

So the only meaningful offset is 0. Let's just remove it completely.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
 block/bio-integrity.c | 8 +-------
 block/bio.c           | 4 ++--
 drivers/md/dm.c       | 2 +-
 include/linux/bio.h   | 5 ++---
 4 files changed, 6 insertions(+), 13 deletions(-)

diff --git a/block/bio-integrity.c b/block/bio-integrity.c
index 43a4476..43895a0 100644
--- a/block/bio-integrity.c
+++ b/block/bio-integrity.c
@@ -433,21 +433,15 @@ EXPORT_SYMBOL(bio_integrity_advance);
 /**
  * bio_integrity_trim - Trim integrity vector
  * @bio:	bio whose integrity vector to update
- * @offset:	offset to first data sector
  * @sectors:	number of data sectors
  *
  * Description: Used to trim the integrity vector in a cloned bio.
- * The ivec will be advanced corresponding to 'offset' data sectors
- * and the length will be truncated corresponding to 'len' data
- * sectors.
  */
-void bio_integrity_trim(struct bio *bio, unsigned int offset,
-			unsigned int sectors)
+void bio_integrity_trim(struct bio *bio, unsigned int sectors)
 {
 	struct bio_integrity_payload *bip = bio_integrity(bio);
 	struct blk_integrity *bi = bdev_get_integrity(bio->bi_bdev);
 
-	bio_integrity_advance(bio, offset << 9);
 	bip->bip_iter.bi_size = bio_integrity_bytes(bi, sectors);
 }
 EXPORT_SYMBOL(bio_integrity_trim);
diff --git a/block/bio.c b/block/bio.c
index fa84323..6895986 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1878,7 +1878,7 @@ struct bio *bio_split(struct bio *bio, int sectors,
 	split->bi_iter.bi_size = sectors << 9;
 
 	if (bio_integrity(split))
-		bio_integrity_trim(split, 0, sectors);
+		bio_integrity_trim(split, sectors);
 
 	bio_advance(bio, split->bi_iter.bi_size);
 
@@ -1909,7 +1909,7 @@ void bio_trim(struct bio *bio, int offset, int size)
 	bio->bi_iter.bi_size = size;
 
 	if (bio_integrity(bio))
-		bio_integrity_trim(bio, 0, size);
+		bio_integrity_trim(bio, size);
 
 }
 EXPORT_SYMBOL_GPL(bio_trim);
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index dfb7597..e54ecdd 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1102,7 +1102,7 @@ static int clone_bio(struct dm_target_io *tio, struct bio *bio,
 	clone->bi_iter.bi_size = to_bytes(len);
 
 	if (bio_integrity(bio))
-		bio_integrity_trim(clone, 0, len);
+		bio_integrity_trim(clone, len);
 
 	return 0;
 }
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 00b086a..350f71d 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -732,7 +732,7 @@ extern bool bio_integrity_enabled(struct bio *bio);
 extern int bio_integrity_prep(struct bio *);
 extern void bio_integrity_endio(struct bio *);
 extern void bio_integrity_advance(struct bio *, unsigned int);
-extern void bio_integrity_trim(struct bio *, unsigned int, unsigned int);
+extern void bio_integrity_trim(struct bio *, unsigned int);
 extern int bio_integrity_clone(struct bio *, struct bio *, gfp_t);
 extern int bioset_integrity_create(struct bio_set *, int);
 extern void bioset_integrity_free(struct bio_set *);
@@ -782,8 +782,7 @@ static inline void bio_integrity_advance(struct bio *bio,
 	return;
 }
 
-static inline void bio_integrity_trim(struct bio *bio, unsigned int offset,
-				      unsigned int sectors)
+static inline void bio_integrity_trim(struct bio *bio, unsigned int sectors)
 {
 	return;
 }
-- 
2.9.3

^ permalink raw reply related

* [PATCH 1/8] Guard bvec iteration logic
From: Dmitry Monakhov @ 2017-03-30 13:49 UTC (permalink / raw)
  To: linux-kernel, linux-block, martin.petersen; +Cc: Dmitry Monakhov
In-Reply-To: <1490881776-28735-1-git-send-email-dmonakhov@openvz.org>

If some one try to attempt advance bvec beyond it's size we simply
dump WARN_ONCE and continue to iterate beyond bvec array boundaries.
This simply means that we endup dereferencing/corrupting random memory
region.

Code was added long time ago here 4550dd6c, luckily no one hit it
in real life :)

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
 include/linux/bvec.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/include/linux/bvec.h b/include/linux/bvec.h
index 89b65b8..86b914f 100644
--- a/include/linux/bvec.h
+++ b/include/linux/bvec.h
@@ -70,8 +70,7 @@ static inline void bvec_iter_advance(const struct bio_vec *bv,
 				     struct bvec_iter *iter,
 				     unsigned bytes)
 {
-	WARN_ONCE(bytes > iter->bi_size,
-		  "Attempted to advance past end of bvec iter\n");
+	BUG_ON(bytes > iter->bi_size);
 
 	while (bytes) {
 		unsigned iter_len = bvec_iter_len(bv, *iter);
-- 
2.9.3

^ permalink raw reply related

* [PATCH 7/8] T10: Move opencoded contants to common header
From: Dmitry Monakhov @ 2017-03-30 13:49 UTC (permalink / raw)
  To: linux-kernel, linux-block, martin.petersen; +Cc: Dmitry Monakhov
In-Reply-To: <1490881776-28735-1-git-send-email-dmonakhov@openvz.org>

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
 block/t10-pi.c                   | 9 +++------
 drivers/scsi/lpfc/lpfc_scsi.c    | 4 ++--
 drivers/scsi/qla2xxx/qla_isr.c   | 8 ++++----
 drivers/target/target_core_sbc.c | 2 +-
 include/linux/t10-pi.h           | 3 +++
 5 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/block/t10-pi.c b/block/t10-pi.c
index 2c97912..485cecd 100644
--- a/block/t10-pi.c
+++ b/block/t10-pi.c
@@ -28,9 +28,6 @@
 
 typedef __be16 (csum_fn) (void *, unsigned int);
 
-static const __be16 APP_ESCAPE = (__force __be16) 0xffff;
-static const __be32 REF_ESCAPE = (__force __be32) 0xffffffff;
-
 static __be16 t10_pi_crc_fn(void *data, unsigned int len)
 {
 	return cpu_to_be16(crc_t10dif(data, len));
@@ -82,7 +79,7 @@ static int t10_pi_verify(struct blk_integrity_iter *iter, csum_fn *fn,
 		switch (type) {
 		case 1:
 		case 2:
-			if (pi->app_tag == APP_ESCAPE)
+			if (pi->app_tag == T10_APP_ESCAPE)
 				goto next;
 
 			if (be32_to_cpu(pi->ref_tag) !=
@@ -95,8 +92,8 @@ static int t10_pi_verify(struct blk_integrity_iter *iter, csum_fn *fn,
 			}
 			break;
 		case 3:
-			if (pi->app_tag == APP_ESCAPE &&
-			    pi->ref_tag == REF_ESCAPE)
+			if (pi->app_tag == T10_APP_ESCAPE &&
+			    pi->ref_tag == T10_REF_ESCAPE)
 				goto next;
 			break;
 		}
diff --git a/drivers/scsi/lpfc/lpfc_scsi.c b/drivers/scsi/lpfc/lpfc_scsi.c
index 54fd0c8..6703512 100644
--- a/drivers/scsi/lpfc/lpfc_scsi.c
+++ b/drivers/scsi/lpfc/lpfc_scsi.c
@@ -2934,8 +2934,8 @@ lpfc_calc_bg_err(struct lpfc_hba *phba, struct lpfc_scsi_buf *lpfc_cmd)
 				 * First check to see if a protection data
 				 * check is valid
 				 */
-				if ((src->ref_tag == 0xffffffff) ||
-				    (src->app_tag == 0xffff)) {
+				if ((src->ref_tag == T10_REF_ESCAPE) ||
+				    (src->app_tag == T10_APP_ESCAPE)) {
 					start_ref_tag++;
 					goto skipit;
 				}
diff --git a/drivers/scsi/qla2xxx/qla_isr.c b/drivers/scsi/qla2xxx/qla_isr.c
index 3203367..dfab093 100644
--- a/drivers/scsi/qla2xxx/qla_isr.c
+++ b/drivers/scsi/qla2xxx/qla_isr.c
@@ -1950,9 +1950,9 @@ qla2x00_handle_dif_error(srb_t *sp, struct sts_entry_24xx *sts24)
 	 * For type     3: ref & app tag is all 'f's
 	 * For type 0,1,2: app tag is all 'f's
 	 */
-	if ((a_app_tag == 0xffff) &&
+	if ((a_app_tag == T10_APP_TAG) &&
 	    ((scsi_get_prot_type(cmd) != SCSI_PROT_DIF_TYPE3) ||
-	     (a_ref_tag == 0xffffffff))) {
+	     (a_ref_tag == T10_REF_TAG))) {
 		uint32_t blocks_done, resid;
 		sector_t lba_s = scsi_get_lba(cmd);
 
@@ -1994,9 +1994,9 @@ qla2x00_handle_dif_error(srb_t *sp, struct sts_entry_24xx *sts24)
 			spt = page_address(sg_page(sg)) + sg->offset;
 			spt += j;
 
-			spt->app_tag = 0xffff;
+			spt->app_tag = T10_APP_TAG;
 			if (scsi_get_prot_type(cmd) == SCSI_PROT_DIF_TYPE3)
-				spt->ref_tag = 0xffffffff;
+				spt->ref_tag = T10_REF_TAG;
 		}
 
 		return 0;
diff --git a/drivers/target/target_core_sbc.c b/drivers/target/target_core_sbc.c
index c194063..927ef44 100644
--- a/drivers/target/target_core_sbc.c
+++ b/drivers/target/target_core_sbc.c
@@ -1446,7 +1446,7 @@ sbc_dif_verify(struct se_cmd *cmd, sector_t start, unsigned int sectors,
 				 (unsigned long long)sector, sdt->guard_tag,
 				 sdt->app_tag, be32_to_cpu(sdt->ref_tag));
 
-			if (sdt->app_tag == cpu_to_be16(0xffff)) {
+			if (sdt->app_tag == T10_APP_ESCAPE) {
 				dsg_off += block_size;
 				goto next;
 			}
diff --git a/include/linux/t10-pi.h b/include/linux/t10-pi.h
index 9fba9dd..c96845c 100644
--- a/include/linux/t10-pi.h
+++ b/include/linux/t10-pi.h
@@ -24,6 +24,9 @@ enum t10_dif_type {
 	T10_PI_TYPE3_PROTECTION = 0x3,
 };
 
+static const __be16 T10_APP_ESCAPE = (__force __be16) 0xffff;
+static const __be32 T10_REF_ESCAPE = (__force __be32) 0xffffffff;
+
 /*
  * T10 Protection Information tuple.
  */
-- 
2.9.3

^ permalink raw reply related

* [PATCH 6/8] bio-integrity: add bio_integrity_setup helper
From: Dmitry Monakhov @ 2017-03-30 13:49 UTC (permalink / raw)
  To: linux-kernel, linux-block, martin.petersen; +Cc: Dmitry Monakhov
In-Reply-To: <1490881776-28735-1-git-send-email-dmonakhov@openvz.org>

Currently all integrity prep hooks are open-coded, and if prepare fails
we ignore it's code and fail bio with EIO. Let's return real error to
upper layer, so later caller may react accordingly. For example retry in
case of ENOMEM.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
 block/blk-core.c     |  5 +----
 block/blk-mq.c       |  8 ++------
 drivers/nvdimm/blk.c | 13 ++-----------
 drivers/nvdimm/btt.c | 13 ++-----------
 include/linux/bio.h  | 25 +++++++++++++++++++++++++
 5 files changed, 32 insertions(+), 32 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index d772c22..071a998 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1637,11 +1637,8 @@ static blk_qc_t blk_queue_bio(struct request_queue *q, struct bio *bio)
 
 	blk_queue_split(q, &bio, q->bio_split);
 
-	if (bio_integrity_enabled(bio) && bio_integrity_prep(bio)) {
-		bio->bi_error = -EIO;
-		bio_endio(bio);
+	if (bio_integrity_setup(bio))
 		return BLK_QC_T_NONE;
-	}
 
 	if (op_is_flush(bio->bi_opf)) {
 		spin_lock_irq(q->queue_lock);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 08a49c6..a9931ec 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1489,10 +1489,8 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 
 	blk_queue_bounce(q, &bio);
 
-	if (bio_integrity_enabled(bio) && bio_integrity_prep(bio)) {
-		bio_io_error(bio);
+	if (bio_integrity_setup(bio))
 		return BLK_QC_T_NONE;
-	}
 
 	blk_queue_split(q, &bio, q->bio_split);
 
@@ -1611,10 +1609,8 @@ static blk_qc_t blk_sq_make_request(struct request_queue *q, struct bio *bio)
 
 	blk_queue_bounce(q, &bio);
 
-	if (bio_integrity_enabled(bio) && bio_integrity_prep(bio)) {
-		bio_io_error(bio);
+	if (bio_integrity_setup(bio))
 		return BLK_QC_T_NONE;
-	}
 
 	blk_queue_split(q, &bio, q->bio_split);
 
diff --git a/drivers/nvdimm/blk.c b/drivers/nvdimm/blk.c
index 9faaa96..1edb3f3 100644
--- a/drivers/nvdimm/blk.c
+++ b/drivers/nvdimm/blk.c
@@ -179,16 +179,8 @@ static blk_qc_t nd_blk_make_request(struct request_queue *q, struct bio *bio)
 	int err = 0, rw;
 	bool do_acct;
 
-	/*
-	 * bio_integrity_enabled also checks if the bio already has an
-	 * integrity payload attached. If it does, we *don't* do a
-	 * bio_integrity_prep here - the payload has been generated by
-	 * another kernel subsystem, and we just pass it through.
-	 */
-	if (bio_integrity_enabled(bio) && bio_integrity_prep(bio)) {
-		bio->bi_error = -EIO;
-		goto out;
-	}
+	if (bio_integrity_setup(bio))
+		return BLK_QC_T_NONE;
 
 	bip = bio_integrity(bio);
 	nsblk = q->queuedata;
@@ -212,7 +204,6 @@ static blk_qc_t nd_blk_make_request(struct request_queue *q, struct bio *bio)
 	if (do_acct)
 		nd_iostat_end(bio, start);
 
- out:
 	bio_endio(bio);
 	return BLK_QC_T_NONE;
 }
diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
index 368795a..03ded8d 100644
--- a/drivers/nvdimm/btt.c
+++ b/drivers/nvdimm/btt.c
@@ -1158,16 +1158,8 @@ static blk_qc_t btt_make_request(struct request_queue *q, struct bio *bio)
 	int err = 0;
 	bool do_acct;
 
-	/*
-	 * bio_integrity_enabled also checks if the bio already has an
-	 * integrity payload attached. If it does, we *don't* do a
-	 * bio_integrity_prep here - the payload has been generated by
-	 * another kernel subsystem, and we just pass it through.
-	 */
-	if (bio_integrity_enabled(bio) && bio_integrity_prep(bio)) {
-		bio->bi_error = -EIO;
-		goto out;
-	}
+	if (bio_integrity_setup(bio))
+		return BLK_QC_T_NONE;
 
 	do_acct = nd_iostat_start(bio, &start);
 	bio_for_each_segment(bvec, bio, iter) {
@@ -1194,7 +1186,6 @@ static blk_qc_t btt_make_request(struct request_queue *q, struct bio *bio)
 	if (do_acct)
 		nd_iostat_end(bio, start);
 
-out:
 	bio_endio(bio);
 	return BLK_QC_T_NONE;
 }
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 350f71d..f477327 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -738,6 +738,26 @@ extern int bioset_integrity_create(struct bio_set *, int);
 extern void bioset_integrity_free(struct bio_set *);
 extern void bio_integrity_init(void);
 
+static inline int bio_integrity_setup(struct bio *bio)
+{
+	int err = 0;
+
+	/*
+	 * bio_integrity_enabled also checks if the bio already has an
+	 * integrity payload attached. If it does, we *don't* do a
+	 * bio_integrity_prep here - the payload has been generated by
+	 * another kernel subsystem, and we just pass it through.
+	 */
+	if (bio_integrity_enabled(bio)) {
+		err = bio_integrity_prep(bio);
+		if (err) {
+			bio->bi_error = err;
+			bio_endio(bio);
+		}
+	}
+	return err;
+}
+
 #else /* CONFIG_BLK_DEV_INTEGRITY */
 
 static inline void *bio_integrity(struct bio *bio)
@@ -765,6 +785,11 @@ static inline int bio_integrity_prep(struct bio *bio)
 	return 0;
 }
 
+static int bio_integrity_setup(struct bio *bio)
+{
+	return 0;
+}
+
 static inline void bio_integrity_free(struct bio *bio)
 {
 	return;
-- 
2.9.3

^ permalink raw reply related

* [PATCH 4/8] bio-integrity: bio_trim should truncate integrity vector accordingly
From: Dmitry Monakhov @ 2017-03-30 13:49 UTC (permalink / raw)
  To: linux-kernel, linux-block, martin.petersen; +Cc: Dmitry Monakhov
In-Reply-To: <1490881776-28735-1-git-send-email-dmonakhov@openvz.org>

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
 block/bio.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/block/bio.c b/block/bio.c
index e75878f..fa84323 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1907,6 +1907,10 @@ void bio_trim(struct bio *bio, int offset, int size)
 	bio_advance(bio, offset << 9);
 
 	bio->bi_iter.bi_size = size;
+
+	if (bio_integrity(bio))
+		bio_integrity_trim(bio, 0, size);
+
 }
 EXPORT_SYMBOL_GPL(bio_trim);
 
-- 
2.9.3

^ permalink raw reply related

* [PATCH 3/8] bio-integrity: save original iterator for verify stage
From: Dmitry Monakhov @ 2017-03-30 13:49 UTC (permalink / raw)
  To: linux-kernel, linux-block, martin.petersen; +Cc: Dmitry Monakhov
In-Reply-To: <1490881776-28735-1-git-send-email-dmonakhov@openvz.org>

In order to perform verification we need to know original data vector
But, after bio traverse io-stack it may be advanced, splited and relocated
many times so it is hard to guess original data vector.

In fact currently ->verify_fn not woks at all because at the moment
it is called bio->bi_iter.bi_size == 0

The simplest way to fix that is to save original data vector and treat is
as immutable.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
 block/bio-integrity.c | 6 ++++--
 include/linux/bio.h   | 1 +
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/block/bio-integrity.c b/block/bio-integrity.c
index b5009a8..43a4476 100644
--- a/block/bio-integrity.c
+++ b/block/bio-integrity.c
@@ -238,10 +238,10 @@ static int bio_integrity_process(struct bio *bio,
 
 	iter.disk_name = bio->bi_bdev->bd_disk->disk_name;
 	iter.interval = 1 << bi->interval_exp;
-	iter.seed = bip_get_seed(bip);
+	iter.seed = bip->bip_verify_iter.bi_sector;
 	iter.prot_buf = prot_buf;
 
-	bio_for_each_segment(bv, bio, bviter) {
+	__bio_for_each_segment(bv, bio, bviter, bip->bip_verify_iter) {
 		void *kaddr = kmap_atomic(bv.bv_page);
 
 		iter.data_buf = kaddr + bv.bv_offset;
@@ -310,6 +310,7 @@ int bio_integrity_prep(struct bio *bio)
 	bip->bip_flags |= BIP_BLOCK_INTEGRITY;
 	bip->bip_iter.bi_size = len;
 	bip_set_seed(bip, bio->bi_iter.bi_sector);
+	bip->bip_verify_iter = bio->bi_iter;
 
 	if (bi->flags & BLK_INTEGRITY_IP_CHECKSUM)
 		bip->bip_flags |= BIP_IP_CHECKSUM;
@@ -476,6 +477,7 @@ int bio_integrity_clone(struct bio *bio, struct bio *bio_src,
 
 	bip->bip_vcnt = bip_src->bip_vcnt;
 	bip->bip_iter = bip_src->bip_iter;
+	bip->bip_verify_iter = bip_src->bip_verify_iter;
 
 	return 0;
 }
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 8e52119..00b086a 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -308,6 +308,7 @@ struct bio_integrity_payload {
 	struct bio		*bip_bio;	/* parent bio */
 
 	struct bvec_iter	bip_iter;
+	struct bvec_iter	bip_verify_iter;/* saved orig data iterator */
 
 	bio_end_io_t		*bip_end_io;	/* saved I/O completion fn */
 
-- 
2.9.3

^ permalink raw reply related

* [PATCH 2/8] bio-integrity: Do not allocate integrity context for bio w/o data
From: Dmitry Monakhov @ 2017-03-30 13:49 UTC (permalink / raw)
  To: linux-kernel, linux-block, martin.petersen; +Cc: Dmitry Monakhov
In-Reply-To: <1490881776-28735-1-git-send-email-dmonakhov@openvz.org>

If bio has no data, such as ones from blkdev_issue_flush(),
then we have nothing to protect.

This patch prevent bugon like follows:

kfree_debugcheck: out of range ptr ac1fa1d106742a5ah
kernel BUG at mm/slab.c:2773!
invalid opcode: 0000 [#1] SMP
Modules linked in: bcache
CPU: 0 PID: 4428 Comm: xfs_io Tainted: G        W       4.11.0-rc4-ext4-00041-g2ef0043-dirty #43
Hardware name: Virtuozzo KVM, BIOS seabios-1.7.5-11.vz7.4 04/01/2014
task: ffff880137786440 task.stack: ffffc90000ba8000
RIP: 0010:kfree_debugcheck+0x25/0x2a
RSP: 0018:ffffc90000babde0 EFLAGS: 00010082
RAX: 0000000000000034 RBX: ac1fa1d106742a5a RCX: 0000000000000007
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88013f3ccb40
RBP: ffffc90000babde8 R08: 0000000000000000 R09: 0000000000000000
R10: 00000000fcb76420 R11: 00000000725172ed R12: 0000000000000282
R13: ffffffff8150e766 R14: ffff88013a145e00 R15: 0000000000000001
FS:  00007fb09384bf40(0000) GS:ffff88013f200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fd0172f9e40 CR3: 0000000137fa9000 CR4: 00000000000006f0
Call Trace:
 kfree+0xc8/0x1b3
 bio_integrity_free+0xc3/0x16b
 bio_free+0x25/0x66
 bio_put+0x14/0x26
 blkdev_issue_flush+0x7a/0x85
 blkdev_fsync+0x35/0x42
 vfs_fsync_range+0x8e/0x9f
 vfs_fsync+0x1c/0x1e
 do_fsync+0x31/0x4a
 SyS_fsync+0x10/0x14
 entry_SYSCALL_64_fastpath+0x1f/0xc2

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
 block/bio-integrity.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/block/bio-integrity.c b/block/bio-integrity.c
index 5384713..b5009a8 100644
--- a/block/bio-integrity.c
+++ b/block/bio-integrity.c
@@ -175,6 +175,9 @@ bool bio_integrity_enabled(struct bio *bio)
 	if (bio_op(bio) != REQ_OP_READ && bio_op(bio) != REQ_OP_WRITE)
 		return false;
 
+	if (!bio_sectors(bio))
+		return false;
+
 	/* Already protected? */
 	if (bio_integrity(bio))
 		return false;
-- 
2.9.3

^ permalink raw reply related

* [PATCH 0/8] block: T10/DIF Fixes and cleanups
From: Dmitry Monakhov @ 2017-03-30 13:49 UTC (permalink / raw)
  To: linux-kernel, linux-block, martin.petersen; +Cc: Dmitry Monakhov

This patch set fix various problems spotted during T10/DIF integrity machinery testing.

TOC:
## General bulletproof protection for block layer
0001 Guard bvec iteration logic
## Fix various bugs in T10/DIF/DIX infrastructure
0002 bio integrity: Do not allocate integrity context for 
0003 bio integrity: save original iterator for verify stag
0004 bio integrity: bio_trim should truncate integrity vec
0005 bio integrity: fix interface for bio_integrity_trim
## Cleanup T10/DIF/DIX infrastructure
0006 bio integrity add bio_integrity_setup helper
0007 T10 Move opencoded contants to common header
## Fix tcm_fileio info leak
0008 tcm_fileio: Prevent information leak for short reads

Testcase: http://marc.info/?l=linux-scsi&m=149087997013452&w=2

^ permalink raw reply

* Re: [Drbd-dev] [PATCH 22/23] drbd: implement REQ_OP_WRITE_ZEROES
From: Lars Ellenberg @ 2017-03-30 12:50 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: axboe, martin.petersen, agk, snitzer, shli, philipp.reisner,
	linux-block, linux-scsi, drbd-dev, dm-devel, linux-raid
In-Reply-To: <20170330114408.GA15777@lst.de>

On Thu, Mar 30, 2017 at 01:44:09PM +0200, Christoph Hellwig wrote:
> On Thu, Mar 30, 2017 at 12:06:41PM +0200, Lars Ellenberg wrote:
> > On Thu, Mar 23, 2017 at 10:33:40AM -0400, Christoph Hellwig wrote:
> > > It seems like DRBD assumes its on the wire TRIM request always zeroes data.
> > > Use that fact to implement REQ_OP_WRITE_ZEROES.
> > > 
> > > XXX: will need a careful audit from the drbd team!
> > 
> > Thanks, this one looks ok to me.
> 
> So the DRBD protocol requires the TRIM operation to always zero?

"users" (both as in submitting entities, and people using DRBD)
expect that DRBD guarantees replicas to be identical after whatever
operations have been completed by all replicas.

Which means that for trim/discard/unmap, we can only expose that to
upper layers (or use it for internal purposes) if the operation has
a well defined, and on all backends identical, result.

Short answer: Yes.

> > The real question for me is, will the previous one (21/23)
> > return != 0 (some EOPNOTSUPP or else) to DRBD in more situations than
> > what we have now?
> 
> No, blkdev_issue_zeroout should never return -EOPNOTSUPP.
> 
> > Will it make an fstrim cause thinly provisioned
> > devices to suddenly be fully allocated?
> 
> Not for SCSI devices.  Yes for dm-thinp until it implements
> REQ_OP_WRITE_ZEROES, which will hopefully be soon.

"good enough for me" ;-)

Thanks,

    Lars

^ permalink raw reply

* Re: Outstanding MQ questions from MMC
From: Arnd Bergmann @ 2017-03-30 12:42 UTC (permalink / raw)
  To: Linus Walleij
  Cc: linux-mmc@vger.kernel.org, linux-block, Jens Axboe,
	Christoph Hellwig, Ulf Hansson, Adrian Hunter, Paolo Valente
In-Reply-To: <CACRpkdbZrkAqEdP1-QqDww0ZuFcY47Jgj9dPqkG_whrx7pwC-w@mail.gmail.com>

On Wed, Mar 29, 2017 at 5:09 AM, Linus Walleij <linus.walleij@linaro.org> wrote:
> Hi folks,
>
> I earlier left some unanswered questions in my MMC to MQ conversion series
> but I figured it is better if I collect them and ask the blk-mq
> maintainers directly
> how to deal with the following situations that occur in the MMC block layer:
>
>
> 1. The current MMC code locks the host when the first request comes in
> from blk_fetch_request() and unlocks it when blk_fetch_request() returns
> NULL twice in a row. Then the polling thread terminated and is not restarted
> until we get called by the mmc_request_fn.
>
> Host locking means that we will not send other commands to the MMC
> card from i.e. userspace, which sometimes can send spurious stuff orthogonal
> to the block layer. If the block layer has locked the host, userspace
> has to wait
> and vice versa. It is not a common contention point but it still happens.
>
> In MQ, I have simply locked the host on the first request and then I never
> release it. Clearly this does not work. I am uncertain on how to handle this
> and whether MQ has a way to tell us that the queue is empty so we may release
> the host. I toyed with the idea to just set up a timer, but a "queue
> empty" callback
> from the block layer is what would be ideal.

Would it be possible to change the userspace code to go through
the block layer instead and queue a request there, to avoid having
to lock the card at all?

       Arnd

^ permalink raw reply

* Re: [PATCH 22/23] drbd: implement REQ_OP_WRITE_ZEROES
From: Christoph Hellwig @ 2017-03-30 11:44 UTC (permalink / raw)
  To: Christoph Hellwig, axboe, martin.petersen, agk, snitzer, shli,
	philipp.reisner, linux-block, linux-scsi, drbd-dev, dm-devel,
	linux-raid
In-Reply-To: <20170330100641.GI5939@soda.linbit>

On Thu, Mar 30, 2017 at 12:06:41PM +0200, Lars Ellenberg wrote:
> On Thu, Mar 23, 2017 at 10:33:40AM -0400, Christoph Hellwig wrote:
> > It seems like DRBD assumes its on the wire TRIM request always zeroes data.
> > Use that fact to implement REQ_OP_WRITE_ZEROES.
> > 
> > XXX: will need a careful audit from the drbd team!
> 
> Thanks, this one looks ok to me.

So the DRBD protocol requires the TRIM operation to always zero?

> The real question for me is, will the previous one (21/23)
> return != 0 (some EOPNOTSUPP or else) to DRBD in more situations than
> what we have now?

No, blkdev_issue_zeroout should never return -EOPNOTSUPP.

> Will it make an fstrim cause thinly provisioned
> devices to suddenly be fully allocated?

Not for SCSI devices.  Yes for dm-thinp until it implements
REQ_OP_WRITE_ZEROES, which will hopefully be soon.

^ permalink raw reply

* Re: [PATCH 22/23] drbd: implement REQ_OP_WRITE_ZEROES
From: Lars Ellenberg @ 2017-03-30 10:06 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: axboe, martin.petersen, agk, snitzer, shli, philipp.reisner,
	linux-block, linux-scsi, drbd-dev, dm-devel, linux-raid
In-Reply-To: <20170323143341.31549-23-hch@lst.de>

On Thu, Mar 23, 2017 at 10:33:40AM -0400, Christoph Hellwig wrote:
> It seems like DRBD assumes its on the wire TRIM request always zeroes data.
> Use that fact to implement REQ_OP_WRITE_ZEROES.
> 
> XXX: will need a careful audit from the drbd team!

Thanks, this one looks ok to me.

The real question for me is, will the previous one (21/23)
return != 0 (some EOPNOTSUPP or else) to DRBD in more situations than
what we have now?  Will it make an fstrim cause thinly provisioned
devices to suddenly be fully allocated?
Or does it unmap "the same" as what we have now?
Especially on top of dm-thin, but also on top of any other device.
That's something that is not really "obvious" to me yet.

Cheers,
    Lars

^ permalink raw reply

* [PATCH] block/sed-opal: fix spelling mistake: "Lifcycle" -> "Lifecycle"
From: Colin King @ 2017-03-30  9:58 UTC (permalink / raw)
  To: Scott Bauer, Jonathan Derrick, Rafael Antognolli, Jens Axboe,
	linux-block
  Cc: kernel-janitors, linux-kernel

From: Colin Ian King <colin.king@canonical.com>

trivial fix to spelling mistake in pr_err error message

Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 block/sed-opal.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/sed-opal.c b/block/sed-opal.c
index 14035f826b5e..6736c7873d4a 100644
--- a/block/sed-opal.c
+++ b/block/sed-opal.c
@@ -1831,7 +1831,7 @@ static int get_lsp_lifecycle_cont(struct opal_dev *dev)
 	/* 0x08 is Manufacured Inactive */
 	/* 0x09 is Manufactured */
 	if (lc_status != OPAL_MANUFACTURED_INACTIVE) {
-		pr_err("Couldn't determine the status of the Lifcycle state\n");
+		pr_err("Couldn't determine the status of the Lifecycle state\n");
 		return -ENODEV;
 	}
 
-- 
2.11.0

^ permalink raw reply related

* Re: [PATCH 23/23] block: remove the discard_zeroes_data flag
From: hch @ 2017-03-30  9:06 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: agk@redhat.com, lars.ellenberg@linbit.com, snitzer@redhat.com,
	hch@lst.de, martin.petersen@oracle.com,
	philipp.reisner@linbit.com, axboe@kernel.dk, shli@kernel.org,
	linux-scsi@vger.kernel.org, dm-devel@redhat.com,
	drbd-dev@lists.linbit.com, linux-block@vger.kernel.org,
	linux-raid@vger.kernel.org
In-Reply-To: <1490720411.2573.11.camel@sandisk.com>

On Tue, Mar 28, 2017 at 05:00:48PM +0000, Bart Van Assche wrote:
> 
> It seems to me like the documentation in Documentation/ABI/testing/sysfs-block
> and the above code are not in sync. I think the above code will cause reading
> from the discard_zeroes_data attribute to return an empty string ("") instead
> of "0\n".

Thanks, fine with me.

> 
> BTW, my personal preference is to remove this attribute entirely because keeping
> it will cause confusion, no matter how well we document the behavior of this
> attribute.

Jens, any opinion?  I'd like to remove it too, but I fear it might
break things.  We could deprecate it first with a warning when read
and then remove it a few releases down the road.

^ permalink raw reply

* Re: RFC: always use REQ_OP_WRITE_ZEROES for zeroing offload
From: Christoph Hellwig @ 2017-03-30  9:04 UTC (permalink / raw)
  To: Christoph Hellwig, axboe, martin.petersen, agk, snitzer, shli,
	philipp.reisner, linux-block, linux-scsi, drbd-dev, dm-devel,
	linux-raid
In-Reply-To: <20170323155410.GD1138@soda.linbit>

Lars, can you please take a look a patch 22 and check if it's safe?

That's the big thing I want to know before posting the next version
of the series.  If it's not safe I'd like to drop that patch.

^ permalink raw reply

* Re: [PATCH 12/23] sd: handle REQ_UNMAP
From: hch @ 2017-03-30  9:02 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: agk@redhat.com, lars.ellenberg@linbit.com, snitzer@redhat.com,
	hch@lst.de, martin.petersen@oracle.com,
	philipp.reisner@linbit.com, axboe@kernel.dk, shli@kernel.org,
	linux-scsi@vger.kernel.org, dm-devel@redhat.com,
	drbd-dev@lists.linbit.com, linux-block@vger.kernel.org,
	linux-raid@vger.kernel.org
In-Reply-To: <1490719722.2573.8.camel@sandisk.com>

On Tue, Mar 28, 2017 at 04:48:55PM +0000, Bart Van Assche wrote:
> >  	if (sdp->no_write_same)
> >  		return BLKPREP_INVALID;
> >  	if (sdkp->ws16 || sector > 0xffffffff || nr_sectors > 0xffff)
> 
> Users can change the provisioning mode from user space fromï¿½SD_LBP_WS16 into
> SD_LBP_WS10 so I'm not sure it's safe to skip the (sdkp->ws16 || sector >
> 0xffffffff || nr_sectors > 0xffff) check if REQ_UNMAP is set.

They can, and if the device has too many sectors that will already cause
discard to fail, and in this case it will cause write zeroes to fail as
well.  The intent behind this patch is to keep the behavior the same
as the old path that uses discards for zeroing.  The logic looks a bit
clumsy, but I'd rather keep it as-is.

^ permalink raw reply

* Re: [PATCH 11/23] block_dev: use blkdev_issue_zerout for hole punches
From: hch @ 2017-03-30  8:59 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: agk@redhat.com, lars.ellenberg@linbit.com, snitzer@redhat.com,
	hch@lst.de, martin.petersen@oracle.com,
	philipp.reisner@linbit.com, axboe@kernel.dk, shli@kernel.org,
	linux-scsi@vger.kernel.org, dm-devel@redhat.com,
	drbd-dev@lists.linbit.com, linux-block@vger.kernel.org,
	linux-raid@vger.kernel.org
In-Reply-To: <1490719834.2573.9.camel@sandisk.com>

On Tue, Mar 28, 2017 at 04:50:47PM +0000, Bart Van Assche wrote:
> On Thu, 2017-03-23 at 10:33 -0400, Christoph Hellwig wrote:
> > This gets us support for non-discard efficient write of zeroes (e.g. NVMe)
> > and preparse for removing the discard_zeroes_data flag.
> 
> Hello Christoph,
> 
> "preparse" probably should have been "prepare"?

Yes, fixed.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox