[dm-devel] [PATCH] dm-mpath: fix a tiny case which can cause an infinite loop

linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [dm-devel] [PATCH] dm-mpath: fix a tiny case which can cause an infinite loop
@ 2016-02-04  2:08 jiangyiwen
  2016-02-04  3:24 ` Mike Snitzer
  0 siblings, 1 reply; 15+ messages in thread
From: jiangyiwen @ 2016-02-04  2:08 UTC (permalink / raw)
  To: snitzer
  Cc: dm-devel, linux-scsi, Christoph Hellwig, James Bottomley,
	xuejiufei, Qijiang (Joseph, Euler), martin.petersen

When two processes submit WRTIE SAME bio simultaneously and
first IO return failed because of INVALID FIELD IN CDB, and
then second IO can enter into an infinite loop.
The problem can be described as follows:

process 1                              process 2
submit_bio(REQ_WRITE_SAME) and
wait io completion
                                       begin submit_bio(REQ_WRITE_SAME)
                                       -> blk_queue_bio()
                                         -> dm_request_fn()
                                           -> map_request()
                                             -> scsi_request_fn()
                                               -> blk_peek_request()
                                                 -> scsi_prep_fn()
In the moment, IO return and
sense_key with illegal_request,
sense_code with 0x24(INVALID FIELD IN CDB).
In this way it will set no_write_same = 1
in sd_done() and disable write same
                                       In sd_setup_write_same_cmnd()
                                       when find (no_write_same == 1)
                                       will return BLKPREP_KILL,
                                       and then in blk_peek_request()
                                       set error to EIO.
                                       However, in multipath_end_io()
                                       when find error is EIO it will
                                       fail path and retry even if
                                       device doesn't support WRITE SAME.

In this situation, when process of multipathd reinstate
path by using test_unit_ready periodically, it will cause
second WRITE SAME IO enters into an infinite loop and
loop never terminates.

In do_end_io(), when finding device doesn't support WRITE SAME and
request is WRITE SAME, return EOPNOTSUPP to applications.

Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: xuejiufei <xuejiufei@huawei.com>
---
 drivers/md/dm-mpath.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index cfa29f5..ad1602a 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -1269,6 +1269,11 @@ static int do_end_io(struct multipath *m, struct request *clone,
 	if (noretry_error(error))
 		return error;

+	/* do not retry in case of WRITE SAME not supporting */
+	if ((clone->cmd_flags & REQ_WRITE_SAME) &&
+			!clone->q->limits.max_write_same_sectors)
+		return -EOPNOTSUPP;
+
 	if (mpio->pgpath)
 		fail_path(mpio->pgpath);

-- 
1.8.4.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: dm-mpath: fix a tiny case which can cause an infinite loop
  2016-02-04  2:08 [dm-devel] [PATCH] dm-mpath: fix a tiny case which can cause an infinite loop jiangyiwen
@ 2016-02-04  3:24 ` Mike Snitzer
  2016-02-04  3:49   ` jiangyiwen
  0 siblings, 1 reply; 15+ messages in thread
From: Mike Snitzer @ 2016-02-04  3:24 UTC (permalink / raw)
  To: jiangyiwen
  Cc: dm-devel, linux-scsi, Christoph Hellwig, James Bottomley,
	xuejiufei, Qijiang (Joseph, Euler), martin.petersen

On Wed, Feb 03 2016 at  9:08pm -0500,
jiangyiwen <jiangyiwen@huawei.com> wrote:

> When two processes submit WRTIE SAME bio simultaneously and
> first IO return failed because of INVALID FIELD IN CDB, and
> then second IO can enter into an infinite loop.
> The problem can be described as follows:
> 
> process 1                              process 2
> submit_bio(REQ_WRITE_SAME) and
> wait io completion
>                                        begin submit_bio(REQ_WRITE_SAME)
>                                        -> blk_queue_bio()
>                                          -> dm_request_fn()
>                                            -> map_request()
>                                              -> scsi_request_fn()
>                                                -> blk_peek_request()
>                                                  -> scsi_prep_fn()
> In the moment, IO return and
> sense_key with illegal_request,
> sense_code with 0x24(INVALID FIELD IN CDB).
> In this way it will set no_write_same = 1
> in sd_done() and disable write same
>                                        In sd_setup_write_same_cmnd()
>                                        when find (no_write_same == 1)
>                                        will return BLKPREP_KILL,
>                                        and then in blk_peek_request()
>                                        set error to EIO.
>                                        However, in multipath_end_io()
>                                        when find error is EIO it will
>                                        fail path and retry even if
>                                        device doesn't support WRITE SAME.
> 
> In this situation, when process of multipathd reinstate
> path by using test_unit_ready periodically, it will cause
> second WRITE SAME IO enters into an infinite loop and
> loop never terminates.
> 
> In do_end_io(), when finding device doesn't support WRITE SAME and
> request is WRITE SAME, return EOPNOTSUPP to applications.
> 
> Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com>
> Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
> Reviewed-by: xuejiufei <xuejiufei@huawei.com>
> ---
>  drivers/md/dm-mpath.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
> index cfa29f5..ad1602a 100644
> --- a/drivers/md/dm-mpath.c
> +++ b/drivers/md/dm-mpath.c
> @@ -1269,6 +1269,11 @@ static int do_end_io(struct multipath *m, struct request *clone,
>  	if (noretry_error(error))
>  		return error;
> 
> +	/* do not retry in case of WRITE SAME not supporting */
> +	if ((clone->cmd_flags & REQ_WRITE_SAME) &&
> +			!clone->q->limits.max_write_same_sectors)
> +		return -EOPNOTSUPP;
> +
>  	if (mpio->pgpath)
>  		fail_path(mpio->pgpath);
> 

Did you test this patch?  Looks like it isn't going to make a
difference.  'error' will be -EREMOTEIO, which will be caught by
noretry_error().  So we'll never go on to run your new code.

Please see commit 7eee4ae2db ("dm: disable WRITE SAME if it fails").

I think DM already handles this case properly.  The only thing is it'll
return -EREMOTEIO (rather than -EOPNOTSUPP) further up the stack.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: dm-mpath: fix a tiny case which can cause an infinite loop
  2016-02-04  3:24 ` Mike Snitzer
@ 2016-02-04  3:49   ` jiangyiwen
  2016-02-04  4:25     ` Mike Snitzer
  0 siblings, 1 reply; 15+ messages in thread
From: jiangyiwen @ 2016-02-04  3:49 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: dm-devel, linux-scsi, Christoph Hellwig, James Bottomley,
	xuejiufei, Qijiang (Joseph, Euler), martin.petersen

On 2016/2/4 11:24, Mike Snitzer wrote:
> On Wed, Feb 03 2016 at  9:08pm -0500,
> jiangyiwen <jiangyiwen@huawei.com> wrote:
> 
>> When two processes submit WRTIE SAME bio simultaneously and
>> first IO return failed because of INVALID FIELD IN CDB, and
>> then second IO can enter into an infinite loop.
>> The problem can be described as follows:
>>
>> process 1                              process 2
>> submit_bio(REQ_WRITE_SAME) and
>> wait io completion
>>                                        begin submit_bio(REQ_WRITE_SAME)
>>                                        -> blk_queue_bio()
>>                                          -> dm_request_fn()
>>                                            -> map_request()
>>                                              -> scsi_request_fn()
>>                                                -> blk_peek_request()
>>                                                  -> scsi_prep_fn()
>> In the moment, IO return and
>> sense_key with illegal_request,
>> sense_code with 0x24(INVALID FIELD IN CDB).
>> In this way it will set no_write_same = 1
>> in sd_done() and disable write same
>>                                        In sd_setup_write_same_cmnd()
>>                                        when find (no_write_same == 1)
>>                                        will return BLKPREP_KILL,
>>                                        and then in blk_peek_request()
>>                                        set error to EIO.
>>                                        However, in multipath_end_io()
>>                                        when find error is EIO it will
>>                                        fail path and retry even if
>>                                        device doesn't support WRITE SAME.
>>
>> In this situation, when process of multipathd reinstate
>> path by using test_unit_ready periodically, it will cause
>> second WRITE SAME IO enters into an infinite loop and
>> loop never terminates.
>>
>> In do_end_io(), when finding device doesn't support WRITE SAME and
>> request is WRITE SAME, return EOPNOTSUPP to applications.
>>
>> Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com>
>> Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
>> Reviewed-by: xuejiufei <xuejiufei@huawei.com>
>> ---
>>  drivers/md/dm-mpath.c | 5 +++++
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
>> index cfa29f5..ad1602a 100644
>> --- a/drivers/md/dm-mpath.c
>> +++ b/drivers/md/dm-mpath.c
>> @@ -1269,6 +1269,11 @@ static int do_end_io(struct multipath *m, struct request *clone,
>>  	if (noretry_error(error))
>>  		return error;
>>
>> +	/* do not retry in case of WRITE SAME not supporting */
>> +	if ((clone->cmd_flags & REQ_WRITE_SAME) &&
>> +			!clone->q->limits.max_write_same_sectors)
>> +		return -EOPNOTSUPP;
>> +
>>  	if (mpio->pgpath)
>>  		fail_path(mpio->pgpath);
>>
> 
> Did you test this patch?  Looks like it isn't going to make a
> difference.  'error' will be -EREMOTEIO, which will be caught by
> noretry_error().  So we'll never go on to run your new code.
> 
> Please see commit 7eee4ae2db ("dm: disable WRITE SAME if it fails").
> 
> I think DM already handles this case properly.  The only thing is it'll
> return -EREMOTEIO (rather than -EOPNOTSUPP) further up the stack.
> 
> .
> 
Hi Mike,
Yes, I have already test in version linux-3.8, and I also have already
carefully checked latest kernel code, and find that it also exists this
problem. I know about this commit 7eee4ae2db ("dm: disable WRITE SAME
if it fails"), it only can solve first IO situation which I mentioned
above, in other words, when WRITE SAME IO truly send to device, it
actually return -EREMOTEIO if device doesn't support WRITE SAME.
But in above situation which issues two WRITE SAME IO simultaneously,
second IO will not truly send to device instead of returning
BLKPREP_KILL in sd_setup_write_same_cmnd() because find
(no_write_same == 1) which no_write_same is set when first IO returned,
and then in blk_peek_request() will return EIO to MD/DM which caused
the problem above mentioned.

I have described detailed process of problem, you will find actually
it is a problem when reviewed carefully.

Thanks,
Yiwen Jiang



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: dm-mpath: fix a tiny case which can cause an infinite loop
  2016-02-04  3:49   ` jiangyiwen
@ 2016-02-04  4:25     ` Mike Snitzer
  2016-02-04  5:03       ` Martin K. Petersen
                         ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Mike Snitzer @ 2016-02-04  4:25 UTC (permalink / raw)
  To: jiangyiwen
  Cc: dm-devel, linux-scsi, Christoph Hellwig, James Bottomley,
	xuejiufei, Qijiang (Joseph, Euler), martin.petersen

On Wed, Feb 03 2016 at 10:49pm -0500,
jiangyiwen <jiangyiwen@huawei.com> wrote:

> On 2016/2/4 11:24, Mike Snitzer wrote:
> > On Wed, Feb 03 2016 at  9:08pm -0500,
> > jiangyiwen <jiangyiwen@huawei.com> wrote:
> > 
> >> When two processes submit WRTIE SAME bio simultaneously and
> >> first IO return failed because of INVALID FIELD IN CDB, and
> >> then second IO can enter into an infinite loop.
> >> The problem can be described as follows:
> >>
> >> process 1                              process 2
> >> submit_bio(REQ_WRITE_SAME) and
> >> wait io completion
> >>                                        begin submit_bio(REQ_WRITE_SAME)
> >>                                        -> blk_queue_bio()
> >>                                          -> dm_request_fn()
> >>                                            -> map_request()
> >>                                              -> scsi_request_fn()
> >>                                                -> blk_peek_request()
> >>                                                  -> scsi_prep_fn()
> >> In the moment, IO return and
> >> sense_key with illegal_request,
> >> sense_code with 0x24(INVALID FIELD IN CDB).
> >> In this way it will set no_write_same = 1
> >> in sd_done() and disable write same
> >>                                        In sd_setup_write_same_cmnd()
> >>                                        when find (no_write_same == 1)
> >>                                        will return BLKPREP_KILL,
> >>                                        and then in blk_peek_request()
> >>                                        set error to EIO.
> >>                                        However, in multipath_end_io()
> >>                                        when find error is EIO it will
> >>                                        fail path and retry even if
> >>                                        device doesn't support WRITE SAME.
> >>
> >> In this situation, when process of multipathd reinstate
> >> path by using test_unit_ready periodically, it will cause
> >> second WRITE SAME IO enters into an infinite loop and
> >> loop never terminates.
> >>
> >> In do_end_io(), when finding device doesn't support WRITE SAME and
> >> request is WRITE SAME, return EOPNOTSUPP to applications.
> >>
> >> Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com>
> >> Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
> >> Reviewed-by: xuejiufei <xuejiufei@huawei.com>
> >> ---
> >>  drivers/md/dm-mpath.c | 5 +++++
> >>  1 file changed, 5 insertions(+)
> >>
> >> diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
> >> index cfa29f5..ad1602a 100644
> >> --- a/drivers/md/dm-mpath.c
> >> +++ b/drivers/md/dm-mpath.c
> >> @@ -1269,6 +1269,11 @@ static int do_end_io(struct multipath *m, struct request *clone,
> >>  	if (noretry_error(error))
> >>  		return error;
> >>
> >> +	/* do not retry in case of WRITE SAME not supporting */
> >> +	if ((clone->cmd_flags & REQ_WRITE_SAME) &&
> >> +			!clone->q->limits.max_write_same_sectors)
> >> +		return -EOPNOTSUPP;
> >> +
> >>  	if (mpio->pgpath)
> >>  		fail_path(mpio->pgpath);
> >>
> > 
> > Did you test this patch?  Looks like it isn't going to make a
> > difference.  'error' will be -EREMOTEIO, which will be caught by
> > noretry_error().  So we'll never go on to run your new code.
> > 
> > Please see commit 7eee4ae2db ("dm: disable WRITE SAME if it fails").
> > 
> > I think DM already handles this case properly.  The only thing is it'll
> > return -EREMOTEIO (rather than -EOPNOTSUPP) further up the stack.
> > 
> > .
> > 
> Hi Mike,
> Yes, I have already test in version linux-3.8, and I also have already
> carefully checked latest kernel code, and find that it also exists this
> problem. I know about this commit 7eee4ae2db ("dm: disable WRITE SAME
> if it fails"), it only can solve first IO situation which I mentioned
> above, in other words, when WRITE SAME IO truly send to device, it
> actually return -EREMOTEIO if device doesn't support WRITE SAME.
> But in above situation which issues two WRITE SAME IO simultaneously,
> second IO will not truly send to device instead of returning
> BLKPREP_KILL in sd_setup_write_same_cmnd() because find
> (no_write_same == 1) which no_write_same is set when first IO returned,
> and then in blk_peek_request() will return EIO to MD/DM which caused
> the problem above mentioned.
> 
> I have described detailed process of problem, you will find actually
> it is a problem when reviewed carefully.

OK, so -EIO is getting returned.  That shouldn't happen.  -EIO is the
generic catch-all error that we're going to retry.  We have
differentiated IO errors in SCSI that get returned up the IO stack
(e.g. -EREMOTEIO, etc).

The SCSI, or block layer, should return a non-retryable error for this
case.  But we only have the differentiated IO errors for SCSI cmds that
are issued, so it seems we still need to train SCSI (and block by
association/dependency) to return permanent errors that are identified
during request preparation.

But it may be that we need to widen the WRITE_SAME error handling code
in DM to check for -EREMOTEIO or -EOPNOTSUPP.  But I'm really not in
favor of special-casing -EIO for WRITE_SAME, we need to be sprinkling
less special-case code to make up for lack of information for lower
layers.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: dm-mpath: fix a tiny case which can cause an infinite loop
  2016-02-04  4:25     ` Mike Snitzer
@ 2016-02-04  5:03       ` Martin K. Petersen
  2016-02-04  6:48       ` [PATCH] block/sd: Return -EREMOTEIO when WRITE SAME and DISCARD are disabled Martin K. Petersen
  2016-02-04  8:25       ` dm-mpath: fix a tiny case which can cause an infinite loop jiangyiwen
  2 siblings, 0 replies; 15+ messages in thread
From: Martin K. Petersen @ 2016-02-04  5:03 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: jiangyiwen, dm-devel, linux-scsi, Christoph Hellwig,
	James Bottomley, xuejiufei, Qijiang (Joseph, Euler),
	martin.petersen

>>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:

Mike> The SCSI, or block layer, should return a non-retryable error for
Mike> this case.  But we only have the differentiated IO errors for SCSI
Mike> cmds that are issued, so it seems we still need to train SCSI (and
Mike> block by association/dependency) to return permanent errors that
Mike> are identified during request preparation.

There currently isn't any way to make BLKPREP_KILL return anything other
than -EIO. I'll take a look tomorrow...

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH] block/sd: Return -EREMOTEIO when WRITE SAME and DISCARD are disabled
  2016-02-04  4:25     ` Mike Snitzer
  2016-02-04  5:03       ` Martin K. Petersen
@ 2016-02-04  6:48       ` Martin K. Petersen
  2016-02-04  9:16         ` jiangyiwen
                           ` (2 more replies)
  2016-02-04  8:25       ` dm-mpath: fix a tiny case which can cause an infinite loop jiangyiwen
  2 siblings, 3 replies; 15+ messages in thread
From: Martin K. Petersen @ 2016-02-04  6:48 UTC (permalink / raw)
  To: linux-scsi, linux-block; +Cc: jiangyiwen, snitzer, Martin K. Petersen

When a storage device rejects a WRITE SAME command we will disable write
same functionality for the device and return -EREMOTEIO to the block
layer. -EREMOTEIO will in turn prevent DM from retrying the I/O and/or
failing the path.

Yiwen Jiang discovered a small race where WRITE SAME requests issued
simultaneously would cause -EIO to be returned. This happened because
any requests being prepared after WRITE SAME had been disabled for the
device caused us to return BLKPREP_KILL. The latter caused the block
layer to return -EIO upon completion.

To overcome this we introduce BLKPREP_INVALID which indicates that this
is an invalid request for the device. blk_peek_request() is modified to
return -EREMOTEIO in that case.

Reported-by: Yiwen Jiang <jiangyiwen@huawei.com>
Suggested-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

---

I contemplated making blk_peek_request() use rq->errors to decide what
to return up the stack. However, I cringed at mixing errnos and SCSI
midlayer status information in the same location.
---
 block/blk-core.c       | 6 ++++--
 drivers/scsi/sd.c      | 4 ++--
 include/linux/blkdev.h | 9 ++++++---
 3 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 33e2f62d5062..ee833af2f892 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2446,14 +2446,16 @@ struct request *blk_peek_request(struct request_queue *q)
 
 			rq = NULL;
 			break;
-		} else if (ret == BLKPREP_KILL) {
+		} else if (ret == BLKPREP_KILL || ret == BLKPREP_INVALID) {
+			int err = ret == BLKPREP_INVALID ? -EREMOTEIO : -EIO;
+
 			rq->cmd_flags |= REQ_QUIET;
 			/*
 			 * Mark this request as started so we don't trigger
 			 * any debug logic in the end I/O path.
 			 */
 			blk_start_request(rq);
-			__blk_end_request_all(rq, -EIO);
+			__blk_end_request_all(rq, err);
 		} else {
 			printk(KERN_ERR "%s: bad return=%d\n", __func__, ret);
 			break;
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index ec163d08f6c3..6e841c6da632 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -761,7 +761,7 @@ static int sd_setup_discard_cmnd(struct scsi_cmnd *cmd)
 		break;
 
 	default:
-		ret = BLKPREP_KILL;
+		ret = BLKPREP_INVALID;
 		goto out;
 	}
 
@@ -839,7 +839,7 @@ static int sd_setup_write_same_cmnd(struct scsi_cmnd *cmd)
 	int ret;
 
 	if (sdkp->device->no_write_same)
-		return BLKPREP_KILL;
+		return BLKPREP_INVALID;
 
 	BUG_ON(bio_offset(bio) || bio_iovec(bio).bv_len != sdp->sector_size);
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index c70e3588a48c..e990d181625a 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -680,9 +680,12 @@ static inline bool blk_write_same_mergeable(struct bio *a, struct bio *b)
 /*
  * q->prep_rq_fn return values
  */
-#define BLKPREP_OK		0	/* serve it */
-#define BLKPREP_KILL		1	/* fatal error, kill */
-#define BLKPREP_DEFER		2	/* leave on queue */
+enum {
+	BLKPREP_OK,		/* serve it */
+	BLKPREP_KILL,		/* fatal error, kill, return -EIO */
+	BLKPREP_INVALID,	/* invalid command, kill, return -EREMOTEIO */
+	BLKPREP_DEFER,		/* leave on queue */
+};
 
 extern unsigned long blk_max_low_pfn, blk_max_pfn;
 
-- 
2.7.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] block/sd: Return -EREMOTEIO when WRITE SAME and DISCARD are disabled
  2016-02-04  6:48       ` [PATCH] block/sd: Return -EREMOTEIO when WRITE SAME and DISCARD are disabled Martin K. Petersen
@ 2016-02-04  9:16         ` jiangyiwen
  2016-02-05  3:13           ` Martin K. Petersen
  2016-02-04 14:14         ` Hannes Reinecke
  2016-02-04 16:36         ` Ewan Milne
  2 siblings, 1 reply; 15+ messages in thread
From: jiangyiwen @ 2016-02-04  9:16 UTC (permalink / raw)
  To: Martin K. Petersen, linux-scsi, linux-block; +Cc: snitzer

On 2016/2/4 14:48, Martin K. Petersen wrote:
> When a storage device rejects a WRITE SAME command we will disable write
> same functionality for the device and return -EREMOTEIO to the block
> layer. -EREMOTEIO will in turn prevent DM from retrying the I/O and/or
> failing the path.
> 
> Yiwen Jiang discovered a small race where WRITE SAME requests issued
> simultaneously would cause -EIO to be returned. This happened because
> any requests being prepared after WRITE SAME had been disabled for the
> device caused us to return BLKPREP_KILL. The latter caused the block
> layer to return -EIO upon completion.
> 
> To overcome this we introduce BLKPREP_INVALID which indicates that this
> is an invalid request for the device. blk_peek_request() is modified to
> return -EREMOTEIO in that case.
> 
> Reported-by: Yiwen Jiang <jiangyiwen@huawei.com>
> Suggested-by: Mike Snitzer <snitzer@redhat.com>
> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
> 
> ---
> 
> I contemplated making blk_peek_request() use rq->errors to decide what
> to return up the stack. However, I cringed at mixing errnos and SCSI
> midlayer status information in the same location.
> ---
>  block/blk-core.c       | 6 ++++--
>  drivers/scsi/sd.c      | 4 ++--
>  include/linux/blkdev.h | 9 ++++++---
>  3 files changed, 12 insertions(+), 7 deletions(-)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 33e2f62d5062..ee833af2f892 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -2446,14 +2446,16 @@ struct request *blk_peek_request(struct request_queue *q)
>  
>  			rq = NULL;
>  			break;
> -		} else if (ret == BLKPREP_KILL) {
> +		} else if (ret == BLKPREP_KILL || ret == BLKPREP_INVALID) {
> +			int err = ret == BLKPREP_INVALID ? -EREMOTEIO : -EIO;
> +
>  			rq->cmd_flags |= REQ_QUIET;
>  			/*
>  			 * Mark this request as started so we don't trigger
>  			 * any debug logic in the end I/O path.
>  			 */
>  			blk_start_request(rq);
> -			__blk_end_request_all(rq, -EIO);
> +			__blk_end_request_all(rq, err);
>  		} else {
>  			printk(KERN_ERR "%s: bad return=%d\n", __func__, ret);
>  			break;
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index ec163d08f6c3..6e841c6da632 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -761,7 +761,7 @@ static int sd_setup_discard_cmnd(struct scsi_cmnd *cmd)
>  		break;
>  
>  	default:
> -		ret = BLKPREP_KILL;
> +		ret = BLKPREP_INVALID;
>  		goto out;
>  	}
>  
> @@ -839,7 +839,7 @@ static int sd_setup_write_same_cmnd(struct scsi_cmnd *cmd)
>  	int ret;
>  
>  	if (sdkp->device->no_write_same)
> -		return BLKPREP_KILL;
> +		return BLKPREP_INVALID;
>  
>  	BUG_ON(bio_offset(bio) || bio_iovec(bio).bv_len != sdp->sector_size);
>  
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index c70e3588a48c..e990d181625a 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -680,9 +680,12 @@ static inline bool blk_write_same_mergeable(struct bio *a, struct bio *b)
>  /*
>   * q->prep_rq_fn return values
>   */
> -#define BLKPREP_OK		0	/* serve it */
> -#define BLKPREP_KILL		1	/* fatal error, kill */
> -#define BLKPREP_DEFER		2	/* leave on queue */
> +enum {
> +	BLKPREP_OK,		/* serve it */
> +	BLKPREP_KILL,		/* fatal error, kill, return -EIO */
> +	BLKPREP_INVALID,	/* invalid command, kill, return -EREMOTEIO */
> +	BLKPREP_DEFER,		/* leave on queue */
> +};
>  
>  extern unsigned long blk_max_low_pfn, blk_max_pfn;
>  
> 
Hi Martin,
It is very good, I totally agree with this patch. But I have
three questions:

First, I don't understand why blk_peek_request() return EREMOTEIO,
as I know, in this situation we only prepare scsi command
without sending to device, and I think EREMOTEIO should
be returned only when IO has already sent to device, maybe
I don't understand definition of EREMOTEIO.
So, Why don't return the errno with EOPNOTSUPP?

In addition, I still worried with whether there has other
situations which will return EIO or other error. In this
way, MD/DM still can happen this type of problem, so I think
may be in multipath we still needs a protection to avoid it.

At last, I have a additional problem, I remember that you
previously send a series of patches about XCOPY, why don't
have any news latter later? I very much expect that I can
see these patches which are merged into kernel.

Thanks,
Yiwen Jiang.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] block/sd: Return -EREMOTEIO when WRITE SAME and DISCARD are disabled
  2016-02-04  9:16         ` jiangyiwen
@ 2016-02-05  3:13           ` Martin K. Petersen
  2016-02-05  3:39             ` jiangyiwen
  2016-02-16 12:14             ` jiangyiwen
  0 siblings, 2 replies; 15+ messages in thread
From: Martin K. Petersen @ 2016-02-05  3:13 UTC (permalink / raw)
  To: jiangyiwen; +Cc: Martin K. Petersen, linux-scsi, linux-block, snitzer

>>>>> "Yiwen" == jiangyiwen  <jiangyiwen@huawei.com> writes:

Yiwen,

Yiwen> First, I don't understand why blk_peek_request() return
Yiwen> EREMOTEIO, as I know, in this situation we only prepare scsi
Yiwen> command without sending to device, and I think EREMOTEIO should
Yiwen> be returned only when IO has already sent to device, maybe I
Yiwen> don't understand definition of EREMOTEIO.  So, Why don't return
Yiwen> the errno with EOPNOTSUPP?

DM currently has special handling for EREMOTEIO failures (because that's
what we'd return when a device responds with ILLEGAL REQUEST).

I am not opposed to returning EOPNOTSUPP but it would require more
changes and since this is a bugfix for stable I want to keep it as small
as possible.

Yiwen> In addition, I still worried with whether there has other
Yiwen> situations which will return EIO or other error. In this way,
Yiwen> MD/DM still can happen this type of problem, so I think may be in
Yiwen> multipath we still needs a protection to avoid it.

There are various error scenarios where we can end up bailing with a
BLKPREP_KILL. But the general rule of thumb is that these conditions all
demand a retry. The optional commands like WRITE SAME and UNMAP are
special in that they are irrecoverable.

Yiwen> At last, I have a additional problem, I remember that you
Yiwen> previously send a series of patches about XCOPY, why don't have
Yiwen> any news latter later? I very much expect that I can see these
Yiwen> patches which are merged into kernel.

I am working on a refresh of the series that includes token-based copy
offload support in addition to EXTENDED COPY. The patches depend on Mike
Christie's request flag patch series which has yet to be merged.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] block/sd: Return -EREMOTEIO when WRITE SAME and DISCARD are disabled
  2016-02-05  3:13           ` Martin K. Petersen
@ 2016-02-05  3:39             ` jiangyiwen
  2016-02-16 12:14             ` jiangyiwen
  1 sibling, 0 replies; 15+ messages in thread
From: jiangyiwen @ 2016-02-05  3:39 UTC (permalink / raw)
  To: Martin K. Petersen; +Cc: linux-scsi, linux-block, snitzer

On 2016/2/5 11:13, Martin K. Petersen wrote:
>>>>>> "Yiwen" == jiangyiwen  <jiangyiwen@huawei.com> writes:
> 
> Yiwen,
> 
> Yiwen> First, I don't understand why blk_peek_request() return
> Yiwen> EREMOTEIO, as I know, in this situation we only prepare scsi
> Yiwen> command without sending to device, and I think EREMOTEIO should
> Yiwen> be returned only when IO has already sent to device, maybe I
> Yiwen> don't understand definition of EREMOTEIO.  So, Why don't return
> Yiwen> the errno with EOPNOTSUPP?
> 
> DM currently has special handling for EREMOTEIO failures (because that's
> what we'd return when a device responds with ILLEGAL REQUEST).
> 
> I am not opposed to returning EOPNOTSUPP but it would require more
> changes and since this is a bugfix for stable I want to keep it as small
> as possible.
> 
> Yiwen> In addition, I still worried with whether there has other
> Yiwen> situations which will return EIO or other error. In this way,
> Yiwen> MD/DM still can happen this type of problem, so I think may be in
> Yiwen> multipath we still needs a protection to avoid it.
> 
> There are various error scenarios where we can end up bailing with a
> BLKPREP_KILL. But the general rule of thumb is that these conditions all
> demand a retry. The optional commands like WRITE SAME and UNMAP are
> special in that they are irrecoverable.
> 
> Yiwen> At last, I have a additional problem, I remember that you
> Yiwen> previously send a series of patches about XCOPY, why don't have
> Yiwen> any news latter later? I very much expect that I can see these
> Yiwen> patches which are merged into kernel.
> 
> I am working on a refresh of the series that includes token-based copy
> offload support in addition to EXTENDED COPY. The patches depend on Mike
> Christie's request flag patch series which has yet to be merged.
> 
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>

Thanks,
Yiwen Jiang.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] block/sd: Return -EREMOTEIO when WRITE SAME and DISCARD are disabled
  2016-02-05  3:13           ` Martin K. Petersen
  2016-02-05  3:39             ` jiangyiwen
@ 2016-02-16 12:14             ` jiangyiwen
  2016-02-18  0:22               ` Martin K. Petersen
  1 sibling, 1 reply; 15+ messages in thread
From: jiangyiwen @ 2016-02-16 12:14 UTC (permalink / raw)
  To: Martin K. Petersen; +Cc: linux-scsi, linux-block, snitzer, James Bottomley

On 2016/2/5 11:13, Martin K. Petersen wrote:
>>>>>> "Yiwen" == jiangyiwen  <jiangyiwen@huawei.com> writes:
> 
> Yiwen,
> 
> Yiwen> First, I don't understand why blk_peek_request() return
> Yiwen> EREMOTEIO, as I know, in this situation we only prepare scsi
> Yiwen> command without sending to device, and I think EREMOTEIO should
> Yiwen> be returned only when IO has already sent to device, maybe I
> Yiwen> don't understand definition of EREMOTEIO.  So, Why don't return
> Yiwen> the errno with EOPNOTSUPP?
> 
> DM currently has special handling for EREMOTEIO failures (because that's
> what we'd return when a device responds with ILLEGAL REQUEST).
> 
> I am not opposed to returning EOPNOTSUPP but it would require more
> changes and since this is a bugfix for stable I want to keep it as small
> as possible.
> 
> Yiwen> In addition, I still worried with whether there has other
> Yiwen> situations which will return EIO or other error. In this way,
> Yiwen> MD/DM still can happen this type of problem, so I think may be in
> Yiwen> multipath we still needs a protection to avoid it.
> 
> There are various error scenarios where we can end up bailing with a
> BLKPREP_KILL. But the general rule of thumb is that these conditions all
> demand a retry. The optional commands like WRITE SAME and UNMAP are
> special in that they are irrecoverable.
> 
> Yiwen> At last, I have a additional problem, I remember that you
> Yiwen> previously send a series of patches about XCOPY, why don't have
> Yiwen> any news latter later? I very much expect that I can see these
> Yiwen> patches which are merged into kernel.
> 
> I am working on a refresh of the series that includes token-based copy
> offload support in addition to EXTENDED COPY. The patches depend on Mike
> Christie's request flag patch series which has yet to be merged.
> 
Hi Martin,
I tested this code, but found a problem.

When called scsi_prep_fn return BLKPREP_INVALID, we should
use the same code with BLKPREP_KILL in scsi_prep_return. This
patch should add a line of code as follows:

---
 drivers/scsi/scsi_lib.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index dd8ad2a..d8d2198 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1343,6 +1343,7 @@ scsi_prep_return(struct request_queue *q, struct request *req, int ret)

 	switch (ret) {
 	case BLKPREP_KILL:
+	case BLKPREP_INVALID:
 		req->errors = DID_NO_CONNECT << 16;
 		/* release the command and kill it */
 		if (req->special) {
-- 
1.8.4.3

Thanks,
Yiwen Jiang.


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] block/sd: Return -EREMOTEIO when WRITE SAME and DISCARD are disabled
  2016-02-16 12:14             ` jiangyiwen
@ 2016-02-18  0:22               ` Martin K. Petersen
  0 siblings, 0 replies; 15+ messages in thread
From: Martin K. Petersen @ 2016-02-18  0:22 UTC (permalink / raw)
  To: jiangyiwen
  Cc: Martin K. Petersen, linux-scsi, linux-block, snitzer,
	James Bottomley

>>>>> "Yiwen" == jiangyiwen  <jiangyiwen@huawei.com> writes:

Yiwen,

Yiwen> When called scsi_prep_fn return BLKPREP_INVALID, we should use
Yiwen> the same code with BLKPREP_KILL in scsi_prep_return.

You are right!

Applied to 4.5/scsi-fixes.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] block/sd: Return -EREMOTEIO when WRITE SAME and DISCARD are disabled
  2016-02-04  6:48       ` [PATCH] block/sd: Return -EREMOTEIO when WRITE SAME and DISCARD are disabled Martin K. Petersen
  2016-02-04  9:16         ` jiangyiwen
@ 2016-02-04 14:14         ` Hannes Reinecke
  2016-02-04 16:36         ` Ewan Milne
  2 siblings, 0 replies; 15+ messages in thread
From: Hannes Reinecke @ 2016-02-04 14:14 UTC (permalink / raw)
  To: Martin K. Petersen, linux-scsi, linux-block; +Cc: jiangyiwen, snitzer

On 02/04/2016 07:48 AM, Martin K. Petersen wrote:
> When a storage device rejects a WRITE SAME command we will disable write
> same functionality for the device and return -EREMOTEIO to the block
> layer. -EREMOTEIO will in turn prevent DM from retrying the I/O and/or
> failing the path.
> 
> Yiwen Jiang discovered a small race where WRITE SAME requests issued
> simultaneously would cause -EIO to be returned. This happened because
> any requests being prepared after WRITE SAME had been disabled for the
> device caused us to return BLKPREP_KILL. The latter caused the block
> layer to return -EIO upon completion.
> 
> To overcome this we introduce BLKPREP_INVALID which indicates that this
> is an invalid request for the device. blk_peek_request() is modified to
> return -EREMOTEIO in that case.
> 
> Reported-by: Yiwen Jiang <jiangyiwen@huawei.com>
> Suggested-by: Mike Snitzer <snitzer@redhat.com>
> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
> 
> ---
> 
> I contemplated making blk_peek_request() use rq->errors to decide what
> to return up the stack. However, I cringed at mixing errnos and SCSI
> midlayer status information in the same location.

Well, rq->errors has a mixed heritage anyway; occasionally it's an
errno, sometimes a SCSI midlayer status, you name it.
One should clean it up eventually ...

> ---
>  block/blk-core.c       | 6 ++++--
>  drivers/scsi/sd.c      | 4 ++--
>  include/linux/blkdev.h | 9 ++++++---
>  3 files changed, 12 insertions(+), 7 deletions(-)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 33e2f62d5062..ee833af2f892 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -2446,14 +2446,16 @@ struct request *blk_peek_request(struct request_queue *q)
>  
>  			rq = NULL;
>  			break;
> -		} else if (ret == BLKPREP_KILL) {
> +		} else if (ret == BLKPREP_KILL || ret == BLKPREP_INVALID) {
> +			int err = ret == BLKPREP_INVALID ? -EREMOTEIO : -EIO;
> +
>  			rq->cmd_flags |= REQ_QUIET;
>  			/*
>  			 * Mark this request as started so we don't trigger
>  			 * any debug logic in the end I/O path.
>  			 */
>  			blk_start_request(rq);
> -			__blk_end_request_all(rq, -EIO);
> +			__blk_end_request_all(rq, err);
>  		} else {
>  			printk(KERN_ERR "%s: bad return=%d\n", __func__, ret);
>  			break;
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index ec163d08f6c3..6e841c6da632 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -761,7 +761,7 @@ static int sd_setup_discard_cmnd(struct scsi_cmnd *cmd)
>  		break;
>  
>  	default:
> -		ret = BLKPREP_KILL;
> +		ret = BLKPREP_INVALID;
>  		goto out;
>  	}
>  
> @@ -839,7 +839,7 @@ static int sd_setup_write_same_cmnd(struct scsi_cmnd *cmd)
>  	int ret;
>  
>  	if (sdkp->device->no_write_same)
> -		return BLKPREP_KILL;
> +		return BLKPREP_INVALID;
>  
>  	BUG_ON(bio_offset(bio) || bio_iovec(bio).bv_len != sdp->sector_size);
>  
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index c70e3588a48c..e990d181625a 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -680,9 +680,12 @@ static inline bool blk_write_same_mergeable(struct bio *a, struct bio *b)
>  /*
>   * q->prep_rq_fn return values
>   */
> -#define BLKPREP_OK		0	/* serve it */
> -#define BLKPREP_KILL		1	/* fatal error, kill */
> -#define BLKPREP_DEFER		2	/* leave on queue */
> +enum {
> +	BLKPREP_OK,		/* serve it */
> +	BLKPREP_KILL,		/* fatal error, kill, return -EIO */
> +	BLKPREP_INVALID,	/* invalid command, kill, return -EREMOTEIO */
> +	BLKPREP_DEFER,		/* leave on queue */
> +};
>  
>  extern unsigned long blk_max_low_pfn, blk_max_pfn;
>  
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] block/sd: Return -EREMOTEIO when WRITE SAME and DISCARD are disabled
  2016-02-04  6:48       ` [PATCH] block/sd: Return -EREMOTEIO when WRITE SAME and DISCARD are disabled Martin K. Petersen
  2016-02-04  9:16         ` jiangyiwen
  2016-02-04 14:14         ` Hannes Reinecke
@ 2016-02-04 16:36         ` Ewan Milne
  2016-02-05  3:16           ` Martin K. Petersen
  2 siblings, 1 reply; 15+ messages in thread
From: Ewan Milne @ 2016-02-04 16:36 UTC (permalink / raw)
  To: Martin K. Petersen; +Cc: linux-scsi, linux-block, jiangyiwen, snitzer

See below.

On Thu, 2016-02-04 at 01:48 -0500, Martin K. Petersen wrote:
> When a storage device rejects a WRITE SAME command we will disable write
> same functionality for the device and return -EREMOTEIO to the block
> layer. -EREMOTEIO will in turn prevent DM from retrying the I/O and/or
> failing the path.
> 
> Yiwen Jiang discovered a small race where WRITE SAME requests issued
> simultaneously would cause -EIO to be returned. This happened because
> any requests being prepared after WRITE SAME had been disabled for the
> device caused us to return BLKPREP_KILL. The latter caused the block
> layer to return -EIO upon completion.
> 
> To overcome this we introduce BLKPREP_INVALID which indicates that this
> is an invalid request for the device. blk_peek_request() is modified to
> return -EREMOTEIO in that case.
> 
> Reported-by: Yiwen Jiang <jiangyiwen@huawei.com>
> Suggested-by: Mike Snitzer <snitzer@redhat.com>
> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
> 
> ---
> 
> I contemplated making blk_peek_request() use rq->errors to decide what
> to return up the stack. However, I cringed at mixing errnos and SCSI
> midlayer status information in the same location.
> ---
>  block/blk-core.c       | 6 ++++--
>  drivers/scsi/sd.c      | 4 ++--
>  include/linux/blkdev.h | 9 ++++++---
>  3 files changed, 12 insertions(+), 7 deletions(-)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 33e2f62d5062..ee833af2f892 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -2446,14 +2446,16 @@ struct request *blk_peek_request(struct request_queue *q)
>  
>  			rq = NULL;
>  			break;
> -		} else if (ret == BLKPREP_KILL) {
> +		} else if (ret == BLKPREP_KILL || ret == BLKPREP_INVALID) {
> +			int err = ret == BLKPREP_INVALID ? -EREMOTEIO : -EIO;

Maybe it's unnecessary, but I would put parenthesis around (ret == ... -EIO); for clarity.

> +
>  			rq->cmd_flags |= REQ_QUIET;
>  			/*
>  			 * Mark this request as started so we don't trigger
>  			 * any debug logic in the end I/O path.
>  			 */
>  			blk_start_request(rq);
> -			__blk_end_request_all(rq, -EIO);
> +			__blk_end_request_all(rq, err);
>  		} else {
>  			printk(KERN_ERR "%s: bad return=%d\n", __func__, ret);
>  			break;
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index ec163d08f6c3..6e841c6da632 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -761,7 +761,7 @@ static int sd_setup_discard_cmnd(struct scsi_cmnd *cmd)
>  		break;
>  
>  	default:
> -		ret = BLKPREP_KILL;
> +		ret = BLKPREP_INVALID;
>  		goto out;
>  	}
>  
> @@ -839,7 +839,7 @@ static int sd_setup_write_same_cmnd(struct scsi_cmnd *cmd)
>  	int ret;
>  
>  	if (sdkp->device->no_write_same)
> -		return BLKPREP_KILL;
> +		return BLKPREP_INVALID;
>  
>  	BUG_ON(bio_offset(bio) || bio_iovec(bio).bv_len != sdp->sector_size);
>  
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index c70e3588a48c..e990d181625a 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -680,9 +680,12 @@ static inline bool blk_write_same_mergeable(struct bio *a, struct bio *b)
>  /*
>   * q->prep_rq_fn return values
>   */
> -#define BLKPREP_OK		0	/* serve it */
> -#define BLKPREP_KILL		1	/* fatal error, kill */
> -#define BLKPREP_DEFER		2	/* leave on queue */
> +enum {
> +	BLKPREP_OK,		/* serve it */
> +	BLKPREP_KILL,		/* fatal error, kill, return -EIO */
> +	BLKPREP_INVALID,	/* invalid command, kill, return -EREMOTEIO */
> +	BLKPREP_DEFER,		/* leave on queue */
> +};

I would prefer that additional enum/constant values be added to the end
of the series, here you are changing the ordinal value of BLKPREP_DEFER
from 2 to 3.  Could you swap these?

>  extern unsigned long blk_max_low_pfn, blk_max_pfn;
>  

Reviewed-by: Ewan D. Milne <emilne@redhat.com>



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] block/sd: Return -EREMOTEIO when WRITE SAME and DISCARD are disabled
  2016-02-04 16:36         ` Ewan Milne
@ 2016-02-05  3:16           ` Martin K. Petersen
  0 siblings, 0 replies; 15+ messages in thread
From: Martin K. Petersen @ 2016-02-05  3:16 UTC (permalink / raw)
  To: Ewan Milne
  Cc: Martin K. Petersen, linux-scsi, linux-block, jiangyiwen, snitzer

>>>>> "Ewan" == Ewan Milne <emilne@redhat.com> writes:

Ewan> Maybe it's unnecessary, but I would put parenthesis around (ret ==
Ewan> ... -EIO); for clarity.

Sure.

Ewan> I would prefer that additional enum/constant values be added to
Ewan> the end of the series, here you are changing the ordinal value of
Ewan> BLKPREP_DEFER from 2 to 3.  Could you swap these?

I felt that KILL and INVALID were closely related. That's why I kept
them together. But it doesn't matter to me.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: dm-mpath: fix a tiny case which can cause an infinite loop
  2016-02-04  4:25     ` Mike Snitzer
  2016-02-04  5:03       ` Martin K. Petersen
  2016-02-04  6:48       ` [PATCH] block/sd: Return -EREMOTEIO when WRITE SAME and DISCARD are disabled Martin K. Petersen
@ 2016-02-04  8:25       ` jiangyiwen
  2 siblings, 0 replies; 15+ messages in thread
From: jiangyiwen @ 2016-02-04  8:25 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: dm-devel, linux-scsi, Christoph Hellwig, James Bottomley,
	xuejiufei, Qijiang (Joseph, Euler), martin.petersen

On 2016/2/4 12:25, Mike Snitzer wrote:
> On Wed, Feb 03 2016 at 10:49pm -0500,
> jiangyiwen <jiangyiwen@huawei.com> wrote:
> 
>> On 2016/2/4 11:24, Mike Snitzer wrote:
>>> On Wed, Feb 03 2016 at  9:08pm -0500,
>>> jiangyiwen <jiangyiwen@huawei.com> wrote:
>>>
>>>> When two processes submit WRTIE SAME bio simultaneously and
>>>> first IO return failed because of INVALID FIELD IN CDB, and
>>>> then second IO can enter into an infinite loop.
>>>> The problem can be described as follows:
>>>>
>>>> process 1                              process 2
>>>> submit_bio(REQ_WRITE_SAME) and
>>>> wait io completion
>>>>                                        begin submit_bio(REQ_WRITE_SAME)
>>>>                                        -> blk_queue_bio()
>>>>                                          -> dm_request_fn()
>>>>                                            -> map_request()
>>>>                                              -> scsi_request_fn()
>>>>                                                -> blk_peek_request()
>>>>                                                  -> scsi_prep_fn()
>>>> In the moment, IO return and
>>>> sense_key with illegal_request,
>>>> sense_code with 0x24(INVALID FIELD IN CDB).
>>>> In this way it will set no_write_same = 1
>>>> in sd_done() and disable write same
>>>>                                        In sd_setup_write_same_cmnd()
>>>>                                        when find (no_write_same == 1)
>>>>                                        will return BLKPREP_KILL,
>>>>                                        and then in blk_peek_request()
>>>>                                        set error to EIO.
>>>>                                        However, in multipath_end_io()
>>>>                                        when find error is EIO it will
>>>>                                        fail path and retry even if
>>>>                                        device doesn't support WRITE SAME.
>>>>
>>>> In this situation, when process of multipathd reinstate
>>>> path by using test_unit_ready periodically, it will cause
>>>> second WRITE SAME IO enters into an infinite loop and
>>>> loop never terminates.
>>>>
>>>> In do_end_io(), when finding device doesn't support WRITE SAME and
>>>> request is WRITE SAME, return EOPNOTSUPP to applications.
>>>>
>>>> Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com>
>>>> Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
>>>> Reviewed-by: xuejiufei <xuejiufei@huawei.com>
>>>> ---
>>>>  drivers/md/dm-mpath.c | 5 +++++
>>>>  1 file changed, 5 insertions(+)
>>>>
>>>> diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
>>>> index cfa29f5..ad1602a 100644
>>>> --- a/drivers/md/dm-mpath.c
>>>> +++ b/drivers/md/dm-mpath.c
>>>> @@ -1269,6 +1269,11 @@ static int do_end_io(struct multipath *m, struct request *clone,
>>>>  	if (noretry_error(error))
>>>>  		return error;
>>>>
>>>> +	/* do not retry in case of WRITE SAME not supporting */
>>>> +	if ((clone->cmd_flags & REQ_WRITE_SAME) &&
>>>> +			!clone->q->limits.max_write_same_sectors)
>>>> +		return -EOPNOTSUPP;
>>>> +
>>>>  	if (mpio->pgpath)
>>>>  		fail_path(mpio->pgpath);
>>>>
>>>
>>> Did you test this patch?  Looks like it isn't going to make a
>>> difference.  'error' will be -EREMOTEIO, which will be caught by
>>> noretry_error().  So we'll never go on to run your new code.
>>>
>>> Please see commit 7eee4ae2db ("dm: disable WRITE SAME if it fails").
>>>
>>> I think DM already handles this case properly.  The only thing is it'll
>>> return -EREMOTEIO (rather than -EOPNOTSUPP) further up the stack.
>>>
>>> .
>>>
>> Hi Mike,
>> Yes, I have already test in version linux-3.8, and I also have already
>> carefully checked latest kernel code, and find that it also exists this
>> problem. I know about this commit 7eee4ae2db ("dm: disable WRITE SAME
>> if it fails"), it only can solve first IO situation which I mentioned
>> above, in other words, when WRITE SAME IO truly send to device, it
>> actually return -EREMOTEIO if device doesn't support WRITE SAME.
>> But in above situation which issues two WRITE SAME IO simultaneously,
>> second IO will not truly send to device instead of returning
>> BLKPREP_KILL in sd_setup_write_same_cmnd() because find
>> (no_write_same == 1) which no_write_same is set when first IO returned,
>> and then in blk_peek_request() will return EIO to MD/DM which caused
>> the problem above mentioned.
>>
>> I have described detailed process of problem, you will find actually
>> it is a problem when reviewed carefully.
> 
> OK, so -EIO is getting returned.  That shouldn't happen.  -EIO is the
> generic catch-all error that we're going to retry.  We have
> differentiated IO errors in SCSI that get returned up the IO stack
> (e.g. -EREMOTEIO, etc).
> 
> The SCSI, or block layer, should return a non-retryable error for this
> case.  But we only have the differentiated IO errors for SCSI cmds that
> are issued, so it seems we still need to train SCSI (and block by
> association/dependency) to return permanent errors that are identified
> during request preparation.
> 
> But it may be that we need to widen the WRITE_SAME error handling code
> in DM to check for -EREMOTEIO or -EOPNOTSUPP.  But I'm really not in
> favor of special-casing -EIO for WRITE_SAME, we need to be sprinkling
> less special-case code to make up for lack of information for lower
> layers.
> 
> .
> 
I totally agree with you. SCSI or block layer should not return EIO
to MD/DM, but the return code associated with many other process,
it will influence more if I modify blk_peek_request. So I use this
method to avoid it, and I also expect martin can get a better idea.

In addition, with increasing number of various storage, I worried
whether multipath still can occur an infinite loop as I mentioned
above someday. Thus I think whether multipath should have a
timeout mechanism to solve this type of problem which will cause a
ping-pong effect, for instance, return error to applications when
try a certain counts or time in multipath. This is my idea, I hope
it will have more people to discuss it.



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2016-02-18  0:22 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-02-04  2:08 [dm-devel] [PATCH] dm-mpath: fix a tiny case which can cause an infinite loop jiangyiwen
2016-02-04  3:24 ` Mike Snitzer
2016-02-04  3:49   ` jiangyiwen
2016-02-04  4:25     ` Mike Snitzer
2016-02-04  5:03       ` Martin K. Petersen
2016-02-04  6:48       ` [PATCH] block/sd: Return -EREMOTEIO when WRITE SAME and DISCARD are disabled Martin K. Petersen
2016-02-04  9:16         ` jiangyiwen
2016-02-05  3:13           ` Martin K. Petersen
2016-02-05  3:39             ` jiangyiwen
2016-02-16 12:14             ` jiangyiwen
2016-02-18  0:22               ` Martin K. Petersen
2016-02-04 14:14         ` Hannes Reinecke
2016-02-04 16:36         ` Ewan Milne
2016-02-05  3:16           ` Martin K. Petersen
2016-02-04  8:25       ` dm-mpath: fix a tiny case which can cause an infinite loop jiangyiwen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).