[PATCH] nvme: fix handling single range discard request

linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] nvme: fix handling single range discard request
@ 2023-03-03 23:13 Ming Lei
  2023-03-04  8:00 ` Hannes Reinecke
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Ming Lei @ 2023-03-03 23:13 UTC (permalink / raw)
  To: Christoph Hellwig, Sagi Grimberg, linux-nvme; +Cc: linux-block, Ming Lei

When investigating one customer report on warning in nvme_setup_discard,
we observed the controller(nvme/tcp) actually exposes
queue_max_discard_segments(req->q) == 1.

Obviously the current code can't handle this situation, since contiguity
merge like normal RW request is taken.

Fix the issue by building range from request sector/nr_sectors directly.

Fixes: b35ba01ea697 ("nvme: support ranged discard requests")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/nvme/host/core.c | 28 +++++++++++++++++++---------
 1 file changed, 19 insertions(+), 9 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index c2730b116dc6..d4be525f8100 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -781,16 +781,26 @@ static blk_status_t nvme_setup_discard(struct nvme_ns *ns, struct request *req,
 		range = page_address(ns->ctrl->discard_page);
 	}
 
-	__rq_for_each_bio(bio, req) {
-		u64 slba = nvme_sect_to_lba(ns, bio->bi_iter.bi_sector);
-		u32 nlb = bio->bi_iter.bi_size >> ns->lba_shift;
-
-		if (n < segments) {
-			range[n].cattr = cpu_to_le32(0);
-			range[n].nlb = cpu_to_le32(nlb);
-			range[n].slba = cpu_to_le64(slba);
+	if (queue_max_discard_segments(req->q) == 1) {
+		u64 slba = nvme_sect_to_lba(ns, blk_rq_pos(req));
+		u32 nlb = blk_rq_sectors(req) >> (ns->lba_shift - 9);
+
+		range[0].cattr = cpu_to_le32(0);
+		range[0].nlb = cpu_to_le32(nlb);
+		range[0].slba = cpu_to_le64(slba);
+		n = 1;
+	} else {
+		__rq_for_each_bio(bio, req) {
+			u64 slba = nvme_sect_to_lba(ns, bio->bi_iter.bi_sector);
+			u32 nlb = bio->bi_iter.bi_size >> ns->lba_shift;
+
+			if (n < segments) {
+				range[n].cattr = cpu_to_le32(0);
+				range[n].nlb = cpu_to_le32(nlb);
+				range[n].slba = cpu_to_le64(slba);
+			}
+			n++;
 		}
-		n++;
 	}
 
 	if (WARN_ON_ONCE(n != segments)) {
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] nvme: fix handling single range discard request
  2023-03-03 23:13 [PATCH] nvme: fix handling single range discard request Ming Lei
@ 2023-03-04  8:00 ` Hannes Reinecke
  2023-03-04 10:22   ` Ming Lei
  2023-03-06 14:21 ` Sagi Grimberg
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: Hannes Reinecke @ 2023-03-04  8:00 UTC (permalink / raw)
  To: Ming Lei, Christoph Hellwig, Sagi Grimberg, linux-nvme; +Cc: linux-block

On 3/4/23 00:13, Ming Lei wrote:
> When investigating one customer report on warning in nvme_setup_discard,
> we observed the controller(nvme/tcp) actually exposes
> queue_max_discard_segments(req->q) == 1.
> 
> Obviously the current code can't handle this situation, since contiguity
> merge like normal RW request is taken.
> 
> Fix the issue by building range from request sector/nr_sectors directly.
> 
> Fixes: b35ba01ea697 ("nvme: support ranged discard requests")
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>   drivers/nvme/host/core.c | 28 +++++++++++++++++++---------
>   1 file changed, 19 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index c2730b116dc6..d4be525f8100 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -781,16 +781,26 @@ static blk_status_t nvme_setup_discard(struct nvme_ns *ns, struct request *req,
>   		range = page_address(ns->ctrl->discard_page);
>   	}
>   
> -	__rq_for_each_bio(bio, req) {
> -		u64 slba = nvme_sect_to_lba(ns, bio->bi_iter.bi_sector);
> -		u32 nlb = bio->bi_iter.bi_size >> ns->lba_shift;
> -
> -		if (n < segments) {
> -			range[n].cattr = cpu_to_le32(0);
> -			range[n].nlb = cpu_to_le32(nlb);
> -			range[n].slba = cpu_to_le64(slba);
> +	if (queue_max_discard_segments(req->q) == 1) {
> +		u64 slba = nvme_sect_to_lba(ns, blk_rq_pos(req));
> +		u32 nlb = blk_rq_sectors(req) >> (ns->lba_shift - 9);
> +
> +		range[0].cattr = cpu_to_le32(0);
> +		range[0].nlb = cpu_to_le32(nlb);
> +		range[0].slba = cpu_to_le64(slba);
> +		n = 1;
> +	} else { > +		__rq_for_each_bio(bio, req) {
> +			u64 slba = nvme_sect_to_lba(ns, bio->bi_iter.bi_sector);
> +			u32 nlb = bio->bi_iter.bi_size >> ns->lba_shift;
> +
> +			if (n < segments) {
> +				range[n].cattr = cpu_to_le32(0);
> +				range[n].nlb = cpu_to_le32(nlb);
> +				range[n].slba = cpu_to_le64(slba);
> +			}
> +			n++;
>   		}
> -		n++;
>   	}
>   
>   	if (WARN_ON_ONCE(n != segments)) {
Now _that_ is odd.
Looks like 'req' is not formatted according to the 'max_discard_sectors' 
setting.
But if that's the case, then this 'fix' would fail whenever 
'max_discard_sectors' < 'max_hw_sectors', right?
Shouldn't we rather modify the merge algorithm to check for 
max_discard_sectors for DISCARD requests, such that we never _have_
mis-matched requests and this patch would be pointless?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] nvme: fix handling single range discard request
  2023-03-04  8:00 ` Hannes Reinecke
@ 2023-03-04 10:22   ` Ming Lei
  2023-03-04 11:14     ` Hannes Reinecke
  0 siblings, 1 reply; 15+ messages in thread
From: Ming Lei @ 2023-03-04 10:22 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Christoph Hellwig, Sagi Grimberg, linux-nvme, linux-block,
	ming.lei

On Sat, Mar 04, 2023 at 09:00:28AM +0100, Hannes Reinecke wrote:
> On 3/4/23 00:13, Ming Lei wrote:
> > When investigating one customer report on warning in nvme_setup_discard,
> > we observed the controller(nvme/tcp) actually exposes
> > queue_max_discard_segments(req->q) == 1.
> > 
> > Obviously the current code can't handle this situation, since contiguity
> > merge like normal RW request is taken.
> > 
> > Fix the issue by building range from request sector/nr_sectors directly.
> > 
> > Fixes: b35ba01ea697 ("nvme: support ranged discard requests")
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> >   drivers/nvme/host/core.c | 28 +++++++++++++++++++---------
> >   1 file changed, 19 insertions(+), 9 deletions(-)
> > 
> > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> > index c2730b116dc6..d4be525f8100 100644
> > --- a/drivers/nvme/host/core.c
> > +++ b/drivers/nvme/host/core.c
> > @@ -781,16 +781,26 @@ static blk_status_t nvme_setup_discard(struct nvme_ns *ns, struct request *req,
> >   		range = page_address(ns->ctrl->discard_page);
> >   	}
> > -	__rq_for_each_bio(bio, req) {
> > -		u64 slba = nvme_sect_to_lba(ns, bio->bi_iter.bi_sector);
> > -		u32 nlb = bio->bi_iter.bi_size >> ns->lba_shift;
> > -
> > -		if (n < segments) {
> > -			range[n].cattr = cpu_to_le32(0);
> > -			range[n].nlb = cpu_to_le32(nlb);
> > -			range[n].slba = cpu_to_le64(slba);
> > +	if (queue_max_discard_segments(req->q) == 1) {
> > +		u64 slba = nvme_sect_to_lba(ns, blk_rq_pos(req));
> > +		u32 nlb = blk_rq_sectors(req) >> (ns->lba_shift - 9);
> > +
> > +		range[0].cattr = cpu_to_le32(0);
> > +		range[0].nlb = cpu_to_le32(nlb);
> > +		range[0].slba = cpu_to_le64(slba);
> > +		n = 1;
> > +	} else { > +		__rq_for_each_bio(bio, req) {
> > +			u64 slba = nvme_sect_to_lba(ns, bio->bi_iter.bi_sector);
> > +			u32 nlb = bio->bi_iter.bi_size >> ns->lba_shift;
> > +
> > +			if (n < segments) {
> > +				range[n].cattr = cpu_to_le32(0);
> > +				range[n].nlb = cpu_to_le32(nlb);
> > +				range[n].slba = cpu_to_le64(slba);
> > +			}
> > +			n++;
> >   		}
> > -		n++;
> >   	}
> >   	if (WARN_ON_ONCE(n != segments)) {
> Now _that_ is odd.
> Looks like 'req' is not formatted according to the 'max_discard_sectors'
> setting.
> But if that's the case, then this 'fix' would fail whenever
> 'max_discard_sectors' < 'max_hw_sectors', right?

No, it isn't the case.

> Shouldn't we rather modify the merge algorithm to check for
> max_discard_sectors for DISCARD requests, such that we never _have_
> mis-matched requests and this patch would be pointless?

But it is related with discard merge.

If queue_max_discard_segments() is 1, block layer merges discard
request/bios just like normal RW IO.

However, if queue_max_discard_segments() is > 1, block layer simply
'merges' all bios into one request, no matter if the LBA is adjacent
or not, and treat each bio as one discard segment, that is called
multi range discard too.

That is the reason why we have to treat queue_max_discard_segments() == 1
specially, and you can see same handling in virtio_blk.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] nvme: fix handling single range discard request
  2023-03-04 10:22   ` Ming Lei
@ 2023-03-04 11:14     ` Hannes Reinecke
  2023-03-04 12:02       ` Ming Lei
  0 siblings, 1 reply; 15+ messages in thread
From: Hannes Reinecke @ 2023-03-04 11:14 UTC (permalink / raw)
  To: Ming Lei; +Cc: Christoph Hellwig, Sagi Grimberg, linux-nvme, linux-block

On 3/4/23 11:22, Ming Lei wrote:
> On Sat, Mar 04, 2023 at 09:00:28AM +0100, Hannes Reinecke wrote:
>> On 3/4/23 00:13, Ming Lei wrote:
>>> When investigating one customer report on warning in nvme_setup_discard,
>>> we observed the controller(nvme/tcp) actually exposes
>>> queue_max_discard_segments(req->q) == 1.
>>>
>>> Obviously the current code can't handle this situation, since contiguity
>>> merge like normal RW request is taken.
>>>
>>> Fix the issue by building range from request sector/nr_sectors directly.
>>>
>>> Fixes: b35ba01ea697 ("nvme: support ranged discard requests")
>>> Signed-off-by: Ming Lei <ming.lei@redhat.com>
>>> ---
>>>    drivers/nvme/host/core.c | 28 +++++++++++++++++++---------
>>>    1 file changed, 19 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>>> index c2730b116dc6..d4be525f8100 100644
>>> --- a/drivers/nvme/host/core.c
>>> +++ b/drivers/nvme/host/core.c
>>> @@ -781,16 +781,26 @@ static blk_status_t nvme_setup_discard(struct nvme_ns *ns, struct request *req,
>>>    		range = page_address(ns->ctrl->discard_page);
>>>    	}
>>> -	__rq_for_each_bio(bio, req) {
>>> -		u64 slba = nvme_sect_to_lba(ns, bio->bi_iter.bi_sector);
>>> -		u32 nlb = bio->bi_iter.bi_size >> ns->lba_shift;
>>> -
>>> -		if (n < segments) {
>>> -			range[n].cattr = cpu_to_le32(0);
>>> -			range[n].nlb = cpu_to_le32(nlb);
>>> -			range[n].slba = cpu_to_le64(slba);
>>> +	if (queue_max_discard_segments(req->q) == 1) {
>>> +		u64 slba = nvme_sect_to_lba(ns, blk_rq_pos(req));
>>> +		u32 nlb = blk_rq_sectors(req) >> (ns->lba_shift - 9);
>>> +
>>> +		range[0].cattr = cpu_to_le32(0);
>>> +		range[0].nlb = cpu_to_le32(nlb);
>>> +		range[0].slba = cpu_to_le64(slba);
>>> +		n = 1;
>>> +	} else { > +		__rq_for_each_bio(bio, req) {
>>> +			u64 slba = nvme_sect_to_lba(ns, bio->bi_iter.bi_sector);
>>> +			u32 nlb = bio->bi_iter.bi_size >> ns->lba_shift;
>>> +
>>> +			if (n < segments) {
>>> +				range[n].cattr = cpu_to_le32(0);
>>> +				range[n].nlb = cpu_to_le32(nlb);
>>> +				range[n].slba = cpu_to_le64(slba);
>>> +			}
>>> +			n++;
>>>    		}
>>> -		n++;
>>>    	}
>>>    	if (WARN_ON_ONCE(n != segments)) {
>> Now _that_ is odd.
>> Looks like 'req' is not formatted according to the 'max_discard_sectors'
>> setting.
>> But if that's the case, then this 'fix' would fail whenever
>> 'max_discard_sectors' < 'max_hw_sectors', right?
> 
> No, it isn't the case.
> 
>> Shouldn't we rather modify the merge algorithm to check for
>> max_discard_sectors for DISCARD requests, such that we never _have_
>> mis-matched requests and this patch would be pointless?
> 
> But it is related with discard merge.
> 
> If queue_max_discard_segments() is 1, block layer merges discard
> request/bios just like normal RW IO.
> 
> However, if queue_max_discard_segments() is > 1, block layer simply
> 'merges' all bios into one request, no matter if the LBA is adjacent
> or not, and treat each bio as one discard segment, that is called
> multi range discard too.
> 
But wouldn't the number of bios be subject to 
'queue_max_discard_segment', too?
What guarantees we're not overflowing that for multi-segment discard merge?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] nvme: fix handling single range discard request
  2023-03-04 11:14     ` Hannes Reinecke
@ 2023-03-04 12:02       ` Ming Lei
  0 siblings, 0 replies; 15+ messages in thread
From: Ming Lei @ 2023-03-04 12:02 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: Christoph Hellwig, Sagi Grimberg, linux-nvme, linux-block

On Sat, Mar 04, 2023 at 12:14:30PM +0100, Hannes Reinecke wrote:
> On 3/4/23 11:22, Ming Lei wrote:
> > On Sat, Mar 04, 2023 at 09:00:28AM +0100, Hannes Reinecke wrote:
> > > On 3/4/23 00:13, Ming Lei wrote:
> > > > When investigating one customer report on warning in nvme_setup_discard,
> > > > we observed the controller(nvme/tcp) actually exposes
> > > > queue_max_discard_segments(req->q) == 1.
> > > > 
> > > > Obviously the current code can't handle this situation, since contiguity
> > > > merge like normal RW request is taken.
> > > > 
> > > > Fix the issue by building range from request sector/nr_sectors directly.
> > > > 
> > > > Fixes: b35ba01ea697 ("nvme: support ranged discard requests")
> > > > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > > > ---
> > > >    drivers/nvme/host/core.c | 28 +++++++++++++++++++---------
> > > >    1 file changed, 19 insertions(+), 9 deletions(-)
> > > > 
> > > > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> > > > index c2730b116dc6..d4be525f8100 100644
> > > > --- a/drivers/nvme/host/core.c
> > > > +++ b/drivers/nvme/host/core.c
> > > > @@ -781,16 +781,26 @@ static blk_status_t nvme_setup_discard(struct nvme_ns *ns, struct request *req,
> > > >    		range = page_address(ns->ctrl->discard_page);
> > > >    	}
> > > > -	__rq_for_each_bio(bio, req) {
> > > > -		u64 slba = nvme_sect_to_lba(ns, bio->bi_iter.bi_sector);
> > > > -		u32 nlb = bio->bi_iter.bi_size >> ns->lba_shift;
> > > > -
> > > > -		if (n < segments) {
> > > > -			range[n].cattr = cpu_to_le32(0);
> > > > -			range[n].nlb = cpu_to_le32(nlb);
> > > > -			range[n].slba = cpu_to_le64(slba);
> > > > +	if (queue_max_discard_segments(req->q) == 1) {
> > > > +		u64 slba = nvme_sect_to_lba(ns, blk_rq_pos(req));
> > > > +		u32 nlb = blk_rq_sectors(req) >> (ns->lba_shift - 9);
> > > > +
> > > > +		range[0].cattr = cpu_to_le32(0);
> > > > +		range[0].nlb = cpu_to_le32(nlb);
> > > > +		range[0].slba = cpu_to_le64(slba);
> > > > +		n = 1;
> > > > +	} else { > +		__rq_for_each_bio(bio, req) {
> > > > +			u64 slba = nvme_sect_to_lba(ns, bio->bi_iter.bi_sector);
> > > > +			u32 nlb = bio->bi_iter.bi_size >> ns->lba_shift;
> > > > +
> > > > +			if (n < segments) {
> > > > +				range[n].cattr = cpu_to_le32(0);
> > > > +				range[n].nlb = cpu_to_le32(nlb);
> > > > +				range[n].slba = cpu_to_le64(slba);
> > > > +			}
> > > > +			n++;
> > > >    		}
> > > > -		n++;
> > > >    	}
> > > >    	if (WARN_ON_ONCE(n != segments)) {
> > > Now _that_ is odd.
> > > Looks like 'req' is not formatted according to the 'max_discard_sectors'
> > > setting.
> > > But if that's the case, then this 'fix' would fail whenever
> > > 'max_discard_sectors' < 'max_hw_sectors', right?
> > 
> > No, it isn't the case.
> > 
> > > Shouldn't we rather modify the merge algorithm to check for
> > > max_discard_sectors for DISCARD requests, such that we never _have_
> > > mis-matched requests and this patch would be pointless?
> > 
> > But it is related with discard merge.
> > 
> > If queue_max_discard_segments() is 1, block layer merges discard
> > request/bios just like normal RW IO.
> > 
> > However, if queue_max_discard_segments() is > 1, block layer simply
> > 'merges' all bios into one request, no matter if the LBA is adjacent
> > or not, and treat each bio as one discard segment, that is called
> > multi range discard too.
> > 
> But wouldn't the number of bios be subject to 'queue_max_discard_segment',
> too?
> What guarantees we're not overflowing that for multi-segment discard merge?

block layer merge code makes sure that the max discard segment limit
is respected, please see:

	req_attempt_discard_merge()
	bio_attempt_discard_merge()
	blk_recalc_rq_segments()

Thanks,
Ming


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] nvme: fix handling single range discard request
  2023-03-03 23:13 [PATCH] nvme: fix handling single range discard request Ming Lei
  2023-03-04  8:00 ` Hannes Reinecke
@ 2023-03-06 14:21 ` Sagi Grimberg
  2023-03-06 21:49   ` Ming Lei
  2023-03-08  5:38 ` Chaitanya Kulkarni
  2023-03-09  9:41 ` Christoph Hellwig
  3 siblings, 1 reply; 15+ messages in thread
From: Sagi Grimberg @ 2023-03-06 14:21 UTC (permalink / raw)
  To: Ming Lei, Christoph Hellwig, linux-nvme; +Cc: linux-block



On 3/4/23 01:13, Ming Lei wrote:
> When investigating one customer report on warning in nvme_setup_discard,
> we observed the controller(nvme/tcp) actually exposes
> queue_max_discard_segments(req->q) == 1.
> 
> Obviously the current code can't handle this situation, since contiguity
> merge like normal RW request is taken.
> 
> Fix the issue by building range from request sector/nr_sectors directly.
> 
> Fixes: b35ba01ea697 ("nvme: support ranged discard requests")
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>   drivers/nvme/host/core.c | 28 +++++++++++++++++++---------
>   1 file changed, 19 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index c2730b116dc6..d4be525f8100 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -781,16 +781,26 @@ static blk_status_t nvme_setup_discard(struct nvme_ns *ns, struct request *req,
>   		range = page_address(ns->ctrl->discard_page);
>   	}
>   
> -	__rq_for_each_bio(bio, req) {
> -		u64 slba = nvme_sect_to_lba(ns, bio->bi_iter.bi_sector);
> -		u32 nlb = bio->bi_iter.bi_size >> ns->lba_shift;
> -
> -		if (n < segments) {
> -			range[n].cattr = cpu_to_le32(0);
> -			range[n].nlb = cpu_to_le32(nlb);
> -			range[n].slba = cpu_to_le64(slba);
> +	if (queue_max_discard_segments(req->q) == 1) {
> +		u64 slba = nvme_sect_to_lba(ns, blk_rq_pos(req));
> +		u32 nlb = blk_rq_sectors(req) >> (ns->lba_shift - 9);
> +
> +		range[0].cattr = cpu_to_le32(0);
> +		range[0].nlb = cpu_to_le32(nlb);
> +		range[0].slba = cpu_to_le64(slba);
> +		n = 1;
> +	} else {
> +		__rq_for_each_bio(bio, req) {
> +			u64 slba = nvme_sect_to_lba(ns, bio->bi_iter.bi_sector);
> +			u32 nlb = bio->bi_iter.bi_size >> ns->lba_shift;
> +
> +			if (n < segments) {
> +				range[n].cattr = cpu_to_le32(0);
> +				range[n].nlb = cpu_to_le32(nlb);
> +				range[n].slba = cpu_to_le64(slba);
> +			}
> +			n++;
>   		}
> -		n++;
>   	}
>   
>   	if (WARN_ON_ONCE(n != segments)) {


Maybe just set segments to min(blk_rq_nr_discard_segments(req), 
queue_max_discard_segments(req->q)) and let the existing code do
its thing?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] nvme: fix handling single range discard request
  2023-03-06 14:21 ` Sagi Grimberg
@ 2023-03-06 21:49   ` Ming Lei
  2023-03-07 11:39     ` Sagi Grimberg
  0 siblings, 1 reply; 15+ messages in thread
From: Ming Lei @ 2023-03-06 21:49 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: Christoph Hellwig, linux-nvme, linux-block, ming.lei

On Mon, Mar 06, 2023 at 04:21:08PM +0200, Sagi Grimberg wrote:
> 
> 
> On 3/4/23 01:13, Ming Lei wrote:
> > When investigating one customer report on warning in nvme_setup_discard,
> > we observed the controller(nvme/tcp) actually exposes
> > queue_max_discard_segments(req->q) == 1.
> > 
> > Obviously the current code can't handle this situation, since contiguity
> > merge like normal RW request is taken.
> > 
> > Fix the issue by building range from request sector/nr_sectors directly.
> > 
> > Fixes: b35ba01ea697 ("nvme: support ranged discard requests")
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> >   drivers/nvme/host/core.c | 28 +++++++++++++++++++---------
> >   1 file changed, 19 insertions(+), 9 deletions(-)
> > 
> > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> > index c2730b116dc6..d4be525f8100 100644
> > --- a/drivers/nvme/host/core.c
> > +++ b/drivers/nvme/host/core.c
> > @@ -781,16 +781,26 @@ static blk_status_t nvme_setup_discard(struct nvme_ns *ns, struct request *req,
> >   		range = page_address(ns->ctrl->discard_page);
> >   	}
> > -	__rq_for_each_bio(bio, req) {
> > -		u64 slba = nvme_sect_to_lba(ns, bio->bi_iter.bi_sector);
> > -		u32 nlb = bio->bi_iter.bi_size >> ns->lba_shift;
> > -
> > -		if (n < segments) {
> > -			range[n].cattr = cpu_to_le32(0);
> > -			range[n].nlb = cpu_to_le32(nlb);
> > -			range[n].slba = cpu_to_le64(slba);
> > +	if (queue_max_discard_segments(req->q) == 1) {
> > +		u64 slba = nvme_sect_to_lba(ns, blk_rq_pos(req));
> > +		u32 nlb = blk_rq_sectors(req) >> (ns->lba_shift - 9);
> > +
> > +		range[0].cattr = cpu_to_le32(0);
> > +		range[0].nlb = cpu_to_le32(nlb);
> > +		range[0].slba = cpu_to_le64(slba);
> > +		n = 1;
> > +	} else {
> > +		__rq_for_each_bio(bio, req) {
> > +			u64 slba = nvme_sect_to_lba(ns, bio->bi_iter.bi_sector);
> > +			u32 nlb = bio->bi_iter.bi_size >> ns->lba_shift;
> > +
> > +			if (n < segments) {
> > +				range[n].cattr = cpu_to_le32(0);
> > +				range[n].nlb = cpu_to_le32(nlb);
> > +				range[n].slba = cpu_to_le64(slba);
> > +			}
> > +			n++;
> >   		}
> > -		n++;
> >   	}
> >   	if (WARN_ON_ONCE(n != segments)) {
> 
> 
> Maybe just set segments to min(blk_rq_nr_discard_segments(req),
> queue_max_discard_segments(req->q)) and let the existing code do
> its thing?

What is the existing code for applying min()?

For block layer merge code, it has to cover two kinds of discard merge:

- the traditional single range discard for most of devices
- multi range discard merge for nvme and virtio-blk which takes same
fix

For driver side, it has to do similar handling if both single-range and
multi-range discard are supported:

- each bio is one discard range for multi-range discard
- the whole request(may include more than 1 bios) is the single discard
range for single-range discard


Thanks, 
Ming


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] nvme: fix handling single range discard request
  2023-03-06 21:49   ` Ming Lei
@ 2023-03-07 11:39     ` Sagi Grimberg
  2023-03-07 12:14       ` Ming Lei
  0 siblings, 1 reply; 15+ messages in thread
From: Sagi Grimberg @ 2023-03-07 11:39 UTC (permalink / raw)
  To: Ming Lei; +Cc: Christoph Hellwig, linux-nvme, linux-block



On 3/6/23 23:49, Ming Lei wrote:
> On Mon, Mar 06, 2023 at 04:21:08PM +0200, Sagi Grimberg wrote:
>>
>>
>> On 3/4/23 01:13, Ming Lei wrote:
>>> When investigating one customer report on warning in nvme_setup_discard,
>>> we observed the controller(nvme/tcp) actually exposes
>>> queue_max_discard_segments(req->q) == 1.
>>>
>>> Obviously the current code can't handle this situation, since contiguity
>>> merge like normal RW request is taken.
>>>
>>> Fix the issue by building range from request sector/nr_sectors directly.
>>>
>>> Fixes: b35ba01ea697 ("nvme: support ranged discard requests")
>>> Signed-off-by: Ming Lei <ming.lei@redhat.com>
>>> ---
>>>    drivers/nvme/host/core.c | 28 +++++++++++++++++++---------
>>>    1 file changed, 19 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>>> index c2730b116dc6..d4be525f8100 100644
>>> --- a/drivers/nvme/host/core.c
>>> +++ b/drivers/nvme/host/core.c
>>> @@ -781,16 +781,26 @@ static blk_status_t nvme_setup_discard(struct nvme_ns *ns, struct request *req,
>>>    		range = page_address(ns->ctrl->discard_page);
>>>    	}
>>> -	__rq_for_each_bio(bio, req) {
>>> -		u64 slba = nvme_sect_to_lba(ns, bio->bi_iter.bi_sector);
>>> -		u32 nlb = bio->bi_iter.bi_size >> ns->lba_shift;
>>> -
>>> -		if (n < segments) {
>>> -			range[n].cattr = cpu_to_le32(0);
>>> -			range[n].nlb = cpu_to_le32(nlb);
>>> -			range[n].slba = cpu_to_le64(slba);
>>> +	if (queue_max_discard_segments(req->q) == 1) {
>>> +		u64 slba = nvme_sect_to_lba(ns, blk_rq_pos(req));
>>> +		u32 nlb = blk_rq_sectors(req) >> (ns->lba_shift - 9);
>>> +
>>> +		range[0].cattr = cpu_to_le32(0);
>>> +		range[0].nlb = cpu_to_le32(nlb);
>>> +		range[0].slba = cpu_to_le64(slba);
>>> +		n = 1;
>>> +	} else {
>>> +		__rq_for_each_bio(bio, req) {
>>> +			u64 slba = nvme_sect_to_lba(ns, bio->bi_iter.bi_sector);
>>> +			u32 nlb = bio->bi_iter.bi_size >> ns->lba_shift;
>>> +
>>> +			if (n < segments) {
>>> +				range[n].cattr = cpu_to_le32(0);
>>> +				range[n].nlb = cpu_to_le32(nlb);
>>> +				range[n].slba = cpu_to_le64(slba);
>>> +			}
>>> +			n++;
>>>    		}
>>> -		n++;
>>>    	}
>>>    	if (WARN_ON_ONCE(n != segments)) {
>>
>>
>> Maybe just set segments to min(blk_rq_nr_discard_segments(req),
>> queue_max_discard_segments(req->q)) and let the existing code do
>> its thing?
> 
> What is the existing code for applying min()?

Was referring to this:
--
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 3345f866178e..dbc402587431 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -781,6 +781,7 @@ static blk_status_t nvme_setup_discard(struct 
nvme_ns *ns, struct request *req,
                 range = page_address(ns->ctrl->discard_page);
         }

+       segments = min(segments, queue_max_discard_segments(req->q));
         __rq_for_each_bio(bio, req) {
                 u64 slba = nvme_sect_to_lba(ns, bio->bi_iter.bi_sector);
                 u32 nlb = bio->bi_iter.bi_size >> ns->lba_shift;
--

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] nvme: fix handling single range discard request
  2023-03-07 11:39     ` Sagi Grimberg
@ 2023-03-07 12:14       ` Ming Lei
  2023-03-07 12:31         ` Sagi Grimberg
  0 siblings, 1 reply; 15+ messages in thread
From: Ming Lei @ 2023-03-07 12:14 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: Christoph Hellwig, linux-nvme, linux-block, ming.lei

On Tue, Mar 07, 2023 at 01:39:27PM +0200, Sagi Grimberg wrote:
> 
> 
> On 3/6/23 23:49, Ming Lei wrote:
> > On Mon, Mar 06, 2023 at 04:21:08PM +0200, Sagi Grimberg wrote:
> > > 
> > > 
> > > On 3/4/23 01:13, Ming Lei wrote:
> > > > When investigating one customer report on warning in nvme_setup_discard,
> > > > we observed the controller(nvme/tcp) actually exposes
> > > > queue_max_discard_segments(req->q) == 1.
> > > > 
> > > > Obviously the current code can't handle this situation, since contiguity
> > > > merge like normal RW request is taken.
> > > > 
> > > > Fix the issue by building range from request sector/nr_sectors directly.
> > > > 
> > > > Fixes: b35ba01ea697 ("nvme: support ranged discard requests")
> > > > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > > > ---
> > > >    drivers/nvme/host/core.c | 28 +++++++++++++++++++---------
> > > >    1 file changed, 19 insertions(+), 9 deletions(-)
> > > > 
> > > > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> > > > index c2730b116dc6..d4be525f8100 100644
> > > > --- a/drivers/nvme/host/core.c
> > > > +++ b/drivers/nvme/host/core.c
> > > > @@ -781,16 +781,26 @@ static blk_status_t nvme_setup_discard(struct nvme_ns *ns, struct request *req,
> > > >    		range = page_address(ns->ctrl->discard_page);
> > > >    	}
> > > > -	__rq_for_each_bio(bio, req) {
> > > > -		u64 slba = nvme_sect_to_lba(ns, bio->bi_iter.bi_sector);
> > > > -		u32 nlb = bio->bi_iter.bi_size >> ns->lba_shift;
> > > > -
> > > > -		if (n < segments) {
> > > > -			range[n].cattr = cpu_to_le32(0);
> > > > -			range[n].nlb = cpu_to_le32(nlb);
> > > > -			range[n].slba = cpu_to_le64(slba);
> > > > +	if (queue_max_discard_segments(req->q) == 1) {
> > > > +		u64 slba = nvme_sect_to_lba(ns, blk_rq_pos(req));
> > > > +		u32 nlb = blk_rq_sectors(req) >> (ns->lba_shift - 9);
> > > > +
> > > > +		range[0].cattr = cpu_to_le32(0);
> > > > +		range[0].nlb = cpu_to_le32(nlb);
> > > > +		range[0].slba = cpu_to_le64(slba);
> > > > +		n = 1;
> > > > +	} else {
> > > > +		__rq_for_each_bio(bio, req) {
> > > > +			u64 slba = nvme_sect_to_lba(ns, bio->bi_iter.bi_sector);
> > > > +			u32 nlb = bio->bi_iter.bi_size >> ns->lba_shift;
> > > > +
> > > > +			if (n < segments) {
> > > > +				range[n].cattr = cpu_to_le32(0);
> > > > +				range[n].nlb = cpu_to_le32(nlb);
> > > > +				range[n].slba = cpu_to_le64(slba);
> > > > +			}
> > > > +			n++;
> > > >    		}
> > > > -		n++;
> > > >    	}
> > > >    	if (WARN_ON_ONCE(n != segments)) {
> > > 
> > > 
> > > Maybe just set segments to min(blk_rq_nr_discard_segments(req),
> > > queue_max_discard_segments(req->q)) and let the existing code do
> > > its thing?
> > 
> > What is the existing code for applying min()?
> 
> Was referring to this:
> --
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 3345f866178e..dbc402587431 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -781,6 +781,7 @@ static blk_status_t nvme_setup_discard(struct nvme_ns
> *ns, struct request *req,
>                 range = page_address(ns->ctrl->discard_page);
>         }
> 
> +       segments = min(segments, queue_max_discard_segments(req->q));

That can't work.

In case of queue_max_discard_segments(req->q) == 1, the request still
can have more than one bios since the normal merge is taken for discard
IOs.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] nvme: fix handling single range discard request
  2023-03-07 12:14       ` Ming Lei
@ 2023-03-07 12:31         ` Sagi Grimberg
  2023-03-07 14:24           ` Ming Lei
  2023-03-08  5:36           ` Chaitanya Kulkarni
  0 siblings, 2 replies; 15+ messages in thread
From: Sagi Grimberg @ 2023-03-07 12:31 UTC (permalink / raw)
  To: Ming Lei; +Cc: Christoph Hellwig, linux-nvme, linux-block



On 3/7/23 14:14, Ming Lei wrote:
> On Tue, Mar 07, 2023 at 01:39:27PM +0200, Sagi Grimberg wrote:
>>
>>
>> On 3/6/23 23:49, Ming Lei wrote:
>>> On Mon, Mar 06, 2023 at 04:21:08PM +0200, Sagi Grimberg wrote:
>>>>
>>>>
>>>> On 3/4/23 01:13, Ming Lei wrote:
>>>>> When investigating one customer report on warning in nvme_setup_discard,
>>>>> we observed the controller(nvme/tcp) actually exposes
>>>>> queue_max_discard_segments(req->q) == 1.
>>>>>
>>>>> Obviously the current code can't handle this situation, since contiguity
>>>>> merge like normal RW request is taken.
>>>>>
>>>>> Fix the issue by building range from request sector/nr_sectors directly.
>>>>>
>>>>> Fixes: b35ba01ea697 ("nvme: support ranged discard requests")
>>>>> Signed-off-by: Ming Lei <ming.lei@redhat.com>
>>>>> ---
>>>>>     drivers/nvme/host/core.c | 28 +++++++++++++++++++---------
>>>>>     1 file changed, 19 insertions(+), 9 deletions(-)
>>>>>
>>>>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>>>>> index c2730b116dc6..d4be525f8100 100644
>>>>> --- a/drivers/nvme/host/core.c
>>>>> +++ b/drivers/nvme/host/core.c
>>>>> @@ -781,16 +781,26 @@ static blk_status_t nvme_setup_discard(struct nvme_ns *ns, struct request *req,
>>>>>     		range = page_address(ns->ctrl->discard_page);
>>>>>     	}
>>>>> -	__rq_for_each_bio(bio, req) {
>>>>> -		u64 slba = nvme_sect_to_lba(ns, bio->bi_iter.bi_sector);
>>>>> -		u32 nlb = bio->bi_iter.bi_size >> ns->lba_shift;
>>>>> -
>>>>> -		if (n < segments) {
>>>>> -			range[n].cattr = cpu_to_le32(0);
>>>>> -			range[n].nlb = cpu_to_le32(nlb);
>>>>> -			range[n].slba = cpu_to_le64(slba);
>>>>> +	if (queue_max_discard_segments(req->q) == 1) {
>>>>> +		u64 slba = nvme_sect_to_lba(ns, blk_rq_pos(req));
>>>>> +		u32 nlb = blk_rq_sectors(req) >> (ns->lba_shift - 9);
>>>>> +
>>>>> +		range[0].cattr = cpu_to_le32(0);
>>>>> +		range[0].nlb = cpu_to_le32(nlb);
>>>>> +		range[0].slba = cpu_to_le64(slba);
>>>>> +		n = 1;
>>>>> +	} else {
>>>>> +		__rq_for_each_bio(bio, req) {
>>>>> +			u64 slba = nvme_sect_to_lba(ns, bio->bi_iter.bi_sector);
>>>>> +			u32 nlb = bio->bi_iter.bi_size >> ns->lba_shift;
>>>>> +
>>>>> +			if (n < segments) {
>>>>> +				range[n].cattr = cpu_to_le32(0);
>>>>> +				range[n].nlb = cpu_to_le32(nlb);
>>>>> +				range[n].slba = cpu_to_le64(slba);
>>>>> +			}
>>>>> +			n++;
>>>>>     		}
>>>>> -		n++;
>>>>>     	}
>>>>>     	if (WARN_ON_ONCE(n != segments)) {
>>>>
>>>>
>>>> Maybe just set segments to min(blk_rq_nr_discard_segments(req),
>>>> queue_max_discard_segments(req->q)) and let the existing code do
>>>> its thing?
>>>
>>> What is the existing code for applying min()?
>>
>> Was referring to this:
>> --
>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>> index 3345f866178e..dbc402587431 100644
>> --- a/drivers/nvme/host/core.c
>> +++ b/drivers/nvme/host/core.c
>> @@ -781,6 +781,7 @@ static blk_status_t nvme_setup_discard(struct nvme_ns
>> *ns, struct request *req,
>>                  range = page_address(ns->ctrl->discard_page);
>>          }
>>
>> +       segments = min(segments, queue_max_discard_segments(req->q));
> 
> That can't work.
> 
> In case of queue_max_discard_segments(req->q) == 1, the request still
> can have more than one bios since the normal merge is taken for discard
> IOs.

Ah, I see, the bios are contiguous though right?
We could add a contiguity check in the loop and conditionally
increment n, but maybe that would probably be more complicated...

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] nvme: fix handling single range discard request
  2023-03-07 12:31         ` Sagi Grimberg
@ 2023-03-07 14:24           ` Ming Lei
  2023-03-08  5:42             ` Chaitanya Kulkarni
  2023-03-08  5:36           ` Chaitanya Kulkarni
  1 sibling, 1 reply; 15+ messages in thread
From: Ming Lei @ 2023-03-07 14:24 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: Christoph Hellwig, linux-nvme, linux-block

On Tue, Mar 07, 2023 at 02:31:48PM +0200, Sagi Grimberg wrote:
> 
> 
> On 3/7/23 14:14, Ming Lei wrote:
> > On Tue, Mar 07, 2023 at 01:39:27PM +0200, Sagi Grimberg wrote:
> > > 
> > > 
> > > On 3/6/23 23:49, Ming Lei wrote:
> > > > On Mon, Mar 06, 2023 at 04:21:08PM +0200, Sagi Grimberg wrote:
> > > > > 
> > > > > 
> > > > > On 3/4/23 01:13, Ming Lei wrote:
> > > > > > When investigating one customer report on warning in nvme_setup_discard,
> > > > > > we observed the controller(nvme/tcp) actually exposes
> > > > > > queue_max_discard_segments(req->q) == 1.
> > > > > > 
> > > > > > Obviously the current code can't handle this situation, since contiguity
> > > > > > merge like normal RW request is taken.
> > > > > > 
> > > > > > Fix the issue by building range from request sector/nr_sectors directly.
> > > > > > 
> > > > > > Fixes: b35ba01ea697 ("nvme: support ranged discard requests")
> > > > > > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > > > > > ---
> > > > > >     drivers/nvme/host/core.c | 28 +++++++++++++++++++---------
> > > > > >     1 file changed, 19 insertions(+), 9 deletions(-)
> > > > > > 
> > > > > > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> > > > > > index c2730b116dc6..d4be525f8100 100644
> > > > > > --- a/drivers/nvme/host/core.c
> > > > > > +++ b/drivers/nvme/host/core.c
> > > > > > @@ -781,16 +781,26 @@ static blk_status_t nvme_setup_discard(struct nvme_ns *ns, struct request *req,
> > > > > >     		range = page_address(ns->ctrl->discard_page);
> > > > > >     	}
> > > > > > -	__rq_for_each_bio(bio, req) {
> > > > > > -		u64 slba = nvme_sect_to_lba(ns, bio->bi_iter.bi_sector);
> > > > > > -		u32 nlb = bio->bi_iter.bi_size >> ns->lba_shift;
> > > > > > -
> > > > > > -		if (n < segments) {
> > > > > > -			range[n].cattr = cpu_to_le32(0);
> > > > > > -			range[n].nlb = cpu_to_le32(nlb);
> > > > > > -			range[n].slba = cpu_to_le64(slba);
> > > > > > +	if (queue_max_discard_segments(req->q) == 1) {
> > > > > > +		u64 slba = nvme_sect_to_lba(ns, blk_rq_pos(req));
> > > > > > +		u32 nlb = blk_rq_sectors(req) >> (ns->lba_shift - 9);
> > > > > > +
> > > > > > +		range[0].cattr = cpu_to_le32(0);
> > > > > > +		range[0].nlb = cpu_to_le32(nlb);
> > > > > > +		range[0].slba = cpu_to_le64(slba);
> > > > > > +		n = 1;
> > > > > > +	} else {
> > > > > > +		__rq_for_each_bio(bio, req) {
> > > > > > +			u64 slba = nvme_sect_to_lba(ns, bio->bi_iter.bi_sector);
> > > > > > +			u32 nlb = bio->bi_iter.bi_size >> ns->lba_shift;
> > > > > > +
> > > > > > +			if (n < segments) {
> > > > > > +				range[n].cattr = cpu_to_le32(0);
> > > > > > +				range[n].nlb = cpu_to_le32(nlb);
> > > > > > +				range[n].slba = cpu_to_le64(slba);
> > > > > > +			}
> > > > > > +			n++;
> > > > > >     		}
> > > > > > -		n++;
> > > > > >     	}
> > > > > >     	if (WARN_ON_ONCE(n != segments)) {
> > > > > 
> > > > > 
> > > > > Maybe just set segments to min(blk_rq_nr_discard_segments(req),
> > > > > queue_max_discard_segments(req->q)) and let the existing code do
> > > > > its thing?
> > > > 
> > > > What is the existing code for applying min()?
> > > 
> > > Was referring to this:
> > > --
> > > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> > > index 3345f866178e..dbc402587431 100644
> > > --- a/drivers/nvme/host/core.c
> > > +++ b/drivers/nvme/host/core.c
> > > @@ -781,6 +781,7 @@ static blk_status_t nvme_setup_discard(struct nvme_ns
> > > *ns, struct request *req,
> > >                  range = page_address(ns->ctrl->discard_page);
> > >          }
> > > 
> > > +       segments = min(segments, queue_max_discard_segments(req->q));
> > 
> > That can't work.
> > 
> > In case of queue_max_discard_segments(req->q) == 1, the request still
> > can have more than one bios since the normal merge is taken for discard
> > IOs.
> 
> Ah, I see, the bios are contiguous though right?

Yes, the merge is just like normal RW.

> We could add a contiguity check in the loop and conditionally
> increment n, but maybe that would probably be more complicated...

That is more complicated than this patch, and the same pattern
has been applied on virtio-blk.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] nvme: fix handling single range discard request
  2023-03-07 12:31         ` Sagi Grimberg
  2023-03-07 14:24           ` Ming Lei
@ 2023-03-08  5:36           ` Chaitanya Kulkarni
  1 sibling, 0 replies; 15+ messages in thread
From: Chaitanya Kulkarni @ 2023-03-08  5:36 UTC (permalink / raw)
  To: Sagi Grimberg, Ming Lei
  Cc: Christoph Hellwig, linux-nvme@lists.infradead.org,
	linux-block@vger.kernel.org


>>> Was referring to this:
>>> -- 
>>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>>> index 3345f866178e..dbc402587431 100644
>>> --- a/drivers/nvme/host/core.c
>>> +++ b/drivers/nvme/host/core.c
>>> @@ -781,6 +781,7 @@ static blk_status_t nvme_setup_discard(struct 
>>> nvme_ns
>>> *ns, struct request *req,
>>>                  range = page_address(ns->ctrl->discard_page);
>>>          }
>>>
>>> +       segments = min(segments, queue_max_discard_segments(req->q));
>>
>> That can't work.
>>
>> In case of queue_max_discard_segments(req->q) == 1, the request still
>> can have more than one bios since the normal merge is taken for discard
>> IOs.
> 
> Ah, I see, the bios are contiguous though right?
> We could add a contiguity check in the loop and conditionally
> increment n, but maybe that would probably be more complicated...

It will be great if we can avoid above mentioned complicated pattern...

-ck



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] nvme: fix handling single range discard request
  2023-03-03 23:13 [PATCH] nvme: fix handling single range discard request Ming Lei
  2023-03-04  8:00 ` Hannes Reinecke
  2023-03-06 14:21 ` Sagi Grimberg
@ 2023-03-08  5:38 ` Chaitanya Kulkarni
  2023-03-09  9:41 ` Christoph Hellwig
  3 siblings, 0 replies; 15+ messages in thread
From: Chaitanya Kulkarni @ 2023-03-08  5:38 UTC (permalink / raw)
  To: Ming Lei
  Cc: linux-block@vger.kernel.org, linux-nvme@lists.infradead.org,
	Sagi Grimberg, Christoph Hellwig

On 3/3/2023 3:13 PM, Ming Lei wrote:
> When investigating one customer report on warning in nvme_setup_discard,
> we observed the controller(nvme/tcp) actually exposes
> queue_max_discard_segments(req->q) == 1.
> 
> Obviously the current code can't handle this situation, since contiguity
> merge like normal RW request is taken.
> 
> Fix the issue by building range from request sector/nr_sectors directly.
> 
> Fixes: b35ba01ea697 ("nvme: support ranged discard requests")
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---

Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>

-ck



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] nvme: fix handling single range discard request
  2023-03-07 14:24           ` Ming Lei
@ 2023-03-08  5:42             ` Chaitanya Kulkarni
  0 siblings, 0 replies; 15+ messages in thread
From: Chaitanya Kulkarni @ 2023-03-08  5:42 UTC (permalink / raw)
  To: Ming Lei, Sagi Grimberg
  Cc: Christoph Hellwig, linux-nvme@lists.infradead.org,
	linux-block@vger.kernel.org


>>>> Was referring to this:
>>>> --
>>>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>>>> index 3345f866178e..dbc402587431 100644
>>>> --- a/drivers/nvme/host/core.c
>>>> +++ b/drivers/nvme/host/core.c
>>>> @@ -781,6 +781,7 @@ static blk_status_t nvme_setup_discard(struct nvme_ns
>>>> *ns, struct request *req,
>>>>                   range = page_address(ns->ctrl->discard_page);
>>>>           }
>>>>
>>>> +       segments = min(segments, queue_max_discard_segments(req->q));
>>>
>>> That can't work.
>>>
>>> In case of queue_max_discard_segments(req->q) == 1, the request still
>>> can have more than one bios since the normal merge is taken for discard
>>> IOs.
>>
>> Ah, I see, the bios are contiguous though right?
> 
> Yes, the merge is just like normal RW.
> 
>> We could add a contiguity check in the loop and conditionally
>> increment n, but maybe that would probably be more complicated...
> 
> That is more complicated than this patch, and the same pattern
> has been applied on virtio-blk.
>

I'd very much keep the pattern in virio-blk in nvme that is easier
to read and simple than conditional increment of the n, unless there
is a strong reason for not doing that, which I failed to
understand ...

-ck



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] nvme: fix handling single range discard request
  2023-03-03 23:13 [PATCH] nvme: fix handling single range discard request Ming Lei
                   ` (2 preceding siblings ...)
  2023-03-08  5:38 ` Chaitanya Kulkarni
@ 2023-03-09  9:41 ` Christoph Hellwig
  3 siblings, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2023-03-09  9:41 UTC (permalink / raw)
  To: Ming Lei; +Cc: Christoph Hellwig, Sagi Grimberg, linux-nvme, linux-block

Thanks,

applied to nvme-6.3.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2023-03-09  9:42 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-03-03 23:13 [PATCH] nvme: fix handling single range discard request Ming Lei
2023-03-04  8:00 ` Hannes Reinecke
2023-03-04 10:22   ` Ming Lei
2023-03-04 11:14     ` Hannes Reinecke
2023-03-04 12:02       ` Ming Lei
2023-03-06 14:21 ` Sagi Grimberg
2023-03-06 21:49   ` Ming Lei
2023-03-07 11:39     ` Sagi Grimberg
2023-03-07 12:14       ` Ming Lei
2023-03-07 12:31         ` Sagi Grimberg
2023-03-07 14:24           ` Ming Lei
2023-03-08  5:42             ` Chaitanya Kulkarni
2023-03-08  5:36           ` Chaitanya Kulkarni
2023-03-08  5:38 ` Chaitanya Kulkarni
2023-03-09  9:41 ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).