[PATCH] nvmet: fix KATO offset in Set Features

linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] nvmet: fix KATO offset in Set Features
@ 2016-12-07 18:39 Daniel Verkamp
  2016-12-07 19:09 ` Daniel Verkamp
  2016-12-09 19:59 ` [PATCH v2] nvmet: fix KATO offset in Set Features Daniel Verkamp
  0 siblings, 2 replies; 12+ messages in thread
From: Daniel Verkamp @ 2016-12-07 18:39 UTC (permalink / raw)


The Set Features implementation for Keep Alive Timer was using the wrong
structure when retrieving the KATO value; it was treating the Set
Features command as a Property Set command.

The NVMe spec defines the Keep Alive Timer feature as having one input
in CDW11 (4 bytes at offset 44 in the command) whereas the code was
reading 8 bytes at offset 48.

Since the Linux NVMe over Fabrics host never sets this feature, this
code has presumably never been tested.

Signed-off-by: Daniel Verkamp <daniel.verkamp at intel.com>
---
 drivers/nvme/target/admin-cmd.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index 6fe4c48..bdae9ef 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -391,8 +391,7 @@ static void nvmet_execute_set_features(struct nvmet_req *req)
 			(subsys->max_qid - 1) | ((subsys->max_qid - 1) << 16));
 		break;
 	case NVME_FEAT_KATO:
-		val = le64_to_cpu(req->cmd->prop_set.value);
-		val32 = val & 0xffff;
+		val32 = le32_to_cpu(req->cmd->common.cdw10[1]);
 		req->sq->ctrl->kato = DIV_ROUND_UP(val32, 1000);
 		nvmet_set_result(req, req->sq->ctrl->kato);
 		break;
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH] nvmet: fix KATO offset in Set Features
  2016-12-07 18:39 [PATCH] nvmet: fix KATO offset in Set Features Daniel Verkamp
@ 2016-12-07 19:09 ` Daniel Verkamp
  2016-12-09 18:28   ` Christoph Hellwig
  2016-12-09 19:59 ` [PATCH v2] nvmet: fix KATO offset in Set Features Daniel Verkamp
  1 sibling, 1 reply; 12+ messages in thread
From: Daniel Verkamp @ 2016-12-07 19:09 UTC (permalink / raw)


On 12/07/2016 11:39 AM, Daniel Verkamp wrote:
> The Set Features implementation for Keep Alive Timer was using the wrong
> structure when retrieving the KATO value; it was treating the Set
> Features command as a Property Set command.
> 
> The NVMe spec defines the Keep Alive Timer feature as having one input
> in CDW11 (4 bytes at offset 44 in the command) whereas the code was
> reading 8 bytes at offset 48.
> 
> Since the Linux NVMe over Fabrics host never sets this feature, this
> code has presumably never been tested.
> 
> Signed-off-by: Daniel Verkamp <daniel.verkamp at intel.com>
> ---
>  drivers/nvme/target/admin-cmd.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
> index 6fe4c48..bdae9ef 100644
> --- a/drivers/nvme/target/admin-cmd.c
> +++ b/drivers/nvme/target/admin-cmd.c
> @@ -391,8 +391,7 @@ static void nvmet_execute_set_features(struct nvmet_req *req)
>  			(subsys->max_qid - 1) | ((subsys->max_qid - 1) << 16));
>  		break;
>  	case NVME_FEAT_KATO:
> -		val = le64_to_cpu(req->cmd->prop_set.value);
> -		val32 = val & 0xffff;
> +		val32 = le32_to_cpu(req->cmd->common.cdw10[1]);
>  		req->sq->ctrl->kato = DIV_ROUND_UP(val32, 1000);
>  		nvmet_set_result(req, req->sq->ctrl->kato);
>  		break;
> 

I just noticed that this makes val unused, introducing a warning.
I'll fix that on the next version.

However, I also see that this Set Features is actually sending back
the KATO value in the result (completion Dword 0).  Returning the
actually chosen (rounded) value from Set Features would make sense
to me, but it's not what the spec actually says - it only defines
Dword 0 of the completion for Get Features, not Set Features.

If we do want to return the KATO value, it should presumably be the
value in milliseconds, not seconds (currently Get Features returns
ctrl->kato * 1000 and Set Features returns just ctrl->kato).

Should I remove the nvmet_set_result() or fix it to at least return
the correct units?

Thanks,
-- Daniel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH] nvmet: fix KATO offset in Set Features
  2016-12-07 19:09 ` Daniel Verkamp
@ 2016-12-09 18:28   ` Christoph Hellwig
  2016-12-09 20:04     ` Daniel Verkamp
  0 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2016-12-09 18:28 UTC (permalink / raw)


On Wed, Dec 07, 2016@12:09:40PM -0700, Daniel Verkamp wrote:
> Should I remove the nvmet_set_result() or fix it to at least return
> the correct units?

Please remove it, as the spec does not define the result.  Otherwise
the patch looks fine to me.

Thanks for doing this work! Btw, did you catch this using a test
suite?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2] nvmet: fix KATO offset in Set Features
  2016-12-07 18:39 [PATCH] nvmet: fix KATO offset in Set Features Daniel Verkamp
  2016-12-07 19:09 ` Daniel Verkamp
@ 2016-12-09 19:59 ` Daniel Verkamp
  1 sibling, 0 replies; 12+ messages in thread
From: Daniel Verkamp @ 2016-12-09 19:59 UTC (permalink / raw)


The Set Features implementation for Keep Alive Timer was using the wrong
structure when retrieving the KATO value; it was treating the Set
Features command as a Property Set command.

The NVMe spec defines the Keep Alive Timer feature as having one input
in CDW11 (4 bytes at offset 44 in the command) whereas the code was
reading 8 bytes at offset 48.

Since the Linux NVMe over Fabrics host never sets this feature, this
code has presumably never been tested.

Signed-off-by: Daniel Verkamp <daniel.verkamp at intel.com>
---

v2: remove unused val variable

 drivers/nvme/target/admin-cmd.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index 6fe4c48..f791d46 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -381,7 +381,6 @@ static void nvmet_execute_set_features(struct nvmet_req *req)
 {
 	struct nvmet_subsys *subsys = req->sq->ctrl->subsys;
 	u32 cdw10 = le32_to_cpu(req->cmd->common.cdw10[0]);
-	u64 val;
 	u32 val32;
 	u16 status = 0;
 
@@ -391,8 +390,7 @@ static void nvmet_execute_set_features(struct nvmet_req *req)
 			(subsys->max_qid - 1) | ((subsys->max_qid - 1) << 16));
 		break;
 	case NVME_FEAT_KATO:
-		val = le64_to_cpu(req->cmd->prop_set.value);
-		val32 = val & 0xffff;
+		val32 = le32_to_cpu(req->cmd->common.cdw10[1]);
 		req->sq->ctrl->kato = DIV_ROUND_UP(val32, 1000);
 		nvmet_set_result(req, req->sq->ctrl->kato);
 		break;
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH] nvmet: fix KATO offset in Set Features
  2016-12-09 18:28   ` Christoph Hellwig
@ 2016-12-09 20:04     ` Daniel Verkamp
  2016-12-21 10:27       ` Christoph Hellwig
  0 siblings, 1 reply; 12+ messages in thread
From: Daniel Verkamp @ 2016-12-09 20:04 UTC (permalink / raw)

On 12/09/2016 11:28 AM, Christoph Hellwig wrote:
> On Wed, Dec 07, 2016@12:09:40PM -0700, Daniel Verkamp wrote:
>> Should I remove the nvmet_set_result() or fix it to at least return
>> the correct units?
> 
> Please remove it, as the spec does not define the result.  Otherwise
> the patch looks fine to me.

I sent a new version of the patch that just fixes the unused variable.
I'll send another one to remove the result.  I was going to remove it
in this patch, but I couldn't find out where (if anywhere) the result
is initialized to 0 in the common command handling.  Is it OK to just
remove the call to nvmet_set_result()? Is the whole request zeroed
out somewhere earlier? Or should I just change it to call
nvmet_set_result(req, 0) for clarity?

> Thanks for doing this work! Btw, did you catch this using a test
> suite?

I caught this when testing the SPDK NVMe over Fabrics host code, which
was using the Set Features command to set up KATO.  However, I later
realized that since we already set KATO during Connect, it's pretty
useless to configure it again with a Set Features - so I removed the
code that was hitting this bug.

A NVMe over Fabrics test suite would be a great idea, though...

Thanks,
-- Daniel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH] nvmet: fix KATO offset in Set Features
  2016-12-09 20:04     ` Daniel Verkamp
@ 2016-12-21 10:27       ` Christoph Hellwig
       [not found]         ` <830460a5-ce31-b882-8f06-4a906763eb50@broadcom.com>
  0 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2016-12-21 10:27 UTC (permalink / raw)


On Fri, Dec 09, 2016@01:04:49PM -0700, Daniel Verkamp wrote:
> On 12/09/2016 11:28 AM, Christoph Hellwig wrote:
> > On Wed, Dec 07, 2016@12:09:40PM -0700, Daniel Verkamp wrote:
> >> Should I remove the nvmet_set_result() or fix it to at least return
> >> the correct units?
> > 
> > Please remove it, as the spec does not define the result.  Otherwise
> > the patch looks fine to me.
> 
> I sent a new version of the patch that just fixes the unused variable.
> I'll send another one to remove the result.  I was going to remove it
> in this patch, but I couldn't find out where (if anywhere) the result
> is initialized to 0 in the common command handling.  Is it OK to just
> remove the call to nvmet_set_result()? Is the whole request zeroed
> out somewhere earlier? Or should I just change it to call
> nvmet_set_result(req, 0) for clarity?

We currently don't touch ->result unless the spec specifies any
value for it, although it might be a good idea to just always
initialize it to make sure nothing is leaked.

So if you feel like sending patches I'd send one patch to initialize
the 64-bit result field to 0 in nvmet_req_init next to where ->status
is initialized and then just remove it.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Q: nvme_rdma and reconnect
       [not found]         ` <830460a5-ce31-b882-8f06-4a906763eb50@broadcom.com>
@ 2016-12-22  6:41           ` Christoph Hellwig
  2016-12-22 12:08             ` Sagi Grimberg
  0 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2016-12-22  6:41 UTC (permalink / raw)


On Wed, Dec 21, 2016@04:56:41PM -0800, James Smart wrote:
> Sagi, Christoph,
> 
> Can you explain what the difference is between the "reset" path and the
> "error/reconnect" path is in the rdma driver.  From my point of view, it
> would seem both, relative to the fabric-side of the transport, are
> terminating the controller and reconnecting to a new controller to recover.
> So why wouldn't they be the same (single) reset flow ?

They should use the same flow.  A couple month ago I had a prototype
for that but never got it to work fully.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Q: nvme_rdma and reconnect
  2016-12-22  6:41           ` Q: nvme_rdma and reconnect Christoph Hellwig
@ 2016-12-22 12:08             ` Sagi Grimberg
  2016-12-22 12:10               ` Christoph Hellwig
  2016-12-22 12:46               ` Hannes Reinecke
  0 siblings, 2 replies; 12+ messages in thread
From: Sagi Grimberg @ 2016-12-22 12:08 UTC (permalink / raw)



>> Sagi, Christoph,
>>
>> Can you explain what the difference is between the "reset" path and the
>> "error/reconnect" path is in the rdma driver.  From my point of view, it
>> would seem both, relative to the fabric-side of the transport, are
>> terminating the controller and reconnecting to a new controller to recover.
>> So why wouldn't they be the same (single) reset flow ?
>
> They should use the same flow.  A couple month ago I had a prototype
> for that but never got it to work fully.

One more distinction is that reconnect failures will retry periodically
while reset failure will remove the device (aligns with the pci driver
behavior).

We can go via the same flow and condition on the state for the
differences, but I'm not sure its easier to understand than two
distinct routines (although that share a lot of code).

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Q: nvme_rdma and reconnect
  2016-12-22 12:08             ` Sagi Grimberg
@ 2016-12-22 12:10               ` Christoph Hellwig
  2016-12-22 12:52                 ` Sagi Grimberg
  2016-12-22 12:46               ` Hannes Reinecke
  1 sibling, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2016-12-22 12:10 UTC (permalink / raw)


On Thu, Dec 22, 2016@02:08:44PM +0200, Sagi Grimberg wrote:
> We can go via the same flow and condition on the state for the
> differences, but I'm not sure its easier to understand than two
> distinct routines (although that share a lot of code).

FYI, my attempt back then was to have a shutdown and a connect
method that these two path call, with the hope of folding the initial
connect and final shutdown into it as well, similar to how the PCI
code work.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Q: nvme_rdma and reconnect
  2016-12-22 12:08             ` Sagi Grimberg
  2016-12-22 12:10               ` Christoph Hellwig
@ 2016-12-22 12:46               ` Hannes Reinecke
  2016-12-22 12:50                 ` Christoph Hellwig
  1 sibling, 1 reply; 12+ messages in thread
From: Hannes Reinecke @ 2016-12-22 12:46 UTC (permalink / raw)


On 12/22/2016 01:08 PM, Sagi Grimberg wrote:
>
>>> Sagi, Christoph,
>>>
>>> Can you explain what the difference is between the "reset" path and the
>>> "error/reconnect" path is in the rdma driver.  From my point of view, it
>>> would seem both, relative to the fabric-side of the transport, are
>>> terminating the controller and reconnecting to a new controller to
>>> recover.
>>> So why wouldn't they be the same (single) reset flow ?
>>
>> They should use the same flow.  A couple month ago I had a prototype
>> for that but never got it to work fully.
>
> One more distinction is that reconnect failures will retry periodically
> while reset failure will remove the device (aligns with the pci driver
> behavior).
>
> We can go via the same flow and condition on the state for the
> differences, but I'm not sure its easier to understand than two
> distinct routines (although that share a lot of code).
>
And keeping in mind that the reset path will be a killer for any 
prospective multipath scenario; if you need to remove the device to 
reset you are guaranteed to _never_ get it back under memory pressure.

So please do not enforce a reset for all cases.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare at suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: J. Hawn, J. Guild, F. Imend?rffer, HRB 16746 (AG N?rnberg)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Q: nvme_rdma and reconnect
  2016-12-22 12:46               ` Hannes Reinecke
@ 2016-12-22 12:50                 ` Christoph Hellwig
  0 siblings, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2016-12-22 12:50 UTC (permalink / raw)


On Thu, Dec 22, 2016@01:46:10PM +0100, Hannes Reinecke wrote:
> And keeping in mind that the reset path will be a killer for any prospective
> multipath scenario; if you need to remove the device to reset you are
> guaranteed to _never_ get it back under memory pressure.

No one talked about removing the device, just shutting down the
instance of the controller.  We need a clean slate to reconnect as all
our RDMA QPs are toast once an error happens by design of IB Verbs.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Q: nvme_rdma and reconnect
  2016-12-22 12:10               ` Christoph Hellwig
@ 2016-12-22 12:52                 ` Sagi Grimberg
  0 siblings, 0 replies; 12+ messages in thread
From: Sagi Grimberg @ 2016-12-22 12:52 UTC (permalink / raw)



> FYI, my attempt back then was to have a shutdown and a connect
> method that these two path call, with the hope of folding the initial
> connect and final shutdown into it as well, similar to how the PCI
> code work.

I remember that, Actually I think you tried to make create_ctrl()
schedule reset_ctrl() like pci probe does. I actually had a prototype
to split them and align pci to the fabrics code because I sorta prefer
the distinction :)

But I'm fine either way...

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-12-22 12:52 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-12-07 18:39 [PATCH] nvmet: fix KATO offset in Set Features Daniel Verkamp
2016-12-07 19:09 ` Daniel Verkamp
2016-12-09 18:28   ` Christoph Hellwig
2016-12-09 20:04     ` Daniel Verkamp
2016-12-21 10:27       ` Christoph Hellwig
     [not found]         ` <830460a5-ce31-b882-8f06-4a906763eb50@broadcom.com>
2016-12-22  6:41           ` Q: nvme_rdma and reconnect Christoph Hellwig
2016-12-22 12:08             ` Sagi Grimberg
2016-12-22 12:10               ` Christoph Hellwig
2016-12-22 12:52                 ` Sagi Grimberg
2016-12-22 12:46               ` Hannes Reinecke
2016-12-22 12:50                 ` Christoph Hellwig
2016-12-09 19:59 ` [PATCH v2] nvmet: fix KATO offset in Set Features Daniel Verkamp

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).