public inbox for linux-nvme@lists.infradead.org
 help / color / mirror / Atom feed
* ioccsz and iorcsz check failing
@ 2023-12-15 14:48 Daniel Wagner
  2023-12-15 18:08 ` Keith Busch
  0 siblings, 1 reply; 14+ messages in thread
From: Daniel Wagner @ 2023-12-15 14:48 UTC (permalink / raw)
  To: linux-nvme@lists.infradead.org
  Cc: Christoph Hellwig, Sagi Grimberg, Keith Busch

Hi,

Since commit 2fcd3ab39826 ("nvme-fabrics: check ioccsz and iorcsz") my
testing fails with these checks when trying to connect a remote target
(Linux nvmet) via nvme-tcp. Looking at the TCP frame I see these values
are 0 on the wire.

When running blktests with nvme-tcp via the loop back device all is
good.

When running blktest with nvme-fc via the loop back device the first
check fails because ioccsz is 0. I've added a bunch of debug prints:

  nvme nvme0: I/O queue command capsule supported size 0 < 4

  nvmet: nvmet_execute_identify:687 cns 1
  nvmet: nvmet_execute_identify_ctrl:469 ioccsz 4 iorcsz 1
  nvmet: nvmet_copy_to_sgl:98 buf ffff8881348a6000 len 4096

So this part looks good to my eyes. Not sure where the problem could be.
In case someone spots the problem as I can't really make sense of it at
the moment.

Thanks,
Daniel


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: ioccsz and iorcsz check failing
  2023-12-15 14:48 ioccsz and iorcsz check failing Daniel Wagner
@ 2023-12-15 18:08 ` Keith Busch
  2023-12-18  9:59   ` Sagi Grimberg
  0 siblings, 1 reply; 14+ messages in thread
From: Keith Busch @ 2023-12-15 18:08 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: linux-nvme@lists.infradead.org, Christoph Hellwig, Sagi Grimberg

On Fri, Dec 15, 2023 at 03:48:04PM +0100, Daniel Wagner wrote:
> Since commit 2fcd3ab39826 ("nvme-fabrics: check ioccsz and iorcsz") my
> testing fails with these checks when trying to connect a remote target
> (Linux nvmet) via nvme-tcp. Looking at the TCP frame I see these values
> are 0 on the wire.
> 
> When running blktests with nvme-tcp via the loop back device all is
> good.
> 
> When running blktest with nvme-fc via the loop back device the first
> check fails because ioccsz is 0. I've added a bunch of debug prints:
> 
>   nvme nvme0: I/O queue command capsule supported size 0 < 4
> 
>   nvmet: nvmet_execute_identify:687 cns 1
>   nvmet: nvmet_execute_identify_ctrl:469 ioccsz 4 iorcsz 1
>   nvmet: nvmet_copy_to_sgl:98 buf ffff8881348a6000 len 4096
> 
> So this part looks good to my eyes. Not sure where the problem could be.
> In case someone spots the problem as I can't really make sense of it at
> the moment.

Weird. I optimistically thought I'd find a problem, but nope, I am
confused. The code looks like it's doing the right thing, and your
target side prints appear to confirm that, but host sees a different
result. Is anything else in the identify wrong, or is it just these
fabrics fields?


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: ioccsz and iorcsz check failing
  2023-12-15 18:08 ` Keith Busch
@ 2023-12-18  9:59   ` Sagi Grimberg
  2023-12-18 13:57     ` Daniel Wagner
  0 siblings, 1 reply; 14+ messages in thread
From: Sagi Grimberg @ 2023-12-18  9:59 UTC (permalink / raw)
  To: Keith Busch, Daniel Wagner
  Cc: linux-nvme@lists.infradead.org, Christoph Hellwig



On 12/15/23 20:08, Keith Busch wrote:
> On Fri, Dec 15, 2023 at 03:48:04PM +0100, Daniel Wagner wrote:
>> Since commit 2fcd3ab39826 ("nvme-fabrics: check ioccsz and iorcsz") my
>> testing fails with these checks when trying to connect a remote target
>> (Linux nvmet) via nvme-tcp. Looking at the TCP frame I see these values
>> are 0 on the wire.
>>
>> When running blktests with nvme-tcp via the loop back device all is
>> good.
>>
>> When running blktest with nvme-fc via the loop back device the first
>> check fails because ioccsz is 0. I've added a bunch of debug prints:
>>
>>    nvme nvme0: I/O queue command capsule supported size 0 < 4
>>
>>    nvmet: nvmet_execute_identify:687 cns 1
>>    nvmet: nvmet_execute_identify_ctrl:469 ioccsz 4 iorcsz 1
>>    nvmet: nvmet_copy_to_sgl:98 buf ffff8881348a6000 len 4096
>>
>> So this part looks good to my eyes. Not sure where the problem could be.
>> In case someone spots the problem as I can't really make sense of it at
>> the moment.
> 
> Weird. I optimistically thought I'd find a problem, but nope, I am
> confused. The code looks like it's doing the right thing, and your
> target side prints appear to confirm that, but host sees a different
> result. Is anything else in the identify wrong, or is it just these
> fabrics fields?

I don't see any issue in the code as well.

I'm wondering if this has something to do with MSG_SPLICE_PAGES that
is just now uncovered?

Daniel, can you try to clear MSG_SPLICE_PAGES from nvmet_try_send_data ?


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: ioccsz and iorcsz check failing
  2023-12-18  9:59   ` Sagi Grimberg
@ 2023-12-18 13:57     ` Daniel Wagner
  2023-12-18 16:04       ` Sagi Grimberg
  0 siblings, 1 reply; 14+ messages in thread
From: Daniel Wagner @ 2023-12-18 13:57 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Keith Busch, linux-nvme@lists.infradead.org, Christoph Hellwig

On Mon, Dec 18, 2023 at 11:59:32AM +0200, Sagi Grimberg wrote:
> Is anything else in the identify wrong, or is it just these
> fabrics fields?

So what I am seeing on the wire are a few fabrics commands (connect, get
property) followed by the nvme id ctrl command (opcode 0x6).

 nvmet: nvmet_req_init:962
 nvmet: nvmet_parse_admin_cmd:1011
 nvmet: nvmet_parse_discovery_cmd:359 opcode 6

This calls then nvmet_execute_disc_identify, so adding

--- a/drivers/nvme/target/discovery.c
+++ b/drivers/nvme/target/discovery.c
@@ -249,6 +249,7 @@ static void nvmet_execute_disc_identify(struct nvmet_req *req)
 {
        struct nvmet_ctrl *ctrl = req->sq->ctrl;
        struct nvme_id_ctrl *id;
+       u32 cmd_capsule_size;
        u16 status = 0;

        if (!nvmet_check_transfer_len(req, NVME_IDENTIFY_DATA_SIZE))
@@ -289,6 +290,17 @@ static void nvmet_execute_disc_identify(struct nvmet_req *req)
                id->sgls |= cpu_to_le32(1 << 2);
        if (req->port->inline_data_size)
                id->sgls |= cpu_to_le32(1 << 20);
+       /*
+        * Max command capsule size is sqe + in-capsule data size.
+        * Disable in-capsule data for Metadata capable controllers.
+        */
+       cmd_capsule_size = sizeof(struct nvme_command);
+       if (!ctrl->pi_support)
+               cmd_capsule_size += req->port->inline_data_size;
+       id->ioccsz = cpu_to_le32(cmd_capsule_size / 16);
+
+       /* Max response capsule size is cqe */
+       id->iorcsz = cpu_to_le32(sizeof(struct nvme_completion) / 16);

        id->oaes = cpu_to_le32(NVMET_DISC_AEN_CFG_OPTIONAL);


seems to do the trick for nvme-tcp and nvme-fc. Is this the right fix?


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: ioccsz and iorcsz check failing
  2023-12-18 13:57     ` Daniel Wagner
@ 2023-12-18 16:04       ` Sagi Grimberg
  2023-12-19 17:24         ` Max Gurtovoy
  2023-12-19 17:28         ` Daniel Wagner
  0 siblings, 2 replies; 14+ messages in thread
From: Sagi Grimberg @ 2023-12-18 16:04 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: Keith Busch, linux-nvme@lists.infradead.org, Christoph Hellwig


>> Is anything else in the identify wrong, or is it just these
>> fabrics fields?
> 
> So what I am seeing on the wire are a few fabrics commands (connect, get
> property) followed by the nvme id ctrl command (opcode 0x6).
> 
>   nvmet: nvmet_req_init:962
>   nvmet: nvmet_parse_admin_cmd:1011
>   nvmet: nvmet_parse_discovery_cmd:359 opcode 6
> 
> This calls then nvmet_execute_disc_identify, so adding
> 
> --- a/drivers/nvme/target/discovery.c
> +++ b/drivers/nvme/target/discovery.c
> @@ -249,6 +249,7 @@ static void nvmet_execute_disc_identify(struct nvmet_req *req)
>   {
>          struct nvmet_ctrl *ctrl = req->sq->ctrl;
>          struct nvme_id_ctrl *id;
> +       u32 cmd_capsule_size;
>          u16 status = 0;
> 
>          if (!nvmet_check_transfer_len(req, NVME_IDENTIFY_DATA_SIZE))
> @@ -289,6 +290,17 @@ static void nvmet_execute_disc_identify(struct nvmet_req *req)
>                  id->sgls |= cpu_to_le32(1 << 2);
>          if (req->port->inline_data_size)
>                  id->sgls |= cpu_to_le32(1 << 20);
> +       /*
> +        * Max command capsule size is sqe + in-capsule data size.
> +        * Disable in-capsule data for Metadata capable controllers.
> +        */
> +       cmd_capsule_size = sizeof(struct nvme_command);
> +       if (!ctrl->pi_support)
> +               cmd_capsule_size += req->port->inline_data_size;
> +       id->ioccsz = cpu_to_le32(cmd_capsule_size / 16);

Yes, this is the culprit. Nice that it exposed a bug.

There is no in-capsule data for discovery controllers afaict.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: ioccsz and iorcsz check failing
  2023-12-18 16:04       ` Sagi Grimberg
@ 2023-12-19 17:24         ` Max Gurtovoy
  2023-12-19 19:59           ` Sagi Grimberg
  2023-12-19 17:28         ` Daniel Wagner
  1 sibling, 1 reply; 14+ messages in thread
From: Max Gurtovoy @ 2023-12-19 17:24 UTC (permalink / raw)
  To: Sagi Grimberg, Daniel Wagner
  Cc: Keith Busch, linux-nvme@lists.infradead.org, Christoph Hellwig



On 18/12/2023 18:04, Sagi Grimberg wrote:
> 
>>> Is anything else in the identify wrong, or is it just these
>>> fabrics fields?
>>
>> So what I am seeing on the wire are a few fabrics commands (connect, get
>> property) followed by the nvme id ctrl command (opcode 0x6).
>>
>>   nvmet: nvmet_req_init:962
>>   nvmet: nvmet_parse_admin_cmd:1011
>>   nvmet: nvmet_parse_discovery_cmd:359 opcode 6
>>
>> This calls then nvmet_execute_disc_identify, so adding
>>
>> --- a/drivers/nvme/target/discovery.c
>> +++ b/drivers/nvme/target/discovery.c
>> @@ -249,6 +249,7 @@ static void nvmet_execute_disc_identify(struct 
>> nvmet_req *req)
>>   {
>>          struct nvmet_ctrl *ctrl = req->sq->ctrl;
>>          struct nvme_id_ctrl *id;
>> +       u32 cmd_capsule_size;
>>          u16 status = 0;
>>
>>          if (!nvmet_check_transfer_len(req, NVME_IDENTIFY_DATA_SIZE))
>> @@ -289,6 +290,17 @@ static void nvmet_execute_disc_identify(struct 
>> nvmet_req *req)
>>                  id->sgls |= cpu_to_le32(1 << 2);
>>          if (req->port->inline_data_size)
>>                  id->sgls |= cpu_to_le32(1 << 20);
>> +       /*
>> +        * Max command capsule size is sqe + in-capsule data size.
>> +        * Disable in-capsule data for Metadata capable controllers.
>> +        */
>> +       cmd_capsule_size = sizeof(struct nvme_command);
>> +       if (!ctrl->pi_support)
>> +               cmd_capsule_size += req->port->inline_data_size;
>> +       id->ioccsz = cpu_to_le32(cmd_capsule_size / 16);
> 
> Yes, this is the culprit. Nice that it exposed a bug.
> 
> There is no in-capsule data for discovery controllers afaict.

Also, the discovery controllers can't support PI AFAIK.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: ioccsz and iorcsz check failing
  2023-12-18 16:04       ` Sagi Grimberg
  2023-12-19 17:24         ` Max Gurtovoy
@ 2023-12-19 17:28         ` Daniel Wagner
  2023-12-19 18:54           ` Caleb Sander
  1 sibling, 1 reply; 14+ messages in thread
From: Daniel Wagner @ 2023-12-19 17:28 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Keith Busch, linux-nvme@lists.infradead.org, Christoph Hellwig

On Mon, Dec 18, 2023 at 06:04:46PM +0200, Sagi Grimberg wrote:
> 
> > > Is anything else in the identify wrong, or is it just these
> > > fabrics fields?
> > 
> > So what I am seeing on the wire are a few fabrics commands (connect, get
> > property) followed by the nvme id ctrl command (opcode 0x6).
> > 
> >   nvmet: nvmet_req_init:962
> >   nvmet: nvmet_parse_admin_cmd:1011
> >   nvmet: nvmet_parse_discovery_cmd:359 opcode 6
> > 
> > This calls then nvmet_execute_disc_identify, so adding
> > 
> > --- a/drivers/nvme/target/discovery.c
> > +++ b/drivers/nvme/target/discovery.c
> > @@ -249,6 +249,7 @@ static void nvmet_execute_disc_identify(struct nvmet_req *req)
> >   {
> >          struct nvmet_ctrl *ctrl = req->sq->ctrl;
> >          struct nvme_id_ctrl *id;
> > +       u32 cmd_capsule_size;
> >          u16 status = 0;
> > 
> >          if (!nvmet_check_transfer_len(req, NVME_IDENTIFY_DATA_SIZE))
> > @@ -289,6 +290,17 @@ static void nvmet_execute_disc_identify(struct nvmet_req *req)
> >                  id->sgls |= cpu_to_le32(1 << 2);
> >          if (req->port->inline_data_size)
> >                  id->sgls |= cpu_to_le32(1 << 20);
> > +       /*
> > +        * Max command capsule size is sqe + in-capsule data size.
> > +        * Disable in-capsule data for Metadata capable controllers.
> > +        */
> > +       cmd_capsule_size = sizeof(struct nvme_command);
> > +       if (!ctrl->pi_support)
> > +               cmd_capsule_size += req->port->inline_data_size;
> > +       id->ioccsz = cpu_to_le32(cmd_capsule_size / 16);
> 
> Yes, this is the culprit. Nice that it exposed a bug.
> 
> There is no in-capsule data for discovery controllers afaict.

Okay, I have no idea. I stole this from nvmet_execute_identify_ctrl().

I tried to read up but couldn't really find a proper answer. And I am a
bit confused about the TCP transport definition. If I got this right
then a min size is not defined either for fabrics commands or responses
depending if it is in-capsule or !in-capsule supported. But never are
both values are defined at the same time.

RDMA is a bit more clearer on this topic and says for admin commands
these value are fixed is in line with your statement

The PI check is kind of useless as we only deal with admin commands
anyway.

Anyway I am fine by just dropping the inline part as it doesn't seem to
impact the functionality for the discovery controller. But I suppose it
would be good to know why this is the correct thing to do.


  3.3.2.1 Capsules and Data Transfers

  The capsule size for the Admin Queue commands and responses is fixed
  and defined in the NVMe Transport binding specification.


  7 I/O Commands

  The user data format and any end-to-end protection information is I/O
  Command Set specific. Refer to each I/O Command Set specification for
  applicability and additional details, if any. Refer to the referenced
  I/O Command Set specification for all I/O Command Set specific
  commands described in Figure 390.


  7.2  Transport Capsule and Data Binding: Fibre Channel

  [ no public documents ]


  7.3.2 Capsules and SGLs (RDMA)

  The capsule size for Fabrics commands are fixed in size regardless of
  whether commands are submitted on an Admin Queue or an I/O Queue. The
  command capsule size is 64 bytes and the response capsule size is 16
  bytes.

  The capsule sizes for the Admin Queue are fixed in size. The command
  capsule size is 64 bytes and the response capsule size is 16 bytes.
  In-capsule data is not supported for the Admin Queue.


  7.4.3 Capsules (TCP)

  The NVMe/TCP transport supports a message model. Data transfers are
  supported via a transport specific data transfer mechanism, described
  in section 7.4.5, and optionally via in-capsule data. NVMe/TCP capsule
  sizes are summarized in Figure 62. The size of capsules is variable
  when in-capsule data is supported and fixed when in-capsule data is
  not supported

  Figure 64 (size in bytes):
                               !in-capsule    in-capsule
  Fabrics and Admin Commands:  n/a            64 - 8256
  Fabrics and Admin Responses: 16             n/a


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: ioccsz and iorcsz check failing
  2023-12-19 17:28         ` Daniel Wagner
@ 2023-12-19 18:54           ` Caleb Sander
  2023-12-19 19:58             ` Sagi Grimberg
  0 siblings, 1 reply; 14+ messages in thread
From: Caleb Sander @ 2023-12-19 18:54 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: Sagi Grimberg, Keith Busch, linux-nvme@lists.infradead.org,
	Christoph Hellwig

I'm not sure IOCCSZ and IORCSZ even make sense for discovery
controllers. These fields are "I/O Queue Command (Response) Capsule
Supported Size", but discovery controllers don't have I/O queues.
Presumably the host should ignore the values in these fields for
admin-only controllers. But the base spec doesn't seem to exempt admin
controllers from the requirement that "The minimum value that shall be
indicated is 4 (1) corresponding to 64 (16) bytes." Perhaps this is a
better question for the NVMe technical workgroups?


On Tue, Dec 19, 2023 at 9:31 AM Daniel Wagner <dwagner@suse.de> wrote:
>
> On Mon, Dec 18, 2023 at 06:04:46PM +0200, Sagi Grimberg wrote:
> >
> > > > Is anything else in the identify wrong, or is it just these
> > > > fabrics fields?
> > >
> > > So what I am seeing on the wire are a few fabrics commands (connect, get
> > > property) followed by the nvme id ctrl command (opcode 0x6).
> > >
> > >   nvmet: nvmet_req_init:962
> > >   nvmet: nvmet_parse_admin_cmd:1011
> > >   nvmet: nvmet_parse_discovery_cmd:359 opcode 6
> > >
> > > This calls then nvmet_execute_disc_identify, so adding
> > >
> > > --- a/drivers/nvme/target/discovery.c
> > > +++ b/drivers/nvme/target/discovery.c
> > > @@ -249,6 +249,7 @@ static void nvmet_execute_disc_identify(struct nvmet_req *req)
> > >   {
> > >          struct nvmet_ctrl *ctrl = req->sq->ctrl;
> > >          struct nvme_id_ctrl *id;
> > > +       u32 cmd_capsule_size;
> > >          u16 status = 0;
> > >
> > >          if (!nvmet_check_transfer_len(req, NVME_IDENTIFY_DATA_SIZE))
> > > @@ -289,6 +290,17 @@ static void nvmet_execute_disc_identify(struct nvmet_req *req)
> > >                  id->sgls |= cpu_to_le32(1 << 2);
> > >          if (req->port->inline_data_size)
> > >                  id->sgls |= cpu_to_le32(1 << 20);
> > > +       /*
> > > +        * Max command capsule size is sqe + in-capsule data size.
> > > +        * Disable in-capsule data for Metadata capable controllers.
> > > +        */
> > > +       cmd_capsule_size = sizeof(struct nvme_command);
> > > +       if (!ctrl->pi_support)
> > > +               cmd_capsule_size += req->port->inline_data_size;
> > > +       id->ioccsz = cpu_to_le32(cmd_capsule_size / 16);
> >
> > Yes, this is the culprit. Nice that it exposed a bug.
> >
> > There is no in-capsule data for discovery controllers afaict.
>
> Okay, I have no idea. I stole this from nvmet_execute_identify_ctrl().
>
> I tried to read up but couldn't really find a proper answer. And I am a
> bit confused about the TCP transport definition. If I got this right
> then a min size is not defined either for fabrics commands or responses
> depending if it is in-capsule or !in-capsule supported. But never are
> both values are defined at the same time.
>
> RDMA is a bit more clearer on this topic and says for admin commands
> these value are fixed is in line with your statement
>
> The PI check is kind of useless as we only deal with admin commands
> anyway.
>
> Anyway I am fine by just dropping the inline part as it doesn't seem to
> impact the functionality for the discovery controller. But I suppose it
> would be good to know why this is the correct thing to do.
>
>
>   3.3.2.1 Capsules and Data Transfers
>
>   The capsule size for the Admin Queue commands and responses is fixed
>   and defined in the NVMe Transport binding specification.
>
>
>   7 I/O Commands
>
>   The user data format and any end-to-end protection information is I/O
>   Command Set specific. Refer to each I/O Command Set specification for
>   applicability and additional details, if any. Refer to the referenced
>   I/O Command Set specification for all I/O Command Set specific
>   commands described in Figure 390.
>
>
>   7.2  Transport Capsule and Data Binding: Fibre Channel
>
>   [ no public documents ]
>
>
>   7.3.2 Capsules and SGLs (RDMA)
>
>   The capsule size for Fabrics commands are fixed in size regardless of
>   whether commands are submitted on an Admin Queue or an I/O Queue. The
>   command capsule size is 64 bytes and the response capsule size is 16
>   bytes.
>
>   The capsule sizes for the Admin Queue are fixed in size. The command
>   capsule size is 64 bytes and the response capsule size is 16 bytes.
>   In-capsule data is not supported for the Admin Queue.
>
>
>   7.4.3 Capsules (TCP)
>
>   The NVMe/TCP transport supports a message model. Data transfers are
>   supported via a transport specific data transfer mechanism, described
>   in section 7.4.5, and optionally via in-capsule data. NVMe/TCP capsule
>   sizes are summarized in Figure 62. The size of capsules is variable
>   when in-capsule data is supported and fixed when in-capsule data is
>   not supported
>
>   Figure 64 (size in bytes):
>                                !in-capsule    in-capsule
>   Fabrics and Admin Commands:  n/a            64 - 8256
>   Fabrics and Admin Responses: 16             n/a
>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: ioccsz and iorcsz check failing
  2023-12-19 18:54           ` Caleb Sander
@ 2023-12-19 19:58             ` Sagi Grimberg
  0 siblings, 0 replies; 14+ messages in thread
From: Sagi Grimberg @ 2023-12-19 19:58 UTC (permalink / raw)
  To: Caleb Sander, Daniel Wagner
  Cc: Keith Busch, linux-nvme@lists.infradead.org, Christoph Hellwig



On 12/19/23 20:54, Caleb Sander wrote:
> I'm not sure IOCCSZ and IORCSZ even make sense for discovery
> controllers. These fields are "I/O Queue Command (Response) Capsule
> Supported Size", but discovery controllers don't have I/O queues.
> Presumably the host should ignore the values in these fields for
> admin-only controllers. But the base spec doesn't seem to exempt admin
> controllers from the requirement that "The minimum value that shall be
> indicated is 4 (1) corresponding to 64 (16) bytes." Perhaps this is a
> better question for the NVMe technical workgroups?

Perhaps. I'm not sure that the intention is for I/O commands only
as it is defined also for admin commands.

I think we should have the discovery controller simply place some
sensible numbers there and later debate if the host should ignore
them or not.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: ioccsz and iorcsz check failing
  2023-12-19 17:24         ` Max Gurtovoy
@ 2023-12-19 19:59           ` Sagi Grimberg
  2023-12-19 23:55             ` Max Gurtovoy
  0 siblings, 1 reply; 14+ messages in thread
From: Sagi Grimberg @ 2023-12-19 19:59 UTC (permalink / raw)
  To: Max Gurtovoy, Daniel Wagner
  Cc: Keith Busch, linux-nvme@lists.infradead.org, Christoph Hellwig



On 12/19/23 19:24, Max Gurtovoy wrote:
>
>
> On 18/12/2023 18:04, Sagi Grimberg wrote:
>>
>>>> Is anything else in the identify wrong, or is it just these
>>>> fabrics fields?
>>>
>>> So what I am seeing on the wire are a few fabrics commands (connect, 
>>> get
>>> property) followed by the nvme id ctrl command (opcode 0x6).
>>>
>>>   nvmet: nvmet_req_init:962
>>>   nvmet: nvmet_parse_admin_cmd:1011
>>>   nvmet: nvmet_parse_discovery_cmd:359 opcode 6
>>>
>>> This calls then nvmet_execute_disc_identify, so adding
>>>
>>> --- a/drivers/nvme/target/discovery.c
>>> +++ b/drivers/nvme/target/discovery.c
>>> @@ -249,6 +249,7 @@ static void nvmet_execute_disc_identify(struct 
>>> nvmet_req *req)
>>>   {
>>>          struct nvmet_ctrl *ctrl = req->sq->ctrl;
>>>          struct nvme_id_ctrl *id;
>>> +       u32 cmd_capsule_size;
>>>          u16 status = 0;
>>>
>>>          if (!nvmet_check_transfer_len(req, NVME_IDENTIFY_DATA_SIZE))
>>> @@ -289,6 +290,17 @@ static void nvmet_execute_disc_identify(struct 
>>> nvmet_req *req)
>>>                  id->sgls |= cpu_to_le32(1 << 2);
>>>          if (req->port->inline_data_size)
>>>                  id->sgls |= cpu_to_le32(1 << 20);
>>> +       /*
>>> +        * Max command capsule size is sqe + in-capsule data size.
>>> +        * Disable in-capsule data for Metadata capable controllers.
>>> +        */
>>> +       cmd_capsule_size = sizeof(struct nvme_command);
>>> +       if (!ctrl->pi_support)
>>> +               cmd_capsule_size += req->port->inline_data_size;
>>> +       id->ioccsz = cpu_to_le32(cmd_capsule_size / 16);
>>
>> Yes, this is the culprit. Nice that it exposed a bug.
>>
>> There is no in-capsule data for discovery controllers afaict.
>
> Also, the discovery controllers can't support PI AFAIK.

Indeed, we should simply set it to sizeof(struct nvme_command) / 16.

It is true that for tcp transport we should add additional 8192 which
is mandatory for nvme-tcp. But that can be done in an incremental patch.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: ioccsz and iorcsz check failing
  2023-12-19 19:59           ` Sagi Grimberg
@ 2023-12-19 23:55             ` Max Gurtovoy
  2023-12-20  0:10               ` Max Gurtovoy
  0 siblings, 1 reply; 14+ messages in thread
From: Max Gurtovoy @ 2023-12-19 23:55 UTC (permalink / raw)
  To: Sagi Grimberg, Daniel Wagner
  Cc: Keith Busch, linux-nvme@lists.infradead.org, Christoph Hellwig



On 19/12/2023 21:59, Sagi Grimberg wrote:
> 
> 
> On 12/19/23 19:24, Max Gurtovoy wrote:
>>
>>
>> On 18/12/2023 18:04, Sagi Grimberg wrote:
>>>
>>>>> Is anything else in the identify wrong, or is it just these
>>>>> fabrics fields?
>>>>
>>>> So what I am seeing on the wire are a few fabrics commands (connect,
>>>> get
>>>> property) followed by the nvme id ctrl command (opcode 0x6).
>>>>
>>>>    nvmet: nvmet_req_init:962
>>>>    nvmet: nvmet_parse_admin_cmd:1011
>>>>    nvmet: nvmet_parse_discovery_cmd:359 opcode 6
>>>>
>>>> This calls then nvmet_execute_disc_identify, so adding
>>>>
>>>> --- a/drivers/nvme/target/discovery.c
>>>> +++ b/drivers/nvme/target/discovery.c
>>>> @@ -249,6 +249,7 @@ static void nvmet_execute_disc_identify(struct
>>>> nvmet_req *req)
>>>>    {
>>>>           struct nvmet_ctrl *ctrl = req->sq->ctrl;
>>>>           struct nvme_id_ctrl *id;
>>>> +       u32 cmd_capsule_size;
>>>>           u16 status = 0;
>>>>
>>>>           if (!nvmet_check_transfer_len(req, NVME_IDENTIFY_DATA_SIZE))
>>>> @@ -289,6 +290,17 @@ static void nvmet_execute_disc_identify(struct
>>>> nvmet_req *req)
>>>>                   id->sgls |= cpu_to_le32(1 << 2);
>>>>           if (req->port->inline_data_size)
>>>>                   id->sgls |= cpu_to_le32(1 << 20);
>>>> +       /*
>>>> +        * Max command capsule size is sqe + in-capsule data size.
>>>> +        * Disable in-capsule data for Metadata capable controllers.
>>>> +        */
>>>> +       cmd_capsule_size = sizeof(struct nvme_command);
>>>> +       if (!ctrl->pi_support)
>>>> +               cmd_capsule_size += req->port->inline_data_size;
>>>> +       id->ioccsz = cpu_to_le32(cmd_capsule_size / 16);
>>>
>>> Yes, this is the culprit. Nice that it exposed a bug.
>>>
>>> There is no in-capsule data for discovery controllers afaict.
>>
>> Also, the discovery controllers can't support PI AFAIK.
> 
> Indeed, we should simply set it to sizeof(struct nvme_command) / 16.
> 
> It is true that for tcp transport we should add additional 8192 which
> is mandatory for nvme-tcp. But that can be done in an incremental patch.

I'm not aware of additional 8k of mandatory size for nvme-tcp discovery 
controllers.
what is this needed for ?

I was re-thinking about this issue again and I think that the original 
patch that caused the problem that Daniel found is wrong from spec 
perspective and also breaks compatibility with linux target for no 
reason. And maybe with other targets as well.

A discovery controller according to the spec does not implement IO 
queues or expose namespaces. So why do we even check for IOCCSZ and 
IORCSZ for these controllers ?
It is also mentioned explicitly that these 2 fields are reserved for 
discovery controllers.

SPEC:
"
I/O Queue Command Capsule Supported Size (IOCCSZ): This field defines 
the maximum I/O command capsule size in 16 byte units. The minimum value 
that shall be indicated is 4 corresponding to 64 bytes.

I/O Queue Response Capsule Supported Size (IORCSZ): This field defines 
the maximum I/O response capsule size in 16 byte units. The minimum 
value that shall be indicated is 1 corresponding to 16 bytes.
"

I suggest the following untested code to fix *only* the host driver side:

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index ee2e4c49892d..89e01e6b5fe4 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3068,14 +3068,14 @@ static int nvme_check_ctrl_fabric_info(struct 
nvme_ctrl *ctrl, struct nvme_id_ct
                 return -EINVAL;
         }

-       if (ctrl->ioccsz < 4) {
+       if (nvme_discovery_ctrl(ctrl) && ctrl->ioccsz < 4) {
                 dev_err(ctrl->device,
                         "I/O queue command capsule supported size %d < 
4\n",
                         ctrl->ioccsz);
                 return -EINVAL;
         }

-       if (ctrl->iorcsz < 1) {
+       if (nvme_discovery_ctrl(ctrl) && ctrl->iorcsz < 1) {
                 dev_err(ctrl->device,
                         "I/O queue response capsule supported size %d < 
1\n",
                         ctrl->iorcsz);


Daniel,
can you please test the above and if it works I'll send a fix with your 
Tested-By signature. We must have it merged to next 6.7 rc.

-Max.


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: ioccsz and iorcsz check failing
  2023-12-19 23:55             ` Max Gurtovoy
@ 2023-12-20  0:10               ` Max Gurtovoy
  2023-12-20  9:02                 ` Daniel Wagner
  0 siblings, 1 reply; 14+ messages in thread
From: Max Gurtovoy @ 2023-12-20  0:10 UTC (permalink / raw)
  To: Sagi Grimberg, Daniel Wagner
  Cc: Keith Busch, linux-nvme@lists.infradead.org, Christoph Hellwig



On 20/12/2023 1:55, Max Gurtovoy wrote:
> 
> 
> On 19/12/2023 21:59, Sagi Grimberg wrote:
>>
>>
>> On 12/19/23 19:24, Max Gurtovoy wrote:
>>>
>>>
>>> On 18/12/2023 18:04, Sagi Grimberg wrote:
>>>>
>>>>>> Is anything else in the identify wrong, or is it just these
>>>>>> fabrics fields?
>>>>>
>>>>> So what I am seeing on the wire are a few fabrics commands (connect,
>>>>> get
>>>>> property) followed by the nvme id ctrl command (opcode 0x6).
>>>>>
>>>>>    nvmet: nvmet_req_init:962
>>>>>    nvmet: nvmet_parse_admin_cmd:1011
>>>>>    nvmet: nvmet_parse_discovery_cmd:359 opcode 6
>>>>>
>>>>> This calls then nvmet_execute_disc_identify, so adding
>>>>>
>>>>> --- a/drivers/nvme/target/discovery.c
>>>>> +++ b/drivers/nvme/target/discovery.c
>>>>> @@ -249,6 +249,7 @@ static void nvmet_execute_disc_identify(struct
>>>>> nvmet_req *req)
>>>>>    {
>>>>>           struct nvmet_ctrl *ctrl = req->sq->ctrl;
>>>>>           struct nvme_id_ctrl *id;
>>>>> +       u32 cmd_capsule_size;
>>>>>           u16 status = 0;
>>>>>
>>>>>           if (!nvmet_check_transfer_len(req, NVME_IDENTIFY_DATA_SIZE))
>>>>> @@ -289,6 +290,17 @@ static void nvmet_execute_disc_identify(struct
>>>>> nvmet_req *req)
>>>>>                   id->sgls |= cpu_to_le32(1 << 2);
>>>>>           if (req->port->inline_data_size)
>>>>>                   id->sgls |= cpu_to_le32(1 << 20);
>>>>> +       /*
>>>>> +        * Max command capsule size is sqe + in-capsule data size.
>>>>> +        * Disable in-capsule data for Metadata capable controllers.
>>>>> +        */
>>>>> +       cmd_capsule_size = sizeof(struct nvme_command);
>>>>> +       if (!ctrl->pi_support)
>>>>> +               cmd_capsule_size += req->port->inline_data_size;
>>>>> +       id->ioccsz = cpu_to_le32(cmd_capsule_size / 16);
>>>>
>>>> Yes, this is the culprit. Nice that it exposed a bug.
>>>>
>>>> There is no in-capsule data for discovery controllers afaict.
>>>
>>> Also, the discovery controllers can't support PI AFAIK.
>>
>> Indeed, we should simply set it to sizeof(struct nvme_command) / 16.
>>
>> It is true that for tcp transport we should add additional 8192 which
>> is mandatory for nvme-tcp. But that can be done in an incremental patch.
> 
> I'm not aware of additional 8k of mandatory size for nvme-tcp discovery 
> controllers.
> what is this needed for ?
> 
> I was re-thinking about this issue again and I think that the original 
> patch that caused the problem that Daniel found is wrong from spec 
> perspective and also breaks compatibility with linux target for no 
> reason. And maybe with other targets as well.
> 
> A discovery controller according to the spec does not implement IO 
> queues or expose namespaces. So why do we even check for IOCCSZ and 
> IORCSZ for these controllers ?
> It is also mentioned explicitly that these 2 fields are reserved for 
> discovery controllers.
> 
> SPEC:
> "
> I/O Queue Command Capsule Supported Size (IOCCSZ): This field defines 
> the maximum I/O command capsule size in 16 byte units. The minimum value 
> that shall be indicated is 4 corresponding to 64 bytes.
> 
> I/O Queue Response Capsule Supported Size (IORCSZ): This field defines 
> the maximum I/O response capsule size in 16 byte units. The minimum 
> value that shall be indicated is 1 corresponding to 16 bytes.
> "
> 
> I suggest the following untested code to fix *only* the host driver side:
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index ee2e4c49892d..89e01e6b5fe4 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -3068,14 +3068,14 @@ static int nvme_check_ctrl_fabric_info(struct 
> nvme_ctrl *ctrl, struct nvme_id_ct
>                  return -EINVAL;
>          }
> 
> -       if (ctrl->ioccsz < 4) {
> +       if (nvme_discovery_ctrl(ctrl) && ctrl->ioccsz < 4) {
>                  dev_err(ctrl->device,
>                          "I/O queue command capsule supported size %d < 
> 4\n",
>                          ctrl->ioccsz);
>                  return -EINVAL;
>          }
> 
> -       if (ctrl->iorcsz < 1) {
> +       if (nvme_discovery_ctrl(ctrl) && ctrl->iorcsz < 1) {
>                  dev_err(ctrl->device,
>                          "I/O queue response capsule supported size %d < 
> 1\n",
>                          ctrl->iorcsz);
> 
> 
> Daniel,
> can you please test the above and if it works I'll send a fix with your 
> Tested-By signature. We must have it merged to next 6.7 rc.
> 
> -Max.
> 

Daniel,
I just saw that Caleb also noticed that it shouldn't be relevant for 
discovery controllers. The mail got lost in the mailbox.

Anyway, I believe the initiator/host code should be updated for 6.7 and 
must ignore these fields as the spec explicitly said it is reserved.
For the target side, I prefer to keep it 0 for reserved fields since 
this is the convention in all the specifications that I'm aware of.

Sagi/Christoph,
WDYT ?

I have a small bug for above code so please use:

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index ee2e4c49892d..b218ac88fcf8 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3068,14 +3068,14 @@ static int nvme_check_ctrl_fabric_info(struct 
nvme_ctrl *ctrl, struct nvme_id_ct
                 return -EINVAL;
         }

-       if (ctrl->ioccsz < 4) {
+       if (!nvme_discovery_ctrl(ctrl) && ctrl->ioccsz < 4) {
                 dev_err(ctrl->device,
                         "I/O queue command capsule supported size %d < 
4\n",
                         ctrl->ioccsz);
                 return -EINVAL;
         }

-       if (ctrl->iorcsz < 1) {
+       if (!nvme_discovery_ctrl(ctrl) && ctrl->iorcsz < 1) {
                 dev_err(ctrl->device,
                         "I/O queue response capsule supported size %d < 
1\n",
                         ctrl->iorcsz);




^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: ioccsz and iorcsz check failing
  2023-12-20  0:10               ` Max Gurtovoy
@ 2023-12-20  9:02                 ` Daniel Wagner
  2023-12-20  9:13                   ` Sagi Grimberg
  0 siblings, 1 reply; 14+ messages in thread
From: Daniel Wagner @ 2023-12-20  9:02 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: Sagi Grimberg, Keith Busch, linux-nvme@lists.infradead.org,
	Christoph Hellwig

On Wed, Dec 20, 2023 at 02:10:42AM +0200, Max Gurtovoy wrote:
> I just saw that Caleb also noticed that it shouldn't be relevant for
> discovery controllers. The mail got lost in the mailbox.

I found it also in the spam folder.

> Anyway, I believe the initiator/host code should be updated for 6.7 and must
> ignore these fields as the spec explicitly said it is reserved.
> For the target side, I prefer to keep it 0 for reserved fields since this is
> the convention in all the specifications that I'm aware of.

I've tried to figure out what our testing storage servers report (NetApp
AF-700 and a Dell Powerstore). Though I can't access our equipment right
now due to recabling activities in the datacenter.

> I have a small bug for above code so please use:
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index ee2e4c49892d..b218ac88fcf8 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -3068,14 +3068,14 @@ static int nvme_check_ctrl_fabric_info(struct
> nvme_ctrl *ctrl, struct nvme_id_ct
>                 return -EINVAL;
>         }
> 
> -       if (ctrl->ioccsz < 4) {
> +       if (!nvme_discovery_ctrl(ctrl) && ctrl->ioccsz < 4) {
>                 dev_err(ctrl->device,
>                         "I/O queue command capsule supported size %d < 4\n",
>                         ctrl->ioccsz);
>                 return -EINVAL;
>         }
> 
> -       if (ctrl->iorcsz < 1) {
> +       if (!nvme_discovery_ctrl(ctrl) && ctrl->iorcsz < 1) {
>                 dev_err(ctrl->device,
>                         "I/O queue response capsule supported size %d <
> 1\n",
>                         ctrl->iorcsz);
> 

With this blktests is happy and connecting to a remote nvme-tcp target
(Linux 6.1) works fine. So if you spin a proper patch feel free to add

Tested-by: Daniel Wagner <dwagner@suse.de>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: ioccsz and iorcsz check failing
  2023-12-20  9:02                 ` Daniel Wagner
@ 2023-12-20  9:13                   ` Sagi Grimberg
  0 siblings, 0 replies; 14+ messages in thread
From: Sagi Grimberg @ 2023-12-20  9:13 UTC (permalink / raw)
  To: Daniel Wagner, Max Gurtovoy
  Cc: Keith Busch, linux-nvme@lists.infradead.org, Christoph Hellwig



On 12/20/23 11:02, Daniel Wagner wrote:
> On Wed, Dec 20, 2023 at 02:10:42AM +0200, Max Gurtovoy wrote:
>> I just saw that Caleb also noticed that it shouldn't be relevant for
>> discovery controllers. The mail got lost in the mailbox.
> 
> I found it also in the spam folder.
> 
>> Anyway, I believe the initiator/host code should be updated for 6.7 and must
>> ignore these fields as the spec explicitly said it is reserved.
>> For the target side, I prefer to keep it 0 for reserved fields since this is
>> the convention in all the specifications that I'm aware of.
> 
> I've tried to figure out what our testing storage servers report (NetApp
> AF-700 and a Dell Powerstore). Though I can't access our equipment right
> now due to recabling activities in the datacenter.
> 
>> I have a small bug for above code so please use:
>>
>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>> index ee2e4c49892d..b218ac88fcf8 100644
>> --- a/drivers/nvme/host/core.c
>> +++ b/drivers/nvme/host/core.c
>> @@ -3068,14 +3068,14 @@ static int nvme_check_ctrl_fabric_info(struct
>> nvme_ctrl *ctrl, struct nvme_id_ct
>>                  return -EINVAL;
>>          }
>>
>> -       if (ctrl->ioccsz < 4) {
>> +       if (!nvme_discovery_ctrl(ctrl) && ctrl->ioccsz < 4) {
>>                  dev_err(ctrl->device,
>>                          "I/O queue command capsule supported size %d < 4\n",
>>                          ctrl->ioccsz);
>>                  return -EINVAL;
>>          }
>>
>> -       if (ctrl->iorcsz < 1) {
>> +       if (!nvme_discovery_ctrl(ctrl) && ctrl->iorcsz < 1) {
>>                  dev_err(ctrl->device,
>>                          "I/O queue response capsule supported size %d <
>> 1\n",
>>                          ctrl->iorcsz);
>>
> 
> With this blktests is happy and connecting to a remote nvme-tcp target
> (Linux 6.1) works fine. So if you spin a proper patch feel free to add
> 
> Tested-by: Daniel Wagner <dwagner@suse.de>

I think that nvmet should set these values as you suggested Daniel,
at the same time we can relax nvme host to issue a warning if it sees a
discovery controller that clears these values.


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-12-20  9:13 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-12-15 14:48 ioccsz and iorcsz check failing Daniel Wagner
2023-12-15 18:08 ` Keith Busch
2023-12-18  9:59   ` Sagi Grimberg
2023-12-18 13:57     ` Daniel Wagner
2023-12-18 16:04       ` Sagi Grimberg
2023-12-19 17:24         ` Max Gurtovoy
2023-12-19 19:59           ` Sagi Grimberg
2023-12-19 23:55             ` Max Gurtovoy
2023-12-20  0:10               ` Max Gurtovoy
2023-12-20  9:02                 ` Daniel Wagner
2023-12-20  9:13                   ` Sagi Grimberg
2023-12-19 17:28         ` Daniel Wagner
2023-12-19 18:54           ` Caleb Sander
2023-12-19 19:58             ` Sagi Grimberg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox