All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dmitry Bogdanov <d.bogdanov@yadro.com>
To: Guixin Liu <kanie@linux.alibaba.com>
Cc: <hch@lst.de>, <sagi@grimberg.me>, <kch@nvidia.com>,
	<linux-nvme@lists.infradead.org>
Subject: Re: [PATCH v11 1/1] nvmet: support reservation feature
Date: Tue, 8 Oct 2024 11:45:38 +0300	[thread overview]
Message-ID: <20241008084537.GK22571@yadro.com> (raw)
In-Reply-To: <fea312d9-5b53-48b9-9002-2b68053a22a0@linux.alibaba.com>

On Tue, Oct 08, 2024 at 03:27:46PM +0800, Guixin Liu wrote:
> 在 2024/10/7 16:08, Dmitry Bogdanov 写道:
> > On Sat, Oct 05, 2024 at 10:26:58PM +0800, Guixin Liu wrote:
> > > 在 2024/9/30 20:53, Dmitry Bogdanov 写道:
> > > > On Sun, Sep 29, 2024 at 11:14:10AM +0800, Guixin Liu wrote:
> > > > > This patch implements the reservation feature, includes:
> > > > > 1. reservation register(register, unregister and replace).
> > > > > 2. reservation acquire(acquire, preempt, preempt and abort).
> > > > > 3. reservation release(release and clear).
> > > > > 4. reservation report.
> > > > > 5. set feature and get feature of reservation notify mask.
> > > > > 6. get log page of reservation event.
> > > > > 
> > > > > And also make reservation configurable, one can set ns to support
> > > > > reservation before enable ns. The default of resv_enable is false.
> > > > > 
> > > > > Signed-off-by: Guixin Liu <kanie@linux.alibaba.com>
> > > > > ---
> > > > > +
> > > > > +static void nvmet_execute_pr_report(struct nvmet_req *req)
> > > > > +{
> > > > > +       u32 cdw11 = le32_to_cpu(req->cmd->common.cdw11);
> > > > > +       u32 cdw10 = le32_to_cpu(req->cmd->common.cdw10);
> > > > > +       u32 num_bytes = 4 * (cdw10 + 1); /* cdw10 is number of dwords */
> > > > > +       u8 eds = cdw11 & 1; /* Extended data structure, bit 00 */
> > > > > +       struct nvme_registered_ctrl_ext *ctrl_eds;
> > > > > +       struct nvme_reservation_status_ext *data;
> > > > > +       struct nvmet_pr *pr = &req->ns->pr;
> > > > > +       struct nvmet_pr_registrant *holder;
> > > > > +       struct nvmet_pr_registrant *reg;
> > > > > +       u16 num_ctrls = 0;
> > > > > +       u16 status;
> > > > > +       u8 rtype;
> > > > > +
> > > > > +       /* nvmet hostid(uuid_t) is 128 bit. */
> > > > > +       if (!eds) {
> > > > > +               req->error_loc = offsetof(struct nvme_common_command, cdw11);
> > > > > +               status = NVME_SC_HOST_ID_INCONSIST | NVME_SC_DNR;
> > > > > +               goto out;
> > > > > +       }
> > > > > +
> > > > > +       if (num_bytes < sizeof(struct nvme_reservation_status_ext)) {
> > > > > +               req->error_loc = offsetof(struct nvme_common_command, cdw10);
> > > > > +               status = NVME_SC_INVALID_FIELD | NVME_SC_DNR;
> > > > > +               goto out;
> > > > > +       }
> > > > > +
> > > > > +       data = kmalloc(num_bytes, GFP_KERNEL);
> > > > > +       if (!data) {
> > > > > +               status = NVME_SC_INTERNAL;
> > > > > +               goto out;
> > > > > +       }
> > > > > +       memset(data, 0, num_bytes);
> > > > > +       data->gen = cpu_to_le32(atomic_read(&pr->generation));
> > > > > +       data->ptpls = 0;
> > > > > +       ctrl_eds = data->regctl_eds;
> > > > > +
> > > > > +       rcu_read_lock();
> > > > > +       holder = rcu_dereference(pr->holder);
> > > > > +       rtype = holder ? holder->rtype : 0;
> > > > > +       data->rtype = rtype;
> > > > > +
> > > > > +       list_for_each_entry_rcu(reg, &pr->registrant_list, entry) {
> > > > > +               if ((void *)ctrl_eds >= (void *)(data + num_bytes))
> > > > > +                       break;
> > > > > +               /*
> > > > > +                * Dynamic controller, set cntlid to 0xffff.
> > > > > +                */
> > > > > +               ctrl_eds->cntlid = 0xffff;
> > > > > +               if (rtype == NVME_PR_WRITE_EXCLUSIVE_ALL_REGS ||
> > > > > +                   rtype == NVME_PR_EXCLUSIVE_ACCESS_ALL_REGS)
> > > > > +                       ctrl_eds->rcsts = 1;
> > > > > +               if (holder && uuid_equal(&reg->hostid, &holder->hostid))
> > > > is just (reg == holder) not enough here?
> > > Yes.
> > > > > +                       ctrl_eds->rcsts = 1;
> > > > > +               uuid_copy((uuid_t *)&ctrl_eds->hostid, &reg->hostid);
> > > > > +               ctrl_eds->rkey = cpu_to_le64(reg->rkey);
> > > > > +               ctrl_eds++;
> > > > > +               num_ctrls++;
> > > > > +       }
> > > > > +       rcu_read_unlock();
> > > > > +
> > > > > +       put_unaligned_le16(num_ctrls, data->regctl);
> > > > Here you should report the number of all registered registrants in ns,
> > > > not the only reported. It is used by the host to understand that it got
> > > > not a full response.
> > > > NUMD field (cdw11) is about transfering the response. But the structure itsef in
> > > > the reposnse has to be complete.
> > > The NVMe spec says that:
> > > 
> > >      "This indicates the number of Registered Controller data structures
> > > and/or Registered Controller extended data structures contained in this
> > > data structure."
> > You read not the latest Spec again. Spec v2.1 has another statement:
> >    Number of Registrants (REGSTRNT): This field indicates the number of registrants of the
> >    namespace. This indicates the number of Registrant data structures or Registrant Extended data
> >    structures contained in this data structure.
> 
> Well, still, "contained in this data structure" means that we should
> report how many registrants we reported.

No, vice-versa :), that means that "this data structure" shall contain "the number
of registrants of the namespace".

> 
> I do a research on nvme-cli and SPDK:
> 
> 1. nvme-cli: In json_nvme_resv_report, use reported regctls to traverse
> reported registrants.

I saw, and that is a security violation - to trust a data from a network.
Try to set a small NUMD and ask ReservationRepost from SPDK nvme target -
you will get a coredump of nvme-cli due to aouut-of-bound access of its
buffer (NUMD sized).

Instead, it shall send ReservationReport twice - the first with small
buffer to get just REGSTRNT, and the second with a full buffer to get
the whole list. That is a common handling of a variable length response
in SCSI, and NVME too.

> 
> 2. SPDK: In nvmf_ns_reservation_report, set regctl to the number of
> reported registrants.

Looks like you see again not the last version of the reference.
The latest SPDK has a correct implementation - It reports a whole
registrants list since Feb 13, 2023.
Please, follow it.

> Best Regards,
> 
> Guixin Liu
> 
> > > It means that host use this filed to obtain the reported registrants number.
> > > 
> > Think from the host point of view - how are you going to understand
> > how much registrants are there in the namespace?

BR,
 Dmitry


  reply	other threads:[~2024-10-08  8:48 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-29  3:14 [PATCH v11 0/1] Implement the NVMe reservation feature Guixin Liu
2024-09-29  3:14 ` [PATCH v11 1/1] nvmet: support " Guixin Liu
2024-09-30 12:53   ` Dmitry Bogdanov
2024-10-05 14:26     ` Guixin Liu
2024-10-07  8:08       ` Dmitry Bogdanov
2024-10-08  7:27         ` Guixin Liu
2024-10-08  8:45           ` Dmitry Bogdanov [this message]
2024-10-08  9:25             ` Guixin Liu
2024-10-02  8:09   ` Christoph Hellwig
2024-10-05 15:09     ` Guixin Liu
2024-10-07  6:36       ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20241008084537.GK22571@yadro.com \
    --to=d.bogdanov@yadro.com \
    --cc=hch@lst.de \
    --cc=kanie@linux.alibaba.com \
    --cc=kch@nvidia.com \
    --cc=linux-nvme@lists.infradead.org \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.