From: Dmitry Bogdanov <d.bogdanov@yadro.com>
To: Guixin Liu <kanie@linux.alibaba.com>
Cc: <hch@lst.de>, <sagi@grimberg.me>, <kch@nvidia.com>,
<linux-nvme@lists.infradead.org>
Subject: Re: [PATCH v9 1/1] nvmet: support reservation feature
Date: Tue, 24 Sep 2024 11:24:25 +0300 [thread overview]
Message-ID: <20240924082425.GF22571@yadro.com> (raw)
In-Reply-To: <0cddf902-9e60-43ca-b138-630c5485efa0@linux.alibaba.com>
On Tue, Sep 24, 2024 at 02:38:40PM +0800, Guixin Liu wrote:
>
> 在 2024/9/24 14:18, Guixin Liu 写道:
> >
> > 在 2024/9/24 13:54, Dmitry Bogdanov 写道:
> > > On Tue, Sep 24, 2024 at 10:49:58AM +0800, Guixin Liu wrote:
> > > > 在 2024/9/24 00:32, Dmitry Bogdanov 写道:
> > > > > > This patch implements the reservation feature, includes:
> > > > > > 1. reservation register(register, unregister and replace).
> > > > > > 2. reservation acquire(acquire, preempt, preempt and abort).
> > > > > > 3. reservation release(release and clear).
> > > > > > 4. reservation report.
> > > > > > 5. set feature and get feature of reservation notify mask.
> > > > > > 6. get log page of reservation event.
> > > > > >
> > > > > > And also make reservation configurable, one can set ns to support
> > > > > > reservation before enable ns. The default of resv_enable is false.
> > > > > >
> > > > > > Signed-off-by: Guixin Liu <kanie@linux.alibaba.com>
> > > > > > ---
> > > > > > drivers/nvme/target/Makefile | 2 +-
> > > > > > drivers/nvme/target/admin-cmd.c | 24 +-
> > > > > > drivers/nvme/target/configfs.c | 27 +
> > > > > > drivers/nvme/target/core.c | 58 +-
> > > > > > drivers/nvme/target/nvmet.h | 49 ++
> > > > > > drivers/nvme/target/pr.c | 1214
> > > > > > +++++++++++++++++++++++++++++++
> > > > > > include/linux/nvme.h | 54 ++
> > > > > > 7 files changed, 1419 insertions(+), 9 deletions(-)
> > > > > > create mode 100644 drivers/nvme/target/pr.c
> > > > > ...
> > > > > > +
> > > > > > +static void nvmet_pr_unregister_one(struct nvmet_pr *pr,
> > > > > > + struct nvmet_pr_registrant *reg)
> > > > > > +{
> > > > > > + struct nvmet_pr_registrant *first_reg;
> > > > > > + struct nvmet_pr_registrant *holder;
> > > > > > + u8 original_rtype;
> > > > > > +
> > > > > > + lockdep_assert_held(&pr->pr_lock);
> > > > > > + list_del_rcu(®->entry);
> > > > > > +
> > > > > > + holder = rcu_dereference_protected(pr->holder,
> > > > > > + lockdep_is_held(&pr->pr_lock));
> > > > > > + if (reg != holder)
> > > > > > + goto out;
> > > > > > +
> > > > > > + original_rtype = holder->rtype;
> > > > > > + if (original_rtype == NVME_PR_WRITE_EXCLUSIVE_ALL_REGS ||
> > > > > > + original_rtype == NVME_PR_EXCLUSIVE_ACCESS_ALL_REGS) {
> > > > > > + first_reg =
> > > > > > list_first_or_null_rcu(&pr->registrant_list,
> > > > > > + struct nvmet_pr_registrant, entry);
> > > > > > + if (first_reg)
> > > > > > + first_reg->rtype = original_rtype;
> > > > > > + rcu_assign_pointer(pr->holder, first_reg);
> > > > > > + } else {
> > > > > > + rcu_assign_pointer(pr->holder, NULL);
> > > > > > +
> > > > > > + if (original_rtype ==
> > > > > > NVME_PR_WRITE_EXCLUSIVE_REG_ONLY ||
> > > > > > + original_rtype ==
> > > > > > NVME_PR_WRITE_EXCLUSIVE_REG_ONLY)
> > > > > copy&past error?
> > > > > The second check should be against NVME_PR_EXCLUSIVE_ACCESS_REG_ONLY?
> > > > Indeed, my mistake, it will be fixed in v10.
> > > > > > + nvmet_pr_resv_released(pr, ®->hostid);
> > > > > > + }
> > > > > > +out:
> > > > > > + synchronize_rcu();
> > > > > > + kfree(reg);
> > > > > > +}
> > > > > > +
> > > > > > ......
> > > > > > +
> > > > > > +static u16 nvmet_pr_preempt(struct nvmet_req *req,
> > > > > > + struct nvmet_pr_registrant *reg,
> > > > > > + u8 rtype,
> > > > > > + struct nvmet_pr_acquire_data *d,
> > > > > > + bool abort)
> > > > > > +{
> > > > > > + struct nvmet_ctrl *ctrl = req->sq->ctrl;
> > > > > > + struct nvmet_pr *pr = &req->ns->pr;
> > > > > > + struct nvmet_pr_registrant *holder;
> > > > > > + enum nvme_pr_type original_rtype;
> > > > > > + u64 prkey = le64_to_cpu(d->prkey);
> > > > > > + u16 status;
> > > > > > +
> > > > > > + lockdep_assert_held(&pr->pr_lock);
> > > > > > + holder = rcu_dereference_protected(pr->holder,
> > > > > > + lockdep_is_held(&pr->pr_lock));
> > > > > > + if (!holder)
> > > > > > + return nvmet_pr_unreg_all_host_by_prkey(req, prkey,
> > > > > > + &ctrl->hostid, abort);
> > > > > > +
> > > > > > + original_rtype = holder->rtype;
> > > > > > + if (original_rtype == NVME_PR_WRITE_EXCLUSIVE_ALL_REGS ||
> > > > > > + original_rtype == NVME_PR_EXCLUSIVE_ACCESS_ALL_REGS) {
> > > > > > + if (!prkey) {
> > > > > > + nvmet_pr_unreg_all_others(req,
> > > > > > &ctrl->hostid, abort);
> > > > > > + nvmet_pr_set_new_holder(pr, rtype, reg);
> > > > > You didnot address the reservation atomicity here. You still remove an
> > > > > old holder and then after a some time you set the new holder.
> > > > > In this time gap an incomming WRITE command from a non holder will
> > > > > write
> > > > > to the media.
> > > > Right, I will change it in v10.
> > > > > > + return NVME_SC_SUCCESS;
> > > > > > + }
> > > > > > + return nvmet_pr_unreg_all_host_by_prkey(req, prkey,
> > > > > > + &ctrl->hostid, abort);
> > > > > > + }
> > > > > > +
> > > > > > + if (holder == reg) {
> > > > > > + status = nvmet_pr_update_reg_attr(pr, holder,
> > > > > > + nvmet_pr_update_holder_rtype, &rtype);
> > > > > > + if (!status && original_rtype != rtype)
> > > > > > + nvmet_pr_resv_released(pr, ®->hostid);
> > > > > > + return status;
> > > > > > + }
> > > > > > +
> > > > > > + if (prkey == holder->rkey) {
> > > > > > + status = nvmet_pr_unreg_all_others_by_prkey(req,
> > > > > > prkey,
> > > > > > + &ctrl->hostid, abort);
> > > > > And here too that timegap with released reservation.
> > > > I will fix this too, and also the
> > > >
> > > > nvmet_pr_unreg_all_others_by_prkey() does not need a return value,
> > > >
> > > > I remove it.
> > > >
> > > > > > + if (status)
> > > > > > + return status;
> > > > > > +
> > > > > > + nvmet_pr_set_new_holder(pr, rtype, reg);
> > > > > > + if (original_rtype != rtype)
> > > > > > + nvmet_pr_resv_released(pr, ®->hostid);
> > > > > This function is(may be) already invoked in nvmet_pr_unregister_one.
> > > > >
> > > > Not, here the original_rtype is not NVME_PR_WRITE_EXCLUSIVE_REG_ONLY or
> > > > NVME_PR_EXCLUSIVE_ACCESS_ALL_REGS, so the nvmet_pr_unregister_one will
> > > > not call nvmet_pr_resv_released.
> > > nvmet_pr_unregister_one sends an event for original_rtype == *_REG_ONLY
> > > Here you have only original_rtype != ALL_REGS, so _REG_ONLY is possible.
> > My mistake, it will be removed in v10, thanks.
>
> I take a look again, if we set self new holder before call
> nvmet_pr_unreg_all_others_by_prkey(), the
> nvmet_pr_unreg_all_others_by_prkey() will
>
> not unregister self, so this will not goto nvmet_pr_unregister_one()'s
> calling nvmet_pr_resv_released().
Yes, and this is a reason not to try to fix non-atomicity (anothter my
comment) by setting new holder before unregistering.
Regarding this place, here nvmet_pr_resv_released should be called for
original_rtype !=*_REG_ONLY with a note that _REG_ONLY handled in nvmet_pr_unregister_one.
Please, do not take my suggestions "how to fix" as a direct order, it's
just suggestion.
> > > > > > + return status;
> > > > > > + }
> > > > > > +
> > > > > > + if (prkey)
> > > > > > + return nvmet_pr_unreg_all_host_by_prkey(req, prkey,
> > > > > > + &ctrl->hostid, abort);
> > > > > > + return NVME_SC_INVALID_FIELD | NVME_SC_DNR;
> > > > > > +}
> > > > > > +
> > > > > > +static void nvmet_pr_confirm_ns_pc_ref(struct percpu_ref *ref)
> > > > > > +{
> > > > > > + struct nvmet_pr_per_ctrl_ref *pc_ref =
> > > > > > + container_of(ref, struct nvmet_pr_per_ctrl_ref, ref);
> > > > > > +
> > > > > > + complete(&pc_ref->confirm_done);
> > > > > > +}
> > > > > > +
next prev parent reply other threads:[~2024-09-24 8:25 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-23 9:47 [PATCH v9 0/1] Implement the NVMe reservation feature Guixin Liu
2024-09-23 9:47 ` [PATCH v9 1/1] nvmet: support " Guixin Liu
2024-09-23 16:32 ` Dmitry Bogdanov
2024-09-24 2:49 ` Guixin Liu
2024-09-24 5:54 ` Dmitry Bogdanov
2024-09-24 6:18 ` Guixin Liu
2024-09-24 6:38 ` Guixin Liu
2024-09-24 8:24 ` Dmitry Bogdanov [this message]
2024-09-24 9:45 ` Guixin Liu
2024-09-24 12:57 ` Dmitry Bogdanov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240924082425.GF22571@yadro.com \
--to=d.bogdanov@yadro.com \
--cc=hch@lst.de \
--cc=kanie@linux.alibaba.com \
--cc=kch@nvidia.com \
--cc=linux-nvme@lists.infradead.org \
--cc=sagi@grimberg.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.