[PATCH v8 0/1] Implement the NVMe reservation feature

Linux-NVME Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v8 0/1] Implement the NVMe reservation feature
@ 2024-02-29  3:12 Guixin Liu
  2024-02-29  3:12 ` [PATCH v8 1/1] nvmet: support " Guixin Liu
  2024-03-05  2:32 ` [PATCH v8 0/1] Implement the NVMe " Guixin Liu
  0 siblings, 2 replies; 12+ messages in thread
From: Guixin Liu @ 2024-02-29  3:12 UTC (permalink / raw)
  To: hch, sagi, kch; +Cc: linux-nvme

Hi guys,
    I've implemented the NVMe reservation feature. Please review it, all 
comments are welcome as usual.

Changes from v7 to v8:
- Add me as the new file pr.c's maintainer.

Changes from v6 to v7:
- Handle "reservation notification mask" feature command to mask reservation
log.

- Add all the registrants that need to be freed to a temporary list fist,
and then after calling synchronize_rcu(), release all the registrants on the
temporary list.

- Fix the resv log page is random when there is no resv log page.

- Change nvmet_is_host_still_connected() to nvmet_is_host_connected().

- Remove nvmet_pr_set_rtype_and_holder() and change nvmet_pr_create_new_resv()
to nvmet_pr_create_new_reservation().

- Change nvmet_pr_find_registrant_by_hostid() to nvmet_pr_find_registrant().

- Change nvmet_pr_send_resv_released() to nvmet_pr_resv_released().

- Change __nvmet_pr_unregister_one() to nvmet_pr_unregister_one().

- In nvmet_pr_unreg_by_prkey(), nvmet_pr_unreg_by_prkey_except_hostid() and
nvmet_pr_unreg_except_hostid(), first do unregistering and then do event sending.


Changes from v5 to v6:
- Use synchronize_rcu() and kfree() to free registrant instead of kfree_rcu().

- Remove nvmet_pr_register_check_rkey(), put the check into pr_lock warp.
And refactor the nvmet_pr_register().

- Add the print fmt to the head.

- Add lockdep_is_held(&pr->pr_lock) condition to list_for_each_entry_rcu.

- Fix the bug in nvmet_pr_update_reg_attr(), when the change_attr hook
return fail, we should not replace the holder. 

Changes from v4 to v5:

- Use rculist macros to handle registration_list instead of list macros
regardless of in mutex lock or not.

- Use goto statement instead of return in nvmet_is_host_still_connected 
and __nvmet_pr_unregister_one.

- Add lockdep_assert_held and rcu_read_lock_held assert to many functions,
if it's necessary.

- Add a comment to nvmet_execute_get_log_page_resv to explain how lost_count
works.

- In nvmet_pr_clear, we should set holder to NULL first, I fixed this.

- Unify nvmet_pr_update_holder_rtype and __nvmet_pr_do_replace to 
nvmet_pr_update_reg_attr.

- Fix wrong nr_pages in nvmet_execute_get_log_page_resv.

- Fix the deadlock issue of nvmet_pr_exit_ns, put it out of the subsys lock.


Changes from v3 to v4:
- Use kfifo to handle resv log page instead of list, and also limit the
resv log queue to 64.

- Change the function calling alignment style to:
    nvmet_pr_send_event_by_hostid(pr, hostid,
            NVME_PR_LOG_RESERVATOPM_PREEMPTED); 

- Put kmalloc out of rcu_read_lock in nvmet_execute_pr_report().

- Remove the goto in __nvmet_pr_unregister_one().

- Change generation to atomic_t, and remove nvmet_pr_inc_generation().

- In addtion, the number2 patch "nvmet: unify aer type enum" is not
relate with this patch, so I will send it separately.


Changes from v2 to v3:
- Use rcu instead of rwlock to make IO path run faster, and put the rtype
into the struct nvmet_pr_registrant.

- Limit the resv_log_list to 128.

- Change generation to atomic64.

- Put register rkey check to a warpper.

- Change nr_avl_pages to nr_pages.

- Use NVME_SC_SUCCESS instead of 0.

- Change kmalloc param to let it not sleep in mutex lock.


Changes from v1 to v2:
- Implement the reservation notification report, includes registration
preempted, reservation released and reservation preempted.
  And also handle the reservation log page available event and send get
reservation log page command to clear log page at host.

- Put the reservation check access after validate opcode. And remove
opcodes which nvmet not implement yet check.
  Now there is no admin opcode nvmet implemented needs reservation check,
so I dont add reservation check to admin command path.
  Next we need to do reservation check includes the situation of nsid is
0xffffffff at each admin command path, if it is needed.

- Add reservation commands support in nvmet_get_cmd_effects_nvm().

- From Chaitanya, change the local variable tree style to make it cleaner,
and add some comments about NVMe spec.
  And also change others advice from chaitanya.

- Put the nvmet_pr_check_cmd_access and nvmet_parse_pr_cmd into reservation
enable check warp.

- Remove kmem_cache instead to use kmalloc and kfree.

- Change others advice from Sagi.

- Add a blktest test case, this patch will be sent before these series of
patches.


Guixin Liu (1):
  nvmet: support reservation feature

 MAINTAINERS                     |    6 +
 drivers/nvme/target/Makefile    |    2 +-
 drivers/nvme/target/admin-cmd.c |   20 +-
 drivers/nvme/target/configfs.c  |   27 +
 drivers/nvme/target/core.c      |   51 +-
 drivers/nvme/target/nvmet.h     |   36 ++
 drivers/nvme/target/pr.c        | 1041 +++++++++++++++++++++++++++++++
 include/linux/nvme.h            |   54 ++
 8 files changed, 1230 insertions(+), 7 deletions(-)
 create mode 100644 drivers/nvme/target/pr.c

-- 
2.43.0



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v8 1/1] nvmet: support reservation feature
  2024-02-29  3:12 [PATCH v8 0/1] Implement the NVMe reservation feature Guixin Liu
@ 2024-02-29  3:12 ` Guixin Liu
  2024-09-17  8:43   ` Dmitry Bogdanov
  2024-03-05  2:32 ` [PATCH v8 0/1] Implement the NVMe " Guixin Liu
  1 sibling, 1 reply; 12+ messages in thread
From: Guixin Liu @ 2024-02-29  3:12 UTC (permalink / raw)
  To: hch, sagi, kch; +Cc: linux-nvme

This patch implements the reservation feature, includes:
1. reservation register(register, unregister and replace).
2. reservation acquire(acquire, preempt, preempt and abort).
3. reservation release(release and clear).
4. reservation report.

And also make reservation configurable, one can set ns to support
reservation before enable ns. The default of resv_enable is false.

Signed-off-by: Guixin Liu <kanie@linux.alibaba.com>
---
 MAINTAINERS                     |    6 +
 drivers/nvme/target/Makefile    |    2 +-
 drivers/nvme/target/admin-cmd.c |   20 +-
 drivers/nvme/target/configfs.c  |   27 +
 drivers/nvme/target/core.c      |   51 +-
 drivers/nvme/target/nvmet.h     |   36 ++
 drivers/nvme/target/pr.c        | 1041 +++++++++++++++++++++++++++++++
 include/linux/nvme.h            |   54 ++
 8 files changed, 1230 insertions(+), 7 deletions(-)
 create mode 100644 drivers/nvme/target/pr.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 8999497011a2..8e8162736233 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -15660,6 +15660,12 @@ W:	http://git.infradead.org/nvme.git
 T:	git git://git.infradead.org/nvme.git
 F:	drivers/nvme/target/
 
+NVM EXPRESS TARGET PERSISTENT RESERVATION
+M:	Guixin Liu <kanie@linux.alibaba.com>
+L:	linux-nvme@lists.infradead.org
+S:	Supported
+F:	drivers/nvme/target/pr.c
+
 NVMEM FRAMEWORK
 M:	Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
 S:	Maintained
diff --git a/drivers/nvme/target/Makefile b/drivers/nvme/target/Makefile
index c66820102493..f9bfc904a5b3 100644
--- a/drivers/nvme/target/Makefile
+++ b/drivers/nvme/target/Makefile
@@ -10,7 +10,7 @@ obj-$(CONFIG_NVME_TARGET_FCLOOP)	+= nvme-fcloop.o
 obj-$(CONFIG_NVME_TARGET_TCP)		+= nvmet-tcp.o
 
 nvmet-y		+= core.o configfs.o admin-cmd.o fabrics-cmd.o \
-			discovery.o io-cmd-file.o io-cmd-bdev.o
+			discovery.o io-cmd-file.o io-cmd-bdev.o pr.o
 nvmet-$(CONFIG_NVME_TARGET_PASSTHRU)	+= passthru.o
 nvmet-$(CONFIG_BLK_DEV_ZONED)		+= zns.o
 nvmet-$(CONFIG_NVME_TARGET_AUTH)	+= fabrics-cmd-auth.o auth.o
diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index 39cb570f833d..eb1d85701441 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -176,6 +176,10 @@ static void nvmet_get_cmd_effects_nvm(struct nvme_effects_log *log)
 	log->iocs[nvme_cmd_read] =
 	log->iocs[nvme_cmd_flush] =
 	log->iocs[nvme_cmd_dsm]	=
+	log->iocs[nvme_cmd_resv_acquire] =
+	log->iocs[nvme_cmd_resv_register] =
+	log->iocs[nvme_cmd_resv_release] =
+	log->iocs[nvme_cmd_resv_report] =
 		cpu_to_le32(NVME_CMD_EFFECTS_CSUPP);
 	log->iocs[nvme_cmd_write] =
 	log->iocs[nvme_cmd_write_zeroes] =
@@ -340,6 +344,8 @@ static void nvmet_execute_get_log_page(struct nvmet_req *req)
 		return nvmet_execute_get_log_cmd_effects_ns(req);
 	case NVME_LOG_ANA:
 		return nvmet_execute_get_log_page_ana(req);
+	case NVME_LOG_RESERVATION:
+		return nvmet_execute_get_log_page_resv(req);
 	}
 	pr_debug("unhandled lid %d on qid %d\n",
 	       req->cmd->get_log_page.lid, req->sq->qid);
@@ -550,7 +556,13 @@ static void nvmet_execute_identify_ns(struct nvmet_req *req)
 	 */
 	id->nmic = NVME_NS_NMIC_SHARED;
 	id->anagrpid = cpu_to_le32(req->ns->anagrpid);
-
+	id->rescap = NVME_PR_SUPPORT_WRITE_EXCLUSIVE |
+		     NVME_PR_SUPPORT_EXCLUSIVE_ACCESS |
+		     NVME_PR_SUPPORT_WRITE_EXCLUSIVE_REG_ONLY |
+		     NVME_PR_SUPPORT_EXCLUSIVE_ACCESS_REG_ONLY |
+		     NVME_PR_SUPPORT_WRITE_EXCLUSIVE_ALL_REGS |
+		     NVME_PR_SUPPORT_EXCLUSIVE_ACCESS_ALL_REGS |
+		     NVME_PR_SUPPORT_IEKEY_VER_1_3_DEF;
 	memcpy(&id->nguid, &req->ns->nguid, sizeof(id->nguid));
 
 	id->lbaf[0].ds = req->ns->blksize_shift;
@@ -851,6 +863,9 @@ void nvmet_execute_set_features(struct nvmet_req *req)
 	case NVME_FEAT_WRITE_PROTECT:
 		status = nvmet_set_feat_write_protect(req);
 		break;
+	case NVME_FEAT_RESV_MASK:
+		status = nvmet_set_feat_resv_notif_mask(req, cdw11);
+		break;
 	default:
 		req->error_loc = offsetof(struct nvme_common_command, cdw10);
 		status = NVME_SC_INVALID_FIELD | NVME_SC_DNR;
@@ -949,6 +964,9 @@ void nvmet_execute_get_features(struct nvmet_req *req)
 	case NVME_FEAT_WRITE_PROTECT:
 		status = nvmet_get_feat_write_protect(req);
 		break;
+	case NVME_FEAT_RESV_MASK:
+		status = nvmet_get_feat_resv_notif_mask(req);
+		break;
 	default:
 		req->error_loc =
 			offsetof(struct nvme_common_command, cdw10);
diff --git a/drivers/nvme/target/configfs.c b/drivers/nvme/target/configfs.c
index 2482a0db2504..56aead89d21a 100644
--- a/drivers/nvme/target/configfs.c
+++ b/drivers/nvme/target/configfs.c
@@ -714,6 +714,32 @@ static ssize_t nvmet_ns_revalidate_size_store(struct config_item *item,
 
 CONFIGFS_ATTR_WO(nvmet_ns_, revalidate_size);
 
+static ssize_t nvmet_ns_resv_enable_show(struct config_item *item, char *page)
+{
+	return sprintf(page, "%d\n", to_nvmet_ns(item)->pr.enable);
+}
+
+static ssize_t nvmet_ns_resv_enable_store(struct config_item *item,
+					const char *page, size_t count)
+{
+	struct nvmet_ns *ns = to_nvmet_ns(item);
+	bool val;
+
+	if (kstrtobool(page, &val))
+		return -EINVAL;
+
+	mutex_lock(&ns->subsys->lock);
+	if (ns->enabled) {
+		pr_err("the ns:%d is already enabled.\n", ns->nsid);
+		mutex_unlock(&ns->subsys->lock);
+		return -EINVAL;
+	}
+	ns->pr.enable = val;
+	mutex_unlock(&ns->subsys->lock);
+	return count;
+}
+CONFIGFS_ATTR(nvmet_ns_, resv_enable);
+
 static struct configfs_attribute *nvmet_ns_attrs[] = {
 	&nvmet_ns_attr_device_path,
 	&nvmet_ns_attr_device_nguid,
@@ -722,6 +748,7 @@ static struct configfs_attribute *nvmet_ns_attrs[] = {
 	&nvmet_ns_attr_enable,
 	&nvmet_ns_attr_buffered_io,
 	&nvmet_ns_attr_revalidate_size,
+	&nvmet_ns_attr_resv_enable,
 #ifdef CONFIG_PCI_P2PDMA
 	&nvmet_ns_attr_p2pmem,
 #endif
diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index d26aa30f8702..5bdb7001b45a 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -591,6 +591,8 @@ int nvmet_ns_enable(struct nvmet_ns *ns)
 	if (ns->nsid > subsys->max_nsid)
 		subsys->max_nsid = ns->nsid;
 
+	nvmet_pr_init_ns(ns);
+
 	ret = xa_insert(&subsys->namespaces, ns->nsid, ns, GFP_KERNEL);
 	if (ret)
 		goto out_restore_subsys_maxnsid;
@@ -646,6 +648,7 @@ void nvmet_ns_disable(struct nvmet_ns *ns)
 	synchronize_rcu();
 	wait_for_completion(&ns->disable_done);
 	percpu_ref_exit(&ns->ref);
+	nvmet_pr_exit_ns(ns);
 
 	mutex_lock(&subsys->lock);
 
@@ -904,18 +907,34 @@ static u16 nvmet_parse_io_cmd(struct nvmet_req *req)
 		return ret;
 	}
 
+	if (req->ns->pr.enable) {
+		ret = nvmet_parse_pr_cmd(req);
+		if (!ret)
+			return ret;
+	}
+
 	switch (req->ns->csi) {
 	case NVME_CSI_NVM:
 		if (req->ns->file)
-			return nvmet_file_parse_io_cmd(req);
-		return nvmet_bdev_parse_io_cmd(req);
+			ret = nvmet_file_parse_io_cmd(req);
+		else
+			ret = nvmet_bdev_parse_io_cmd(req);
+		break;
 	case NVME_CSI_ZNS:
 		if (IS_ENABLED(CONFIG_BLK_DEV_ZONED))
-			return nvmet_bdev_zns_parse_io_cmd(req);
-		return NVME_SC_INVALID_IO_CMD_SET;
+			ret = nvmet_bdev_zns_parse_io_cmd(req);
+		else
+			ret = NVME_SC_INVALID_IO_CMD_SET;
+		break;
 	default:
-		return NVME_SC_INVALID_IO_CMD_SET;
+		ret = NVME_SC_INVALID_IO_CMD_SET;
 	}
+	if (ret)
+		return ret;
+
+	if (req->ns->pr.enable)
+		ret = nvmet_pr_check_cmd_access(req);
+	return ret;
 }
 
 bool nvmet_req_init(struct nvmet_req *req, struct nvmet_cq *cq,
@@ -1231,6 +1250,22 @@ static void nvmet_init_cap(struct nvmet_ctrl *ctrl)
 		nvmet_passthrough_override_cap(ctrl);
 }
 
+bool nvmet_is_host_connected(uuid_t *hostid, struct nvmet_subsys *subsys)
+{
+	struct nvmet_ctrl *ctrl;
+	bool ret = false;
+
+	mutex_lock(&subsys->lock);
+	list_for_each_entry(ctrl, &subsys->ctrls, subsys_entry) {
+		if (uuid_equal(hostid, &ctrl->hostid)) {
+			ret = true;
+			break;
+		}
+	}
+	mutex_unlock(&subsys->lock);
+	return ret;
+}
+
 struct nvmet_ctrl *nvmet_ctrl_find_get(const char *subsysnqn,
 				       const char *hostnqn, u16 cntlid,
 				       struct nvmet_req *req)
@@ -1447,6 +1482,11 @@ u16 nvmet_alloc_ctrl(const char *subsysnqn, const char *hostnqn,
 	ctrl->err_counter = 0;
 	spin_lock_init(&ctrl->error_lock);
 
+	ctrl->pr_log_mgr.counter = 0;
+	ctrl->pr_log_mgr.lost_count = 0;
+	mutex_init(&ctrl->pr_log_mgr.lock);
+	INIT_KFIFO(ctrl->pr_log_mgr.log_queue);
+
 	nvmet_start_keep_alive_timer(ctrl);
 
 	mutex_lock(&subsys->lock);
@@ -1485,6 +1525,7 @@ static void nvmet_ctrl_free(struct kref *ref)
 	cancel_work_sync(&ctrl->fatal_err_work);
 
 	nvmet_destroy_auth(ctrl);
+	nvmet_ctrl_destroy_pr(ctrl);
 
 	ida_free(&cntlid_ida, ctrl->cntlid);
 
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index 6c8acebe1a1a..4f8416e7353d 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -20,6 +20,7 @@
 #include <linux/blkdev.h>
 #include <linux/radix-tree.h>
 #include <linux/t10-pi.h>
+#include <linux/kfifo.h>
 
 #define NVMET_DEFAULT_VS		NVME_VS(1, 3, 0)
 
@@ -30,6 +31,7 @@
 #define NVMET_MN_MAX_SIZE		40
 #define NVMET_SN_MAX_SIZE		20
 #define NVMET_FR_MAX_SIZE		8
+#define NVMET_PR_LOG_QUEUE_SIZE		64
 
 /*
  * Supported optional AENs:
@@ -56,6 +58,22 @@
 #define IPO_IATTR_CONNECT_SQE(x)	\
 	(cpu_to_le32(offsetof(struct nvmf_connect_command, x)))
 
+struct nvmet_pr_registrant {
+	u64			rkey;
+	uuid_t			hostid;
+	enum nvme_pr_type	rtype;
+	struct list_head	entry;
+};
+
+struct nvmet_pr {
+	bool			enable;
+	unsigned long		notify_mask;
+	atomic_t		generation;
+	struct nvmet_pr_registrant __rcu *holder;
+	struct mutex		pr_lock;
+	struct list_head	registrant_list;
+};
+
 struct nvmet_ns {
 	struct percpu_ref	ref;
 	struct bdev_handle	*bdev_handle;
@@ -85,6 +103,7 @@ struct nvmet_ns {
 	int			pi_type;
 	int			metadata_size;
 	u8			csi;
+	struct nvmet_pr		pr;
 };
 
 static inline struct nvmet_ns *to_nvmet_ns(struct config_item *item)
@@ -190,6 +209,13 @@ static inline bool nvmet_port_secure_channel_required(struct nvmet_port *port)
     return nvmet_port_disc_addr_treq_secure_channel(port) == NVMF_TREQ_REQUIRED;
 }
 
+struct nvmet_pr_log_mgr {
+	struct mutex		lock;
+	u64			lost_count;
+	u64			counter;
+	DECLARE_KFIFO(log_queue, struct nvme_pr_log, NVMET_PR_LOG_QUEUE_SIZE);
+};
+
 struct nvmet_ctrl {
 	struct nvmet_subsys	*subsys;
 	struct nvmet_sq		**sqs;
@@ -243,6 +269,7 @@ struct nvmet_ctrl {
 	u8			*dh_key;
 	size_t			dh_keysize;
 #endif
+	struct nvmet_pr_log_mgr pr_log_mgr;
 };
 
 struct nvmet_subsys {
@@ -750,4 +777,13 @@ static inline bool nvmet_has_auth(struct nvmet_ctrl *ctrl)
 static inline const char *nvmet_dhchap_dhgroup_name(u8 dhgid) { return NULL; }
 #endif
 
+void nvmet_pr_init_ns(struct nvmet_ns *ns);
+u16 nvmet_parse_pr_cmd(struct nvmet_req *req);
+u16 nvmet_pr_check_cmd_access(struct nvmet_req *req);
+void nvmet_ctrl_destroy_pr(struct nvmet_ctrl *ctrl);
+void nvmet_pr_exit_ns(struct nvmet_ns *ns);
+bool nvmet_is_host_connected(uuid_t *hostid, struct nvmet_subsys *subsys);
+void nvmet_execute_get_log_page_resv(struct nvmet_req *req);
+u16 nvmet_set_feat_resv_notif_mask(struct nvmet_req *req, u32 mask);
+u16 nvmet_get_feat_resv_notif_mask(struct nvmet_req *req);
 #endif /* _NVMET_H */
diff --git a/drivers/nvme/target/pr.c b/drivers/nvme/target/pr.c
new file mode 100644
index 000000000000..d3a25ff40365
--- /dev/null
+++ b/drivers/nvme/target/pr.c
@@ -0,0 +1,1041 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * NVMe over Fabrics Persist Reservation.
+ * Copyright (c) 2024 Guixin Liu, Alibaba Group.
+ * All rights reserved.
+ */
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+#include "nvmet.h"
+#include <linux/slab.h>
+#include <asm/unaligned.h>
+#include <linux/lockdep.h>
+
+#define NVMET_PR_NOTIFI_MASK_ALL \
+		(1 << NVME_PR_NOTIFY_BIT_REG_PREEMPTED | \
+		 1 << NVME_PR_NOTIFY_BIT_RESV_RELEASED | \
+		 1 << NVME_PR_NOTIFY_BIT_RESV_PREEMPTED)
+
+struct nvmet_pr_register_data {
+	__le64	crkey;
+	__le64	nrkey;
+};
+
+struct nvmet_pr_acquire_data {
+	__le64	crkey;
+	__le64	prkey;
+};
+
+struct nvmet_pr_release_data {
+	__le64	crkey;
+};
+
+static inline struct nvmet_ns *nvmet_pr_to_ns(struct nvmet_pr *pr)
+{
+	return container_of(pr, struct nvmet_ns, pr);
+}
+
+static u16 nvmet_pr_validate_rtype(u8 rtype)
+{
+	if (rtype < NVME_PR_WRITE_EXCLUSIVE ||
+	    rtype > NVME_PR_EXCLUSIVE_ACCESS_ALL_REGS)
+		return NVME_SC_INVALID_FIELD | NVME_SC_DNR;
+
+	return NVME_SC_SUCCESS;
+}
+
+static struct nvmet_pr_registrant *
+nvmet_pr_find_registrant(struct nvmet_pr *pr, uuid_t *hostid)
+{
+	struct nvmet_pr_registrant *reg;
+
+	WARN_ON_ONCE(!rcu_read_lock_held() &&
+		     !lockdep_is_held(&pr->pr_lock));
+
+	list_for_each_entry_rcu(reg, &pr->registrant_list, entry,
+				rcu_read_lock_held() ||
+				lockdep_is_held(&pr->pr_lock)) {
+		if (uuid_equal(&reg->hostid, hostid))
+			return reg;
+	}
+	return NULL;
+}
+
+u16 nvmet_set_feat_resv_notif_mask(struct nvmet_req *req, u32 mask)
+{
+	u32 nsid = le32_to_cpu(req->cmd->common.nsid);
+	struct nvmet_ctrl *ctrl = req->sq->ctrl;
+	struct nvmet_ns *ns;
+	unsigned long idx;
+	u16 status;
+
+	if (mask & ~(NVMET_PR_NOTIFI_MASK_ALL)) {
+		req->error_loc = offsetof(struct nvme_common_command, cdw11);
+		return NVME_SC_INVALID_FIELD | NVME_SC_DNR;
+	}
+
+	if (nsid != U32_MAX) {
+		status = nvmet_req_find_ns(req);
+		if (status)
+			return status;
+		if (!req->ns->pr.enable)
+			return NVME_SC_INVALID_FIELD | NVME_SC_DNR;
+
+		WRITE_ONCE(req->ns->pr.notify_mask, mask);
+		goto success;
+	}
+
+	xa_for_each(&ctrl->subsys->namespaces, idx, ns) {
+		if (ns->pr.enable)
+			WRITE_ONCE(ns->pr.notify_mask, mask);
+	}
+
+success:
+	nvmet_set_result(req, mask);
+	return NVME_SC_SUCCESS;
+}
+
+u16 nvmet_get_feat_resv_notif_mask(struct nvmet_req *req)
+{
+	u16 status;
+
+	status = nvmet_req_find_ns(req);
+	if (status)
+		return status;
+
+	if (!req->ns->pr.enable)
+		return NVME_SC_INVALID_FIELD | NVME_SC_DNR;
+
+	nvmet_set_result(req, READ_ONCE(req->ns->pr.notify_mask));
+	return status;
+}
+
+void nvmet_execute_get_log_page_resv(struct nvmet_req *req)
+{
+	struct nvmet_pr_log_mgr *log_mgr = &req->sq->ctrl->pr_log_mgr;
+	struct nvme_pr_log next_log = {0};
+	struct nvme_pr_log log = {0};
+	u16 status = NVME_SC_SUCCESS;
+	u64 lost_count;
+	u64 cur_count;
+	u64 next_count;
+
+	mutex_lock(&log_mgr->lock);
+	if (!kfifo_get(&log_mgr->log_queue, &log))
+		goto out;
+
+	/*
+	 * We can't get the last in kfifo.
+	 * Utilize the current count and the count from the next log to
+	 * calculate the number of lost logs, while also addressing cases
+	 * of overflow. If there is no subsequent log, the number of lost
+	 * logs is equal to the lost_count within the nvmet_pr_log_mgr.
+	 */
+	cur_count = le64_to_cpu(log.count);
+	if (kfifo_peek(&log_mgr->log_queue, &next_log)) {
+		next_count = le64_to_cpu(next_log.count);
+		if (next_count > cur_count)
+			lost_count = next_count - cur_count - 1;
+		else
+			lost_count = U64_MAX - cur_count + next_count - 1;
+	} else {
+		lost_count = log_mgr->lost_count;
+	}
+
+	log.count = cpu_to_le64(cur_count + lost_count);
+	log_mgr->lost_count -= lost_count;
+
+	log.nr_pages = kfifo_len(&log_mgr->log_queue);
+
+out:
+	status = nvmet_copy_to_sgl(req, 0, &log, sizeof(log));
+	mutex_unlock(&log_mgr->lock);
+	nvmet_req_complete(req, status);
+}
+
+static void nvmet_pr_add_resv_log(struct nvmet_ctrl *ctrl, u8 log_type, u32 nsid)
+{
+	struct nvmet_pr_log_mgr *log_mgr = &ctrl->pr_log_mgr;
+	struct nvme_pr_log log = {0};
+
+	mutex_lock(&log_mgr->lock);
+	log_mgr->counter++;
+	if (log_mgr->counter == 0)
+		log_mgr->counter = 1;
+
+	log.count = cpu_to_le64(log_mgr->counter);
+	log.type = log_type;
+	log.nsid = cpu_to_le32(nsid);
+
+	if (!kfifo_put(&log_mgr->log_queue, log)) {
+		pr_warn("a reservation log lost, cntlid:%d, log_type:%d, nsid:%d\n",
+			ctrl->cntlid, log_type, nsid);
+		log_mgr->lost_count++;
+	}
+
+	mutex_unlock(&log_mgr->lock);
+}
+
+static void nvmet_pr_resv_released(struct nvmet_pr *pr, uuid_t *hostid)
+{
+	struct nvmet_ns *ns = nvmet_pr_to_ns(pr);
+	struct nvmet_subsys *subsys = ns->subsys;
+	struct nvmet_ctrl *ctrl;
+
+	if (test_bit(NVME_PR_NOTIFY_BIT_RESV_RELEASED, &pr->notify_mask))
+		return;
+
+	mutex_lock(&subsys->lock);
+	list_for_each_entry(ctrl, &subsys->ctrls, subsys_entry) {
+		if (!uuid_equal(&ctrl->hostid, hostid) &&
+		    nvmet_pr_find_registrant(pr, &ctrl->hostid)) {
+			nvmet_pr_add_resv_log(ctrl,
+				NVME_PR_LOG_RESERVATION_RELEASED, ns->nsid);
+			nvmet_add_async_event(ctrl, NVME_AER_CSS,
+				NVME_AEN_RESV_LOG_PAGE_AVALIABLE,
+				NVME_LOG_RESERVATION);
+		}
+	}
+	mutex_unlock(&subsys->lock);
+}
+
+static void nvmet_pr_send_event_by_hostid(struct nvmet_pr *pr, uuid_t *hostid,
+					  u8 log_type)
+{
+	struct nvmet_ns *ns = nvmet_pr_to_ns(pr);
+	struct nvmet_subsys *subsys = ns->subsys;
+	struct nvmet_ctrl *ctrl;
+
+	mutex_lock(&subsys->lock);
+	list_for_each_entry(ctrl, &subsys->ctrls, subsys_entry) {
+		if (uuid_equal(hostid, &ctrl->hostid)) {
+			nvmet_pr_add_resv_log(ctrl, log_type, ns->nsid);
+			nvmet_add_async_event(ctrl, NVME_AER_CSS,
+				NVME_AEN_RESV_LOG_PAGE_AVALIABLE,
+				NVME_LOG_RESERVATION);
+		}
+	}
+	mutex_unlock(&subsys->lock);
+}
+
+static void nvmet_pr_resv_preempted(struct nvmet_pr *pr, uuid_t *hostid)
+{
+	if (test_bit(NVME_PR_NOTIFY_BIT_RESV_PREEMPTED, &pr->notify_mask))
+		return;
+
+	nvmet_pr_send_event_by_hostid(pr, hostid,
+		NVME_PR_LOG_RESERVATOPM_PREEMPTED);
+}
+
+static void nvmet_pr_registration_preempted(struct nvmet_pr *pr,
+					    uuid_t *hostid)
+{
+	if (test_bit(NVME_PR_NOTIFY_BIT_REG_PREEMPTED, &pr->notify_mask))
+		return;
+
+	nvmet_pr_send_event_by_hostid(pr, hostid,
+		NVME_PR_LOG_REGISTRATION_PREEMPTED);
+}
+
+static u16 nvmet_pr_create_new_reservation(struct nvmet_pr *pr, u8 new_rtype,
+					   struct nvmet_pr_registrant *reg)
+{
+	u16 status;
+
+	status = nvmet_pr_validate_rtype(new_rtype);
+	if (!status) {
+		reg->rtype = new_rtype;
+		rcu_assign_pointer(pr->holder, reg);
+	}
+	return status;
+}
+
+static u16 nvmet_pr_register(struct nvmet_req *req,
+			     struct nvmet_pr_register_data *d)
+{
+	struct nvmet_ctrl *ctrl = req->sq->ctrl;
+	struct nvmet_pr_registrant *new, *reg;
+	struct nvmet_pr *pr = &req->ns->pr;
+	u16 status = NVME_SC_SUCCESS;
+
+	new = kmalloc(sizeof(*new), GFP_KERNEL);
+	if (!new)
+		return NVME_SC_INTERNAL;
+
+	mutex_lock(&pr->pr_lock);
+	reg = nvmet_pr_find_registrant(pr, &ctrl->hostid);
+	if (reg) {
+		if (reg->rkey != le64_to_cpu(d->nrkey))
+			status = NVME_SC_RESERVATION_CONFLICT | NVME_SC_DNR;
+		kfree(new);
+		goto out;
+	}
+
+	memset(new, 0, sizeof(*new));
+	INIT_LIST_HEAD(&new->entry);
+	new->rkey = le64_to_cpu(d->nrkey);
+	uuid_copy(&new->hostid, &ctrl->hostid);
+	list_add_tail_rcu(&new->entry, &pr->registrant_list);
+
+out:
+	mutex_unlock(&pr->pr_lock);
+	return status;
+}
+
+static void nvmet_pr_unregister_one(struct nvmet_pr *pr,
+				    struct nvmet_pr_registrant *reg)
+{
+	struct nvmet_pr_registrant *first_reg;
+	struct nvmet_pr_registrant *holder;
+	u8 original_rtype;
+
+	lockdep_assert_held(&pr->pr_lock);
+	list_del_rcu(&reg->entry);
+
+	holder = rcu_dereference_protected(pr->holder,
+			lockdep_is_held(&pr->pr_lock));
+	if (reg != holder)
+		goto out;
+
+	original_rtype = holder->rtype;
+	if (original_rtype == NVME_PR_WRITE_EXCLUSIVE_ALL_REGS ||
+	    original_rtype == NVME_PR_EXCLUSIVE_ACCESS_ALL_REGS) {
+		first_reg = list_first_or_null_rcu(&pr->registrant_list,
+				struct nvmet_pr_registrant, entry);
+		if (first_reg)
+			first_reg->rtype = original_rtype;
+		rcu_assign_pointer(pr->holder, first_reg);
+	} else {
+		rcu_assign_pointer(pr->holder, NULL);
+
+		if (original_rtype == NVME_PR_WRITE_EXCLUSIVE_REG_ONLY ||
+		    original_rtype == NVME_PR_WRITE_EXCLUSIVE_REG_ONLY)
+			nvmet_pr_resv_released(pr, &reg->hostid);
+	}
+out:
+	synchronize_rcu();
+	kfree(reg);
+}
+
+static u16 nvmet_pr_unregister(struct nvmet_req *req,
+			       struct nvmet_pr_register_data *d,
+			       bool ignore_key)
+{
+	u16 status = NVME_SC_RESERVATION_CONFLICT | NVME_SC_DNR;
+	struct nvmet_ctrl *ctrl = req->sq->ctrl;
+	struct nvmet_pr *pr = &req->ns->pr;
+	struct nvmet_pr_registrant *reg;
+
+	mutex_lock(&pr->pr_lock);
+	list_for_each_entry_rcu(reg, &pr->registrant_list, entry,
+				lockdep_is_held(&pr->pr_lock)) {
+		if (uuid_equal(&reg->hostid, &ctrl->hostid)) {
+			if (ignore_key || reg->rkey == le64_to_cpu(d->crkey)) {
+				status = NVME_SC_SUCCESS;
+				nvmet_pr_unregister_one(pr, reg);
+			}
+			break;
+		}
+	}
+	mutex_unlock(&pr->pr_lock);
+
+	return status;
+}
+
+static u16 nvmet_pr_update_reg_rkey(struct nvmet_pr_registrant *reg,
+				    void *attr)
+{
+	reg->rkey = *(u64 *)attr;
+	return NVME_SC_SUCCESS;
+}
+
+static u16 nvmet_pr_update_reg_attr(struct nvmet_pr *pr,
+				    struct nvmet_pr_registrant *reg,
+				    u16 (*change_attr)(struct nvmet_pr_registrant *reg, void *attr),
+				    void *attr)
+{
+	struct nvmet_pr_registrant *holder;
+	struct nvmet_pr_registrant *new;
+	u16 status;
+
+	lockdep_assert_held(&pr->pr_lock);
+	holder = rcu_dereference_protected(pr->holder,
+			lockdep_is_held(&pr->pr_lock));
+	if (reg != holder)
+		return change_attr(reg, attr);
+
+	new = kmalloc(sizeof(*new), GFP_ATOMIC);
+	if (!new)
+		return NVME_SC_INTERNAL;
+
+	new->rkey = holder->rkey;
+	new->rtype = holder->rtype;
+	uuid_copy(&new->hostid, &holder->hostid);
+	INIT_LIST_HEAD(&new->entry);
+
+	status = change_attr(new, attr);
+	if (!status) {
+		list_replace_rcu(&holder->entry, &new->entry);
+		rcu_assign_pointer(pr->holder, new);
+		synchronize_rcu();
+		kfree(holder);
+	} else {
+		kfree(new);
+	}
+
+	return status;
+}
+
+static u16 nvmet_pr_replace(struct nvmet_req *req,
+			    struct nvmet_pr_register_data *d,
+			    bool ignore_key)
+{
+	u16 status = NVME_SC_RESERVATION_CONFLICT | NVME_SC_DNR;
+	struct nvmet_ctrl *ctrl = req->sq->ctrl;
+	struct nvmet_pr *pr = &req->ns->pr;
+	struct nvmet_pr_registrant *reg;
+	u64 nrkey = le64_to_cpu(d->nrkey);
+
+	mutex_lock(&pr->pr_lock);
+	list_for_each_entry_rcu(reg, &pr->registrant_list, entry,
+				lockdep_is_held(&pr->pr_lock)) {
+		if (uuid_equal(&reg->hostid, &ctrl->hostid)) {
+			if (ignore_key || reg->rkey == le64_to_cpu(d->crkey))
+				status = nvmet_pr_update_reg_attr(pr, reg,
+						nvmet_pr_update_reg_rkey,
+						&nrkey);
+			break;
+		}
+	}
+	mutex_unlock(&pr->pr_lock);
+	return status;
+}
+
+static void nvmet_execute_pr_register(struct nvmet_req *req)
+{
+	u32 cdw10 = le32_to_cpu(req->cmd->common.cdw10);
+	bool ignore_key = (bool)((cdw10 >> 3) & 1); /* Ignore existing key, bit 03 */
+	struct nvmet_pr_register_data *d;
+	u8 reg_act = cdw10 & 0x07; /* Reservation Register Action, bit 02:00 */
+	u16 status;
+
+	d = kmalloc(sizeof(*d), GFP_KERNEL);
+	if (!d) {
+		status = NVME_SC_INTERNAL;
+		goto out;
+	}
+
+	status = nvmet_copy_from_sgl(req, 0, d, sizeof(*d));
+	if (status)
+		goto free_data;
+
+	switch (reg_act) {
+	case NVME_PR_REGISTER_ACT_REG:
+		status = nvmet_pr_register(req, d);
+		break;
+	case NVME_PR_REGISTER_ACT_UNREG:
+		status = nvmet_pr_unregister(req, d, ignore_key);
+		break;
+	case NVME_PR_REGISTER_ACT_REPLACE:
+		status = nvmet_pr_replace(req, d, ignore_key);
+		break;
+	default:
+		req->error_loc = offsetof(struct nvme_common_command, cdw10);
+		status = NVME_SC_INVALID_OPCODE | NVME_SC_DNR;
+		break;
+	}
+free_data:
+	kfree(d);
+out:
+	if (!status)
+		atomic_inc(&req->ns->pr.generation);
+	nvmet_req_complete(req, status);
+}
+
+static u16 nvmet_pr_acquire(struct nvmet_req *req,
+			    struct nvmet_pr_registrant *reg,
+			    u8 rtype)
+{
+	struct nvmet_pr *pr = &req->ns->pr;
+	struct nvmet_pr_registrant *holder;
+
+	lockdep_assert_held(&pr->pr_lock);
+	holder = rcu_dereference_protected(pr->holder,
+			lockdep_is_held(&pr->pr_lock));
+	if (holder && reg != holder)
+		return  NVME_SC_RESERVATION_CONFLICT | NVME_SC_DNR;
+	if (holder && reg == holder) {
+		if (holder->rtype == rtype)
+			return NVME_SC_SUCCESS;
+		return NVME_SC_RESERVATION_CONFLICT | NVME_SC_DNR;
+	}
+
+	return nvmet_pr_create_new_reservation(pr, rtype, reg);
+}
+
+static u16 nvmet_pr_unreg_by_prkey(struct nvmet_pr *pr, u64 prkey,
+				   uuid_t *send_hostid)
+{
+	u16 status = NVME_SC_RESERVATION_CONFLICT | NVME_SC_DNR;
+	struct nvmet_pr_registrant *reg, *tmp;
+	uuid_t hostid;
+
+	lockdep_assert_held(&pr->pr_lock);
+
+	list_for_each_entry_safe(reg, tmp, &pr->registrant_list, entry) {
+		if (reg->rkey == prkey) {
+			status = NVME_SC_SUCCESS;
+			uuid_copy(&hostid, &reg->hostid);
+			nvmet_pr_unregister_one(pr, reg);
+			if (!uuid_equal(&reg->hostid, send_hostid))
+				nvmet_pr_registration_preempted(pr, &hostid);
+		}
+	}
+	return status;
+}
+
+static u16 nvmet_pr_unreg_by_prkey_except_hostid(struct nvmet_pr *pr, u64 prkey,
+						 uuid_t *send_hostid)
+{
+	u16 status = NVME_SC_RESERVATION_CONFLICT | NVME_SC_DNR;
+	struct nvmet_pr_registrant *reg, *tmp;
+	uuid_t hostid;
+
+	lockdep_assert_held(&pr->pr_lock);
+
+	list_for_each_entry_safe(reg, tmp, &pr->registrant_list, entry) {
+		if (reg->rkey == prkey &&
+		    !uuid_equal(&reg->hostid, send_hostid)) {
+			status = NVME_SC_SUCCESS;
+			uuid_copy(&hostid, &reg->hostid);
+			nvmet_pr_unregister_one(pr, reg);
+			nvmet_pr_registration_preempted(pr, &hostid);
+		}
+	}
+	return status;
+}
+
+static void nvmet_pr_unreg_except_hostid(struct nvmet_pr *pr,
+					 uuid_t *send_hostid)
+{
+	struct nvmet_pr_registrant *reg, *tmp;
+	uuid_t hostid;
+
+	lockdep_assert_held(&pr->pr_lock);
+
+	list_for_each_entry_safe(reg, tmp, &pr->registrant_list, entry) {
+		if (!uuid_equal(&reg->hostid, send_hostid)) {
+			uuid_copy(&hostid, &reg->hostid);
+			nvmet_pr_unregister_one(pr, reg);
+			nvmet_pr_registration_preempted(pr, &hostid);
+		}
+	}
+}
+
+static u16 nvmet_pr_update_holder_rtype(struct nvmet_pr_registrant *reg,
+					void *attr)
+{
+	u8 new_rtype = *(u8 *)attr;
+	u16 status;
+
+	status = nvmet_pr_validate_rtype(new_rtype);
+	if (status)
+		return status;
+
+	reg->rtype = new_rtype;
+	return NVME_SC_SUCCESS;
+}
+
+static u16 nvmet_pr_preempt(struct nvmet_req *req,
+			    struct nvmet_pr_registrant *reg,
+			    u8 rtype,
+			    struct nvmet_pr_acquire_data *d)
+{
+	struct nvmet_ctrl *ctrl = req->sq->ctrl;
+	struct nvmet_pr *pr = &req->ns->pr;
+	struct nvmet_pr_registrant *holder;
+	enum nvme_pr_type original_rtype;
+	u16 status;
+
+	lockdep_assert_held(&pr->pr_lock);
+	holder = rcu_dereference_protected(pr->holder,
+			lockdep_is_held(&pr->pr_lock));
+	if (!holder)
+		return nvmet_pr_unreg_by_prkey(pr, le64_to_cpu(d->prkey),
+				&ctrl->hostid);
+
+	original_rtype = holder->rtype;
+	if (original_rtype == NVME_PR_WRITE_EXCLUSIVE_ALL_REGS ||
+	    original_rtype == NVME_PR_EXCLUSIVE_ACCESS_ALL_REGS) {
+		if (!le64_to_cpu(d->prkey)) {
+			nvmet_pr_unreg_except_hostid(pr, &ctrl->hostid);
+			return nvmet_pr_create_new_reservation(pr,
+					rtype, reg);
+		}
+		return nvmet_pr_unreg_by_prkey(pr, le64_to_cpu(d->prkey),
+				&ctrl->hostid);
+	}
+
+	if (holder == reg) {
+		status = nvmet_pr_update_reg_attr(pr, holder,
+				nvmet_pr_update_holder_rtype, &rtype);
+		if (!status && original_rtype != rtype)
+			nvmet_pr_resv_released(pr, &reg->hostid);
+		return status;
+	}
+
+	if (le64_to_cpu(d->prkey) == holder->rkey) {
+		status = nvmet_pr_unreg_by_prkey_except_hostid(pr,
+				le64_to_cpu(d->prkey), &ctrl->hostid);
+		if (status)
+			return status;
+		status = nvmet_pr_create_new_reservation(pr, rtype, reg);
+		if (!status && original_rtype != rtype)
+			nvmet_pr_resv_released(pr, &reg->hostid);
+		return status;
+	}
+
+	if (le64_to_cpu(d->prkey))
+		return nvmet_pr_unreg_by_prkey(pr, le64_to_cpu(d->prkey),
+				&ctrl->hostid);
+	return NVME_SC_INVALID_FIELD | NVME_SC_DNR;
+}
+
+static u16 __nvmet_execute_pr_acquire(struct nvmet_req *req,
+				      struct nvmet_pr_registrant *reg,
+				      u8 acquire_act,
+				      u8 rtype,
+				      struct nvmet_pr_acquire_data *d)
+{
+	u16 status;
+
+	switch (acquire_act) {
+	case NVME_PR_ACQUIRE_ACT_ACQUIRE:
+		status = nvmet_pr_acquire(req, reg, rtype);
+		goto out;
+	case NVME_PR_ACQUIRE_ACT_PREEMPT:
+		status = nvmet_pr_preempt(req, reg, rtype, d);
+		goto inc_gen;
+	case NVME_PR_ACQUIRE_ACT_PREEMPT_AND_ABORT:
+		status = nvmet_pr_preempt(req, reg, rtype, d);
+		goto inc_gen;
+	default:
+		req->error_loc = offsetof(struct nvme_common_command, cdw10);
+		status = NVME_SC_INVALID_OPCODE | NVME_SC_DNR;
+		goto out;
+	}
+inc_gen:
+	if (!status)
+		atomic_inc(&req->ns->pr.generation);
+out:
+	return status;
+}
+
+static void nvmet_execute_pr_acquire(struct nvmet_req *req)
+{
+	u32 cdw10 = le32_to_cpu(req->cmd->common.cdw10);
+	bool ignore_key = (bool)((cdw10 >> 3) & 1); /* Ignore existing key, bit 03 */
+	u8 rtype = (u8)((cdw10 >> 8) & 0xff); /* Reservation type, bit 15:08 */
+	u8 acquire_act = cdw10 & 0x07; /* Reservation acquire action, bit 02:00 */
+	struct nvmet_ctrl *ctrl = req->sq->ctrl;
+	struct nvmet_pr_acquire_data *d = NULL;
+	struct nvmet_pr *pr = &req->ns->pr;
+	struct nvmet_pr_registrant *reg;
+	u16 status = NVME_SC_SUCCESS;
+
+	if (ignore_key) {
+		status = NVME_SC_INVALID_FIELD | NVME_SC_DNR;
+		goto out;
+	}
+
+	d = kmalloc(sizeof(*d), GFP_KERNEL);
+	if (!d) {
+		status = NVME_SC_INTERNAL;
+		goto out;
+	}
+
+	status = nvmet_copy_from_sgl(req, 0, d, sizeof(*d));
+	if (status)
+		goto free_data;
+
+	status = NVME_SC_RESERVATION_CONFLICT | NVME_SC_DNR;
+	mutex_lock(&pr->pr_lock);
+	list_for_each_entry_rcu(reg, &pr->registrant_list, entry,
+				lockdep_is_held(&pr->pr_lock)) {
+		if (uuid_equal(&reg->hostid, &ctrl->hostid) &&
+		    reg->rkey == le64_to_cpu(d->crkey)) {
+			status = __nvmet_execute_pr_acquire(req, reg,
+					acquire_act, rtype, d);
+			break;
+		}
+	}
+	mutex_unlock(&pr->pr_lock);
+
+free_data:
+	kfree(d);
+out:
+	nvmet_req_complete(req, status);
+}
+
+static u16 nvmet_pr_release(struct nvmet_req *req,
+			    struct nvmet_pr_registrant *reg,
+			    u8 rtype)
+{
+	struct nvmet_pr *pr = &req->ns->pr;
+	struct nvmet_pr_registrant *holder;
+	u8 original_rtype;
+
+	lockdep_assert_held(&pr->pr_lock);
+	holder = rcu_dereference_protected(pr->holder,
+			lockdep_is_held(&pr->pr_lock));
+	if (!holder || reg != holder)
+		return NVME_SC_SUCCESS;
+
+	original_rtype = holder->rtype;
+	if (original_rtype != rtype)
+		return NVME_SC_RESERVATION_CONFLICT | NVME_SC_DNR;
+
+	rcu_assign_pointer(pr->holder, NULL);
+
+	if (original_rtype != NVME_PR_WRITE_EXCLUSIVE &&
+	    original_rtype != NVME_PR_EXCLUSIVE_ACCESS)
+		nvmet_pr_resv_released(pr, &reg->hostid);
+
+	return NVME_SC_SUCCESS;
+}
+
+static void nvmet_pr_clear(struct nvmet_req *req)
+{
+	struct nvmet_pr_registrant *reg, *tmp;
+	struct nvmet_pr *pr = &req->ns->pr;
+	LIST_HEAD(free_list);
+
+	lockdep_assert_held(&pr->pr_lock);
+
+	rcu_assign_pointer(pr->holder, NULL);
+
+	list_for_each_entry_safe(reg, tmp, &pr->registrant_list, entry) {
+		list_del_rcu(&reg->entry);
+		if (!uuid_equal(&req->sq->ctrl->hostid, &reg->hostid))
+			nvmet_pr_resv_preempted(pr, &reg->hostid);
+		list_add(&reg->entry, &free_list);
+	}
+	synchronize_rcu();
+	list_for_each_entry_safe(reg, tmp, &free_list, entry) {
+		kfree(reg);
+	}
+
+	atomic_inc(&pr->generation);
+}
+
+static u16 __nvmet_execute_pr_release(struct nvmet_req *req,
+				      struct nvmet_pr_registrant *reg,
+				      u8 release_act, u8 rtype)
+{
+	switch (release_act) {
+	case NVME_PR_RELEASE_ACT_RELEASE:
+		return nvmet_pr_release(req, reg, rtype);
+	case NVME_PR_RELEASE_ACT_CLEAR:
+		nvmet_pr_clear(req);
+		return NVME_SC_SUCCESS;
+	default:
+		req->error_loc = offsetof(struct nvme_common_command, cdw10);
+		return NVME_SC_INVALID_OPCODE | NVME_SC_DNR;
+	}
+}
+
+static void nvmet_execute_pr_release(struct nvmet_req *req)
+{
+	u32 cdw10 = le32_to_cpu(req->cmd->common.cdw10);
+	bool ignore_key = (bool)((cdw10 >> 3) & 1); /* Ignore existing key, bit 03 */
+	u8 rtype = (u8)((cdw10 >> 8) & 0xff); /* Reservation type, bit 15:08 */
+	u8 release_act = cdw10 & 0x07; /* Reservation release action, bit 02:00 */
+	struct nvmet_ctrl *ctrl = req->sq->ctrl;
+	struct nvmet_pr *pr = &req->ns->pr;
+	struct nvmet_pr_release_data *d;
+	struct nvmet_pr_registrant *reg;
+	u16 status;
+
+	if (ignore_key) {
+		status = NVME_SC_INVALID_FIELD | NVME_SC_DNR;
+		goto out;
+	}
+
+	d = kmalloc(sizeof(*d), GFP_KERNEL);
+	if (!d) {
+		status = NVME_SC_INTERNAL;
+		goto out;
+	}
+
+	status = nvmet_copy_from_sgl(req, 0, d, sizeof(*d));
+	if (status)
+		goto free_data;
+
+	status = NVME_SC_RESERVATION_CONFLICT | NVME_SC_DNR;
+	mutex_lock(&pr->pr_lock);
+	list_for_each_entry_rcu(reg, &pr->registrant_list, entry,
+				lockdep_is_held(&pr->pr_lock)) {
+		if (uuid_equal(&reg->hostid, &ctrl->hostid) &&
+		    reg->rkey == le64_to_cpu(d->crkey)) {
+			status = __nvmet_execute_pr_release(req, reg,
+					release_act, rtype);
+			break;
+		}
+	}
+	mutex_unlock(&pr->pr_lock);
+free_data:
+	kfree(d);
+out:
+	nvmet_req_complete(req, status);
+}
+
+static void nvmet_execute_pr_report(struct nvmet_req *req)
+{
+	u32 cdw11 = le32_to_cpu(req->cmd->common.cdw11);
+	u32 cdw10 = le32_to_cpu(req->cmd->common.cdw10);
+	u32 num_bytes = 4 * (cdw10 + 1); /* cdw10 is number of dwords */
+	u8 eds = cdw11 & 1; /* Extended data structure, bit 00 */
+	struct nvmet_subsys *subsys = req->sq->ctrl->subsys;
+	struct nvme_registered_ctrl_ext *ctrl_eds;
+	struct nvme_reservation_status_ext *data;
+	struct nvmet_pr *pr = &req->ns->pr;
+	struct nvmet_pr_registrant *holder;
+	struct nvmet_pr_registrant *reg;
+	struct nvmet_ctrl *ctrl;
+	u16 num_ctrls = 0;
+	u16 status;
+	u8 rtype;
+
+	/* nvmet hostid(uuid_t) is 128 bit. */
+	if (!eds) {
+		req->error_loc = offsetof(struct nvme_common_command, cdw11);
+		status = NVME_SC_HOST_ID_INCONSIST | NVME_SC_DNR;
+		goto out;
+	}
+
+	if (num_bytes < sizeof(struct nvme_reservation_status_ext)) {
+		req->error_loc = offsetof(struct nvme_common_command, cdw10);
+		status = NVME_SC_INVALID_FIELD | NVME_SC_DNR;
+		goto out;
+	}
+
+	mutex_lock(&subsys->lock);
+	num_ctrls = list_count_nodes(&subsys->ctrls);
+	mutex_unlock(&subsys->lock);
+
+	num_bytes = min(num_bytes,
+			sizeof(struct nvme_reservation_status_ext) +
+			sizeof(struct nvme_registered_ctrl_ext) * num_ctrls);
+
+	data = kmalloc(num_bytes, GFP_KERNEL);
+	if (!data) {
+		status = NVME_SC_INTERNAL;
+		goto out;
+	}
+	memset(data, 0, num_bytes);
+	data->gen = cpu_to_le32(atomic_read(&pr->generation));
+	data->ptpls = 0;
+	ctrl_eds = data->regctl_eds;
+
+	mutex_lock(&subsys->lock);
+	rcu_read_lock();
+	holder = rcu_dereference(pr->holder);
+	rtype = holder ? holder->rtype : 0;
+	data->rtype = rtype;
+	num_ctrls = 0;
+	list_for_each_entry(ctrl, &subsys->ctrls, subsys_entry) {
+		if ((void *)ctrl_eds >= (void *)(data + num_bytes))
+			break;
+		reg = nvmet_pr_find_registrant(pr, &ctrl->hostid);
+		if (!reg)
+			continue;
+		ctrl_eds->cntlid = cpu_to_le16(ctrl->cntlid);
+		if (rtype == NVME_PR_WRITE_EXCLUSIVE_ALL_REGS ||
+		    rtype == NVME_PR_EXCLUSIVE_ACCESS_ALL_REGS)
+			ctrl_eds->rcsts = 1;
+		if (holder && uuid_equal(&ctrl->hostid, &holder->hostid))
+			ctrl_eds->rcsts = 1;
+		uuid_copy((uuid_t *)&ctrl_eds->hostid, &ctrl->hostid);
+		ctrl_eds->rkey = cpu_to_le64(reg->rkey);
+		ctrl_eds++;
+		num_ctrls++;
+	}
+	rcu_read_unlock();
+	mutex_unlock(&subsys->lock);
+
+	put_unaligned_le16(num_ctrls, data->regctl);
+	num_bytes = sizeof(struct nvme_reservation_status_ext) +
+			sizeof(struct nvme_registered_ctrl_ext) * num_ctrls;
+	status = nvmet_copy_to_sgl(req, 0, data, num_bytes);
+	kfree(data);
+out:
+	nvmet_req_complete(req, status);
+}
+
+u16 nvmet_parse_pr_cmd(struct nvmet_req *req)
+{
+	struct nvme_command *cmd = req->cmd;
+
+	switch (cmd->common.opcode) {
+	case nvme_cmd_resv_register:
+		req->execute = nvmet_execute_pr_register;
+		break;
+	case nvme_cmd_resv_acquire:
+		req->execute = nvmet_execute_pr_acquire;
+		break;
+	case nvme_cmd_resv_release:
+		req->execute = nvmet_execute_pr_release;
+		break;
+	case nvme_cmd_resv_report:
+		req->execute = nvmet_execute_pr_report;
+		break;
+	default:
+		return 1;
+	}
+	return NVME_SC_SUCCESS;
+}
+
+static bool nvmet_is_req_write_cmd_group(struct nvmet_req *req)
+{
+	u8 opcode = req->cmd->common.opcode;
+
+	if (req->sq->qid) {
+		switch (opcode) {
+		case nvme_cmd_flush:
+		case nvme_cmd_write:
+		case nvme_cmd_write_zeroes:
+		case nvme_cmd_dsm:
+		case nvme_cmd_zone_append:
+		case nvme_cmd_zone_mgmt_send:
+			return true;
+		default:
+			return false;
+		}
+	}
+	return false;
+}
+
+static bool nvmet_is_req_read_cmd_group(struct nvmet_req *req)
+{
+	u8 opcode = req->cmd->common.opcode;
+
+	if (req->sq->qid) {
+		switch (opcode) {
+		case nvme_cmd_read:
+		case nvme_cmd_zone_mgmt_recv:
+			return true;
+		default:
+			return false;
+		}
+	}
+	return false;
+}
+
+u16 nvmet_pr_check_cmd_access(struct nvmet_req *req)
+{
+	struct nvmet_ctrl *ctrl = req->sq->ctrl;
+	struct nvmet_pr_registrant *holder;
+	struct nvmet_ns *ns = req->ns;
+	struct nvmet_pr *pr = &ns->pr;
+	u16 status = NVME_SC_SUCCESS;
+
+	rcu_read_lock();
+	holder = rcu_dereference(pr->holder);
+	if (!holder)
+		goto unlock;
+	if (uuid_equal(&ctrl->hostid, &holder->hostid))
+		goto unlock;
+
+	/*
+	 * The Reservation command group is checked in executing,
+	 * allow it here.
+	 */
+	switch (holder->rtype) {
+	case NVME_PR_WRITE_EXCLUSIVE:
+		if (nvmet_is_req_write_cmd_group(req))
+			status = NVME_SC_RESERVATION_CONFLICT | NVME_SC_DNR;
+		break;
+	case NVME_PR_EXCLUSIVE_ACCESS:
+		if (nvmet_is_req_read_cmd_group(req) ||
+		    nvmet_is_req_write_cmd_group(req))
+			status = NVME_SC_RESERVATION_CONFLICT | NVME_SC_DNR;
+		break;
+	case NVME_PR_WRITE_EXCLUSIVE_REG_ONLY:
+	case NVME_PR_WRITE_EXCLUSIVE_ALL_REGS:
+		if ((nvmet_is_req_write_cmd_group(req)) &&
+		    !nvmet_pr_find_registrant(pr, &ctrl->hostid))
+			status = NVME_SC_RESERVATION_CONFLICT | NVME_SC_DNR;
+		break;
+	case NVME_PR_EXCLUSIVE_ACCESS_REG_ONLY:
+	case NVME_PR_EXCLUSIVE_ACCESS_ALL_REGS:
+		if ((nvmet_is_req_read_cmd_group(req) ||
+		    nvmet_is_req_write_cmd_group(req)) &&
+		    !nvmet_pr_find_registrant(pr, &ctrl->hostid))
+			status = NVME_SC_RESERVATION_CONFLICT | NVME_SC_DNR;
+		break;
+	default:
+		pr_warn("the reservation type is set wrong, type:%d\n",
+			holder->rtype);
+		break;
+	}
+
+unlock:
+	rcu_read_unlock();
+	if (status)
+		req->error_loc = offsetof(struct nvme_common_command, opcode);
+	return status;
+}
+
+void nvmet_ctrl_destroy_pr(struct nvmet_ctrl *ctrl)
+{
+	struct nvmet_pr_registrant *reg;
+	struct nvmet_ns *ns;
+	unsigned long idx;
+
+	kfifo_free(&ctrl->pr_log_mgr.log_queue);
+	mutex_destroy(&ctrl->pr_log_mgr.lock);
+
+	if (nvmet_is_host_connected(&ctrl->hostid, ctrl->subsys))
+		return;
+
+	xa_for_each(&ctrl->subsys->namespaces, idx, ns) {
+		mutex_lock(&ns->pr.pr_lock);
+		list_for_each_entry_rcu(reg, &ns->pr.registrant_list, entry,
+					lockdep_is_held(&ns->pr.pr_lock)) {
+			if (uuid_equal(&reg->hostid, &ctrl->hostid)) {
+				nvmet_pr_unregister_one(&ns->pr, reg);
+				break;
+			}
+		}
+		mutex_unlock(&ns->pr.pr_lock);
+	}
+}
+
+void nvmet_pr_exit_ns(struct nvmet_ns *ns)
+{
+	struct nvmet_pr_registrant *reg, *tmp;
+	struct nvmet_pr *pr = &ns->pr;
+	LIST_HEAD(free_list);
+
+	mutex_lock(&pr->pr_lock);
+	rcu_assign_pointer(pr->holder, NULL);
+	list_for_each_entry_safe(reg, tmp, &pr->registrant_list, entry) {
+		list_del_rcu(&reg->entry);
+		list_add(&reg->entry, &free_list);
+	}
+	mutex_unlock(&pr->pr_lock);
+	synchronize_rcu();
+	list_for_each_entry_safe(reg, tmp, &free_list, entry) {
+		kfree(reg);
+	}
+
+	mutex_destroy(&pr->pr_lock);
+}
+
+void nvmet_pr_init_ns(struct nvmet_ns *ns)
+{
+	ns->pr.holder = NULL;
+	atomic_set(&ns->pr.generation, 0);
+	mutex_init(&ns->pr.pr_lock);
+	INIT_LIST_HEAD(&ns->pr.registrant_list);
+	ns->pr.notify_mask = 0;
+}
diff --git a/include/linux/nvme.h b/include/linux/nvme.h
index 462c21e0e417..f683d1b4ffad 100644
--- a/include/linux/nvme.h
+++ b/include/linux/nvme.h
@@ -745,6 +745,32 @@ enum {
 	NVME_AEN_CFG_DISC_CHANGE	= 1 << NVME_AEN_BIT_DISC_CHANGE,
 };
 
+enum {
+	NVME_AEN_RESV_LOG_PAGE_AVALIABLE	= 0x00,
+};
+
+enum {
+	NVME_PR_LOG_EMPTY_LOG_PAGE			= 0x00,
+	NVME_PR_LOG_REGISTRATION_PREEMPTED		= 0x01,
+	NVME_PR_LOG_RESERVATION_RELEASED		= 0x02,
+	NVME_PR_LOG_RESERVATOPM_PREEMPTED		= 0x03,
+};
+
+enum {
+	NVME_PR_NOTIFY_BIT_REG_PREEMPTED		= 1,
+	NVME_PR_NOTIFY_BIT_RESV_RELEASED		= 2,
+	NVME_PR_NOTIFY_BIT_RESV_PREEMPTED		= 3,
+};
+
+struct nvme_pr_log {
+	__le64			count;
+	__u8			type;
+	__u8			nr_pages;
+	__u8			rsvd1[2];
+	__le32			nsid;
+	__u8			rsvd2[48];
+};
+
 struct nvme_lba_range_type {
 	__u8			type;
 	__u8			attributes;
@@ -765,6 +791,17 @@ enum {
 	NVME_LBART_ATTRIB_HIDE	= 1 << 1,
 };
 
+enum nvme_pr_capabilities {
+	NVME_PR_SUPPORT_PTPL				= 1,
+	NVME_PR_SUPPORT_WRITE_EXCLUSIVE			= 1 << 1,
+	NVME_PR_SUPPORT_EXCLUSIVE_ACCESS		= 1 << 2,
+	NVME_PR_SUPPORT_WRITE_EXCLUSIVE_REG_ONLY	= 1 << 3,
+	NVME_PR_SUPPORT_EXCLUSIVE_ACCESS_REG_ONLY	= 1 << 4,
+	NVME_PR_SUPPORT_WRITE_EXCLUSIVE_ALL_REGS	= 1 << 5,
+	NVME_PR_SUPPORT_EXCLUSIVE_ACCESS_ALL_REGS	= 1 << 6,
+	NVME_PR_SUPPORT_IEKEY_VER_1_3_DEF		= 1 << 7,
+};
+
 enum nvme_pr_type {
 	NVME_PR_WRITE_EXCLUSIVE			= 1,
 	NVME_PR_EXCLUSIVE_ACCESS		= 2,
@@ -774,6 +811,23 @@ enum nvme_pr_type {
 	NVME_PR_EXCLUSIVE_ACCESS_ALL_REGS	= 6,
 };
 
+enum nvme_pr_register_action {
+	NVME_PR_REGISTER_ACT_REG		= 0,
+	NVME_PR_REGISTER_ACT_UNREG		= 1,
+	NVME_PR_REGISTER_ACT_REPLACE		= 1 << 1,
+};
+
+enum nvme_pr_acquire_action {
+	NVME_PR_ACQUIRE_ACT_ACQUIRE		= 0,
+	NVME_PR_ACQUIRE_ACT_PREEMPT		= 1,
+	NVME_PR_ACQUIRE_ACT_PREEMPT_AND_ABORT	= 1 << 1,
+};
+
+enum nvme_pr_release_action {
+	NVME_PR_RELEASE_ACT_RELEASE		= 0,
+	NVME_PR_RELEASE_ACT_CLEAR		= 1,
+};
+
 enum nvme_eds {
 	NVME_EXTENDED_DATA_STRUCT	= 0x1,
 };
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v8 0/1] Implement the NVMe reservation feature
  2024-02-29  3:12 [PATCH v8 0/1] Implement the NVMe reservation feature Guixin Liu
  2024-02-29  3:12 ` [PATCH v8 1/1] nvmet: support " Guixin Liu
@ 2024-03-05  2:32 ` Guixin Liu
  2024-03-07  8:18   ` Sagi Grimberg
  1 sibling, 1 reply; 12+ messages in thread
From: Guixin Liu @ 2024-03-05  2:32 UTC (permalink / raw)
  To: hch, sagi, kch; +Cc: linux-nvme

Hi Sagi,

     Friendly ping. If you have no further concerns with this patch, I 
would be very grateful

if you could provide a "Reviewed-by" tag. Such an endorsement would be 
immensely encouraging

for my work in this area.

     Thank you very much for reviewing this patch and for the valuable 
suggestions you have provided.

Best regards,

Guixin Liu

在 2024/2/29 11:12, Guixin Liu 写道:
> Hi guys,
>      I've implemented the NVMe reservation feature. Please review it, all
> comments are welcome as usual.
>
>
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v8 0/1] Implement the NVMe reservation feature
  2024-03-05  2:32 ` [PATCH v8 0/1] Implement the NVMe " Guixin Liu
@ 2024-03-07  8:18   ` Sagi Grimberg
  2024-03-12  0:50     ` Chaitanya Kulkarni
  0 siblings, 1 reply; 12+ messages in thread
From: Sagi Grimberg @ 2024-03-07  8:18 UTC (permalink / raw)
  To: Guixin Liu, hch, kch; +Cc: linux-nvme



On 05/03/2024 4:32, Guixin Liu wrote:
> Hi Sagi,
>
>     Friendly ping. If you have no further concerns with this patch, I 
> would be very grateful
>
> if you could provide a "Reviewed-by" tag. Such an endorsement would be 
> immensely encouraging
>
> for my work in this area.
>
>     Thank you very much for reviewing this patch and for the valuable 
> suggestions you have provided.

Sure, I do think we need at least one more reviewer.

Plus, I don't think a target-side reservations maintainers entry is 
justified here, it is a part of nvmet.
So please drop it from your next submission.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v8 0/1] Implement the NVMe reservation feature
  2024-03-07  8:18   ` Sagi Grimberg
@ 2024-03-12  0:50     ` Chaitanya Kulkarni
  2024-03-12  3:39       ` Guixin Liu
  0 siblings, 1 reply; 12+ messages in thread
From: Chaitanya Kulkarni @ 2024-03-12  0:50 UTC (permalink / raw)
  To: Guixin Liu
  Cc: linux-nvme@lists.infradead.org, hch@lst.de, Chaitanya Kulkarni,
	Sagi Grimberg

On 3/7/24 00:18, Sagi Grimberg wrote:
>
>
> On 05/03/2024 4:32, Guixin Liu wrote:
>> Hi Sagi,
>>
>>     Friendly ping. If you have no further concerns with this patch, I 
>> would be very grateful
>>
>> if you could provide a "Reviewed-by" tag. Such an endorsement would 
>> be immensely encouraging
>>
>> for my work in this area.
>>
>>     Thank you very much for reviewing this patch and for the valuable 
>> suggestions you have provided.
>
> Sure, I do think we need at least one more reviewer.
>
> Plus, I don't think a target-side reservations maintainers entry is 
> justified here, it is a part of nvmet.
> So please drop it from your next submission.

Can you please send a next version with Sagi's comments ? also can you
please send blktests if you have any so I can run some tests once I finish
the review on next version ?

-ck



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v8 0/1] Implement the NVMe reservation feature
  2024-03-12  0:50     ` Chaitanya Kulkarni
@ 2024-03-12  3:39       ` Guixin Liu
  2024-07-05 16:45         ` hch
  0 siblings, 1 reply; 12+ messages in thread
From: Guixin Liu @ 2024-03-12  3:39 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: linux-nvme@lists.infradead.org, hch@lst.de, Sagi Grimberg


>> Sure, I do think we need at least one more reviewer.
>>
>> Plus, I don't think a target-side reservations maintainers entry is
>> justified here, it is a part of nvmet.
>> So please drop it from your next submission.
> Can you please send a next version with Sagi's comments ? also can you
> please send blktests if you have any so I can run some tests once I finish
> the review on next version ?
>
> -ck

Sure，but there is still a discussion going on in v7, I may need to change

the code, I will send a v9 after the discussion and with the blktests patch.

Best regards,

Guixin Liu

>
>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v8 0/1] Implement the NVMe reservation feature
  2024-03-12  3:39       ` Guixin Liu
@ 2024-07-05 16:45         ` hch
  2024-09-18  8:30           ` Guixin Liu
  0 siblings, 1 reply; 12+ messages in thread
From: hch @ 2024-07-05 16:45 UTC (permalink / raw)
  To: Guixin Liu
  Cc: Chaitanya Kulkarni, linux-nvme@lists.infradead.org, hch@lst.de,
	Sagi Grimberg

Hi Liu,

did you manage to get back to this?  I just played around with the
feature to test my RFC 9561 implementatiomn and it would be great to
get it merged.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v8 1/1] nvmet: support reservation feature
  2024-02-29  3:12 ` [PATCH v8 1/1] nvmet: support " Guixin Liu
@ 2024-09-17  8:43   ` Dmitry Bogdanov
  2024-09-18  8:26     ` Guixin Liu
  0 siblings, 1 reply; 12+ messages in thread
From: Dmitry Bogdanov @ 2024-09-17  8:43 UTC (permalink / raw)
  To: Guixin Liu; +Cc: hch, sagi, kch, linux-nvme

Hi,

Waiting for the final solution half an year we taken this patch as is.
Here is my comments on it. Hope, this will bring the life to the patch.

On Thu, Feb 29, 2024 at 11:12:41AM +0800, Guixin Liu wrote:
> 
> This patch implements the reservation feature, includes:
> 1. reservation register(register, unregister and replace).
> 2. reservation acquire(acquire, preempt, preempt and abort).
> 3. reservation release(release and clear).
> 4. reservation report.
> 
> And also make reservation configurable, one can set ns to support
> reservation before enable ns. The default of resv_enable is false.
> 
> Signed-off-by: Guixin Liu <kanie@linux.alibaba.com>
> ---
> @@ -550,7 +556,13 @@ static void nvmet_execute_identify_ns(struct nvmet_req *req)
>          */
>         id->nmic = NVME_NS_NMIC_SHARED;
>         id->anagrpid = cpu_to_le32(req->ns->anagrpid);
> -
> +       id->rescap = NVME_PR_SUPPORT_WRITE_EXCLUSIVE |
> +                    NVME_PR_SUPPORT_EXCLUSIVE_ACCESS |
> +                    NVME_PR_SUPPORT_WRITE_EXCLUSIVE_REG_ONLY |
> +                    NVME_PR_SUPPORT_EXCLUSIVE_ACCESS_REG_ONLY |
> +                    NVME_PR_SUPPORT_WRITE_EXCLUSIVE_ALL_REGS |
> +                    NVME_PR_SUPPORT_EXCLUSIVE_ACCESS_ALL_REGS |
> +                    NVME_PR_SUPPORT_IEKEY_VER_1_3_DEF;

You have a flag enabling PR on a namespace, so report rescap only if it
is enabled.

Also I didnot find ONCS, is it not necessarily?

> +void nvmet_execute_get_log_page_resv(struct nvmet_req *req)
> +{
> +       struct nvmet_pr_log_mgr *log_mgr = &req->sq->ctrl->pr_log_mgr;
> +       struct nvme_pr_log next_log = {0};
> +       struct nvme_pr_log log = {0};
> +       u16 status = NVME_SC_SUCCESS;
> +       u64 lost_count;
> +       u64 cur_count;
> +       u64 next_count;
> +
> +       mutex_lock(&log_mgr->lock);
> +       if (!kfifo_get(&log_mgr->log_queue, &log))
> +               goto out;
> +
> +       /*
> +        * We can't get the last in kfifo.
> +        * Utilize the current count and the count from the next log to
> +        * calculate the number of lost logs, while also addressing cases
> +        * of overflow. If there is no subsequent log, the number of lost
> +        * logs is equal to the lost_count within the nvmet_pr_log_mgr.
> +        */
> +       cur_count = le64_to_cpu(log.count);
> +       if (kfifo_peek(&log_mgr->log_queue, &next_log)) {
> +               next_count = le64_to_cpu(next_log.count);
> +               if (next_count > cur_count)
> +                       lost_count = next_count - cur_count - 1;
> +               else
> +                       lost_count = U64_MAX - cur_count + next_count - 1;
> +       } else {
> +               lost_count = log_mgr->lost_count;
> +       }
> +
> +       log.count = cpu_to_le64(cur_count + lost_count);

This value should be wrapped by U64_MAX too but in reverse direction
(count is next_count-1). If == 0 then = -1.

> +static void nvmet_pr_add_resv_log(struct nvmet_ctrl *ctrl, u8 log_type, u32 nsid)
> +{
> +       struct nvmet_pr_log_mgr *log_mgr = &ctrl->pr_log_mgr;
> +       struct nvme_pr_log log = {0};
> +
> +       mutex_lock(&log_mgr->lock);
> +       log_mgr->counter++;
> +       if (log_mgr->counter == 0)
> +               log_mgr->counter = 1;
> +
> +       log.count = cpu_to_le64(log_mgr->counter);
> +       log.type = log_type;
> +       log.nsid = cpu_to_le32(nsid);
> +
> +       if (!kfifo_put(&log_mgr->log_queue, log)) {
> +               pr_warn("a reservation log lost, cntlid:%d, log_type:%d, nsid:%d\n",
> +                       ctrl->cntlid, log_type, nsid);

I think this log should be Info level, because it is not an error
situation. Size of this queue is fixed and it is obvious that it can be
overflowed.

> +static u16 nvmet_pr_create_new_reservation(struct nvmet_pr *pr, u8 new_rtype,
> +                                          struct nvmet_pr_registrant *reg)
> +{
> +       u16 status;
> +
> +       status = nvmet_pr_validate_rtype(new_rtype);

You shall check the validity of the command in the very beginning of
command handling, not at the very end.

> +       if (!status) {
> +               reg->rtype = new_rtype;
> +               rcu_assign_pointer(pr->holder, reg);
> +       }
> +       return status;
> +}
> +

...

> +static u16 nvmet_pr_unreg_by_prkey(struct nvmet_pr *pr, u64 prkey,
> +                                  uuid_t *send_hostid)
> +{
> +       u16 status = NVME_SC_RESERVATION_CONFLICT | NVME_SC_DNR;
> +       struct nvmet_pr_registrant *reg, *tmp;
> +       uuid_t hostid;
> +
> +       lockdep_assert_held(&pr->pr_lock);
> +
> +       list_for_each_entry_safe(reg, tmp, &pr->registrant_list, entry) {
> +               if (reg->rkey == prkey) {
> +                       status = NVME_SC_SUCCESS;
> +                       uuid_copy(&hostid, &reg->hostid);
> +                       nvmet_pr_unregister_one(pr, reg);
> +                       if (!uuid_equal(&reg->hostid, send_hostid))

reg is free already, use just hostid as line below:

> +                               nvmet_pr_registration_preempted(pr, &hostid);
> +               }
> +       }
> +       return status;
> +}
> +

...

> +static u16 nvmet_pr_preempt(struct nvmet_req *req,
> +                           struct nvmet_pr_registrant *reg,
> +                           u8 rtype,
> +                           struct nvmet_pr_acquire_data *d)
> +{
> +       struct nvmet_ctrl *ctrl = req->sq->ctrl;
> +       struct nvmet_pr *pr = &req->ns->pr;
> +       struct nvmet_pr_registrant *holder;
> +       enum nvme_pr_type original_rtype;
> +       u16 status;
> +
> +       lockdep_assert_held(&pr->pr_lock);
> +       holder = rcu_dereference_protected(pr->holder,
> +                       lockdep_is_held(&pr->pr_lock));
> +       if (!holder)
> +               return nvmet_pr_unreg_by_prkey(pr, le64_to_cpu(d->prkey),
> +                               &ctrl->hostid);
> +
> +       original_rtype = holder->rtype;
> +       if (original_rtype == NVME_PR_WRITE_EXCLUSIVE_ALL_REGS ||
> +           original_rtype == NVME_PR_EXCLUSIVE_ACCESS_ALL_REGS) {
> +               if (!le64_to_cpu(d->prkey)) {
> +                       nvmet_pr_unreg_except_hostid(pr, &ctrl->hostid);
> +                       return nvmet_pr_create_new_reservation(pr,
> +                                       rtype, reg);

There is not an atomic change of the reservation here. Probably if you swap
these lines with each other, it will fix that.

> +               }
> +               return nvmet_pr_unreg_by_prkey(pr, le64_to_cpu(d->prkey),
> +                               &ctrl->hostid);
> +       }
> +
> +       if (holder == reg) {
> +               status = nvmet_pr_update_reg_attr(pr, holder,
> +                               nvmet_pr_update_holder_rtype, &rtype);
> +               if (!status && original_rtype != rtype)
> +                       nvmet_pr_resv_released(pr, &reg->hostid);
> +               return status;
> +       }
> +
> +       if (le64_to_cpu(d->prkey) == holder->rkey) {
> +               status = nvmet_pr_unreg_by_prkey_except_hostid(pr,
> +                               le64_to_cpu(d->prkey), &ctrl->hostid);
> +               if (status)
> +                       return status;
> +               status = nvmet_pr_create_new_reservation(pr, rtype, reg);

Here that time gap too.

> +               if (!status && original_rtype != rtype)
> +                       nvmet_pr_resv_released(pr, &reg->hostid);
> +               return status;
> +       }
> +
> +       if (le64_to_cpu(d->prkey))
> +               return nvmet_pr_unreg_by_prkey(pr, le64_to_cpu(d->prkey),
> +                               &ctrl->hostid);
> +       return NVME_SC_INVALID_FIELD | NVME_SC_DNR;
> +}
> +

...

> +void nvmet_ctrl_destroy_pr(struct nvmet_ctrl *ctrl)
> +{
> +       struct nvmet_pr_registrant *reg;
> +       struct nvmet_ns *ns;
> +       unsigned long idx;
> +
> +       kfifo_free(&ctrl->pr_log_mgr.log_queue);
> +       mutex_destroy(&ctrl->pr_log_mgr.lock);
> +

From here and until the end of the function it is against the standard.

> +       if (nvmet_is_host_connected(&ctrl->hostid, ctrl->subsys))
> +               return;
> +
> +       xa_for_each(&ctrl->subsys->namespaces, idx, ns) {
> +               mutex_lock(&ns->pr.pr_lock);
> +               list_for_each_entry_rcu(reg, &ns->pr.registrant_list, entry,
> +                                       lockdep_is_held(&ns->pr.pr_lock)) {
> +                       if (uuid_equal(&reg->hostid, &ctrl->hostid)) {
> +                               nvmet_pr_unregister_one(&ns->pr, reg);
> +                               break;
> +                       }
> +               }
> +               mutex_unlock(&ns->pr.pr_lock);
> +       }

You shall not modify Reservation due to host connectivity.

> +}
> +
> +void nvmet_pr_exit_ns(struct nvmet_ns *ns)
> +{
> +       struct nvmet_pr_registrant *reg, *tmp;
> +       struct nvmet_pr *pr = &ns->pr;
> +       LIST_HEAD(free_list);
> +
> +       mutex_lock(&pr->pr_lock);
> +       rcu_assign_pointer(pr->holder, NULL);
> +       list_for_each_entry_safe(reg, tmp, &pr->registrant_list, entry) {
> +               list_del_rcu(&reg->entry);
> +               list_add(&reg->entry, &free_list);
> +       }
> +       mutex_unlock(&pr->pr_lock);
> +       synchronize_rcu();

IMHO, there is no need in two lists and synchronize_rcu here. At this
moment there shall be no other users of ns.

> +       list_for_each_entry_safe(reg, tmp, &free_list, entry) {
> +               kfree(reg);
> +       }
> +
> +       mutex_destroy(&pr->pr_lock);
> +}


BR,
 Dmitry
 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v8 1/1] nvmet: support reservation feature
  2024-09-17  8:43   ` Dmitry Bogdanov
@ 2024-09-18  8:26     ` Guixin Liu
  2024-09-18 13:41       ` Dmitry Bogdanov
  0 siblings, 1 reply; 12+ messages in thread
From: Guixin Liu @ 2024-09-18  8:26 UTC (permalink / raw)
  To: Dmitry Bogdanov; +Cc: hch, sagi, kch, linux-nvme

Hi Dmitry,

     My thanks for your advice, I will work on this back recently.


在 2024/9/17 16:43, Dmitry Bogdanov 写道:
> Hi,
>
> Waiting for the final solution half an year we taken this patch as is.
> Here is my comments on it. Hope, this will bring the life to the patch.
>
> On Thu, Feb 29, 2024 at 11:12:41AM +0800, Guixin Liu wrote:
>> This patch implements the reservation feature, includes:
>> 1. reservation register(register, unregister and replace).
>> 2. reservation acquire(acquire, preempt, preempt and abort).
>> 3. reservation release(release and clear).
>> 4. reservation report.
>>
>> And also make reservation configurable, one can set ns to support
>> reservation before enable ns. The default of resv_enable is false.
>>
>> Signed-off-by: Guixin Liu <kanie@linux.alibaba.com>
>> ---
>> @@ -550,7 +556,13 @@ static void nvmet_execute_identify_ns(struct nvmet_req *req)
>>           */
>>          id->nmic = NVME_NS_NMIC_SHARED;
>>          id->anagrpid = cpu_to_le32(req->ns->anagrpid);
>> -
>> +       id->rescap = NVME_PR_SUPPORT_WRITE_EXCLUSIVE |
>> +                    NVME_PR_SUPPORT_EXCLUSIVE_ACCESS |
>> +                    NVME_PR_SUPPORT_WRITE_EXCLUSIVE_REG_ONLY |
>> +                    NVME_PR_SUPPORT_EXCLUSIVE_ACCESS_REG_ONLY |
>> +                    NVME_PR_SUPPORT_WRITE_EXCLUSIVE_ALL_REGS |
>> +                    NVME_PR_SUPPORT_EXCLUSIVE_ACCESS_ALL_REGS |
>> +                    NVME_PR_SUPPORT_IEKEY_VER_1_3_DEF;
> You have a flag enabling PR on a namespace, so report rescap only if it
> is enabled.
Sure, I will add a "if" statement.
>
> Also I didnot find ONCS, is it not necessarily?
Oh, I missed this, thanks, I will change oncs.
>> +void nvmet_execute_get_log_page_resv(struct nvmet_req *req)
>> +{
>> +       struct nvmet_pr_log_mgr *log_mgr = &req->sq->ctrl->pr_log_mgr;
>> +       struct nvme_pr_log next_log = {0};
>> +       struct nvme_pr_log log = {0};
>> +       u16 status = NVME_SC_SUCCESS;
>> +       u64 lost_count;
>> +       u64 cur_count;
>> +       u64 next_count;
>> +
>> +       mutex_lock(&log_mgr->lock);
>> +       if (!kfifo_get(&log_mgr->log_queue, &log))
>> +               goto out;
>> +
>> +       /*
>> +        * We can't get the last in kfifo.
>> +        * Utilize the current count and the count from the next log to
>> +        * calculate the number of lost logs, while also addressing cases
>> +        * of overflow. If there is no subsequent log, the number of lost
>> +        * logs is equal to the lost_count within the nvmet_pr_log_mgr.
>> +        */
>> +       cur_count = le64_to_cpu(log.count);
>> +       if (kfifo_peek(&log_mgr->log_queue, &next_log)) {
>> +               next_count = le64_to_cpu(next_log.count);
>> +               if (next_count > cur_count)
>> +                       lost_count = next_count - cur_count - 1;
>> +               else
>> +                       lost_count = U64_MAX - cur_count + next_count - 1;
>> +       } else {
>> +               lost_count = log_mgr->lost_count;
>> +       }
>> +
>> +       log.count = cpu_to_le64(cur_count + lost_count);
> This value should be wrapped by U64_MAX too but in reverse direction
> (count is next_count-1). If == 0 then = -1.
>
The next_count will never be 0, you can see in nvmet_pr_add_resv_log, when

the log_mgr->counter == 0, I set the log_mgr->counter to 1.

>> +static void nvmet_pr_add_resv_log(struct nvmet_ctrl *ctrl, u8 log_type, u32 nsid)
>> +{
>> +       struct nvmet_pr_log_mgr *log_mgr = &ctrl->pr_log_mgr;
>> +       struct nvme_pr_log log = {0};
>> +
>> +       mutex_lock(&log_mgr->lock);
>> +       log_mgr->counter++;
>> +       if (log_mgr->counter == 0)
>> +               log_mgr->counter = 1;
>> +
>> +       log.count = cpu_to_le64(log_mgr->counter);
>> +       log.type = log_type;
>> +       log.nsid = cpu_to_le32(nsid);
>> +
>> +       if (!kfifo_put(&log_mgr->log_queue, log)) {
>> +               pr_warn("a reservation log lost, cntlid:%d, log_type:%d, nsid:%d\n",
>> +                       ctrl->cntlid, log_type, nsid);
> I think this log should be Info level, because it is not an error
> situation. Size of this queue is fixed and it is obvious that it can be
> overflowed.
Agree.
>> +static u16 nvmet_pr_create_new_reservation(struct nvmet_pr *pr, u8 new_rtype,
>> +                                          struct nvmet_pr_registrant *reg)
>> +{
>> +       u16 status;
>> +
>> +       status = nvmet_pr_validate_rtype(new_rtype);
> You shall check the validity of the command in the very beginning of
> command handling, not at the very end.

In some situations, we dont care the value of rtype, for example

using preempt to unregister other host.

So I only check it if needed.

>> +       if (!status) {
>> +               reg->rtype = new_rtype;
>> +               rcu_assign_pointer(pr->holder, reg);
>> +       }
>> +       return status;
>> +}
>> +
> ...
>
>> +static u16 nvmet_pr_unreg_by_prkey(struct nvmet_pr *pr, u64 prkey,
>> +                                  uuid_t *send_hostid)
>> +{
>> +       u16 status = NVME_SC_RESERVATION_CONFLICT | NVME_SC_DNR;
>> +       struct nvmet_pr_registrant *reg, *tmp;
>> +       uuid_t hostid;
>> +
>> +       lockdep_assert_held(&pr->pr_lock);
>> +
>> +       list_for_each_entry_safe(reg, tmp, &pr->registrant_list, entry) {
>> +               if (reg->rkey == prkey) {
>> +                       status = NVME_SC_SUCCESS;
>> +                       uuid_copy(&hostid, &reg->hostid);
>> +                       nvmet_pr_unregister_one(pr, reg);
>> +                       if (!uuid_equal(&reg->hostid, send_hostid))
> reg is free already, use just hostid as line below:
Yeah sure, thanks a lot.
>> +                               nvmet_pr_registration_preempted(pr, &hostid);
>> +               }
>> +       }
>> +       return status;
>> +}
>> +
> ...
>
>> +static u16 nvmet_pr_preempt(struct nvmet_req *req,
>> +                           struct nvmet_pr_registrant *reg,
>> +                           u8 rtype,
>> +                           struct nvmet_pr_acquire_data *d)
>> +{
>> +       struct nvmet_ctrl *ctrl = req->sq->ctrl;
>> +       struct nvmet_pr *pr = &req->ns->pr;
>> +       struct nvmet_pr_registrant *holder;
>> +       enum nvme_pr_type original_rtype;
>> +       u16 status;
>> +
>> +       lockdep_assert_held(&pr->pr_lock);
>> +       holder = rcu_dereference_protected(pr->holder,
>> +                       lockdep_is_held(&pr->pr_lock));
>> +       if (!holder)
>> +               return nvmet_pr_unreg_by_prkey(pr, le64_to_cpu(d->prkey),
>> +                               &ctrl->hostid);
>> +
>> +       original_rtype = holder->rtype;
>> +       if (original_rtype == NVME_PR_WRITE_EXCLUSIVE_ALL_REGS ||
>> +           original_rtype == NVME_PR_EXCLUSIVE_ACCESS_ALL_REGS) {
>> +               if (!le64_to_cpu(d->prkey)) {
>> +                       nvmet_pr_unreg_except_hostid(pr, &ctrl->hostid);
>> +                       return nvmet_pr_create_new_reservation(pr,
>> +                                       rtype, reg);
> There is not an atomic change of the reservation here. Probably if you swap
> these lines with each other, it will fix that.
>
You are right, we need create new reservation first.
>> +               }
>> +               return nvmet_pr_unreg_by_prkey(pr, le64_to_cpu(d->prkey),
>> +                               &ctrl->hostid);
>> +       }
>> +
>> +       if (holder == reg) {
>> +               status = nvmet_pr_update_reg_attr(pr, holder,
>> +                               nvmet_pr_update_holder_rtype, &rtype);
>> +               if (!status && original_rtype != rtype)
>> +                       nvmet_pr_resv_released(pr, &reg->hostid);
>> +               return status;
>> +       }
>> +
>> +       if (le64_to_cpu(d->prkey) == holder->rkey) {
>> +               status = nvmet_pr_unreg_by_prkey_except_hostid(pr,
>> +                               le64_to_cpu(d->prkey), &ctrl->hostid);
>> +               if (status)
>> +                       return status;
>> +               status = nvmet_pr_create_new_reservation(pr, rtype, reg);
> Here that time gap too.
Changed too.
>
>> +               if (!status && original_rtype != rtype)
>> +                       nvmet_pr_resv_released(pr, &reg->hostid);
>> +               return status;
>> +       }
>> +
>> +       if (le64_to_cpu(d->prkey))
>> +               return nvmet_pr_unreg_by_prkey(pr, le64_to_cpu(d->prkey),
>> +                               &ctrl->hostid);
>> +       return NVME_SC_INVALID_FIELD | NVME_SC_DNR;
>> +}
>> +
> ...
>
>> +void nvmet_ctrl_destroy_pr(struct nvmet_ctrl *ctrl)
>> +{
>> +       struct nvmet_pr_registrant *reg;
>> +       struct nvmet_ns *ns;
>> +       unsigned long idx;
>> +
>> +       kfifo_free(&ctrl->pr_log_mgr.log_queue);
>> +       mutex_destroy(&ctrl->pr_log_mgr.lock);
>> +
>  From here and until the end of the function it is against the standard.
>
>> +       if (nvmet_is_host_connected(&ctrl->hostid, ctrl->subsys))
>> +               return;
>> +
>> +       xa_for_each(&ctrl->subsys->namespaces, idx, ns) {
>> +               mutex_lock(&ns->pr.pr_lock);
>> +               list_for_each_entry_rcu(reg, &ns->pr.registrant_list, entry,
>> +                                       lockdep_is_held(&ns->pr.pr_lock)) {
>> +                       if (uuid_equal(&reg->hostid, &ctrl->hostid)) {
>> +                               nvmet_pr_unregister_one(&ns->pr, reg);
>> +                               break;
>> +                       }
>> +               }
>> +               mutex_unlock(&ns->pr.pr_lock);
>> +       }
> You shall not modify Reservation due to host connectivity.

You are right, we should not unregister the host when disconnect, we

should keep it.

>
>> +}
>> +
>> +void nvmet_pr_exit_ns(struct nvmet_ns *ns)
>> +{
>> +       struct nvmet_pr_registrant *reg, *tmp;
>> +       struct nvmet_pr *pr = &ns->pr;
>> +       LIST_HEAD(free_list);
>> +
>> +       mutex_lock(&pr->pr_lock);
>> +       rcu_assign_pointer(pr->holder, NULL);
>> +       list_for_each_entry_safe(reg, tmp, &pr->registrant_list, entry) {
>> +               list_del_rcu(&reg->entry);
>> +               list_add(&reg->entry, &free_list);
>> +       }
>> +       mutex_unlock(&pr->pr_lock);
>> +       synchronize_rcu();
> IMHO, there is no need in two lists and synchronize_rcu here. At this
> moment there shall be no other users of ns.
Yeah, we can just free all regs simply here.
>
>> +       list_for_each_entry_safe(reg, tmp, &free_list, entry) {
>> +               kfree(reg);
>> +       }
>> +
>> +       mutex_destroy(&pr->pr_lock);
>> +}
>
> BR,
>   Dmitry
>   

Best Regards,

Guixin Liu



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v8 0/1] Implement the NVMe reservation feature
  2024-07-05 16:45         ` hch
@ 2024-09-18  8:30           ` Guixin Liu
  0 siblings, 0 replies; 12+ messages in thread
From: Guixin Liu @ 2024-09-18  8:30 UTC (permalink / raw)
  To: hch@lst.de
  Cc: Chaitanya Kulkarni, linux-nvme@lists.infradead.org, Sagi Grimberg

Hi,

     My apologize for not being able to continue my modifications 
recently, I have been assigned to other work over past half year, I will 
resume my modifications soon to get this patch merged.

Best Regards,

Guixin Liu

在 2024/7/6 00:45, hch@lst.de 写道:
> Hi Liu,
>
> did you manage to get back to this?  I just played around with the
> feature to test my RFC 9561 implementatiomn and it would be great to
> get it merged.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v8 1/1] nvmet: support reservation feature
  2024-09-18  8:26     ` Guixin Liu
@ 2024-09-18 13:41       ` Dmitry Bogdanov
  2024-09-19 11:42         ` Guixin Liu
  0 siblings, 1 reply; 12+ messages in thread
From: Dmitry Bogdanov @ 2024-09-18 13:41 UTC (permalink / raw)
  To: Guixin Liu; +Cc: hch, sagi, kch, linux-nvme

On Wed, Sep 18, 2024 at 04:26:10PM +0800, Guixin Liu wrote:
> 
> Hi Dmitry,
> 
>     My thanks for your advice, I will work on this back recently.
> 
> 
> 在 2024/9/17 16:43, Dmitry Bogdanov 写道:
> > Hi,
> > 
> > Waiting for the final solution half an year we taken this patch as is.
> > Here is my comments on it. Hope, this will bring the life to the patch.
> > 
> > > +void nvmet_execute_get_log_page_resv(struct nvmet_req *req)
> > > +{
> > > +       struct nvmet_pr_log_mgr *log_mgr = &req->sq->ctrl->pr_log_mgr;
> > > +       struct nvme_pr_log next_log = {0};
> > > +       struct nvme_pr_log log = {0};
> > > +       u16 status = NVME_SC_SUCCESS;
> > > +       u64 lost_count;
> > > +       u64 cur_count;
> > > +       u64 next_count;
> > > +
> > > +       mutex_lock(&log_mgr->lock);
> > > +       if (!kfifo_get(&log_mgr->log_queue, &log))
> > > +               goto out;
> > > +
> > > +       /*
> > > +        * We can't get the last in kfifo.
> > > +        * Utilize the current count and the count from the next log to
> > > +        * calculate the number of lost logs, while also addressing cases
> > > +        * of overflow. If there is no subsequent log, the number of lost
> > > +        * logs is equal to the lost_count within the nvmet_pr_log_mgr.
> > > +        */
> > > +       cur_count = le64_to_cpu(log.count);
> > > +       if (kfifo_peek(&log_mgr->log_queue, &next_log)) {
> > > +               next_count = le64_to_cpu(next_log.count);
> > > +               if (next_count > cur_count)
> > > +                       lost_count = next_count - cur_count - 1;
> > > +               else
> > > +                       lost_count = U64_MAX - cur_count + next_count - 1;
> > > +       } else {
> > > +               lost_count = log_mgr->lost_count;
> > > +       }
> > > +
> > > +       log.count = cpu_to_le64(cur_count + lost_count);
> > This value should be wrapped by U64_MAX too but in reverse direction
> > (count is next_count-1). If == 0 then = -1.
> > 
> The next_count will never be 0, you can see in nvmet_pr_add_resv_log, when
> 
> the log_mgr->counter == 0, I set the log_mgr->counter to 1.
I mean that herer tou should add the same logic as in nvmet_pr_add_resv_log.
I am talking about this case:
next_count = 2
cur_count  = 0xffffffff
lost_count = U64_MAX - cur_count + next_count - 1 =  0xffffffff - 0xffffffff + 2 - 1 = 1
log.count = cpu_to_le64(cur_count + lost_count) = 0xffffffff + 1 = 0

0 in this case shall became 1 (not -1 as in my original comment)

if (log.count == 0)
	log.count = 1;

would fix that.

> > > +static u16 nvmet_pr_create_new_reservation(struct nvmet_pr *pr, u8 new_rtype,
> > > +                                          struct nvmet_pr_registrant *reg)
> > > +{
> > > +       u16 status;
> > > +
> > > +       status = nvmet_pr_validate_rtype(new_rtype);
> > You shall check the validity of the command in the very beginning of
> > command handling, not at the very end.
> 
> In some situations, we dont care the value of rtype, for example
> 
> using preempt to unregister other host.
> 
> So I only check it if needed.

I have two arguemnts for:
1. It is more convenient to check a validity of a command in the beginning -
there will be less changes to rollback. Now you do some unregistrations
and then may fail the command due to invalid rtype, but Reservation data
you does not change back.

2. Yes, the spec does not state that RTYPE shall be valid in Reservation Acquire Action (RACQA) field to 001b (Preempt) 
when reservation is not going to be changed. But it doesnot state that it
might be ignored too :). In some places I see that such a statement there is.
Also, there is such general statement for reserved values
	1.4.1.6 reserved
	Receipt of reserved coded values in defined fields in
	commands shall be reported as an error.

> > > +       if (!status) {
> > > +               reg->rtype = new_rtype;
> > > +               rcu_assign_pointer(pr->holder, reg);
> > > +       }
> > > +       return status;
> > > +}
> > > +

BR,
 Dmitry


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v8 1/1] nvmet: support reservation feature
  2024-09-18 13:41       ` Dmitry Bogdanov
@ 2024-09-19 11:42         ` Guixin Liu
  0 siblings, 0 replies; 12+ messages in thread
From: Guixin Liu @ 2024-09-19 11:42 UTC (permalink / raw)
  To: Dmitry Bogdanov; +Cc: hch, sagi, kch, linux-nvme


在 2024/9/18 21:41, Dmitry Bogdanov 写道:
> On Wed, Sep 18, 2024 at 04:26:10PM +0800, Guixin Liu wrote:
>> Hi Dmitry,
>>
>>      My thanks for your advice, I will work on this back recently.
>>
>>
>> 在 2024/9/17 16:43, Dmitry Bogdanov 写道:
>>> Hi,
>>>
>>> Waiting for the final solution half an year we taken this patch as is.
>>> Here is my comments on it. Hope, this will bring the life to the patch.
>>>
>>>> +void nvmet_execute_get_log_page_resv(struct nvmet_req *req)
>>>> +{
>>>> +       struct nvmet_pr_log_mgr *log_mgr = &req->sq->ctrl->pr_log_mgr;
>>>> +       struct nvme_pr_log next_log = {0};
>>>> +       struct nvme_pr_log log = {0};
>>>> +       u16 status = NVME_SC_SUCCESS;
>>>> +       u64 lost_count;
>>>> +       u64 cur_count;
>>>> +       u64 next_count;
>>>> +
>>>> +       mutex_lock(&log_mgr->lock);
>>>> +       if (!kfifo_get(&log_mgr->log_queue, &log))
>>>> +               goto out;
>>>> +
>>>> +       /*
>>>> +        * We can't get the last in kfifo.
>>>> +        * Utilize the current count and the count from the next log to
>>>> +        * calculate the number of lost logs, while also addressing cases
>>>> +        * of overflow. If there is no subsequent log, the number of lost
>>>> +        * logs is equal to the lost_count within the nvmet_pr_log_mgr.
>>>> +        */
>>>> +       cur_count = le64_to_cpu(log.count);
>>>> +       if (kfifo_peek(&log_mgr->log_queue, &next_log)) {
>>>> +               next_count = le64_to_cpu(next_log.count);
>>>> +               if (next_count > cur_count)
>>>> +                       lost_count = next_count - cur_count - 1;
>>>> +               else
>>>> +                       lost_count = U64_MAX - cur_count + next_count - 1;
>>>> +       } else {
>>>> +               lost_count = log_mgr->lost_count;
>>>> +       }
>>>> +
>>>> +       log.count = cpu_to_le64(cur_count + lost_count);
>>> This value should be wrapped by U64_MAX too but in reverse direction
>>> (count is next_count-1). If == 0 then = -1.
>>>
>> The next_count will never be 0, you can see in nvmet_pr_add_resv_log, when
>>
>> the log_mgr->counter == 0, I set the log_mgr->counter to 1.
> I mean that herer tou should add the same logic as in nvmet_pr_add_resv_log.
> I am talking about this case:
> next_count = 2
> cur_count  = 0xffffffff
> lost_count = U64_MAX - cur_count + next_count - 1 =  0xffffffff - 0xffffffff + 2 - 1 = 1
> log.count = cpu_to_le64(cur_count + lost_count) = 0xffffffff + 1 = 0
>
> 0 in this case shall became 1 (not -1 as in my original comment)
>
> if (log.count == 0)
> 	log.count = 1;
>
> would fix that.
Your right, it will be changed in v9.
>>>> +static u16 nvmet_pr_create_new_reservation(struct nvmet_pr *pr, u8 new_rtype,
>>>> +                                          struct nvmet_pr_registrant *reg)
>>>> +{
>>>> +       u16 status;
>>>> +
>>>> +       status = nvmet_pr_validate_rtype(new_rtype);
>>> You shall check the validity of the command in the very beginning of
>>> command handling, not at the very end.
>> In some situations, we dont care the value of rtype, for example
>>
>> using preempt to unregister other host.
>>
>> So I only check it if needed.
> I have two arguemnts for:
> 1. It is more convenient to check a validity of a command in the beginning -
> there will be less changes to rollback. Now you do some unregistrations
> and then may fail the command due to invalid rtype, but Reservation data
> you does not change back.
>
> 2. Yes, the spec does not state that RTYPE shall be valid in Reservation Acquire Action (RACQA) field to 001b (Preempt)
> when reservation is not going to be changed. But it doesnot state that it
> might be ignored too :). In some places I see that such a statement there is.
> Also, there is such general statement for reserved values
> 	1.4.1.6 reserved
> 	Receipt of reserved coded values in defined fields in
> 	commands shall be reported as an error.
I changed rtype check to the beginning of acquire to see maintainer`s 
opinion.
>>>> +       if (!status) {
>>>> +               reg->rtype = new_rtype;
>>>> +               rcu_assign_pointer(pr->holder, reg);
>>>> +       }
>>>> +       return status;
>>>> +}
>>>> +
> BR,
>   Dmitry

Best Regards,

Guixin Liu



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2024-09-19 11:42 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-29  3:12 [PATCH v8 0/1] Implement the NVMe reservation feature Guixin Liu
2024-02-29  3:12 ` [PATCH v8 1/1] nvmet: support " Guixin Liu
2024-09-17  8:43   ` Dmitry Bogdanov
2024-09-18  8:26     ` Guixin Liu
2024-09-18 13:41       ` Dmitry Bogdanov
2024-09-19 11:42         ` Guixin Liu
2024-03-05  2:32 ` [PATCH v8 0/1] Implement the NVMe " Guixin Liu
2024-03-07  8:18   ` Sagi Grimberg
2024-03-12  0:50     ` Chaitanya Kulkarni
2024-03-12  3:39       ` Guixin Liu
2024-07-05 16:45         ` hch
2024-09-18  8:30           ` Guixin Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox