* Make NVME shutdown two-pass - Version 4
@ 2024-01-03 21:04 Jeremy Allison
2024-01-03 21:04 ` [PATCH 1/5] driver core: Support two-pass driver shutdown Jeremy Allison
` (6 more replies)
0 siblings, 7 replies; 20+ messages in thread
From: Jeremy Allison @ 2024-01-03 21:04 UTC (permalink / raw)
To: jallison, jra, tansuresh, hch, gregkh, rafael, bhelgaas, sagi; +Cc: linux-nvme
This is version 4 of a patchset originally written by
Tanjore Suresh <tansuresh@google.com> to make shutdown
of nvme devices two-pass.
Changes from version 3:
1). Removed duplicate setting of ctrl->ctrl_config in
nvme completion function. Noticed by Sagi Grimberg <sagi@grimberg.me>
2). Removed intermediate function nvme_wait_for_shutdown_cmpl(),
folded this code directly into nvme_shutdown_wait() by
exporting nvme_wait_ready() from drivers/nvme/host/core.c.
Requested by Sagi Grimberg <sagi@grimberg.me>
-------------------------------------------------------------
Currently the Linux nvme driver shutdown code steps
through each connected drive, sets the NVME_CC_SHN_NORMAL
(normal shutdown) flag and then polls the given drive
waiting for the response NVME_CSTS_SHST_CMPLT flag
(shutdown complete).
Each drive is taking around 13 seconds to respond to this.
The customer has 20+ drives on the box so this time adds
up on shutdown when the nvme driver is being shut down.
This patchset changes shutdown to proceed in parallel,
so the NVME_CC_SHN_NORMAL (normal shutdown) flag is
sent to all drives first, and then it polls waiting
for the NVME_CSTS_SHST_CMPLT flag (shutdown complete)
for all drives.
In the specific customer case it reduces the NVME
shutdown time from over 300 seconds to around 15
seconds.
-------------------------------------------------------------
Thanks for your consideration,
Jeremy Allison.
CIQ / Samba Team.
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 1/5] driver core: Support two-pass driver shutdown
2024-01-03 21:04 Make NVME shutdown two-pass - Version 4 Jeremy Allison
@ 2024-01-03 21:04 ` Jeremy Allison
2024-01-04 13:12 ` Sagi Grimberg
2024-01-03 21:04 ` [PATCH 2/5] PCI: Support two-pass shutdown Jeremy Allison
` (5 subsequent siblings)
6 siblings, 1 reply; 20+ messages in thread
From: Jeremy Allison @ 2024-01-03 21:04 UTC (permalink / raw)
To: jallison, jra, tansuresh, hch, gregkh, rafael, bhelgaas, sagi; +Cc: linux-nvme
From: Tanjore Suresh <tansuresh@google.com>
This changes the bus driver interface with an additional entry point
to enable devices to implement two-pass shutdown. The existing
synchronous interface to shutdown is called, and if a shutdown_wait
method is defined the device is moved to an alternate list.
Once the shutdown method is called for all devices, the
shutdown_wait method is then called synchronously for
all devices on the alternate list.
Signed-off-by: Tanjore Suresh <tansuresh@google.com>
Signed-off-by: Jeremy Allison <jallison@ciq.com>
---
drivers/base/core.c | 37 +++++++++++++++++++++++++++++++++++++
include/linux/device/bus.h | 6 +++++-
2 files changed, 42 insertions(+), 1 deletion(-)
diff --git a/drivers/base/core.c b/drivers/base/core.c
index 67ba592afc77..e1f4c54de3e1 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -4725,6 +4725,7 @@ EXPORT_SYMBOL_GPL(device_change_owner);
void device_shutdown(void)
{
struct device *dev, *parent;
+ LIST_HEAD(shutdown_wait_list);
wait_for_device_probe();
device_block_probing();
@@ -4769,10 +4770,17 @@ void device_shutdown(void)
dev_info(dev, "shutdown_pre\n");
dev->class->shutdown_pre(dev);
}
+
if (dev->bus && dev->bus->shutdown) {
if (initcall_debug)
dev_info(dev, "shutdown\n");
dev->bus->shutdown(dev);
+ /*
+ * Only put the device on the shutdown_wait_list
+ * if a shutdown_wait() method is also defined.
+ */
+ if (dev->bus->shutdown_wait)
+ list_add(&dev->kobj.entry, &shutdown_wait_list);
} else if (dev->driver && dev->driver->shutdown) {
if (initcall_debug)
dev_info(dev, "shutdown\n");
@@ -4789,6 +4797,35 @@ void device_shutdown(void)
spin_lock(&devices_kset->list_lock);
}
spin_unlock(&devices_kset->list_lock);
+
+ /*
+ * Second pass only for devices that have configured
+ * a shutdown_wait() method.
+ */
+ while (!list_empty(&shutdown_wait_list)) {
+ dev = list_entry(shutdown_wait_list.next, struct device,
+ kobj.entry);
+ parent = get_device(dev->parent);
+ get_device(dev);
+ /*
+ * Make sure the device is off the list
+ */
+ list_del_init(&dev->kobj.entry);
+ if (parent)
+ device_lock(parent);
+ device_lock(dev);
+ if (dev->bus && dev->bus->shutdown_wait) {
+ if (initcall_debug)
+ dev_info(dev,
+ "shutdown_wait called\n");
+ dev->bus->shutdown_wait(dev);
+ }
+ device_unlock(dev);
+ if (parent)
+ device_unlock(parent);
+ put_device(dev);
+ put_device(parent);
+ }
}
/*
diff --git a/include/linux/device/bus.h b/include/linux/device/bus.h
index ae10c4322754..5a36af80cabe 100644
--- a/include/linux/device/bus.h
+++ b/include/linux/device/bus.h
@@ -48,7 +48,10 @@ struct fwnode_handle;
* will never get called until they do.
* @remove: Called when a device removed from this bus.
* @shutdown: Called at shut-down time to quiesce the device.
- *
+ * @shutdown_wait: If this method exists, devices are stored on a separate
+ * list after shutdown() has been called and
+ * shutdown_wait() is called synchronously on each device
+ * in this list in turn.
* @online: Called to put the device back online (after offlining it).
* @offline: Called to put the device offline for hot-removal. May fail.
*
@@ -90,6 +93,7 @@ struct bus_type {
void (*sync_state)(struct device *dev);
void (*remove)(struct device *dev);
void (*shutdown)(struct device *dev);
+ void (*shutdown_wait)(struct device *dev);
int (*online)(struct device *dev);
int (*offline)(struct device *dev);
--
2.39.3
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 2/5] PCI: Support two-pass shutdown
2024-01-03 21:04 Make NVME shutdown two-pass - Version 4 Jeremy Allison
2024-01-03 21:04 ` [PATCH 1/5] driver core: Support two-pass driver shutdown Jeremy Allison
@ 2024-01-03 21:04 ` Jeremy Allison
2024-01-04 18:55 ` Bjorn Helgaas
2024-01-03 21:04 ` [PATCH 3/5] nvme: Change 'bool shutdown' into an enum shutdown_type Jeremy Allison
` (4 subsequent siblings)
6 siblings, 1 reply; 20+ messages in thread
From: Jeremy Allison @ 2024-01-03 21:04 UTC (permalink / raw)
To: jallison, jra, tansuresh, hch, gregkh, rafael, bhelgaas, sagi; +Cc: linux-nvme
From: Tanjore Suresh <tansuresh@google.com>
Enhances the base PCI driver to add support for two-pass
shutdown. Adds shutdown_wait() method.
Assume a device takes n secs to shutdown. If a machine has been
populated with M such devices, the total time spent in shutting down
all the devices will be M * n secs if the shutdown is done
synchronously. For example, if NVMe PCI Controllers take 5 secs
to shutdown and if there are 16 such NVMe controllers in a system,
system will spend a total of 80 secs to shutdown all
NVMe devices in that system.
In order to speed up the shutdown time, a two-pass interface to
shutdown has been implemented. The caller calls the shutdown method
for each device in turn to allow a shutdown request to be sent,
then the caller walks the list of devices and calls shutdown_wait()
to synchronously wait for the shutdown to complete.
In the NVMe case above, all 16 devices will process the shutdown
in parallel taking the total time to shutdown down to the time
for one NVMe PCI Controller to shut down.
This will significantly reduce the machine reboot time.
Signed-off-by: Tanjore Suresh <tansuresh@google.com>
Signed-off-by: Jeremy Allison <jallison@ciq.com>
---
drivers/pci/pci-driver.c | 9 +++++++++
include/linux/pci.h | 2 ++
2 files changed, 11 insertions(+)
diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index 51ec9e7e784f..257bbb04c806 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -547,6 +547,14 @@ static int pci_restore_standard_config(struct pci_dev *pci_dev)
}
#endif /* CONFIG_PM_SLEEP */
+static void pci_device_shutdown_wait(struct device *dev)
+{
+ struct pci_dev *pci_dev = to_pci_dev(dev);
+ struct pci_driver *drv = pci_dev->driver;
+
+ if (drv && drv->shutdown_wait)
+ drv->shutdown_wait(pci_dev);
+}
#ifdef CONFIG_PM
/* Auxiliary functions used for system resume and run-time resume */
@@ -1682,6 +1690,7 @@ struct bus_type pci_bus_type = {
.probe = pci_device_probe,
.remove = pci_device_remove,
.shutdown = pci_device_shutdown,
+ .shutdown_wait = pci_device_shutdown_wait,
.dev_groups = pci_dev_groups,
.bus_groups = pci_bus_groups,
.drv_groups = pci_drv_groups,
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 60ca768bc867..e3ba7043c7de 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -917,6 +917,7 @@ struct module;
* Useful for enabling wake-on-lan (NIC) or changing
* the power state of a device before reboot.
* e.g. drivers/net/e100.c.
+ * @shutdown_wait: Optional driver callback to allow two-pass shutdown.
* @sriov_configure: Optional driver callback to allow configuration of
* number of VFs to enable via sysfs "sriov_numvfs" file.
* @sriov_set_msix_vec_count: PF Driver callback to change number of MSI-X
@@ -948,6 +949,7 @@ struct pci_driver {
int (*suspend)(struct pci_dev *dev, pm_message_t state); /* Device suspended */
int (*resume)(struct pci_dev *dev); /* Device woken up */
void (*shutdown)(struct pci_dev *dev);
+ void (*shutdown_wait)(struct pci_dev *dev);
int (*sriov_configure)(struct pci_dev *dev, int num_vfs); /* On PF */
int (*sriov_set_msix_vec_count)(struct pci_dev *vf, int msix_vec_count); /* On PF */
u32 (*sriov_get_vf_total_msix)(struct pci_dev *pf);
--
2.39.3
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 3/5] nvme: Change 'bool shutdown' into an enum shutdown_type.
2024-01-03 21:04 Make NVME shutdown two-pass - Version 4 Jeremy Allison
2024-01-03 21:04 ` [PATCH 1/5] driver core: Support two-pass driver shutdown Jeremy Allison
2024-01-03 21:04 ` [PATCH 2/5] PCI: Support two-pass shutdown Jeremy Allison
@ 2024-01-03 21:04 ` Jeremy Allison
2024-01-04 13:26 ` Sagi Grimberg
2024-01-03 21:04 ` [PATCH 4/5] nvme: Export nvme_wait_ready() Jeremy Allison
` (3 subsequent siblings)
6 siblings, 1 reply; 20+ messages in thread
From: Jeremy Allison @ 2024-01-03 21:04 UTC (permalink / raw)
To: jallison, jra, tansuresh, hch, gregkh, rafael, bhelgaas, sagi; +Cc: linux-nvme
Convert nvme_disable_ctrl() and nvme_dev_disable()
inside drivers/nvme/host/pci.c to use this:
bool shutdown = false == NVME_DISABLE_RESET
bool shutdown = true == NVME_DISABLE_SHUTDOWN_SYNC.
This will make it easier to add a third request later:
NVME_DISABLE_SHUTDOWN_ASNYC
As nvme_disable_ctrl() is used outside of drivers/nvme/host/pci.c,
convert the callers of nvme_disable_ctrl() to this convention too.
Signed-off-by: Jeremy Allison <jallison@ciq.com>
---
drivers/nvme/host/apple.c | 4 ++--
drivers/nvme/host/core.c | 6 +++---
drivers/nvme/host/nvme.h | 7 +++++-
drivers/nvme/host/pci.c | 44 +++++++++++++++++++-------------------
drivers/nvme/host/rdma.c | 3 ++-
drivers/nvme/host/tcp.c | 3 ++-
drivers/nvme/target/loop.c | 2 +-
7 files changed, 38 insertions(+), 31 deletions(-)
diff --git a/drivers/nvme/host/apple.c b/drivers/nvme/host/apple.c
index 596bb11eeba5..764639ede41d 100644
--- a/drivers/nvme/host/apple.c
+++ b/drivers/nvme/host/apple.c
@@ -844,8 +844,8 @@ static void apple_nvme_disable(struct apple_nvme *anv, bool shutdown)
* NVMe disabled state, after a clean shutdown).
*/
if (shutdown)
- nvme_disable_ctrl(&anv->ctrl, shutdown);
- nvme_disable_ctrl(&anv->ctrl, false);
+ nvme_disable_ctrl(&anv->ctrl, NVME_DISABLE_SHUTDOWN_SYNC);
+ nvme_disable_ctrl(&anv->ctrl, NVME_DISABLE_RESET);
}
WRITE_ONCE(anv->ioq.enabled, false);
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 50818dbcfa1a..e1b2facb7d6a 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2219,12 +2219,12 @@ static int nvme_wait_ready(struct nvme_ctrl *ctrl, u32 mask, u32 val,
return ret;
}
-int nvme_disable_ctrl(struct nvme_ctrl *ctrl, bool shutdown)
+int nvme_disable_ctrl(struct nvme_ctrl *ctrl, enum shutdown_type shutdown_type)
{
int ret;
ctrl->ctrl_config &= ~NVME_CC_SHN_MASK;
- if (shutdown)
+ if (shutdown_type == NVME_DISABLE_SHUTDOWN_SYNC)
ctrl->ctrl_config |= NVME_CC_SHN_NORMAL;
else
ctrl->ctrl_config &= ~NVME_CC_ENABLE;
@@ -2233,7 +2233,7 @@ int nvme_disable_ctrl(struct nvme_ctrl *ctrl, bool shutdown)
if (ret)
return ret;
- if (shutdown) {
+ if (shutdown_type == NVME_DISABLE_SHUTDOWN_SYNC) {
return nvme_wait_ready(ctrl, NVME_CSTS_SHST_MASK,
NVME_CSTS_SHST_CMPLT,
ctrl->shutdown_timeout, "shutdown");
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 6092cc361837..1a748640f2fb 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -187,6 +187,11 @@ enum {
NVME_MPATH_IO_STATS = (1 << 2),
};
+enum shutdown_type {
+ NVME_DISABLE_RESET = 0,
+ NVME_DISABLE_SHUTDOWN_SYNC = 1
+};
+
static inline struct nvme_request *nvme_req(struct request *req)
{
return blk_mq_rq_to_pdu(req);
@@ -749,7 +754,7 @@ void nvme_cancel_tagset(struct nvme_ctrl *ctrl);
void nvme_cancel_admin_tagset(struct nvme_ctrl *ctrl);
bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
enum nvme_ctrl_state new_state);
-int nvme_disable_ctrl(struct nvme_ctrl *ctrl, bool shutdown);
+int nvme_disable_ctrl(struct nvme_ctrl *ctrl, enum shutdown_type shutdown_type);
int nvme_enable_ctrl(struct nvme_ctrl *ctrl);
int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
const struct nvme_ctrl_ops *ops, unsigned long quirks);
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index f27202680741..367e322dc818 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -108,7 +108,7 @@ MODULE_PARM_DESC(noacpi, "disable acpi bios quirks");
struct nvme_dev;
struct nvme_queue;
-static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown);
+static void nvme_dev_disable(struct nvme_dev *dev, enum shutdown_type shutdown_type);
static void nvme_delete_io_queues(struct nvme_dev *dev);
static void nvme_update_attrs(struct nvme_dev *dev);
@@ -1330,7 +1330,7 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req)
"I/O %d QID %d timeout, disable controller\n",
req->tag, nvmeq->qid);
nvme_req(req)->flags |= NVME_REQ_CANCELLED;
- nvme_dev_disable(dev, true);
+ nvme_dev_disable(dev, NVME_DISABLE_SHUTDOWN_SYNC);
return BLK_EH_DONE;
case NVME_CTRL_RESETTING:
return BLK_EH_RESET_TIMER;
@@ -1390,7 +1390,7 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req)
if (!nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_RESETTING))
return BLK_EH_DONE;
- nvme_dev_disable(dev, false);
+ nvme_dev_disable(dev, NVME_DISABLE_RESET);
if (nvme_try_sched_reset(&dev->ctrl))
nvme_unquiesce_io_queues(&dev->ctrl);
return BLK_EH_DONE;
@@ -1736,7 +1736,7 @@ static int nvme_pci_configure_admin_queue(struct nvme_dev *dev)
* commands to the admin queue ... and we don't know what memory that
* might be pointing at!
*/
- result = nvme_disable_ctrl(&dev->ctrl, false);
+ result = nvme_disable_ctrl(&dev->ctrl, NVME_DISABLE_RESET);
if (result < 0)
return result;
@@ -2571,7 +2571,7 @@ static bool nvme_pci_ctrl_is_dead(struct nvme_dev *dev)
return (csts & NVME_CSTS_CFS) || !(csts & NVME_CSTS_RDY);
}
-static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
+static void nvme_dev_disable(struct nvme_dev *dev, enum shutdown_type shutdown_type)
{
struct pci_dev *pdev = to_pci_dev(dev->dev);
bool dead;
@@ -2586,7 +2586,7 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
* Give the controller a chance to complete all entered requests
* if doing a safe shutdown.
*/
- if (!dead && shutdown)
+ if (!dead && (shutdown_type == NVME_DISABLE_SHUTDOWN_SYNC))
nvme_wait_freeze_timeout(&dev->ctrl, NVME_IO_TIMEOUT);
}
@@ -2594,7 +2594,7 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
if (!dead && dev->ctrl.queue_count > 0) {
nvme_delete_io_queues(dev);
- nvme_disable_ctrl(&dev->ctrl, shutdown);
+ nvme_disable_ctrl(&dev->ctrl, shutdown_type);
nvme_poll_irqdisable(&dev->queues[0]);
}
nvme_suspend_io_queues(dev);
@@ -2612,7 +2612,7 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
* must flush all entered requests to their failed completion to avoid
* deadlocking blk-mq hot-cpu notifier.
*/
- if (shutdown) {
+ if (shutdown_type == NVME_DISABLE_SHUTDOWN_SYNC) {
nvme_unquiesce_io_queues(&dev->ctrl);
if (dev->ctrl.admin_q && !blk_queue_dying(dev->ctrl.admin_q))
nvme_unquiesce_admin_queue(&dev->ctrl);
@@ -2620,11 +2620,11 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
mutex_unlock(&dev->shutdown_lock);
}
-static int nvme_disable_prepare_reset(struct nvme_dev *dev, bool shutdown)
+static int nvme_disable_prepare_reset(struct nvme_dev *dev, enum shutdown_type shutdown_type)
{
if (!nvme_wait_reset(&dev->ctrl))
return -EBUSY;
- nvme_dev_disable(dev, shutdown);
+ nvme_dev_disable(dev, shutdown_type);
return 0;
}
@@ -2702,7 +2702,7 @@ static void nvme_reset_work(struct work_struct *work)
* moving on.
*/
if (dev->ctrl.ctrl_config & NVME_CC_ENABLE)
- nvme_dev_disable(dev, false);
+ nvme_dev_disable(dev, NVME_DISABLE_RESET);
nvme_sync_queues(&dev->ctrl);
mutex_lock(&dev->shutdown_lock);
@@ -2780,7 +2780,7 @@ static void nvme_reset_work(struct work_struct *work)
dev_warn(dev->ctrl.device, "Disabling device after reset failure: %d\n",
result);
nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_DELETING);
- nvme_dev_disable(dev, true);
+ nvme_dev_disable(dev, NVME_DISABLE_SHUTDOWN_SYNC);
nvme_sync_queues(&dev->ctrl);
nvme_mark_namespaces_dead(&dev->ctrl);
nvme_unquiesce_io_queues(&dev->ctrl);
@@ -3058,7 +3058,7 @@ static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
out_disable:
nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_DELETING);
- nvme_dev_disable(dev, true);
+ nvme_dev_disable(dev, NVME_DISABLE_SHUTDOWN_SYNC);
nvme_free_host_mem(dev);
nvme_dev_remove_admin(dev);
nvme_dbbuf_dma_free(dev);
@@ -3084,7 +3084,7 @@ static void nvme_reset_prepare(struct pci_dev *pdev)
* state as pci_dev device lock is held, making it impossible to race
* with ->remove().
*/
- nvme_disable_prepare_reset(dev, false);
+ nvme_disable_prepare_reset(dev, NVME_DISABLE_RESET);
nvme_sync_queues(&dev->ctrl);
}
@@ -3100,7 +3100,7 @@ static void nvme_shutdown(struct pci_dev *pdev)
{
struct nvme_dev *dev = pci_get_drvdata(pdev);
- nvme_disable_prepare_reset(dev, true);
+ nvme_disable_prepare_reset(dev, NVME_DISABLE_SHUTDOWN_SYNC);
}
/*
@@ -3117,13 +3117,13 @@ static void nvme_remove(struct pci_dev *pdev)
if (!pci_device_is_present(pdev)) {
nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_DEAD);
- nvme_dev_disable(dev, true);
+ nvme_dev_disable(dev, NVME_DISABLE_SHUTDOWN_SYNC);
}
flush_work(&dev->ctrl.reset_work);
nvme_stop_ctrl(&dev->ctrl);
nvme_remove_namespaces(&dev->ctrl);
- nvme_dev_disable(dev, true);
+ nvme_dev_disable(dev, NVME_DISABLE_SHUTDOWN_SYNC);
nvme_free_host_mem(dev);
nvme_dev_remove_admin(dev);
nvme_dbbuf_dma_free(dev);
@@ -3186,7 +3186,7 @@ static int nvme_suspend(struct device *dev)
if (pm_suspend_via_firmware() || !ctrl->npss ||
!pcie_aspm_enabled(pdev) ||
(ndev->ctrl.quirks & NVME_QUIRK_SIMPLE_SUSPEND))
- return nvme_disable_prepare_reset(ndev, true);
+ return nvme_disable_prepare_reset(ndev, NVME_DISABLE_SHUTDOWN_SYNC);
nvme_start_freeze(ctrl);
nvme_wait_freeze(ctrl);
@@ -3229,7 +3229,7 @@ static int nvme_suspend(struct device *dev)
* Clearing npss forces a controller reset on resume. The
* correct value will be rediscovered then.
*/
- ret = nvme_disable_prepare_reset(ndev, true);
+ ret = nvme_disable_prepare_reset(ndev, NVME_DISABLE_SHUTDOWN_SYNC);
ctrl->npss = 0;
}
unfreeze:
@@ -3241,7 +3241,7 @@ static int nvme_simple_suspend(struct device *dev)
{
struct nvme_dev *ndev = pci_get_drvdata(to_pci_dev(dev));
- return nvme_disable_prepare_reset(ndev, true);
+ return nvme_disable_prepare_reset(ndev, NVME_DISABLE_SHUTDOWN_SYNC);
}
static int nvme_simple_resume(struct device *dev)
@@ -3279,10 +3279,10 @@ static pci_ers_result_t nvme_error_detected(struct pci_dev *pdev,
dev_warn(dev->ctrl.device,
"frozen state error detected, reset controller\n");
if (!nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_RESETTING)) {
- nvme_dev_disable(dev, true);
+ nvme_dev_disable(dev, NVME_DISABLE_SHUTDOWN_SYNC);
return PCI_ERS_RESULT_DISCONNECT;
}
- nvme_dev_disable(dev, false);
+ nvme_dev_disable(dev, NVME_DISABLE_RESET);
return PCI_ERS_RESULT_NEED_RESET;
case pci_channel_io_perm_failure:
dev_warn(dev->ctrl.device,
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index bc90ec3c51b0..b969ab23a55b 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -2136,7 +2136,8 @@ static void nvme_rdma_shutdown_ctrl(struct nvme_rdma_ctrl *ctrl, bool shutdown)
{
nvme_rdma_teardown_io_queues(ctrl, shutdown);
nvme_quiesce_admin_queue(&ctrl->ctrl);
- nvme_disable_ctrl(&ctrl->ctrl, shutdown);
+ nvme_disable_ctrl(&ctrl->ctrl, shutdown ?
+ NVME_DISABLE_SHUTDOWN_SYNC : NVME_DISABLE_RESET);
nvme_rdma_teardown_admin_queue(ctrl, shutdown);
}
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 5056bcae2f39..de5937f786b8 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -2291,7 +2291,8 @@ static void nvme_tcp_teardown_ctrl(struct nvme_ctrl *ctrl, bool shutdown)
{
nvme_tcp_teardown_io_queues(ctrl, shutdown);
nvme_quiesce_admin_queue(ctrl);
- nvme_disable_ctrl(ctrl, shutdown);
+ nvme_disable_ctrl(ctrl, shutdown ?
+ NVME_DISABLE_SHUTDOWN_SYNC : NVME_DISABLE_RESET);
nvme_tcp_teardown_admin_queue(ctrl, shutdown);
}
diff --git a/drivers/nvme/target/loop.c b/drivers/nvme/target/loop.c
index 9cb434c58075..6cb6e7c6bdd1 100644
--- a/drivers/nvme/target/loop.c
+++ b/drivers/nvme/target/loop.c
@@ -401,7 +401,7 @@ static void nvme_loop_shutdown_ctrl(struct nvme_loop_ctrl *ctrl)
nvme_quiesce_admin_queue(&ctrl->ctrl);
if (ctrl->ctrl.state == NVME_CTRL_LIVE)
- nvme_disable_ctrl(&ctrl->ctrl, true);
+ nvme_disable_ctrl(&ctrl->ctrl, NVME_DISABLE_SHUTDOWN_SYNC);
nvme_cancel_admin_tagset(&ctrl->ctrl);
nvme_loop_destroy_admin_queue(ctrl);
--
2.39.3
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 4/5] nvme: Export nvme_wait_ready().
2024-01-03 21:04 Make NVME shutdown two-pass - Version 4 Jeremy Allison
` (2 preceding siblings ...)
2024-01-03 21:04 ` [PATCH 3/5] nvme: Change 'bool shutdown' into an enum shutdown_type Jeremy Allison
@ 2024-01-03 21:04 ` Jeremy Allison
2024-01-03 21:04 ` [PATCH 5/5] nvme: Add two-pass shutdown support Jeremy Allison
` (2 subsequent siblings)
6 siblings, 0 replies; 20+ messages in thread
From: Jeremy Allison @ 2024-01-03 21:04 UTC (permalink / raw)
To: jallison, jra, tansuresh, hch, gregkh, rafael, bhelgaas, sagi; +Cc: linux-nvme
We will be calling this from drivers/nvme/host/pci.c
in the next commit.
Signed-off-by: Jeremy Allison <jallison@ciq.com>
---
drivers/nvme/host/core.c | 3 ++-
drivers/nvme/host/nvme.h | 2 ++
2 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index e1b2facb7d6a..c7d448f186fb 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2192,7 +2192,7 @@ const struct block_device_operations nvme_bdev_ops = {
.pr_ops = &nvme_pr_ops,
};
-static int nvme_wait_ready(struct nvme_ctrl *ctrl, u32 mask, u32 val,
+int nvme_wait_ready(struct nvme_ctrl *ctrl, u32 mask, u32 val,
u32 timeout, const char *op)
{
unsigned long timeout_jiffies = jiffies + timeout * HZ;
@@ -2218,6 +2218,7 @@ static int nvme_wait_ready(struct nvme_ctrl *ctrl, u32 mask, u32 val,
return ret;
}
+EXPORT_SYMBOL_GPL(nvme_wait_ready);
int nvme_disable_ctrl(struct nvme_ctrl *ctrl, enum shutdown_type shutdown_type)
{
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 1a748640f2fb..8c30f9856621 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -754,6 +754,8 @@ void nvme_cancel_tagset(struct nvme_ctrl *ctrl);
void nvme_cancel_admin_tagset(struct nvme_ctrl *ctrl);
bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
enum nvme_ctrl_state new_state);
+int nvme_wait_ready(struct nvme_ctrl *ctrl, u32 mask, u32 val,
+ u32 timeout, const char *op);
int nvme_disable_ctrl(struct nvme_ctrl *ctrl, enum shutdown_type shutdown_type);
int nvme_enable_ctrl(struct nvme_ctrl *ctrl);
int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
--
2.39.3
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 5/5] nvme: Add two-pass shutdown support.
2024-01-03 21:04 Make NVME shutdown two-pass - Version 4 Jeremy Allison
` (3 preceding siblings ...)
2024-01-03 21:04 ` [PATCH 4/5] nvme: Export nvme_wait_ready() Jeremy Allison
@ 2024-01-03 21:04 ` Jeremy Allison
2024-01-04 13:14 ` Sagi Grimberg
2024-01-04 4:48 ` Make NVME shutdown two-pass - Version 4 Chaitanya Kulkarni
2024-01-04 19:00 ` Keith Busch
6 siblings, 1 reply; 20+ messages in thread
From: Jeremy Allison @ 2024-01-03 21:04 UTC (permalink / raw)
To: jallison, jra, tansuresh, hch, gregkh, rafael, bhelgaas, sagi; +Cc: linux-nvme
This works with the two-pass shutdown mechanism setup for the PCI
drivers and participates to provide the shutdown_wait
method at the pci_driver structure level.
Adds the new NVME_DISABLE_SHUTDOWN_ASYNC to enum shutdown_type.
Changes the nvme shutdown() method to set the
NVME_CC_SHN_NORMAL bit and then return to the caller when
requested by NVME_DISABLE_SHUTDOWN_ASYNC.
nvme_shutdown_wait() is added to synchronously wait for
the NVME_CSTS_SHST_CMPLT bit.
This change speeds up the shutdown in a system which hosts
many controllers.
Signed-off-by: Jeremy Allison <jallison@ciq.com>
Signed-off-by: Tanjore Suresh <tansuresh@google.com>
---
drivers/nvme/host/core.c | 18 ++++++++++++++++--
drivers/nvme/host/nvme.h | 3 ++-
drivers/nvme/host/pci.c | 30 ++++++++++++++++++++++++++++--
3 files changed, 46 insertions(+), 5 deletions(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index c7d448f186fb..7000adea1e41 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2225,7 +2225,7 @@ int nvme_disable_ctrl(struct nvme_ctrl *ctrl, enum shutdown_type shutdown_type)
int ret;
ctrl->ctrl_config &= ~NVME_CC_SHN_MASK;
- if (shutdown_type == NVME_DISABLE_SHUTDOWN_SYNC)
+ if (shutdown_type != NVME_DISABLE_RESET)
ctrl->ctrl_config |= NVME_CC_SHN_NORMAL;
else
ctrl->ctrl_config &= ~NVME_CC_ENABLE;
@@ -2234,10 +2234,24 @@ int nvme_disable_ctrl(struct nvme_ctrl *ctrl, enum shutdown_type shutdown_type)
if (ret)
return ret;
- if (shutdown_type == NVME_DISABLE_SHUTDOWN_SYNC) {
+ switch (shutdown_type) {
+ case NVME_DISABLE_SHUTDOWN_ASYNC:
+ /*
+ * nvme_shutdown_wait() will read the reply for this.
+ */
+ return ret;
+ case NVME_DISABLE_SHUTDOWN_SYNC:
+ /*
+ * Spin on the read of the control register.
+ */
return nvme_wait_ready(ctrl, NVME_CSTS_SHST_MASK,
NVME_CSTS_SHST_CMPLT,
ctrl->shutdown_timeout, "shutdown");
+ case NVME_DISABLE_RESET:
+ /*
+ * Doing a reset here. Handle below.
+ */
+ break;
}
if (ctrl->quirks & NVME_QUIRK_DELAY_BEFORE_CHK_RDY)
msleep(NVME_QUIRK_DELAY_AMOUNT);
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 8c30f9856621..43667d7a471a 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -189,7 +189,8 @@ enum {
enum shutdown_type {
NVME_DISABLE_RESET = 0,
- NVME_DISABLE_SHUTDOWN_SYNC = 1
+ NVME_DISABLE_SHUTDOWN_SYNC = 1,
+ NVME_DISABLE_SHUTDOWN_ASYNC = 2
};
static inline struct nvme_request *nvme_req(struct request *req)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 367e322dc818..9052dcf0f70c 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2586,7 +2586,7 @@ static void nvme_dev_disable(struct nvme_dev *dev, enum shutdown_type shutdown_t
* Give the controller a chance to complete all entered requests
* if doing a safe shutdown.
*/
- if (!dead && (shutdown_type == NVME_DISABLE_SHUTDOWN_SYNC))
+ if (!dead && (shutdown_type != NVME_DISABLE_RESET))
nvme_wait_freeze_timeout(&dev->ctrl, NVME_IO_TIMEOUT);
}
@@ -3100,7 +3100,32 @@ static void nvme_shutdown(struct pci_dev *pdev)
{
struct nvme_dev *dev = pci_get_drvdata(pdev);
- nvme_disable_prepare_reset(dev, NVME_DISABLE_SHUTDOWN_SYNC);
+ nvme_disable_prepare_reset(dev, NVME_DISABLE_SHUTDOWN_ASYNC);
+}
+
+static void nvme_shutdown_wait(struct pci_dev *pdev)
+{
+ struct nvme_dev *dev = pci_get_drvdata(pdev);
+
+ mutex_lock(&dev->shutdown_lock);
+ /*
+ * Finish waiting for the shutdown request
+ * initiated in nvme_shutdown() above using
+ * NVME_DISABLE_SHUTDOWN_ASYNC.
+ */
+ nvme_wait_ready(&dev->ctrl, NVME_CSTS_SHST_MASK,
+ NVME_CSTS_SHST_CMPLT,
+ dev->ctrl.shutdown_timeout, "shutdown");
+ /*
+ * The driver will not be starting up queues again if shutting down so
+ * must flush all entered requests to their failed completion to avoid
+ * deadlocking blk-mq hot-cpu notifier.
+ */
+ nvme_unquiesce_io_queues(&dev->ctrl);
+ if (dev->ctrl.admin_q && !blk_queue_dying(dev->ctrl.admin_q))
+ nvme_unquiesce_admin_queue(&dev->ctrl);
+
+ mutex_unlock(&dev->shutdown_lock);
}
/*
@@ -3494,6 +3519,7 @@ static struct pci_driver nvme_driver = {
.probe = nvme_probe,
.remove = nvme_remove,
.shutdown = nvme_shutdown,
+ .shutdown_wait = nvme_shutdown_wait,
.driver = {
.probe_type = PROBE_PREFER_ASYNCHRONOUS,
#ifdef CONFIG_PM_SLEEP
--
2.39.3
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: Make NVME shutdown two-pass - Version 4
2024-01-03 21:04 Make NVME shutdown two-pass - Version 4 Jeremy Allison
` (4 preceding siblings ...)
2024-01-03 21:04 ` [PATCH 5/5] nvme: Add two-pass shutdown support Jeremy Allison
@ 2024-01-04 4:48 ` Chaitanya Kulkarni
2024-01-04 6:38 ` Jeremy Allison
2024-01-04 19:00 ` Keith Busch
6 siblings, 1 reply; 20+ messages in thread
From: Chaitanya Kulkarni @ 2024-01-04 4:48 UTC (permalink / raw)
To: Jeremy Allison, jra@samba.org, tansuresh@google.com, hch@lst.de,
gregkh@linuxfoundation.org, rafael@kernel.org,
bhelgaas@google.com, sagi@grimberg.me
Cc: linux-nvme@lists.infradead.org
On 1/3/24 13:04, Jeremy Allison wrote:
> This is version 4 of a patchset originally written by
> Tanjore Suresh <tansuresh@google.com> to make shutdown
> of nvme devices two-pass.
>
> Changes from version 3:
>
> 1). Removed duplicate setting of ctrl->ctrl_config in
> nvme completion function. Noticed by Sagi Grimberg <sagi@grimberg.me>
>
> 2). Removed intermediate function nvme_wait_for_shutdown_cmpl(),
> folded this code directly into nvme_shutdown_wait() by
> exporting nvme_wait_ready() from drivers/nvme/host/core.c.
> Requested by Sagi Grimberg <sagi@grimberg.me>
>
> -------------------------------------------------------------
> Currently the Linux nvme driver shutdown code steps
> through each connected drive, sets the NVME_CC_SHN_NORMAL
> (normal shutdown) flag and then polls the given drive
> waiting for the response NVME_CSTS_SHST_CMPLT flag
> (shutdown complete).
>
> Each drive is taking around 13 seconds to respond to this.
>
> The customer has 20+ drives on the box so this time adds
> up on shutdown when the nvme driver is being shut down.
>
> This patchset changes shutdown to proceed in parallel,
> so the NVME_CC_SHN_NORMAL (normal shutdown) flag is
> sent to all drives first, and then it polls waiting
> for the NVME_CSTS_SHST_CMPLT flag (shutdown complete)
> for all drives.
>
> In the specific customer case it reduces the NVME
> shutdown time from over 300 seconds to around 15
> seconds.
> -------------------------------------------------------------
>
> Thanks for your consideration,
>
> Jeremy Allison.
> CIQ / Samba Team.
>
>
>
Is there any plan to add a blktests for this or this needs a special setup
for testing that cannot be done with blktest ?
-ck
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Make NVME shutdown two-pass - Version 4
2024-01-04 4:48 ` Make NVME shutdown two-pass - Version 4 Chaitanya Kulkarni
@ 2024-01-04 6:38 ` Jeremy Allison
0 siblings, 0 replies; 20+ messages in thread
From: Jeremy Allison @ 2024-01-04 6:38 UTC (permalink / raw)
To: Chaitanya Kulkarni
Cc: jra@samba.org, tansuresh@google.com, hch@lst.de,
gregkh@linuxfoundation.org, rafael@kernel.org,
bhelgaas@google.com, sagi@grimberg.me,
linux-nvme@lists.infradead.org, Jeremy Allison
This code is only exercised when the PCI shutdown function is called -
shutting down the VM is
how I've been testing it). I don't see a way of invoking that in blktest.
On Wed, Jan 3, 2024 at 8:48 PM Chaitanya Kulkarni <chaitanyak@nvidia.com> wrote:
>
> On 1/3/24 13:04, Jeremy Allison wrote:
> > This is version 4 of a patchset originally written by
> > Tanjore Suresh <tansuresh@google.com> to make shutdown
> > of nvme devices two-pass.
> >
> > Changes from version 3:
> >
> > 1). Removed duplicate setting of ctrl->ctrl_config in
> > nvme completion function. Noticed by Sagi Grimberg <sagi@grimberg.me>
> >
> > 2). Removed intermediate function nvme_wait_for_shutdown_cmpl(),
> > folded this code directly into nvme_shutdown_wait() by
> > exporting nvme_wait_ready() from drivers/nvme/host/core.c.
> > Requested by Sagi Grimberg <sagi@grimberg.me>
> >
> > -------------------------------------------------------------
> > Currently the Linux nvme driver shutdown code steps
> > through each connected drive, sets the NVME_CC_SHN_NORMAL
> > (normal shutdown) flag and then polls the given drive
> > waiting for the response NVME_CSTS_SHST_CMPLT flag
> > (shutdown complete).
> >
> > Each drive is taking around 13 seconds to respond to this.
> >
> > The customer has 20+ drives on the box so this time adds
> > up on shutdown when the nvme driver is being shut down.
> >
> > This patchset changes shutdown to proceed in parallel,
> > so the NVME_CC_SHN_NORMAL (normal shutdown) flag is
> > sent to all drives first, and then it polls waiting
> > for the NVME_CSTS_SHST_CMPLT flag (shutdown complete)
> > for all drives.
> >
> > In the specific customer case it reduces the NVME
> > shutdown time from over 300 seconds to around 15
> > seconds.
> > -------------------------------------------------------------
> >
> > Thanks for your consideration,
> >
> > Jeremy Allison.
> > CIQ / Samba Team.
> >
> >
> >
>
> Is there any plan to add a blktests for this or this needs a special setup
> for testing that cannot be done with blktest ?
>
> -ck
>
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/5] driver core: Support two-pass driver shutdown
2024-01-03 21:04 ` [PATCH 1/5] driver core: Support two-pass driver shutdown Jeremy Allison
@ 2024-01-04 13:12 ` Sagi Grimberg
2024-01-04 17:27 ` Jeremy Allison
0 siblings, 1 reply; 20+ messages in thread
From: Sagi Grimberg @ 2024-01-04 13:12 UTC (permalink / raw)
To: Jeremy Allison, jra, tansuresh, hch, gregkh, rafael, bhelgaas; +Cc: linux-nvme
On 1/3/24 23:04, Jeremy Allison wrote:
> From: Tanjore Suresh <tansuresh@google.com>
>
> This changes the bus driver interface with an additional entry point
> to enable devices to implement two-pass shutdown. The existing
> synchronous interface to shutdown is called, and if a shutdown_wait
> method is defined the device is moved to an alternate list.
>
> Once the shutdown method is called for all devices, the
> shutdown_wait method is then called synchronously for
> all devices on the alternate list.
>
> Signed-off-by: Tanjore Suresh <tansuresh@google.com>
> Signed-off-by: Jeremy Allison <jallison@ciq.com>
> ---
> drivers/base/core.c | 37 +++++++++++++++++++++++++++++++++++++
> include/linux/device/bus.h | 6 +++++-
> 2 files changed, 42 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 67ba592afc77..e1f4c54de3e1 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -4725,6 +4725,7 @@ EXPORT_SYMBOL_GPL(device_change_owner);
> void device_shutdown(void)
> {
> struct device *dev, *parent;
> + LIST_HEAD(shutdown_wait_list);
>
> wait_for_device_probe();
> device_block_probing();
> @@ -4769,10 +4770,17 @@ void device_shutdown(void)
> dev_info(dev, "shutdown_pre\n");
> dev->class->shutdown_pre(dev);
> }
> +
> if (dev->bus && dev->bus->shutdown) {
> if (initcall_debug)
> dev_info(dev, "shutdown\n");
> dev->bus->shutdown(dev);
> + /*
> + * Only put the device on the shutdown_wait_list
> + * if a shutdown_wait() method is also defined.
> + */
> + if (dev->bus->shutdown_wait)
> + list_add(&dev->kobj.entry, &shutdown_wait_list);
> } else if (dev->driver && dev->driver->shutdown) {
> if (initcall_debug)
> dev_info(dev, "shutdown\n");
> @@ -4789,6 +4797,35 @@ void device_shutdown(void)
> spin_lock(&devices_kset->list_lock);
> }
> spin_unlock(&devices_kset->list_lock);
> +
> + /*
> + * Second pass only for devices that have configured
> + * a shutdown_wait() method.
> + */
> + while (!list_empty(&shutdown_wait_list)) {
> + dev = list_entry(shutdown_wait_list.next, struct device,
> + kobj.entry);
> + parent = get_device(dev->parent);
> + get_device(dev);
> + /*
> + * Make sure the device is off the list
> + */
> + list_del_init(&dev->kobj.entry);
> + if (parent)
> + device_lock(parent);
> + device_lock(dev);
> + if (dev->bus && dev->bus->shutdown_wait) {
How can dev->bus or dev->bus->shutdown_wait be null at this
point?
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 5/5] nvme: Add two-pass shutdown support.
2024-01-03 21:04 ` [PATCH 5/5] nvme: Add two-pass shutdown support Jeremy Allison
@ 2024-01-04 13:14 ` Sagi Grimberg
2024-01-04 17:30 ` Jeremy Allison
0 siblings, 1 reply; 20+ messages in thread
From: Sagi Grimberg @ 2024-01-04 13:14 UTC (permalink / raw)
To: Jeremy Allison, jra, tansuresh, hch, gregkh, rafael, bhelgaas; +Cc: linux-nvme
On 1/3/24 23:04, Jeremy Allison wrote:
> This works with the two-pass shutdown mechanism setup for the PCI
> drivers and participates to provide the shutdown_wait
> method at the pci_driver structure level.
>
> Adds the new NVME_DISABLE_SHUTDOWN_ASYNC to enum shutdown_type.
> Changes the nvme shutdown() method to set the
> NVME_CC_SHN_NORMAL bit and then return to the caller when
> requested by NVME_DISABLE_SHUTDOWN_ASYNC.
>
> nvme_shutdown_wait() is added to synchronously wait for
> the NVME_CSTS_SHST_CMPLT bit.
>
> This change speeds up the shutdown in a system which hosts
> many controllers.
I think that patch 4 can be folded in here.
>
> Signed-off-by: Jeremy Allison <jallison@ciq.com>
> Signed-off-by: Tanjore Suresh <tansuresh@google.com>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 3/5] nvme: Change 'bool shutdown' into an enum shutdown_type.
2024-01-03 21:04 ` [PATCH 3/5] nvme: Change 'bool shutdown' into an enum shutdown_type Jeremy Allison
@ 2024-01-04 13:26 ` Sagi Grimberg
2024-01-04 17:43 ` Jeremy Allison
0 siblings, 1 reply; 20+ messages in thread
From: Sagi Grimberg @ 2024-01-04 13:26 UTC (permalink / raw)
To: Jeremy Allison, jra, tansuresh, hch, gregkh, rafael, bhelgaas; +Cc: linux-nvme
On 1/3/24 23:04, Jeremy Allison wrote:
> Convert nvme_disable_ctrl() and nvme_dev_disable()
> inside drivers/nvme/host/pci.c to use this:
>
> bool shutdown = false == NVME_DISABLE_RESET
> bool shutdown = true == NVME_DISABLE_SHUTDOWN_SYNC.
>
> This will make it easier to add a third request later:
> NVME_DISABLE_SHUTDOWN_ASNYC
>
> As nvme_disable_ctrl() is used outside of drivers/nvme/host/pci.c,
> convert the callers of nvme_disable_ctrl() to this convention too.
>
> Signed-off-by: Jeremy Allison <jallison@ciq.com>
> ---
> drivers/nvme/host/apple.c | 4 ++--
> drivers/nvme/host/core.c | 6 +++---
> drivers/nvme/host/nvme.h | 7 +++++-
> drivers/nvme/host/pci.c | 44 +++++++++++++++++++-------------------
> drivers/nvme/host/rdma.c | 3 ++-
> drivers/nvme/host/tcp.c | 3 ++-
> drivers/nvme/target/loop.c | 2 +-
> 7 files changed, 38 insertions(+), 31 deletions(-)
>
> diff --git a/drivers/nvme/host/apple.c b/drivers/nvme/host/apple.c
> index 596bb11eeba5..764639ede41d 100644
> --- a/drivers/nvme/host/apple.c
> +++ b/drivers/nvme/host/apple.c
> @@ -844,8 +844,8 @@ static void apple_nvme_disable(struct apple_nvme *anv, bool shutdown)
> * NVMe disabled state, after a clean shutdown).
> */
> if (shutdown)
> - nvme_disable_ctrl(&anv->ctrl, shutdown);
> - nvme_disable_ctrl(&anv->ctrl, false);
> + nvme_disable_ctrl(&anv->ctrl, NVME_DISABLE_SHUTDOWN_SYNC);
> + nvme_disable_ctrl(&anv->ctrl, NVME_DISABLE_RESET);
> }
>
> WRITE_ONCE(anv->ioq.enabled, false);
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 50818dbcfa1a..e1b2facb7d6a 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -2219,12 +2219,12 @@ static int nvme_wait_ready(struct nvme_ctrl *ctrl, u32 mask, u32 val,
> return ret;
> }
>
> -int nvme_disable_ctrl(struct nvme_ctrl *ctrl, bool shutdown)
> +int nvme_disable_ctrl(struct nvme_ctrl *ctrl, enum shutdown_type shutdown_type)
> {
> int ret;
>
> ctrl->ctrl_config &= ~NVME_CC_SHN_MASK;
> - if (shutdown)
> + if (shutdown_type == NVME_DISABLE_SHUTDOWN_SYNC)
> ctrl->ctrl_config |= NVME_CC_SHN_NORMAL;
> else
> ctrl->ctrl_config &= ~NVME_CC_ENABLE;
> @@ -2233,7 +2233,7 @@ int nvme_disable_ctrl(struct nvme_ctrl *ctrl, bool shutdown)
> if (ret)
> return ret;
>
> - if (shutdown) {
> + if (shutdown_type == NVME_DISABLE_SHUTDOWN_SYNC) {
> return nvme_wait_ready(ctrl, NVME_CSTS_SHST_MASK,
> NVME_CSTS_SHST_CMPLT,
> ctrl->shutdown_timeout, "shutdown");
> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> index 6092cc361837..1a748640f2fb 100644
> --- a/drivers/nvme/host/nvme.h
> +++ b/drivers/nvme/host/nvme.h
> @@ -187,6 +187,11 @@ enum {
> NVME_MPATH_IO_STATS = (1 << 2),
> };
>
> +enum shutdown_type {
> + NVME_DISABLE_RESET = 0,
> + NVME_DISABLE_SHUTDOWN_SYNC = 1
> +};
I also don't find this a bit awkward especially when another
enumeration value is added later on.
An alternative approach would be to stick with nvme_disable_ctrl that
accepts shutdown bool, and add nvme_disable_ctrl_nowait() wrapper for it
that call-sites can use. Or even better, because we only have a nowait
variant for shutdown, perhaps we can add nvme_shutdown_ctrl_nowait()
wrapper that accepts only the ctrl instead (this will require splitting
nvme_disable_ctrl to the disabling phase and the waiting phase, but that
is hidden in the core as static functions).
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/5] driver core: Support two-pass driver shutdown
2024-01-04 13:12 ` Sagi Grimberg
@ 2024-01-04 17:27 ` Jeremy Allison
0 siblings, 0 replies; 20+ messages in thread
From: Jeremy Allison @ 2024-01-04 17:27 UTC (permalink / raw)
To: Sagi Grimberg
Cc: Jeremy Allison, tansuresh, hch, gregkh, rafael, bhelgaas,
linux-nvme, jra
On Thu, Jan 04, 2024 at 03:12:11PM +0200, Sagi Grimberg wrote:
>
>
>On 1/3/24 23:04, Jeremy Allison wrote:
>>From: Tanjore Suresh <tansuresh@google.com>
>>
>>This changes the bus driver interface with an additional entry point
>>to enable devices to implement two-pass shutdown. The existing
>>synchronous interface to shutdown is called, and if a shutdown_wait
>>method is defined the device is moved to an alternate list.
>>
>>Once the shutdown method is called for all devices, the
>>shutdown_wait method is then called synchronously for
>>all devices on the alternate list.
>>
>>Signed-off-by: Tanjore Suresh <tansuresh@google.com>
>>Signed-off-by: Jeremy Allison <jallison@ciq.com>
>>---
>> drivers/base/core.c | 37 +++++++++++++++++++++++++++++++++++++
>> include/linux/device/bus.h | 6 +++++-
>> 2 files changed, 42 insertions(+), 1 deletion(-)
>>
>>diff --git a/drivers/base/core.c b/drivers/base/core.c
>>index 67ba592afc77..e1f4c54de3e1 100644
>>--- a/drivers/base/core.c
>>+++ b/drivers/base/core.c
>>@@ -4725,6 +4725,7 @@ EXPORT_SYMBOL_GPL(device_change_owner);
>> void device_shutdown(void)
>> {
>> struct device *dev, *parent;
>>+ LIST_HEAD(shutdown_wait_list);
>> wait_for_device_probe();
>> device_block_probing();
>>@@ -4769,10 +4770,17 @@ void device_shutdown(void)
>> dev_info(dev, "shutdown_pre\n");
>> dev->class->shutdown_pre(dev);
>> }
>>+
>> if (dev->bus && dev->bus->shutdown) {
>> if (initcall_debug)
>> dev_info(dev, "shutdown\n");
>> dev->bus->shutdown(dev);
>>+ /*
>>+ * Only put the device on the shutdown_wait_list
>>+ * if a shutdown_wait() method is also defined.
>>+ */
>>+ if (dev->bus->shutdown_wait)
>>+ list_add(&dev->kobj.entry, &shutdown_wait_list);
>> } else if (dev->driver && dev->driver->shutdown) {
>> if (initcall_debug)
>> dev_info(dev, "shutdown\n");
>>@@ -4789,6 +4797,35 @@ void device_shutdown(void)
>> spin_lock(&devices_kset->list_lock);
>> }
>> spin_unlock(&devices_kset->list_lock);
>>+
>>+ /*
>>+ * Second pass only for devices that have configured
>>+ * a shutdown_wait() method.
>>+ */
>>+ while (!list_empty(&shutdown_wait_list)) {
>>+ dev = list_entry(shutdown_wait_list.next, struct device,
>>+ kobj.entry);
>>+ parent = get_device(dev->parent);
>>+ get_device(dev);
>>+ /*
>>+ * Make sure the device is off the list
>>+ */
>>+ list_del_init(&dev->kobj.entry);
>>+ if (parent)
>>+ device_lock(parent);
>>+ device_lock(dev);
>>+ if (dev->bus && dev->bus->shutdown_wait) {
>
>How can dev->bus or dev->bus->shutdown_wait be null at this
>point?
It's a fair cop :-). That's just paranoia. I'll remove :-).
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 5/5] nvme: Add two-pass shutdown support.
2024-01-04 13:14 ` Sagi Grimberg
@ 2024-01-04 17:30 ` Jeremy Allison
0 siblings, 0 replies; 20+ messages in thread
From: Jeremy Allison @ 2024-01-04 17:30 UTC (permalink / raw)
To: Sagi Grimberg
Cc: Jeremy Allison, tansuresh, hch, gregkh, rafael, bhelgaas,
linux-nvme, jra
On Thu, Jan 04, 2024 at 03:14:16PM +0200, Sagi Grimberg wrote:
>
>On 1/3/24 23:04, Jeremy Allison wrote:
>>This works with the two-pass shutdown mechanism setup for the PCI
>>drivers and participates to provide the shutdown_wait
>>method at the pci_driver structure level.
>>
>>Adds the new NVME_DISABLE_SHUTDOWN_ASYNC to enum shutdown_type.
>>Changes the nvme shutdown() method to set the
>>NVME_CC_SHN_NORMAL bit and then return to the caller when
>>requested by NVME_DISABLE_SHUTDOWN_ASYNC.
>>
>>nvme_shutdown_wait() is added to synchronously wait for
>>the NVME_CSTS_SHST_CMPLT bit.
>>
>>This change speeds up the shutdown in a system which hosts
>>many controllers.
>
>I think that patch 4 can be folded in here.
OK, I thought it was cleaner to have the export as a
separate patch to make it easier to review, but as
you wish.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 3/5] nvme: Change 'bool shutdown' into an enum shutdown_type.
2024-01-04 13:26 ` Sagi Grimberg
@ 2024-01-04 17:43 ` Jeremy Allison
2024-01-04 18:44 ` Jeremy Allison
0 siblings, 1 reply; 20+ messages in thread
From: Jeremy Allison @ 2024-01-04 17:43 UTC (permalink / raw)
To: Sagi Grimberg
Cc: Jeremy Allison, tansuresh, hch, gregkh, rafael, bhelgaas,
linux-nvme, jra
On Thu, Jan 04, 2024 at 03:26:11PM +0200, Sagi Grimberg wrote:
>
>
>On 1/3/24 23:04, Jeremy Allison wrote:
>>Convert nvme_disable_ctrl() and nvme_dev_disable()
>>inside drivers/nvme/host/pci.c to use this:
>>
>>bool shutdown = false == NVME_DISABLE_RESET
>>bool shutdown = true == NVME_DISABLE_SHUTDOWN_SYNC.
>>
>>This will make it easier to add a third request later:
>>NVME_DISABLE_SHUTDOWN_ASNYC
>>
>>As nvme_disable_ctrl() is used outside of drivers/nvme/host/pci.c,
>>convert the callers of nvme_disable_ctrl() to this convention too.
>>
>>Signed-off-by: Jeremy Allison <jallison@ciq.com>
>>---
>> drivers/nvme/host/apple.c | 4 ++--
>> drivers/nvme/host/core.c | 6 +++---
>> drivers/nvme/host/nvme.h | 7 +++++-
>> drivers/nvme/host/pci.c | 44 +++++++++++++++++++-------------------
>> drivers/nvme/host/rdma.c | 3 ++-
>> drivers/nvme/host/tcp.c | 3 ++-
>> drivers/nvme/target/loop.c | 2 +-
>> 7 files changed, 38 insertions(+), 31 deletions(-)
>>
>>diff --git a/drivers/nvme/host/apple.c b/drivers/nvme/host/apple.c
>>index 596bb11eeba5..764639ede41d 100644
>>--- a/drivers/nvme/host/apple.c
>>+++ b/drivers/nvme/host/apple.c
>>@@ -844,8 +844,8 @@ static void apple_nvme_disable(struct apple_nvme *anv, bool shutdown)
>> * NVMe disabled state, after a clean shutdown).
>> */
>> if (shutdown)
>>- nvme_disable_ctrl(&anv->ctrl, shutdown);
>>- nvme_disable_ctrl(&anv->ctrl, false);
>>+ nvme_disable_ctrl(&anv->ctrl, NVME_DISABLE_SHUTDOWN_SYNC);
>>+ nvme_disable_ctrl(&anv->ctrl, NVME_DISABLE_RESET);
>> }
>> WRITE_ONCE(anv->ioq.enabled, false);
>>diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>>index 50818dbcfa1a..e1b2facb7d6a 100644
>>--- a/drivers/nvme/host/core.c
>>+++ b/drivers/nvme/host/core.c
>>@@ -2219,12 +2219,12 @@ static int nvme_wait_ready(struct nvme_ctrl *ctrl, u32 mask, u32 val,
>> return ret;
>> }
>>-int nvme_disable_ctrl(struct nvme_ctrl *ctrl, bool shutdown)
>>+int nvme_disable_ctrl(struct nvme_ctrl *ctrl, enum shutdown_type shutdown_type)
>> {
>> int ret;
>> ctrl->ctrl_config &= ~NVME_CC_SHN_MASK;
>>- if (shutdown)
>>+ if (shutdown_type == NVME_DISABLE_SHUTDOWN_SYNC)
>> ctrl->ctrl_config |= NVME_CC_SHN_NORMAL;
>> else
>> ctrl->ctrl_config &= ~NVME_CC_ENABLE;
>>@@ -2233,7 +2233,7 @@ int nvme_disable_ctrl(struct nvme_ctrl *ctrl, bool shutdown)
>> if (ret)
>> return ret;
>>- if (shutdown) {
>>+ if (shutdown_type == NVME_DISABLE_SHUTDOWN_SYNC) {
>> return nvme_wait_ready(ctrl, NVME_CSTS_SHST_MASK,
>> NVME_CSTS_SHST_CMPLT,
>> ctrl->shutdown_timeout, "shutdown");
>>diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
>>index 6092cc361837..1a748640f2fb 100644
>>--- a/drivers/nvme/host/nvme.h
>>+++ b/drivers/nvme/host/nvme.h
>>@@ -187,6 +187,11 @@ enum {
>> NVME_MPATH_IO_STATS = (1 << 2),
>> };
>>+enum shutdown_type {
>>+ NVME_DISABLE_RESET = 0,
>>+ NVME_DISABLE_SHUTDOWN_SYNC = 1
>>+};
>
>I also don't find this a bit awkward especially when another
>enumeration value is added later on.
>
>An alternative approach would be to stick with nvme_disable_ctrl that
>accepts shutdown bool, and add nvme_disable_ctrl_nowait() wrapper for it
>that call-sites can use. Or even better, because we only have a nowait
>variant for shutdown, perhaps we can add nvme_shutdown_ctrl_nowait()
>wrapper that accepts only the ctrl instead (this will require splitting
>nvme_disable_ctrl to the disabling phase and the waiting phase, but that
>is hidden in the core as static functions).
OK. Let me take a look at this. It would certainly remove the
changes in unrelated files such as drivers/nvme/host/apple.c,
drivers/nvme/host/tcp.c and drivers/nvme/target/loop.c.
This change might take me a little longer to get right, so
please bear with me.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 3/5] nvme: Change 'bool shutdown' into an enum shutdown_type.
2024-01-04 17:43 ` Jeremy Allison
@ 2024-01-04 18:44 ` Jeremy Allison
2024-01-08 17:42 ` Sagi Grimberg
0 siblings, 1 reply; 20+ messages in thread
From: Jeremy Allison @ 2024-01-04 18:44 UTC (permalink / raw)
To: Sagi Grimberg, Jeremy Allison, tansuresh, hch, gregkh, rafael,
bhelgaas, linux-nvme, jra
On Thu, Jan 04, 2024 at 09:43:30AM -0800, Jeremy Allison wrote:
>On Thu, Jan 04, 2024 at 03:26:11PM +0200, Sagi Grimberg wrote:
>>
>>
>>On 1/3/24 23:04, Jeremy Allison wrote:
>>>Convert nvme_disable_ctrl() and nvme_dev_disable()
>>>inside drivers/nvme/host/pci.c to use this:
>>>
>>>bool shutdown = false == NVME_DISABLE_RESET
>>>bool shutdown = true == NVME_DISABLE_SHUTDOWN_SYNC.
>>>
>>>This will make it easier to add a third request later:
>>>NVME_DISABLE_SHUTDOWN_ASNYC
>>>
>>>As nvme_disable_ctrl() is used outside of drivers/nvme/host/pci.c,
>>>convert the callers of nvme_disable_ctrl() to this convention too.
>>>
>>>Signed-off-by: Jeremy Allison <jallison@ciq.com>
>>>---
>>> drivers/nvme/host/apple.c | 4 ++--
>>> drivers/nvme/host/core.c | 6 +++---
>>> drivers/nvme/host/nvme.h | 7 +++++-
>>> drivers/nvme/host/pci.c | 44 +++++++++++++++++++-------------------
>>> drivers/nvme/host/rdma.c | 3 ++-
>>> drivers/nvme/host/tcp.c | 3 ++-
>>> drivers/nvme/target/loop.c | 2 +-
>>> 7 files changed, 38 insertions(+), 31 deletions(-)
>>>
>>>diff --git a/drivers/nvme/host/apple.c b/drivers/nvme/host/apple.c
>>>index 596bb11eeba5..764639ede41d 100644
>>>--- a/drivers/nvme/host/apple.c
>>>+++ b/drivers/nvme/host/apple.c
>>>@@ -844,8 +844,8 @@ static void apple_nvme_disable(struct apple_nvme *anv, bool shutdown)
>>> * NVMe disabled state, after a clean shutdown).
>>> */
>>> if (shutdown)
>>>- nvme_disable_ctrl(&anv->ctrl, shutdown);
>>>- nvme_disable_ctrl(&anv->ctrl, false);
>>>+ nvme_disable_ctrl(&anv->ctrl, NVME_DISABLE_SHUTDOWN_SYNC);
>>>+ nvme_disable_ctrl(&anv->ctrl, NVME_DISABLE_RESET);
>>> }
>>> WRITE_ONCE(anv->ioq.enabled, false);
>>>diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>>>index 50818dbcfa1a..e1b2facb7d6a 100644
>>>--- a/drivers/nvme/host/core.c
>>>+++ b/drivers/nvme/host/core.c
>>>@@ -2219,12 +2219,12 @@ static int nvme_wait_ready(struct nvme_ctrl *ctrl, u32 mask, u32 val,
>>> return ret;
>>> }
>>>-int nvme_disable_ctrl(struct nvme_ctrl *ctrl, bool shutdown)
>>>+int nvme_disable_ctrl(struct nvme_ctrl *ctrl, enum shutdown_type shutdown_type)
>>> {
>>> int ret;
>>> ctrl->ctrl_config &= ~NVME_CC_SHN_MASK;
>>>- if (shutdown)
>>>+ if (shutdown_type == NVME_DISABLE_SHUTDOWN_SYNC)
>>> ctrl->ctrl_config |= NVME_CC_SHN_NORMAL;
>>> else
>>> ctrl->ctrl_config &= ~NVME_CC_ENABLE;
>>>@@ -2233,7 +2233,7 @@ int nvme_disable_ctrl(struct nvme_ctrl *ctrl, bool shutdown)
>>> if (ret)
>>> return ret;
>>>- if (shutdown) {
>>>+ if (shutdown_type == NVME_DISABLE_SHUTDOWN_SYNC) {
>>> return nvme_wait_ready(ctrl, NVME_CSTS_SHST_MASK,
>>> NVME_CSTS_SHST_CMPLT,
>>> ctrl->shutdown_timeout, "shutdown");
>>>diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
>>>index 6092cc361837..1a748640f2fb 100644
>>>--- a/drivers/nvme/host/nvme.h
>>>+++ b/drivers/nvme/host/nvme.h
>>>@@ -187,6 +187,11 @@ enum {
>>> NVME_MPATH_IO_STATS = (1 << 2),
>>> };
>>>+enum shutdown_type {
>>>+ NVME_DISABLE_RESET = 0,
>>>+ NVME_DISABLE_SHUTDOWN_SYNC = 1
>>>+};
>>
>>I also don't find this a bit awkward especially when another
>>enumeration value is added later on.
>>
>>An alternative approach would be to stick with nvme_disable_ctrl that
>>accepts shutdown bool, and add nvme_disable_ctrl_nowait() wrapper for it
>>that call-sites can use. Or even better, because we only have a nowait
>>variant for shutdown, perhaps we can add nvme_shutdown_ctrl_nowait()
>>wrapper that accepts only the ctrl instead (this will require splitting
>>nvme_disable_ctrl to the disabling phase and the waiting phase, but that
>>is hidden in the core as static functions).
>
>OK. Let me take a look at this. It would certainly remove the
>changes in unrelated files such as drivers/nvme/host/apple.c,
>drivers/nvme/host/tcp.c and drivers/nvme/target/loop.c.
>
>This change might take me a little longer to get right, so
>please bear with me.
Yeah, the tricky part is that call chain for nvme_shutdown()
looks like (inside drivers/nvme/host/pci.c):
nvme_shutdown() ->
nvme_disable_prepare_reset() ->
nvme_dev_disable() ->
nvme_disable_ctrl()
Adding a new function nvme_shutdown_ctrl_nowait()
that sets the bit without waiting is trivial, but
the problem is nvme_dev_disable() does a lot of
things before calling nvme_disable_ctrl() which
obviously should not be duplicated in a separate
"twopass" code path.
There needs to be some way to pass a use
"two-pass" flag down from nvme_shutdown()
into nvme_dev_disable().
I guess I could add a second bool flag passed
down from nvme_shutdown() all the way to
nvme_dev_disable(), which then calls the
nvme_shutdown_ctrl_nowait() varient instead
of nvme_disable_ctrl() if both "bool shutdown"
and "bool twopass" are set but that seems kind
of ugly and confusing to me.
How about keeping the enum internal to
drivers/nvme/host/pci.c
and making it:
enum shutdown_type {
NVME_DISABLE_RESET,
NVME_DISABLE_SHUTDOWN,
NVME_DISABLE_SHUTDOWN_TWOPASS
};
and changing 'bool shutdown' to enum shutdown_type
inside the static functions nvme_disable_prepare_reset()
and nvme_dev_disable() only ?
That way we don't have to have the confusing
double bool:
static int nvme_disable_prepare_reset(struct nvme_dev *dev, bool shutdown, bool twopass);
static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown, bool twopass);
instead we have:
static int nvme_disable_prepare_reset(struct nvme_dev *dev, enum shutdown_type shutdown_type);
static void nvme_dev_disable(struct nvme_dev *dev, enum shutdown_type shutdown_type);
function signatures and the use of the enum is
contained solely inside drivers/nvme/host/pci.c.
IMHO this is easier to understand, but again I'm
a newbie at this code so I'm happy to take your
preferences here.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/5] PCI: Support two-pass shutdown
2024-01-03 21:04 ` [PATCH 2/5] PCI: Support two-pass shutdown Jeremy Allison
@ 2024-01-04 18:55 ` Bjorn Helgaas
2024-01-04 19:34 ` Jeremy Allison
0 siblings, 1 reply; 20+ messages in thread
From: Bjorn Helgaas @ 2024-01-04 18:55 UTC (permalink / raw)
To: Jeremy Allison
Cc: jra, tansuresh, hch, gregkh, rafael, bhelgaas, sagi, linux-nvme
On Wed, Jan 03, 2024 at 01:04:02PM -0800, Jeremy Allison wrote:
> From: Tanjore Suresh <tansuresh@google.com>
>
> Enhances the base PCI driver to add support for two-pass
> shutdown. Adds shutdown_wait() method.
>
> Assume a device takes n secs to shutdown. If a machine has been
> populated with M such devices, the total time spent in shutting down
> all the devices will be M * n secs if the shutdown is done
> synchronously. For example, if NVMe PCI Controllers take 5 secs
> to shutdown and if there are 16 such NVMe controllers in a system,
> system will spend a total of 80 secs to shutdown all
> NVMe devices in that system.
>
> In order to speed up the shutdown time, a two-pass interface to
> shutdown has been implemented. The caller calls the shutdown method
> for each device in turn to allow a shutdown request to be sent,
> then the caller walks the list of devices and calls shutdown_wait()
> to synchronously wait for the shutdown to complete.
>
> In the NVMe case above, all 16 devices will process the shutdown
> in parallel taking the total time to shutdown down to the time
> for one NVMe PCI Controller to shut down.
>
> This will significantly reduce the machine reboot time.
>
> Signed-off-by: Tanjore Suresh <tansuresh@google.com>
> Signed-off-by: Jeremy Allison <jallison@ciq.com>
Rewrap commit log to fill 75 columns. Use present tense imperative,
e.g., "Enhance ..." and "Add ..."
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
> ---
> drivers/pci/pci-driver.c | 9 +++++++++
> include/linux/pci.h | 2 ++
> 2 files changed, 11 insertions(+)
>
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index 51ec9e7e784f..257bbb04c806 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -547,6 +547,14 @@ static int pci_restore_standard_config(struct pci_dev *pci_dev)
> }
> #endif /* CONFIG_PM_SLEEP */
>
> +static void pci_device_shutdown_wait(struct device *dev)
> +{
> + struct pci_dev *pci_dev = to_pci_dev(dev);
> + struct pci_driver *drv = pci_dev->driver;
> +
> + if (drv && drv->shutdown_wait)
> + drv->shutdown_wait(pci_dev);
> +}
> #ifdef CONFIG_PM
>
> /* Auxiliary functions used for system resume and run-time resume */
> @@ -1682,6 +1690,7 @@ struct bus_type pci_bus_type = {
> .probe = pci_device_probe,
> .remove = pci_device_remove,
> .shutdown = pci_device_shutdown,
> + .shutdown_wait = pci_device_shutdown_wait,
> .dev_groups = pci_dev_groups,
> .bus_groups = pci_bus_groups,
> .drv_groups = pci_drv_groups,
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 60ca768bc867..e3ba7043c7de 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -917,6 +917,7 @@ struct module;
> * Useful for enabling wake-on-lan (NIC) or changing
> * the power state of a device before reboot.
> * e.g. drivers/net/e100.c.
> + * @shutdown_wait: Optional driver callback to allow two-pass shutdown.
> * @sriov_configure: Optional driver callback to allow configuration of
> * number of VFs to enable via sysfs "sriov_numvfs" file.
> * @sriov_set_msix_vec_count: PF Driver callback to change number of MSI-X
> @@ -948,6 +949,7 @@ struct pci_driver {
> int (*suspend)(struct pci_dev *dev, pm_message_t state); /* Device suspended */
> int (*resume)(struct pci_dev *dev); /* Device woken up */
> void (*shutdown)(struct pci_dev *dev);
> + void (*shutdown_wait)(struct pci_dev *dev);
> int (*sriov_configure)(struct pci_dev *dev, int num_vfs); /* On PF */
> int (*sriov_set_msix_vec_count)(struct pci_dev *vf, int msix_vec_count); /* On PF */
> u32 (*sriov_get_vf_total_msix)(struct pci_dev *pf);
> --
> 2.39.3
>
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Make NVME shutdown two-pass - Version 4
2024-01-03 21:04 Make NVME shutdown two-pass - Version 4 Jeremy Allison
` (5 preceding siblings ...)
2024-01-04 4:48 ` Make NVME shutdown two-pass - Version 4 Chaitanya Kulkarni
@ 2024-01-04 19:00 ` Keith Busch
6 siblings, 0 replies; 20+ messages in thread
From: Keith Busch @ 2024-01-04 19:00 UTC (permalink / raw)
To: Jeremy Allison
Cc: jra, tansuresh, hch, gregkh, rafael, bhelgaas, sagi, linux-nvme
On Wed, Jan 03, 2024 at 01:04:00PM -0800, Jeremy Allison wrote:
> This is version 4 of a patchset originally written by
> Tanjore Suresh <tansuresh@google.com> to make shutdown
> of nvme devices two-pass.
I've tested the series on a few systems with a variety of nvme devices,
and this noticably improves shutdown time, and no problems observed. I
haven't looked at the new code details yet, but it works!
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/5] PCI: Support two-pass shutdown
2024-01-04 18:55 ` Bjorn Helgaas
@ 2024-01-04 19:34 ` Jeremy Allison
0 siblings, 0 replies; 20+ messages in thread
From: Jeremy Allison @ 2024-01-04 19:34 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Jeremy Allison, tansuresh, hch, gregkh, rafael, bhelgaas, sagi,
linux-nvme, jra
On Thu, Jan 04, 2024 at 12:55:24PM -0600, Bjorn Helgaas wrote:
>On Wed, Jan 03, 2024 at 01:04:02PM -0800, Jeremy Allison wrote:
>> From: Tanjore Suresh <tansuresh@google.com>
>>
>> Enhances the base PCI driver to add support for two-pass
>> shutdown. Adds shutdown_wait() method.
>>
>> Assume a device takes n secs to shutdown. If a machine has been
>> populated with M such devices, the total time spent in shutting down
>> all the devices will be M * n secs if the shutdown is done
>> synchronously. For example, if NVMe PCI Controllers take 5 secs
>> to shutdown and if there are 16 such NVMe controllers in a system,
>> system will spend a total of 80 secs to shutdown all
>> NVMe devices in that system.
>>
>> In order to speed up the shutdown time, a two-pass interface to
>> shutdown has been implemented. The caller calls the shutdown method
>> for each device in turn to allow a shutdown request to be sent,
>> then the caller walks the list of devices and calls shutdown_wait()
>> to synchronously wait for the shutdown to complete.
>>
>> In the NVMe case above, all 16 devices will process the shutdown
>> in parallel taking the total time to shutdown down to the time
>> for one NVMe PCI Controller to shut down.
>>
>> This will significantly reduce the machine reboot time.
>>
>> Signed-off-by: Tanjore Suresh <tansuresh@google.com>
>> Signed-off-by: Jeremy Allison <jallison@ciq.com>
>
>Rewrap commit log to fill 75 columns. Use present tense imperative,
>e.g., "Enhance ..." and "Add ..."
>
>Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Done for next rev. Thanks !
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 3/5] nvme: Change 'bool shutdown' into an enum shutdown_type.
2024-01-04 18:44 ` Jeremy Allison
@ 2024-01-08 17:42 ` Sagi Grimberg
2024-01-08 18:41 ` Jeremy Allison
0 siblings, 1 reply; 20+ messages in thread
From: Sagi Grimberg @ 2024-01-08 17:42 UTC (permalink / raw)
To: Jeremy Allison, Jeremy Allison, tansuresh, hch, gregkh, rafael,
bhelgaas, linux-nvme
On 1/4/24 20:44, Jeremy Allison wrote:
> On Thu, Jan 04, 2024 at 09:43:30AM -0800, Jeremy Allison wrote:
>> On Thu, Jan 04, 2024 at 03:26:11PM +0200, Sagi Grimberg wrote:
>>>
>>>
>>> On 1/3/24 23:04, Jeremy Allison wrote:
>>>> Convert nvme_disable_ctrl() and nvme_dev_disable()
>>>> inside drivers/nvme/host/pci.c to use this:
>>>>
>>>> bool shutdown = false == NVME_DISABLE_RESET
>>>> bool shutdown = true == NVME_DISABLE_SHUTDOWN_SYNC.
>>>>
>>>> This will make it easier to add a third request later:
>>>> NVME_DISABLE_SHUTDOWN_ASNYC
>>>>
>>>> As nvme_disable_ctrl() is used outside of drivers/nvme/host/pci.c,
>>>> convert the callers of nvme_disable_ctrl() to this convention too.
>>>>
>>>> Signed-off-by: Jeremy Allison <jallison@ciq.com>
>>>> ---
>>>> drivers/nvme/host/apple.c | 4 ++--
>>>> drivers/nvme/host/core.c | 6 +++---
>>>> drivers/nvme/host/nvme.h | 7 +++++-
>>>> drivers/nvme/host/pci.c | 44 +++++++++++++++++++-------------------
>>>> drivers/nvme/host/rdma.c | 3 ++-
>>>> drivers/nvme/host/tcp.c | 3 ++-
>>>> drivers/nvme/target/loop.c | 2 +-
>>>> 7 files changed, 38 insertions(+), 31 deletions(-)
>>>>
>>>> diff --git a/drivers/nvme/host/apple.c b/drivers/nvme/host/apple.c
>>>> index 596bb11eeba5..764639ede41d 100644
>>>> --- a/drivers/nvme/host/apple.c
>>>> +++ b/drivers/nvme/host/apple.c
>>>> @@ -844,8 +844,8 @@ static void apple_nvme_disable(struct apple_nvme
>>>> *anv, bool shutdown)
>>>> * NVMe disabled state, after a clean shutdown).
>>>> */
>>>> if (shutdown)
>>>> - nvme_disable_ctrl(&anv->ctrl, shutdown);
>>>> - nvme_disable_ctrl(&anv->ctrl, false);
>>>> + nvme_disable_ctrl(&anv->ctrl, NVME_DISABLE_SHUTDOWN_SYNC);
>>>> + nvme_disable_ctrl(&anv->ctrl, NVME_DISABLE_RESET);
>>>> }
>>>> WRITE_ONCE(anv->ioq.enabled, false);
>>>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>>>> index 50818dbcfa1a..e1b2facb7d6a 100644
>>>> --- a/drivers/nvme/host/core.c
>>>> +++ b/drivers/nvme/host/core.c
>>>> @@ -2219,12 +2219,12 @@ static int nvme_wait_ready(struct nvme_ctrl
>>>> *ctrl, u32 mask, u32 val,
>>>> return ret;
>>>> }
>>>> -int nvme_disable_ctrl(struct nvme_ctrl *ctrl, bool shutdown)
>>>> +int nvme_disable_ctrl(struct nvme_ctrl *ctrl, enum shutdown_type
>>>> shutdown_type)
>>>> {
>>>> int ret;
>>>> ctrl->ctrl_config &= ~NVME_CC_SHN_MASK;
>>>> - if (shutdown)
>>>> + if (shutdown_type == NVME_DISABLE_SHUTDOWN_SYNC)
>>>> ctrl->ctrl_config |= NVME_CC_SHN_NORMAL;
>>>> else
>>>> ctrl->ctrl_config &= ~NVME_CC_ENABLE;
>>>> @@ -2233,7 +2233,7 @@ int nvme_disable_ctrl(struct nvme_ctrl *ctrl,
>>>> bool shutdown)
>>>> if (ret)
>>>> return ret;
>>>> - if (shutdown) {
>>>> + if (shutdown_type == NVME_DISABLE_SHUTDOWN_SYNC) {
>>>> return nvme_wait_ready(ctrl, NVME_CSTS_SHST_MASK,
>>>> NVME_CSTS_SHST_CMPLT,
>>>> ctrl->shutdown_timeout, "shutdown");
>>>> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
>>>> index 6092cc361837..1a748640f2fb 100644
>>>> --- a/drivers/nvme/host/nvme.h
>>>> +++ b/drivers/nvme/host/nvme.h
>>>> @@ -187,6 +187,11 @@ enum {
>>>> NVME_MPATH_IO_STATS = (1 << 2),
>>>> };
>>>> +enum shutdown_type {
>>>> + NVME_DISABLE_RESET = 0,
>>>> + NVME_DISABLE_SHUTDOWN_SYNC = 1
>>>> +};
>>>
>>> I also don't find this a bit awkward especially when another
>>> enumeration value is added later on.
>>>
>>> An alternative approach would be to stick with nvme_disable_ctrl that
>>> accepts shutdown bool, and add nvme_disable_ctrl_nowait() wrapper for it
>>> that call-sites can use. Or even better, because we only have a nowait
>>> variant for shutdown, perhaps we can add nvme_shutdown_ctrl_nowait()
>>> wrapper that accepts only the ctrl instead (this will require splitting
>>> nvme_disable_ctrl to the disabling phase and the waiting phase, but that
>>> is hidden in the core as static functions).
>>
>> OK. Let me take a look at this. It would certainly remove the
>> changes in unrelated files such as drivers/nvme/host/apple.c,
>> drivers/nvme/host/tcp.c and drivers/nvme/target/loop.c.
>>
>> This change might take me a little longer to get right, so
>> please bear with me.
>
> Yeah, the tricky part is that call chain for nvme_shutdown()
> looks like (inside drivers/nvme/host/pci.c):
>
> nvme_shutdown() ->
> nvme_disable_prepare_reset() ->
> nvme_dev_disable() ->
> nvme_disable_ctrl()
>
> Adding a new function nvme_shutdown_ctrl_nowait()
> that sets the bit without waiting is trivial, but
> the problem is nvme_dev_disable() does a lot of
> things before calling nvme_disable_ctrl() which
> obviously should not be duplicated in a separate
> "twopass" code path.
Ugh.
>
> There needs to be some way to pass a use
> "two-pass" flag down from nvme_shutdown()
> into nvme_dev_disable()
Ideally nvme_dev_disable can be broken in a way that
nvme_shutdown can have 2-3 helpers and the rest of the
code can remain intact calling nvme_dev_disable (internally
calling these helpers).
>
> I guess I could add a second bool flag passed
> down from nvme_shutdown() all the way to
> nvme_dev_disable(), which then calls the
> nvme_shutdown_ctrl_nowait() varient instead
> of nvme_disable_ctrl() if both "bool shutdown"
> and "bool twopass" are set but that seems kind
> of ugly and confusing to me.
Probably.. Maybe call it dontwait (bool).
And yes, it is ugly, but better than moving the
ugliness to the core.
>
> How about keeping the enum internal to
> drivers/nvme/host/pci.c
> and making it:
>
> enum shutdown_type {
> NVME_DISABLE_RESET,
> NVME_DISABLE_SHUTDOWN,
> NVME_DISABLE_SHUTDOWN_TWOPASS
> };
NVME_PCI_
>
> and changing 'bool shutdown' to enum shutdown_type
> inside the static functions nvme_disable_prepare_reset()
> and nvme_dev_disable() only ?
>
> That way we don't have to have the confusing
> double bool:
Neither are great. I'll defer to Keith and Christoph on this one.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 3/5] nvme: Change 'bool shutdown' into an enum shutdown_type.
2024-01-08 17:42 ` Sagi Grimberg
@ 2024-01-08 18:41 ` Jeremy Allison
0 siblings, 0 replies; 20+ messages in thread
From: Jeremy Allison @ 2024-01-08 18:41 UTC (permalink / raw)
To: Sagi Grimberg
Cc: Jeremy Allison, tansuresh, hch, gregkh, rafael, bhelgaas,
linux-nvme, jra
On Mon, Jan 08, 2024 at 07:42:09PM +0200, Sagi Grimberg wrote:
>On 1/4/24 20:44, Jeremy Allison wrote:
>>
>>There needs to be some way to pass a use
>>"two-pass" flag down from nvme_shutdown()
>>into nvme_dev_disable()
>Ideally nvme_dev_disable can be broken in a way that
>nvme_shutdown can have 2-3 helpers and the rest of the
>code can remain intact calling nvme_dev_disable (internally
>calling these helpers).
I think the safest way to do this is to leave
nvme_dev_disable() alone as much as possible,
and just have it call a nvme_shutdown_ctrl_nowait()
function when the bool/enum tells it it's in a
two-pass situation.
I have code that does this, I'm just waiting
for the 'bool or enum' decision.
>>I guess I could add a second bool flag passed
>>down from nvme_shutdown() all the way to
>>nvme_dev_disable(), which then calls the
>>nvme_shutdown_ctrl_nowait() varient instead
>>of nvme_disable_ctrl() if both "bool shutdown"
>>and "bool twopass" are set but that seems kind
>>of ugly and confusing to me.
>
>Probably.. Maybe call it dontwait (bool).
>And yes, it is ugly, but better than moving the
>ugliness to the core.
No arguments from me :-).
>>How about keeping the enum internal to
>>drivers/nvme/host/pci.c
>>and making it:
>>
>>enum shutdown_type {
>> NVME_DISABLE_RESET,
>> NVME_DISABLE_SHUTDOWN,
>> NVME_DISABLE_SHUTDOWN_TWOPASS
>>};
>
>NVME_PCI_
Got it !
>>and changing 'bool shutdown' to enum shutdown_type
>>inside the static functions nvme_disable_prepare_reset()
>>and nvme_dev_disable() only ?
>>
>>That way we don't have to have the confusing
>>double bool:
>
>Neither are great. I'll defer to Keith and Christoph on this one.
OK, I'll use whatever Keith and Christoph agree on.
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2024-01-08 18:41 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-03 21:04 Make NVME shutdown two-pass - Version 4 Jeremy Allison
2024-01-03 21:04 ` [PATCH 1/5] driver core: Support two-pass driver shutdown Jeremy Allison
2024-01-04 13:12 ` Sagi Grimberg
2024-01-04 17:27 ` Jeremy Allison
2024-01-03 21:04 ` [PATCH 2/5] PCI: Support two-pass shutdown Jeremy Allison
2024-01-04 18:55 ` Bjorn Helgaas
2024-01-04 19:34 ` Jeremy Allison
2024-01-03 21:04 ` [PATCH 3/5] nvme: Change 'bool shutdown' into an enum shutdown_type Jeremy Allison
2024-01-04 13:26 ` Sagi Grimberg
2024-01-04 17:43 ` Jeremy Allison
2024-01-04 18:44 ` Jeremy Allison
2024-01-08 17:42 ` Sagi Grimberg
2024-01-08 18:41 ` Jeremy Allison
2024-01-03 21:04 ` [PATCH 4/5] nvme: Export nvme_wait_ready() Jeremy Allison
2024-01-03 21:04 ` [PATCH 5/5] nvme: Add two-pass shutdown support Jeremy Allison
2024-01-04 13:14 ` Sagi Grimberg
2024-01-04 17:30 ` Jeremy Allison
2024-01-04 4:48 ` Make NVME shutdown two-pass - Version 4 Chaitanya Kulkarni
2024-01-04 6:38 ` Jeremy Allison
2024-01-04 19:00 ` Keith Busch
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox