* [PATCH v3 1/4] nvme_fc: remove reinit_request routine
2018-06-12 23:50 [PATCH v3 0/4] nvme_fc: asynchronous controller create and simple discovery James Smart
@ 2018-06-12 23:50 ` James Smart
2018-06-13 11:28 ` Christoph Hellwig
2018-06-12 23:50 ` [PATCH v3 2/4] nvme_fc: change controllers first connect to use reconnect path James Smart
` (2 subsequent siblings)
3 siblings, 1 reply; 11+ messages in thread
From: James Smart @ 2018-06-12 23:50 UTC (permalink / raw)
The reinit_request routine is not necessary. Remove support for the
op callback.
As all that nvme_reinit_tagset() does is itterate and call the
reinit routine, it too has no purpose. Remove the call.
Signed-off-by: James Smart <james.smart at broadcom.com>
Reviewed-by: Hannes Reinecke <hare at suse.com>
---
A separate effort will look at other areas in nvme that may be
able to remove the reinit interface as well.
drivers/nvme/host/fc.c | 20 --------------------
1 file changed, 20 deletions(-)
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index 0bad65803271..a4e508d2800b 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -1470,21 +1470,6 @@ nvme_fc_xmt_disconnect_assoc(struct nvme_fc_ctrl *ctrl)
static void nvme_fc_error_recovery(struct nvme_fc_ctrl *ctrl, char *errmsg);
-static int
-nvme_fc_reinit_request(void *data, struct request *rq)
-{
- struct nvme_fc_fcp_op *op = blk_mq_rq_to_pdu(rq);
- struct nvme_fc_cmd_iu *cmdiu = &op->cmd_iu;
-
- memset(cmdiu, 0, sizeof(*cmdiu));
- cmdiu->scsi_id = NVME_CMD_SCSI_ID;
- cmdiu->fc_id = NVME_CMD_FC_ID;
- cmdiu->iu_len = cpu_to_be16(sizeof(*cmdiu) / sizeof(u32));
- memset(&op->rsp_iu, 0, sizeof(op->rsp_iu));
-
- return 0;
-}
-
static void
__nvme_fc_exit_request(struct nvme_fc_ctrl *ctrl,
struct nvme_fc_fcp_op *op)
@@ -2502,10 +2487,6 @@ nvme_fc_reinit_io_queues(struct nvme_fc_ctrl *ctrl)
nvme_fc_init_io_queues(ctrl);
- ret = nvme_reinit_tagset(&ctrl->ctrl, ctrl->ctrl.tagset);
- if (ret)
- goto out_free_io_queues;
-
ret = nvme_fc_create_hw_io_queues(ctrl, ctrl->ctrl.sqsize + 1);
if (ret)
goto out_free_io_queues;
@@ -2917,7 +2898,6 @@ static const struct nvme_ctrl_ops nvme_fc_ctrl_ops = {
.submit_async_event = nvme_fc_submit_async_event,
.delete_ctrl = nvme_fc_delete_ctrl,
.get_address = nvmf_get_address,
- .reinit_request = nvme_fc_reinit_request,
};
static void
--
2.13.1
^ permalink raw reply related [flat|nested] 11+ messages in thread* [PATCH v3 2/4] nvme_fc: change controllers first connect to use reconnect path
2018-06-12 23:50 [PATCH v3 0/4] nvme_fc: asynchronous controller create and simple discovery James Smart
2018-06-12 23:50 ` [PATCH v3 1/4] nvme_fc: remove reinit_request routine James Smart
@ 2018-06-12 23:50 ` James Smart
2018-06-13 11:31 ` Christoph Hellwig
2018-06-12 23:50 ` [PATCH v3 3/4] nvme_fc: fix nulling of queue data on reconnect James Smart
2018-06-12 23:50 ` [PATCH v3 4/4] nvme_fc: add 'nvme_discovery' sysfs attribute to fc transport device James Smart
3 siblings, 1 reply; 11+ messages in thread
From: James Smart @ 2018-06-12 23:50 UTC (permalink / raw)
Current code follows the framework that has been in the transports
from the beginning where initial link-side controller connect occurs
as part of "creating the controller". Thus that first connect fully
talks to the controller and obtains values that can then be used in
for blk-mq setup, etc. It also means that everything about the
controller is fully know before the "create controller" call returns.
This has several weaknesses:
- The initial create_ctrl call made by the cli will block for a long
time as wire transactions are performed synchronously. This delay
becomes longer if errors occur or connectivity is lost and retries
need to be performed.
- Code wise, it means there is a separate connect path for initial
controller connect vs the (same) steps used in the reconnect path.
- And as there's separate paths, it means there's separate error
handling and retry logic. It also plays havoc with the NEW state
(should transition out of it after successful initial connect) vs
the RESETTING and CONNECTING (reconnect) states that want to be
transitioned to on error.
- As there's separate paths, to recover from errors and disruptions,
it requires separate recovery/retry paths as well and can severely
convolute the controller state.
- Given the duration of the initial create_ctrl() call, automation
of dynamic discovery can't use simplistic udev rules. Instead,
more convoluted use of systemd is required.
This patch reworks the fc transport to use the same connect paths
for the initial connection as it uses for reconnect. This makes a
single path for error recovery and handling. Given that the initial
connect becomes asynchronous on a background work thread, the
create_ctrl() now returns very quickly, allowing the simplified
auto-connnect scripting.
This patch:
- Removes the driving of the initial connect and replaces it with
a state transition to CONNECTING and initiating the reconnect
thread. A dummy state transition of RESETTING had to be traversed
as a direct transtion of NEW->CONNECTING is not allowed. Given
that the controller is "new", the RESETTING transition is a simple
no-op. Once in the reconnecting thread, the normal behaviors of
ctrl_loss_tmo (max_retries * connect_delay) and dev_loss_tmo will
apply before the controller is torn down.
- Only if the state transitions couldn't be traversed and the
reconnect thread not scheduled, will the controller be torn down
while in create_ctrl.
- The prior code used the controller state of NEW to indicate
whether request queues had been initialized or not. For the admin
queue, the request queue is always created, so there's no need to
check a state. For IO queues, change to tracking whether a successful
io request queue create has occurred (e.g. 1st successful connect).
- The initial controller id is initialized to the dynamic controller
id used in the initial connect message. It will be overwritten by
the real controller id once the controller is connected on the wire.
Note: this patch is dependent on these changes that have been
added to the nvme-cli utility:
- nvme-cli: Wait for device file if not present after successful add_ctrl
- nvme-cli: Add ioctl retry support for "connect-all"
With this patch, a simplistic udev rule such as one of the following
options below, may now be used to support auto connectivity:
option A:
ACTION=="change", SUBSYSTEM=="fc", ENV{FC_EVENT}=="nvmediscovery", \
ENV{NVMEFC_HOST_TRADDR}=="*", ENV{NVMEFC_TRADDR}=="*", \
RUN+="/usr/sbin/nvme connect-all --transport=fc --host-traddr=$env{NVMEFC_HOST_TRADDR} --traddr=$env{NVMEFC_TRADDR}"
option B:
SUBSYSTEM!="fc", GOTO="nvme_autoconnect_end"
ACTION!="change", GOTO="nvme_autoconnect_end"
ENV{FC_EVENT}!="nvmediscovery", GOTO="nvme_autoconnect_end"
ENV{NVMEFC_HOST_TRADDR}!="?*", GOTO="nvme_autoconnect_end"
ENV{NVMEFC_TRADDR}!="?*", GOTO="nvme_autoconnect_end"
RUN+="/usr/sbin/nvme connect-all -t fc --host-traddr=$env{NVMEFC_HOST_TRADDR} -a $env{NVMEFC_TRADDR}"
LABEL="nvme_autoconnect_end"
Signed-off-by: James Smart <james.smart at broadcom.com>
Reviewed-by: Hannes Reinecke <hare at suse.com>
---
drivers/nvme/host/fc.c | 102 ++++++++++++++++++++++---------------------------
1 file changed, 45 insertions(+), 57 deletions(-)
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index a4e508d2800b..7407b464800f 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -142,6 +142,7 @@ struct nvme_fc_ctrl {
struct nvme_fc_rport *rport;
u32 cnum;
+ bool ioq_live;
bool assoc_active;
u64 association_id;
@@ -2448,6 +2449,8 @@ nvme_fc_create_io_queues(struct nvme_fc_ctrl *ctrl)
if (ret)
goto out_delete_hw_queues;
+ ctrl->ioq_live = true;
+
return 0;
out_delete_hw_queues:
@@ -2596,8 +2599,7 @@ nvme_fc_create_association(struct nvme_fc_ctrl *ctrl)
if (ret)
goto out_delete_hw_queue;
- if (ctrl->ctrl.state != NVME_CTRL_NEW)
- blk_mq_unquiesce_queue(ctrl->ctrl.admin_q);
+ blk_mq_unquiesce_queue(ctrl->ctrl.admin_q);
ret = nvmf_connect_admin_queue(&ctrl->ctrl);
if (ret)
@@ -2670,7 +2672,7 @@ nvme_fc_create_association(struct nvme_fc_ctrl *ctrl)
*/
if (ctrl->ctrl.queue_count > 1) {
- if (ctrl->ctrl.state == NVME_CTRL_NEW)
+ if (!ctrl->ioq_live)
ret = nvme_fc_create_io_queues(ctrl);
else
ret = nvme_fc_reinit_io_queues(ctrl);
@@ -2757,8 +2759,7 @@ nvme_fc_delete_association(struct nvme_fc_ctrl *ctrl)
* use blk_mq_tagset_busy_itr() and the transport routine to
* terminate the exchanges.
*/
- if (ctrl->ctrl.state != NVME_CTRL_NEW)
- blk_mq_quiesce_queue(ctrl->ctrl.admin_q);
+ blk_mq_quiesce_queue(ctrl->ctrl.admin_q);
blk_mq_tagset_busy_iter(&ctrl->admin_tag_set,
nvme_fc_terminate_exchange, &ctrl->ctrl);
@@ -2914,7 +2915,7 @@ nvme_fc_connect_ctrl_work(struct work_struct *work)
nvme_fc_reconnect_or_delete(ctrl, ret);
else
dev_info(ctrl->ctrl.device,
- "NVME-FC{%d}: controller reconnect complete\n",
+ "NVME-FC{%d}: controller connect complete\n",
ctrl->cnum);
}
@@ -2962,7 +2963,7 @@ nvme_fc_init_ctrl(struct device *dev, struct nvmf_ctrl_options *opts,
{
struct nvme_fc_ctrl *ctrl;
unsigned long flags;
- int ret, idx, retry;
+ int ret, idx;
if (!(rport->remoteport.port_role &
(FC_PORT_ROLE_NVME_DISCOVERY | FC_PORT_ROLE_NVME_TARGET))) {
@@ -2989,11 +2990,13 @@ nvme_fc_init_ctrl(struct device *dev, struct nvmf_ctrl_options *opts,
}
ctrl->ctrl.opts = opts;
+ ctrl->ctrl.nr_reconnects = 0;
INIT_LIST_HEAD(&ctrl->ctrl_list);
ctrl->lport = lport;
ctrl->rport = rport;
ctrl->dev = lport->dev;
ctrl->cnum = idx;
+ ctrl->ioq_live = false;
ctrl->assoc_active = false;
init_waitqueue_head(&ctrl->ioabort_wait);
@@ -3012,6 +3015,7 @@ nvme_fc_init_ctrl(struct device *dev, struct nvmf_ctrl_options *opts,
ctrl->ctrl.sqsize = opts->queue_size - 1;
ctrl->ctrl.kato = opts->kato;
+ ctrl->ctrl.cntlid = 0xffff;
ret = -ENOMEM;
ctrl->queues = kcalloc(ctrl->ctrl.queue_count,
@@ -3061,68 +3065,52 @@ nvme_fc_init_ctrl(struct device *dev, struct nvmf_ctrl_options *opts,
list_add_tail(&ctrl->ctrl_list, &rport->ctrl_list);
spin_unlock_irqrestore(&rport->lock, flags);
- /*
- * It's possible that transactions used to create the association
- * may fail. Examples: CreateAssociation LS or CreateIOConnection
- * LS gets dropped/corrupted/fails; or a frame gets dropped or a
- * command times out for one of the actions to init the controller
- * (Connect, Get/Set_Property, Set_Features, etc). Many of these
- * transport errors (frame drop, LS failure) inherently must kill
- * the association. The transport is coded so that any command used
- * to create the association (prior to a LIVE state transition
- * while NEW or CONNECTING) will fail if it completes in error or
- * times out.
- *
- * As such: as the connect request was mostly likely due to a
- * udev event that discovered the remote port, meaning there is
- * not an admin or script there to restart if the connect
- * request fails, retry the initial connection creation up to
- * three times before giving up and declaring failure.
- */
- for (retry = 0; retry < 3; retry++) {
- ret = nvme_fc_create_association(ctrl);
- if (!ret)
- break;
+ if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_RESETTING) ||
+ !nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_CONNECTING)) {
+ dev_err(ctrl->ctrl.device,
+ "NVME-FC{%d}: failed to init ctrl state\n", ctrl->cnum);
+ goto fail_ctrl;
}
- if (ret) {
- nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_DELETING);
- cancel_work_sync(&ctrl->ctrl.reset_work);
- cancel_delayed_work_sync(&ctrl->connect_work);
+ nvme_get_ctrl(&ctrl->ctrl);
- /* couldn't schedule retry - fail out */
+ if (!queue_delayed_work(nvme_wq, &ctrl->connect_work, 0)) {
+ nvme_put_ctrl(&ctrl->ctrl);
dev_err(ctrl->ctrl.device,
- "NVME-FC{%d}: Connect retry failed\n", ctrl->cnum);
+ "NVME-FC{%d}: failed to schedule initial connect\n",
+ ctrl->cnum);
+ goto fail_ctrl;
+ }
- ctrl->ctrl.opts = NULL;
+ dev_info(ctrl->ctrl.device,
+ "NVME-FC{%d}: new ctrl: NQN \"%s\"\n",
+ ctrl->cnum, ctrl->ctrl.opts->subsysnqn);
- /* initiate nvme ctrl ref counting teardown */
- nvme_uninit_ctrl(&ctrl->ctrl);
+ return &ctrl->ctrl;
- /* Remove core ctrl ref. */
- nvme_put_ctrl(&ctrl->ctrl);
+fail_ctrl:
+ nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_DELETING);
+ cancel_work_sync(&ctrl->ctrl.reset_work);
+ cancel_delayed_work_sync(&ctrl->connect_work);
- /* as we're past the point where we transition to the ref
- * counting teardown path, if we return a bad pointer here,
- * the calling routine, thinking it's prior to the
- * transition, will do an rport put. Since the teardown
- * path also does a rport put, we do an extra get here to
- * so proper order/teardown happens.
- */
- nvme_fc_rport_get(rport);
+ ctrl->ctrl.opts = NULL;
- if (ret > 0)
- ret = -EIO;
- return ERR_PTR(ret);
- }
+ /* initiate nvme ctrl ref counting teardown */
+ nvme_uninit_ctrl(&ctrl->ctrl);
- nvme_get_ctrl(&ctrl->ctrl);
+ /* Remove core ctrl ref. */
+ nvme_put_ctrl(&ctrl->ctrl);
- dev_info(ctrl->ctrl.device,
- "NVME-FC{%d}: new ctrl: NQN \"%s\"\n",
- ctrl->cnum, ctrl->ctrl.opts->subsysnqn);
+ /* as we're past the point where we transition to the ref
+ * counting teardown path, if we return a bad pointer here,
+ * the calling routine, thinking it's prior to the
+ * transition, will do an rport put. Since the teardown
+ * path also does a rport put, we do an extra get here to
+ * so proper order/teardown happens.
+ */
+ nvme_fc_rport_get(rport);
- return &ctrl->ctrl;
+ return ERR_PTR(-EIO);
out_cleanup_admin_q:
blk_cleanup_queue(ctrl->ctrl.admin_q);
--
2.13.1
^ permalink raw reply related [flat|nested] 11+ messages in thread* [PATCH v3 2/4] nvme_fc: change controllers first connect to use reconnect path
2018-06-12 23:50 ` [PATCH v3 2/4] nvme_fc: change controllers first connect to use reconnect path James Smart
@ 2018-06-13 11:31 ` Christoph Hellwig
2018-06-13 15:15 ` James Smart
0 siblings, 1 reply; 11+ messages in thread
From: Christoph Hellwig @ 2018-06-13 11:31 UTC (permalink / raw)
> Note: this patch is dependent on these changes that have been
> added to the nvme-cli utility:
> - nvme-cli: Wait for device file if not present after successful add_ctrl
> - nvme-cli: Add ioctl retry support for "connect-all"
Which means it breaks backwards compatibility. We can't do that.
IFF we actually end up wanting async connect it needs to be an
explicit opt-it with a new option.
In the meantime I really like this code in general, so please resubmit
it with a flush_delayed_work to make it synchronus.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v3 2/4] nvme_fc: change controllers first connect to use reconnect path
2018-06-13 11:31 ` Christoph Hellwig
@ 2018-06-13 15:15 ` James Smart
0 siblings, 0 replies; 11+ messages in thread
From: James Smart @ 2018-06-13 15:15 UTC (permalink / raw)
On 6/13/2018 4:31 AM, Christoph Hellwig wrote:
>> Note: this patch is dependent on these changes that have been
>> added to the nvme-cli utility:
>> - nvme-cli: Wait for device file if not present after successful add_ctrl
>> - nvme-cli: Add ioctl retry support for "connect-all"
>
> Which means it breaks backwards compatibility. We can't do that.
I don't know what compatibility you are talking about. Both of those
issues could be hit even with the non-async implementation.
The cli has already been updated and addressed them.
>
> IFF we actually end up wanting async connect it needs to be an
> explicit opt-it with a new option.
>
> In the meantime I really like this code in general, so please resubmit
> it with a flush_delayed_work to make it synchronus.
I disagree. No user is aware of the async vs sync nature. It's the
non-delay of the create thread that allows the simplistic udev scripts.
-- james
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v3 3/4] nvme_fc: fix nulling of queue data on reconnect
2018-06-12 23:50 [PATCH v3 0/4] nvme_fc: asynchronous controller create and simple discovery James Smart
2018-06-12 23:50 ` [PATCH v3 1/4] nvme_fc: remove reinit_request routine James Smart
2018-06-12 23:50 ` [PATCH v3 2/4] nvme_fc: change controllers first connect to use reconnect path James Smart
@ 2018-06-12 23:50 ` James Smart
2018-06-13 11:32 ` Christoph Hellwig
2018-06-12 23:50 ` [PATCH v3 4/4] nvme_fc: add 'nvme_discovery' sysfs attribute to fc transport device James Smart
3 siblings, 1 reply; 11+ messages in thread
From: James Smart @ 2018-06-12 23:50 UTC (permalink / raw)
The reconnect path is calling the init routines to clear a queue
structure. But the queue structure has state that perhaps needs
to persist as long as the controller is live.
Remove the nvme_fc_init_queue() calls on reconnect.
The nvme_fc_free_queue() calls will clear state bits and reset
any relevant queue state for a new connection.
Signed-off-by: James Smart <james.smart at broadcom.com>
Reviewed-by: Hannes Reinecke <hare at suse.com>
---
drivers/nvme/host/fc.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index 7407b464800f..0b9cc767a157 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -1879,6 +1879,7 @@ nvme_fc_free_queue(struct nvme_fc_queue *queue)
*/
queue->connection_id = 0;
+ atomic_set(&queue->csn, 1);
}
static void
@@ -2468,7 +2469,7 @@ nvme_fc_create_io_queues(struct nvme_fc_ctrl *ctrl)
}
static int
-nvme_fc_reinit_io_queues(struct nvme_fc_ctrl *ctrl)
+nvme_fc_recreate_io_queues(struct nvme_fc_ctrl *ctrl)
{
struct nvmf_ctrl_options *opts = ctrl->ctrl.opts;
unsigned int nr_io_queues;
@@ -2488,8 +2489,6 @@ nvme_fc_reinit_io_queues(struct nvme_fc_ctrl *ctrl)
if (ctrl->ctrl.queue_count == 1)
return 0;
- nvme_fc_init_io_queues(ctrl);
-
ret = nvme_fc_create_hw_io_queues(ctrl, ctrl->ctrl.sqsize + 1);
if (ret)
goto out_free_io_queues;
@@ -2587,8 +2586,6 @@ nvme_fc_create_association(struct nvme_fc_ctrl *ctrl)
* Create the admin queue
*/
- nvme_fc_init_queue(ctrl, 0);
-
ret = __nvme_fc_create_hw_queue(ctrl, &ctrl->queues[0], 0,
NVME_AQ_DEPTH);
if (ret)
@@ -2675,7 +2672,7 @@ nvme_fc_create_association(struct nvme_fc_ctrl *ctrl)
if (!ctrl->ioq_live)
ret = nvme_fc_create_io_queues(ctrl);
else
- ret = nvme_fc_reinit_io_queues(ctrl);
+ ret = nvme_fc_recreate_io_queues(ctrl);
if (ret)
goto out_term_aen_ops;
}
@@ -3023,6 +3020,8 @@ nvme_fc_init_ctrl(struct device *dev, struct nvmf_ctrl_options *opts,
if (!ctrl->queues)
goto out_free_ida;
+ nvme_fc_init_queue(ctrl, 0);
+
memset(&ctrl->admin_tag_set, 0, sizeof(ctrl->admin_tag_set));
ctrl->admin_tag_set.ops = &nvme_fc_admin_mq_ops;
ctrl->admin_tag_set.queue_depth = NVME_AQ_MQ_TAG_DEPTH;
--
2.13.1
^ permalink raw reply related [flat|nested] 11+ messages in thread* [PATCH v3 3/4] nvme_fc: fix nulling of queue data on reconnect
2018-06-12 23:50 ` [PATCH v3 3/4] nvme_fc: fix nulling of queue data on reconnect James Smart
@ 2018-06-13 11:32 ` Christoph Hellwig
0 siblings, 0 replies; 11+ messages in thread
From: Christoph Hellwig @ 2018-06-13 11:32 UTC (permalink / raw)
On Tue, Jun 12, 2018@04:50:27PM -0700, James Smart wrote:
> The reconnect path is calling the init routines to clear a queue
> structure. But the queue structure has state that perhaps needs
> to persist as long as the controller is live.
>
> Remove the nvme_fc_init_queue() calls on reconnect.
> The nvme_fc_free_queue() calls will clear state bits and reset
> any relevant queue state for a new connection.
>
> Signed-off-by: James Smart <james.smart at broadcom.com>
> Reviewed-by: Hannes Reinecke <hare at suse.com>
Looks fine to me.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v3 4/4] nvme_fc: add 'nvme_discovery' sysfs attribute to fc transport device
2018-06-12 23:50 [PATCH v3 0/4] nvme_fc: asynchronous controller create and simple discovery James Smart
` (2 preceding siblings ...)
2018-06-12 23:50 ` [PATCH v3 3/4] nvme_fc: fix nulling of queue data on reconnect James Smart
@ 2018-06-12 23:50 ` James Smart
2018-06-13 11:33 ` Christoph Hellwig
3 siblings, 1 reply; 11+ messages in thread
From: James Smart @ 2018-06-12 23:50 UTC (permalink / raw)
The fc transport device should allow for a rediscovery, as userspace
might have lost the events. Example is udev events not handled during
system startup.
This patch add a sysfs entry 'nvme_discovery' on the fc class to
have it replay all udev discovery events for all local port/remote
port address pairs.
Signed-off-by: James Smart <james.smart at broadcom.com>
Reviewed-by: Hannes Reinecke <hare at suse.com>
Cc: Ewan Milne <emilne at redhat.com>
Cc: Johannes Thumshirn <jthumshirn at suse.com>
---
drivers/nvme/host/fc.c | 114 +++++++++++++++++++++++++++++++++++++++++++++----
1 file changed, 105 insertions(+), 9 deletions(-)
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index 0b9cc767a157..6e8b4a0a130e 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -122,6 +122,7 @@ struct nvme_fc_rport {
struct list_head endp_list; /* for lport->endp_list */
struct list_head ctrl_list;
struct list_head ls_req_list;
+ struct list_head disc_list;
struct device *dev; /* physical device for dma */
struct nvme_fc_lport *lport;
spinlock_t lock;
@@ -210,7 +211,6 @@ static DEFINE_IDA(nvme_fc_ctrl_cnt);
* These items are short-term. They will eventually be moved into
* a generic FC class. See comments in module init.
*/
-static struct class *fc_class;
static struct device *fc_udev_device;
@@ -507,6 +507,7 @@ nvme_fc_free_rport(struct kref *ref)
list_del(&rport->endp_list);
spin_unlock_irqrestore(&nvme_fc_lock, flags);
+ WARN_ON(!list_empty(&rport->disc_list));
ida_simple_remove(&lport->endp_cnt, rport->remoteport.port_num);
kfree(rport);
@@ -694,6 +695,7 @@ nvme_fc_register_remoteport(struct nvme_fc_local_port *localport,
INIT_LIST_HEAD(&newrec->endp_list);
INIT_LIST_HEAD(&newrec->ctrl_list);
INIT_LIST_HEAD(&newrec->ls_req_list);
+ INIT_LIST_HEAD(&newrec->disc_list);
kref_init(&newrec->ref);
atomic_set(&newrec->act_ctrl_cnt, 0);
spin_lock_init(&newrec->lock);
@@ -3252,6 +3254,100 @@ static struct nvmf_transport_ops nvme_fc_transport = {
.create_ctrl = nvme_fc_create_ctrl,
};
+static void nvme_fc_discovery_unwind(struct list_head *lheadp)
+{
+ unsigned long flags;
+ struct nvme_fc_lport *lport;
+ struct nvme_fc_rport *rport, *tmp_rport;
+
+ list_for_each_entry_safe(rport, tmp_rport,
+ lheadp, disc_list) {
+ spin_lock_irqsave(&nvme_fc_lock, flags);
+ list_del_init(&rport->disc_list);
+ spin_unlock_irqrestore(&nvme_fc_lock, flags);
+ lport = rport->lport;
+ /* signal discovery. Won't hurt if it repeats */
+ nvme_fc_signal_discovery_scan(lport, rport);
+ nvme_fc_rport_put(rport);
+ nvme_fc_lport_put(lport);
+ }
+}
+
+/* Arbitrary successive failures max. With lots of subsystems could be high */
+#define DISCOVERY_MAX_FAIL 20
+
+static ssize_t nvme_fc_nvme_discovery_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ unsigned long flags;
+ LIST_HEAD(nvme_fc_disc_list);
+ struct nvme_fc_lport *lport;
+ struct nvme_fc_rport *rport, *tmp_rport;
+ int failcnt = 0;
+
+restart:
+ spin_lock_irqsave(&nvme_fc_lock, flags);
+ list_for_each_entry(lport, &nvme_fc_lport_list, port_list) {
+ list_for_each_entry(rport, &lport->endp_list, endp_list) {
+ if (!nvme_fc_lport_get(lport))
+ continue;
+ if (!nvme_fc_rport_get(rport)) {
+ /*
+ * This is a temporary condition, so upon
+ * restart this node will be gone from the
+ * list.
+ */
+ spin_unlock_irqrestore(&nvme_fc_lock, flags);
+ nvme_fc_lport_put(lport);
+ nvme_fc_discovery_unwind(&nvme_fc_disc_list);
+ if (failcnt++ < DISCOVERY_MAX_FAIL)
+ goto restart;
+ pr_err("nvme_discovery: too many reference "
+ "failures\n");
+ return 0;
+ }
+ if (list_empty(&rport->disc_list))
+ list_add_tail(&rport->disc_list,
+ &nvme_fc_disc_list);
+ }
+ }
+ spin_unlock_irqrestore(&nvme_fc_lock, flags);
+
+ list_for_each_entry_safe(rport, tmp_rport,
+ &nvme_fc_disc_list, disc_list) {
+ spin_lock_irqsave(&nvme_fc_lock, flags);
+ list_del_init(&rport->disc_list);
+ spin_unlock_irqrestore(&nvme_fc_lock, flags);
+ lport = rport->lport;
+ nvme_fc_signal_discovery_scan(lport, rport);
+ nvme_fc_rport_put(rport);
+ nvme_fc_lport_put(lport);
+ }
+ return count;
+}
+static DEVICE_ATTR(nvme_discovery, 0200, NULL, nvme_fc_nvme_discovery_store);
+
+static struct attribute *nvme_fc_attrs[] = {
+ &dev_attr_nvme_discovery.attr,
+ NULL
+};
+
+static struct attribute_group nvme_fc_attr_group = {
+ .attrs = nvme_fc_attrs,
+};
+
+static const struct attribute_group *nvme_fc_attr_groups[] = {
+ &nvme_fc_attr_group,
+ NULL
+};
+
+static struct class fc_class = {
+ .name = "fc",
+ .dev_groups = nvme_fc_attr_groups,
+ .owner = THIS_MODULE,
+};
+
static int __init nvme_fc_init_module(void)
{
int ret;
@@ -3270,16 +3366,16 @@ static int __init nvme_fc_init_module(void)
* put in place, this code will move to a more generic
* location for the class.
*/
- fc_class = class_create(THIS_MODULE, "fc");
- if (IS_ERR(fc_class)) {
+ ret = class_register(&fc_class);
+ if (ret) {
pr_err("couldn't register class fc\n");
- return PTR_ERR(fc_class);
+ return ret;
}
/*
* Create a device for the FC-centric udev events
*/
- fc_udev_device = device_create(fc_class, NULL, MKDEV(0, 0), NULL,
+ fc_udev_device = device_create(&fc_class, NULL, MKDEV(0, 0), NULL,
"fc_udev_device");
if (IS_ERR(fc_udev_device)) {
pr_err("couldn't create fc_udev device!\n");
@@ -3294,9 +3390,9 @@ static int __init nvme_fc_init_module(void)
return 0;
out_destroy_device:
- device_destroy(fc_class, MKDEV(0, 0));
+ device_destroy(&fc_class, MKDEV(0, 0));
out_destroy_class:
- class_destroy(fc_class);
+ class_unregister(&fc_class);
return ret;
}
@@ -3311,8 +3407,8 @@ static void __exit nvme_fc_exit_module(void)
ida_destroy(&nvme_fc_local_port_cnt);
ida_destroy(&nvme_fc_ctrl_cnt);
- device_destroy(fc_class, MKDEV(0, 0));
- class_destroy(fc_class);
+ device_destroy(&fc_class, MKDEV(0, 0));
+ class_unregister(&fc_class);
}
module_init(nvme_fc_init_module);
--
2.13.1
^ permalink raw reply related [flat|nested] 11+ messages in thread* [PATCH v3 4/4] nvme_fc: add 'nvme_discovery' sysfs attribute to fc transport device
2018-06-12 23:50 ` [PATCH v3 4/4] nvme_fc: add 'nvme_discovery' sysfs attribute to fc transport device James Smart
@ 2018-06-13 11:33 ` Christoph Hellwig
2018-06-13 15:29 ` James Smart
0 siblings, 1 reply; 11+ messages in thread
From: Christoph Hellwig @ 2018-06-13 11:33 UTC (permalink / raw)
On Tue, Jun 12, 2018@04:50:28PM -0700, James Smart wrote:
> The fc transport device should allow for a rediscovery, as userspace
> might have lost the events. Example is udev events not handled during
> system startup.
Please stop adding hacks for your non-standard discovery methods.
> +static void nvme_fc_discovery_unwind(struct list_head *lheadp)
> +{
> + unsigned long flags;
> + struct nvme_fc_lport *lport;
> + struct nvme_fc_rport *rport, *tmp_rport;
> +
> + list_for_each_entry_safe(rport, tmp_rport,
> + lheadp, disc_list) {
> + spin_lock_irqsave(&nvme_fc_lock, flags);
> + list_del_init(&rport->disc_list);
> + spin_unlock_irqrestore(&nvme_fc_lock, flags);
Without the lock taken around the list iteration this is racy.
^ permalink raw reply [flat|nested] 11+ messages in thread* [PATCH v3 4/4] nvme_fc: add 'nvme_discovery' sysfs attribute to fc transport device
2018-06-13 11:33 ` Christoph Hellwig
@ 2018-06-13 15:29 ` James Smart
0 siblings, 0 replies; 11+ messages in thread
From: James Smart @ 2018-06-13 15:29 UTC (permalink / raw)
On 6/13/2018 4:33 AM, Christoph Hellwig wrote:
> On Tue, Jun 12, 2018@04:50:28PM -0700, James Smart wrote:
>> The fc transport device should allow for a rediscovery, as userspace
>> might have lost the events. Example is udev events not handled during
>> system startup.
>
> Please stop adding hacks for your non-standard discovery methods.
Please stop these boorish anti-fc comments. There is nothing
non-standard or hackish about fc's discovery methods.
This patch simply addresses an issue that arises as nvme discovery isn't
in-kernel thus has a relationship to the boot process.
>
>> +static void nvme_fc_discovery_unwind(struct list_head *lheadp)
>> +{
>> + unsigned long flags;
>> + struct nvme_fc_lport *lport;
>> + struct nvme_fc_rport *rport, *tmp_rport;
>> +
>> + list_for_each_entry_safe(rport, tmp_rport,
>> + lheadp, disc_list) {
>> + spin_lock_irqsave(&nvme_fc_lock, flags);
>> + list_del_init(&rport->disc_list);
>> + spin_unlock_irqrestore(&nvme_fc_lock, flags);
>
> Without the lock taken around the list iteration this is racy.
>
how so ?
this list was built up ahead of time solely for this routine, gets are
done on all objects as they were added to the list, the list isn't
changed except for deleting elements from the list by this routine, and
a port can't get on this list twice. I don't see the race.
-- james
^ permalink raw reply [flat|nested] 11+ messages in thread