From: jsmart2021@gmail.com (James Smart)
Subject: [PATCH v2 5/5] nvme_fc: add controller reset support
Date: Mon, 24 Apr 2017 12:42:23 -0700 [thread overview]
Message-ID: <39cbb355-9def-c1ad-4e97-7e3d0ea74bfb@gmail.com> (raw)
In-Reply-To: <ab68b3c3-7cce-336e-108d-062e4076c449@grimberg.me>
On 4/20/2017 5:56 AM, Sagi Grimberg wrote:
>
>> From: James Smart <jsmart2021 at gmail.com>
>>
> It would be very helpful to split this patch into several pieces:
> 1. fixes and cleanups
> 2. preps for teardown and establishment
> 3. delete support
> 4. reset support
> 5. reconnect support
>
> As is, its very hard to review this patch...
Understood. As Christoph has merged it as is into nvme-4.12, I'll
address the issues that have been pointed out.
>> +#define NVME_FC_MAX_CONNECT_ATTEMPTS 1
>> +
>
> Why 1?
Agreed, pretty weak. Needs to be tied into your new
nvmef_should_reconnect() logic.
>> +static void
>> +nvme_fc_error_recovery(struct nvme_fc_ctrl *ctrl, char *errmsg)
>> +{
>> + dev_warn(ctrl->ctrl.device,
>> + "NVME-FC{%d}: transport association error detected: %s\n",
>> + ctrl->cnum, errmsg);
>> + dev_info(ctrl->ctrl.device,
>> + "NVME-FC{%d}: resetting controller\n", ctrl->cnum);
>>
>> - nvme_fc_destroy_admin_queue(ctrl);
>> + if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_RECONNECTING)) {
>> + dev_err(ctrl->ctrl.device,
>> + "NVME-FC{%d}: error_recovery: Couldn't change state "
>> + "to RECONNECTING\n", ctrl->cnum);
>> + return;
>> }
>>
>> - nvme_fc_ctrl_put(ctrl);
>> + if (!queue_work(nvme_fc_wq, &ctrl->reset_work))
>> + dev_err(ctrl->ctrl.device,
>> + "NVME-FC{%d}: error_recovery: Failed to schedule "
>> + "reset work\n", ctrl->cnum);
>> }
>
> Don't you want to stop the queues and fail inflight I/O?
Those are the first actions as part of the reset work. Failing
inflight i/o - no, as that's a rather large amount of work for FC.
Stopping the queues - perhaps. I'll look. In general, shouldn't really
matter as doing it now or elongating the time window to reset work
doesn't change the race. it lowers the number of other ios that may
report error, but does not eliminate it. So if some will hit it, it
doesn't matter.
>> + /* wait for all io that had to be aborted */
>> + spin_lock_irqsave(&ctrl->lock, flags);
>> + while (ctrl->iocnt) {
>> + spin_unlock_irqrestore(&ctrl->lock, flags);
>> + msleep(1000);
>> + spin_lock_irqsave(&ctrl->lock, flags);
>> + }
>
> struct completion for this?
Agreed.
>> +static void
>> +nvme_fc_delete_ctrl_work(struct work_struct *work)
>> +{
>> + struct nvme_fc_ctrl *ctrl =
>> + container_of(work, struct nvme_fc_ctrl, delete_work);
>> +
>> + cancel_work_sync(&ctrl->reset_work);
>> + cancel_delayed_work_sync(&ctrl->connect_work);
>> +
>> + /*
>> + * kill the association on the link side. this will block
>> + * waiting for io to terminate
>> + */
>> + nvme_fc_delete_association(ctrl);
>> +
>> + /*
>> + * tear down the controller
>> + * This will result in the last reference on the nvme ctrl to
>> + * expire, calling the transport nvme_fc_nvme_ctrl_freed() callback.
>> + * From there, the transport will tear down it's logical queues and
>> + * association.
>> + */
>
> Aren't you taking an extra ref in nvme_fc_del_nvme_ctrl?
I think you're right.
>> +/*
>> + * called by the nvme core layer, for sysfs interface that requests
>> + * a reset of the nvme controller
>> + */
>> +static int
>> +nvme_fc_reset_nvme_ctrl(struct nvme_ctrl *nctrl)
>> +{
>> + struct nvme_fc_ctrl *ctrl = to_fc_ctrl(nctrl);
>> +
>> + dev_warn(ctrl->ctrl.device,
>> + "NVME-FC{%d}: admin requested controller reset\n", ctrl->cnum);
>
> Isn't this warn to chatty?
I don't think so. Resets are a lot like scsi device/target/host resets.
They are destructive enough to link-side io and transactions on the link
that you want to know when occur. You also want to know why they
occurred. I think it's significant to know that a reset was induced by
an admin request, not a transport-detected event. Admin requests should
be rare so this shouldn't be an issue.
-- james
prev parent reply other threads:[~2017-04-24 19:42 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-04-11 18:35 [PATCH v2 0/5] nvme_fc: add ctlr reset support and abort fixes jsmart2021
2017-04-11 18:35 ` [PATCH v2 1/5] nvme_fc: Move LS's to rport jsmart2021
2017-04-20 12:37 ` Sagi Grimberg
2017-04-11 18:35 ` [PATCH v2 2/5] nvme_fc: Add ls aborts on remote port teardown jsmart2021
2017-04-20 12:38 ` Sagi Grimberg
2017-04-11 18:35 ` [PATCH v2 3/5] nvme_fc: fix command id check jsmart2021
2017-04-20 12:38 ` Sagi Grimberg
2017-04-11 18:35 ` [PATCH v2 4/5] nvme_fc: add aen abort to teardown jsmart2021
2017-04-20 12:43 ` Sagi Grimberg
2017-04-11 18:35 ` [PATCH v2 5/5] nvme_fc: add controller reset support jsmart2021
2017-04-20 12:56 ` Sagi Grimberg
2017-04-24 7:18 ` Christoph Hellwig
2017-04-24 19:42 ` James Smart [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=39cbb355-9def-c1ad-4e97-7e3d0ea74bfb@gmail.com \
--to=jsmart2021@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).