From: Hannes Reinecke <hare@suse.de>
To: Sagi Grimberg <sagi@grimberg.me>, Christoph Hellwig <hch@lst.de>
Cc: Keith Busch <keith.busch@wdc.com>, linux-nvme@lists.infradead.org
Subject: Re: [PATCHv3] nvme: generate uevent once a multipath namespace is operational again
Date: Tue, 18 May 2021 08:59:09 +0200 [thread overview]
Message-ID: <2174fba2-9c43-b92c-ea73-da59cd91d3ca@suse.de> (raw)
In-Reply-To: <32bda760-9d71-c063-565e-e3a79b8c3135@grimberg.me>
On 5/17/21 7:49 PM, Sagi Grimberg wrote:
>
>> When fast_io_fail_tmo is set I/O will be aborted while recovery is
>> still ongoing. This causes MD to set the namespace to failed, and
>> no futher I/O will be submitted to that namespace.
>>
>> However, once the recovery succeeds and the namespace becomes
>> operational again the NVMe subsystem doesn't send a notification,
>> so MD cannot automatically reinstate operation and requires
>> manual interaction.
>>
>> This patch will send a KOBJ_CHANGE uevent per multipathed namespace
>> once the underlying controller transitions to LIVE, allowing an automatic
>> MD reassembly with these udev rules:
>>
>> /etc/udev/rules.d/65-md-auto-re-add.rules:
>> SUBSYSTEM!="block", GOTO="md_end"
>>
>> ACTION!="change", GOTO="md_end"
>> ENV{ID_FS_TYPE}!="linux_raid_member", GOTO="md_end"
>> PROGRAM="/sbin/md_raid_auto_readd.sh $devnode"
>> LABEL="md_end"
>>
>> /sbin/md_raid_auto_readd.sh:
>>
>> MDADM=/sbin/mdadm
>> DEVNAME=$1
>>
>> export $(${MDADM} --examine --export ${DEVNAME})
>>
>> if [ -z "${MD_UUID}" ]; then
>> exit 1
>> fi
>>
>> UUID_LINK=$(readlink /dev/disk/by-id/md-uuid-${MD_UUID})
>> MD_DEVNAME=${UUID_LINK##*/}
>> export $(${MDADM} --detail --export /dev/${MD_DEVNAME})
>> if [ -z "${MD_METADATA}" ] ; then
>> exit 1
>> fi
>> if [ $(cat /sys/block/${MD_DEVNAME}/md/degraded) != 1 ]; then
>> echo "${MD_DEVNAME}: array not degraded, nothing to do"
>> exit 0
>> fi
>> MD_STATE=$(cat /sys/block/${MD_DEVNAME}/md/array_state)
>> if [ ${MD_STATE} != "clean" ] ; then
>> echo "${MD_DEVNAME}: array state ${MD_STATE}, cannot re-add"
>> exit 1
>> fi
>> MD_VARNAME="MD_DEVICE_dev_${DEVNAME##*/}_ROLE"
>> if [ ${!MD_VARNAME} = "spare" ] ; then
>> ${MDADM} --manage /dev/${MD_DEVNAME} --re-add ${DEVNAME}
>> fi
>
> Is this auto-readd stuff going to util-linux?
>
>>
>> Changes to v2:
>> - Add udev rules example to description
>> Changes to v1:
>> - use disk_uevent() as suggested by hch
>
> This belongs after the '---' separator..
>
>>
>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>> ---
>> drivers/nvme/host/multipath.c | 7 +++++--
>> 1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/nvme/host/multipath.c
>> b/drivers/nvme/host/multipath.c
>> index 0551796517e6..ecc99bd5f8ad 100644
>> --- a/drivers/nvme/host/multipath.c
>> +++ b/drivers/nvme/host/multipath.c
>> @@ -100,8 +100,11 @@ void nvme_kick_requeue_lists(struct nvme_ctrl *ctrl)
>> down_read(&ctrl->namespaces_rwsem);
>> list_for_each_entry(ns, &ctrl->namespaces, list) {
>> - if (ns->head->disk)
>> - kblockd_schedule_work(&ns->head->requeue_work);
>> + if (!ns->head->disk)
>> + continue;
>> + kblockd_schedule_work(&ns->head->requeue_work);
>> + if (ctrl->state == NVME_CTRL_LIVE)
>> + disk_uevent(ns->head->disk, KOBJ_CHANGE);
>> }
>
> I asked this on v1, is this only needed for mpath devices?
Yes; we need to send the KOBJ_CHANGE event on the mpath device as it's
not backed by hardware. The only non-multipathed devices I've seen so
far are PCI devices where events are generated by the PCI device itself.
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
next prev parent reply other threads:[~2021-05-18 6:59 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-05-17 8:32 [PATCHv3] nvme: generate uevent once a multipath namespace is operational again Hannes Reinecke
2021-05-17 17:49 ` Sagi Grimberg
2021-05-18 6:59 ` Hannes Reinecke [this message]
2021-05-18 7:05 ` Christoph Hellwig
2021-05-18 7:49 ` Hannes Reinecke
2021-05-18 18:00 ` Sagi Grimberg
2021-05-18 18:09 ` Hannes Reinecke
2021-05-18 18:39 ` Sagi Grimberg
2021-05-18 18:49 ` Hannes Reinecke
2021-05-18 19:04 ` Sagi Grimberg
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2174fba2-9c43-b92c-ea73-da59cd91d3ca@suse.de \
--to=hare@suse.de \
--cc=hch@lst.de \
--cc=keith.busch@wdc.com \
--cc=linux-nvme@lists.infradead.org \
--cc=sagi@grimberg.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.