Linux-NVME Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Hannes Reinecke <hare@suse.de>
To: Sagi Grimberg <sagi@grimberg.me>, Christoph Hellwig <hch@lst.de>
Cc: Keith Busch <keith.busch@wdc.com>, linux-nvme@lists.infradead.org
Subject: Re: [PATCHv3] nvme: generate uevent once a multipath namespace is operational again
Date: Tue, 18 May 2021 08:59:09 +0200	[thread overview]
Message-ID: <2174fba2-9c43-b92c-ea73-da59cd91d3ca@suse.de> (raw)
In-Reply-To: <32bda760-9d71-c063-565e-e3a79b8c3135@grimberg.me>

On 5/17/21 7:49 PM, Sagi Grimberg wrote:
> 
>> When fast_io_fail_tmo is set I/O will be aborted while recovery is
>> still ongoing. This causes MD to set the namespace to failed, and
>> no futher I/O will be submitted to that namespace.
>>
>> However, once the recovery succeeds and the namespace becomes
>> operational again the NVMe subsystem doesn't send a notification,
>> so MD cannot automatically reinstate operation and requires
>> manual interaction.
>>
>> This patch will send a KOBJ_CHANGE uevent per multipathed namespace
>> once the underlying controller transitions to LIVE, allowing an automatic
>> MD reassembly with these udev rules:
>>
>> /etc/udev/rules.d/65-md-auto-re-add.rules:
>> SUBSYSTEM!="block", GOTO="md_end"
>>
>> ACTION!="change", GOTO="md_end"
>> ENV{ID_FS_TYPE}!="linux_raid_member", GOTO="md_end"
>> PROGRAM="/sbin/md_raid_auto_readd.sh $devnode"
>> LABEL="md_end"
>>
>> /sbin/md_raid_auto_readd.sh:
>>
>> MDADM=/sbin/mdadm
>> DEVNAME=$1
>>
>> export $(${MDADM} --examine --export ${DEVNAME})
>>
>> if [ -z "${MD_UUID}" ]; then
>>      exit 1
>> fi
>>
>> UUID_LINK=$(readlink /dev/disk/by-id/md-uuid-${MD_UUID})
>> MD_DEVNAME=${UUID_LINK##*/}
>> export $(${MDADM} --detail --export /dev/${MD_DEVNAME})
>> if [ -z "${MD_METADATA}" ] ; then
>>      exit 1
>> fi
>> if [ $(cat /sys/block/${MD_DEVNAME}/md/degraded) != 1 ]; then
>>      echo "${MD_DEVNAME}: array not degraded, nothing to do"
>>      exit 0
>> fi
>> MD_STATE=$(cat /sys/block/${MD_DEVNAME}/md/array_state)
>> if [ ${MD_STATE} != "clean" ] ; then
>>      echo "${MD_DEVNAME}: array state ${MD_STATE}, cannot re-add"
>>      exit 1
>> fi
>> MD_VARNAME="MD_DEVICE_dev_${DEVNAME##*/}_ROLE"
>> if [ ${!MD_VARNAME} = "spare" ] ; then
>>      ${MDADM} --manage /dev/${MD_DEVNAME} --re-add ${DEVNAME}
>> fi
> 
> Is this auto-readd stuff going to util-linux?
> 
>>
>> Changes to v2:
>> - Add udev rules example to description
>> Changes to v1:
>> - use disk_uevent() as suggested by hch
> 
> This belongs after the '---' separator..
> 
>>
>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>> ---
>>   drivers/nvme/host/multipath.c | 7 +++++--
>>   1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/nvme/host/multipath.c 
>> b/drivers/nvme/host/multipath.c
>> index 0551796517e6..ecc99bd5f8ad 100644
>> --- a/drivers/nvme/host/multipath.c
>> +++ b/drivers/nvme/host/multipath.c
>> @@ -100,8 +100,11 @@ void nvme_kick_requeue_lists(struct nvme_ctrl *ctrl)
>>       down_read(&ctrl->namespaces_rwsem);
>>       list_for_each_entry(ns, &ctrl->namespaces, list) {
>> -        if (ns->head->disk)
>> -            kblockd_schedule_work(&ns->head->requeue_work);
>> +        if (!ns->head->disk)
>> +            continue;
>> +        kblockd_schedule_work(&ns->head->requeue_work);
>> +        if (ctrl->state == NVME_CTRL_LIVE)
>> +            disk_uevent(ns->head->disk, KOBJ_CHANGE);
>>       }
> 
> I asked this on v1, is this only needed for mpath devices?

Yes; we need to send the KOBJ_CHANGE event on the mpath device as it's 
not backed by hardware. The only non-multipathed devices I've seen so 
far are PCI devices where events are generated by the PCI device itself.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

  reply	other threads:[~2021-05-18  6:59 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-17  8:32 [PATCHv3] nvme: generate uevent once a multipath namespace is operational again Hannes Reinecke
2021-05-17 17:49 ` Sagi Grimberg
2021-05-18  6:59   ` Hannes Reinecke [this message]
2021-05-18  7:05     ` Christoph Hellwig
2021-05-18  7:49       ` Hannes Reinecke
2021-05-18 18:00     ` Sagi Grimberg
2021-05-18 18:09       ` Hannes Reinecke
2021-05-18 18:39         ` Sagi Grimberg
2021-05-18 18:49           ` Hannes Reinecke
2021-05-18 19:04             ` Sagi Grimberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2174fba2-9c43-b92c-ea73-da59cd91d3ca@suse.de \
    --to=hare@suse.de \
    --cc=hch@lst.de \
    --cc=keith.busch@wdc.com \
    --cc=linux-nvme@lists.infradead.org \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox