From: Hannes Reinecke <hare@suse.de>
To: Christoph Hellwig <hch@lst.de>
Cc: Keith Busch <keith.busch@wdc.com>,
Sagi Grimberg <sagi@grimberg.me>,
linux-nvme@lists.infradead.org
Subject: Re: [PATCH 2/2] nvme: add 'queue_if_no_path' semantics
Date: Tue, 6 Oct 2020 15:45:01 +0200 [thread overview]
Message-ID: <ce2c93e1-ba38-cebb-33b3-d506116a61aa@suse.de> (raw)
In-Reply-To: <00e75643-d422-ca12-1648-02ca89044182@suse.de>
On 10/6/20 3:30 PM, Hannes Reinecke wrote:
> On 10/6/20 10:39 AM, Christoph Hellwig wrote:
>> On Tue, Oct 06, 2020 at 10:29:49AM +0200, Hannes Reinecke wrote:
>>>> All multipath devices should behave the same. No special casing for
>>>> PCIe, please.
>>>>
>>> Even if the default behaviour breaks PCI hotplug?
>>
>> Why would it "break" PCI hotplug?
>>
> When running under MD RAID:
> Before hotplug:
> # nvme list
> Node SN Model Namespace
> Usage Format FW Rev
> ---------------- --------------------
> ---------------------------------------- ---------
> -------------------------- ---------------- --------
> /dev/nvme0n1 SLESNVME1 QEMU NVMe Ctrl 1
> 17.18 GB / 17.18 GB 512 B + 0 B 1.0
> /dev/nvme1n1 SLESNVME2 QEMU NVMe Ctrl 1
> 4.29 GB / 4.29 GB 512 B + 0 B 1.0
> /dev/nvme2n1 SLESNVME3 QEMU NVMe Ctrl 1
> 4.29 GB / 4.29 GB 512 B + 0 B 1.0
> After hotplug:
>
> # nvme list
> Node SN Model Namespace
> Usage Format FW Rev
> ---------------- --------------------
> ---------------------------------------- ---------
> -------------------------- ---------------- --------
> /dev/nvme0n1 SLESNVME1 QEMU NVMe Ctrl 1
> 17.18 GB / 17.18 GB 512 B + 0 B 1.0
> /dev/nvme1n1 SLESNVME2 QEMU NVMe Ctrl -1
> 0.00 B / 0.00 B 1 B + 0 B 1.0
> /dev/nvme1n2 SLESNVME2 QEMU NVMe Ctrl 1
> 4.29 GB / 4.29 GB 512 B + 0 B 1.0
> /dev/nvme2n1 SLESNVME3 QEMU NVMe Ctrl 1
> 4.29 GB / 4.29 GB 512 B + 0 B 1.0
>
> And MD hasn't been notified that the device is gone:
> # cat /proc/mdstat
> Personalities : [raid10]
> md0 : active raid10 nvme2n1[1] nvme1n1[0]
> 4189184 blocks super 1.2 2 near-copies [2/2] [UU]
> bitmap: 0/1 pages [0KB], 65536KB chunk
>
> unused devices: <none>
>
> Once I do some I/O to it MD recognized a faulty device:
>
> # cat /proc/mdstat
> Personalities : [raid10]
> md0 : active raid10 nvme2n1[1] nvme1n1[0](F)
> 4189184 blocks super 1.2 2 near-copies [2/1] [_U]
> bitmap: 0/1 pages [0KB], 65536KB chunk
>
> unused devices: <none>
>
> but the re-added device isn't added to the MD RAID.
> In fact, it has been assigned a _different_ namespace ID:
>
> [ 904.299065] pcieport 0000:00:08.0: pciehp: Slot(0-1): Card present
> [ 904.299067] pcieport 0000:00:08.0: pciehp: Slot(0-1): Link Up
> [ 904.435314] pci 0000:02:00.0: [8086:5845] type 00 class 0x010802
> [ 904.435523] pci 0000:02:00.0: reg 0x10: [mem 0x00000000-0x00001fff
> 64bit]
> [ 904.435676] pci 0000:02:00.0: reg 0x20: [mem 0x00000000-0x00000fff]
> [ 904.436982] pci 0000:02:00.0: BAR 0: assigned [mem
> 0xc1200000-0xc1201fff 64bit]
> [ 904.437086] pci 0000:02:00.0: BAR 4: assigned [mem
> 0xc1202000-0xc1202fff]
> [ 904.437118] pcieport 0000:00:08.0: PCI bridge to [bus 02]
> [ 904.437137] pcieport 0000:00:08.0: bridge window [io 0x7000-0x7fff]
> [ 904.439024] pcieport 0000:00:08.0: bridge window [mem
> 0xc1200000-0xc13fffff]
> [ 904.440229] pcieport 0000:00:08.0: bridge window [mem
> 0x802000000-0x803ffffff 64bit pref]
> [ 904.447150] nvme nvme3: pci function 0000:02:00.0
> [ 904.447487] nvme 0000:02:00.0: enabling device (0000 -> 0002)
> [ 904.458880] nvme nvme3: 1/0/0 default/read/poll queues
> [ 904.461296] nvme1n2: detected capacity change from 0 to 4294967296
>
> and the 'old', pre-hotplug device still lingers on in the 'nvme list'
> output.
>
Compare that to the 'standard', non-CMIC nvme, where with the same setup
MD would detach the nvme on its own:
# cat /proc/mdstat
Personalities : [raid10]
md127 : active (auto-read-only) raid10 nvme2n1[1]
4189184 blocks super 1.2 2 near-copies [2/1] [_U]
bitmap: 0/1 pages [0KB], 65536KB chunk
unused devices: <none>
# nvme list
Node SN Model
Namespace Usage Format FW Rev
---------------- --------------------
---------------------------------------- ---------
-------------------------- ---------------- --------
/dev/nvme0n1 SLESNVME1 QEMU NVMe Ctrl
1 17.18 GB / 17.18 GB 512 B + 0 B 1.0
/dev/nvme2n1 SLESNVME3 QEMU NVMe Ctrl
1 4.29 GB / 4.29 GB 512 B + 0 B 1.0
And yes, this is exactly the same setup, the only difference being the
CMIC setting for the NVMe device.
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
next prev parent reply other threads:[~2020-10-06 13:45 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-10-05 12:44 [RFC PATCHv3 0/2] nvme: queue_if_no_path functionality Hannes Reinecke
2020-10-05 12:44 ` [PATCH 1/2] nvme-mpath: delete disk after last connection Hannes Reinecke
2020-10-05 12:50 ` Christoph Hellwig
2021-03-05 20:06 ` Sagi Grimberg
2021-03-04 14:34 ` Daniel Wagner
2020-10-05 12:45 ` [PATCH 2/2] nvme: add 'queue_if_no_path' semantics Hannes Reinecke
2020-10-05 12:52 ` Christoph Hellwig
2020-10-06 5:48 ` Hannes Reinecke
2020-10-06 7:51 ` Christoph Hellwig
2020-10-06 8:07 ` Hannes Reinecke
2020-10-06 8:27 ` Christoph Hellwig
2020-10-06 8:29 ` Hannes Reinecke
2020-10-06 8:39 ` Christoph Hellwig
2020-10-06 13:30 ` Hannes Reinecke
2020-10-06 13:45 ` Hannes Reinecke [this message]
2021-03-05 20:31 ` Sagi Grimberg
2021-03-08 13:17 ` Hannes Reinecke
2021-03-15 17:21 ` Sagi Grimberg
2020-10-06 17:41 ` Keith Busch
2021-03-05 20:11 ` Sagi Grimberg
2021-03-11 12:41 ` Hannes Reinecke
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ce2c93e1-ba38-cebb-33b3-d506116a61aa@suse.de \
--to=hare@suse.de \
--cc=hch@lst.de \
--cc=keith.busch@wdc.com \
--cc=linux-nvme@lists.infradead.org \
--cc=sagi@grimberg.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.