From: Klaus Jensen <its@irrelevant.dk>
To: Damien Hedde <dhedde@kalrayinc.com>
Cc: "Hannes Reinecke" <hare@suse.de>,
"Philippe Mathieu-Daudé" <philmd@linaro.org>,
qemu-block@nongnu.org, "Keith Busch" <kbusch@kernel.org>,
qemu-devel <qemu-devel@nongnu.org>,
"Titouan Huard" <thuard@kalrayinc.com>,
"Markus Armbruster" <armbru@redhat.com>
Subject: Re: NVME hotplug support ?
Date: Mon, 29 Jan 2024 14:37:28 +0100 [thread overview]
Message-ID: <ZbeqGMdCw2QlHccd@cormorant.local> (raw)
In-Reply-To: <7e35528b-cc66-d2f1-e3e3-7dece5620c52@kalrayinc.com>
[-- Attachment #1: Type: text/plain, Size: 4982 bytes --]
On Jan 29 14:13, Damien Hedde wrote:
>
>
> On 1/24/24 08:47, Hannes Reinecke wrote:
> > On 1/24/24 07:52, Philippe Mathieu-Daudé wrote:
> > > Hi Hannes,
> > >
> > > [+Markus as QOM/QDev rubber duck]
> > >
> > > On 23/1/24 13:40, Hannes Reinecke wrote:
> > > > On 1/23/24 11:59, Damien Hedde wrote:
> > > > > Hi all,
> > > > >
> > > > > We are currently looking into hotplugging nvme devices and
> > > > > it is currently not possible:
> > > > > When nvme was introduced 2 years ago, the feature was disabled.
> > > > > > commit cc6fb6bc506e6c47ed604fcb7b7413dff0b7d845
> > > > > > Author: Klaus Jensen
> > > > > > Date: Tue Jul 6 10:48:40 2021 +0200
> > > > > >
> > > > > > hw/nvme: mark nvme-subsys non-hotpluggable
> > > > > > We currently lack the infrastructure to handle
> > > > > > subsystem hotplugging, so
> > > > > > disable it.
> > > > >
> > > > > Do someone know what's lacking or anyone have some tips/idea
> > > > > of what we should develop to add the support ?
> > > > >
> > > > Problem is that the object model is messed up. In qemu
> > > > namespaces are attached to controllers, which in turn are
> > > > children of the PCI device.
> > > > There are subsystems, but these just reference the controller.
> > > >
> > > > So if you hotunplug the PCI device you detach/destroy the
> > > > controller and detach the namespaces from the controller.
> > > > But if you hotplug the PCI device again the NVMe controller will
> > > > be attached to the PCI device, but the namespace are still be
> > > > detached.
> > > >
> > > > Klaus said he was going to fix that, and I dimly remember some patches
> > > > floating around. But apparently it never went anywhere.
> > > >
> > > > Fundamental problem is that the NVMe hierarchy as per spec is
> > > > incompatible with the qemu object model; qemu requires a strict
> > > > tree model where every object has exactly _one_ parent.
> > >
> > > The modelling problem is not clear to me.
> > > Do you have an example of how the NVMe hierarchy should be?
> > >
> > Sure.
> >
> > As per NVMe spec we have this hierarchy:
> >
> > ---> subsys ---
> > | |
> > | V
> > controller namespaces
> >
> > There can be several controllers, and several
> > namespaces.
> > The initiator (ie the linux 'nvme' driver) connects
> > to a controller, queries the subsystem for the attached
> > namespaces, and presents each namespace as a block device.
> >
> > For Qemu we have the problem that every device _must_ be
> > a direct descendant of the parent (expressed by the fact
> > that each 'parent' object is embedded in the device object).
> >
> > So if we were to present a NVMe PCI device, the controller
> > must be derived from the PCI device:
> >
> > pci -> controller
> >
> > but now we have to express the NVMe hierarchy, too:
> >
> > pci -> ctrl1 -> subsys1 -> namespace1
> >
> > which actually works.
> > We can easily attach several namespaces:
> >
> > pci -> ctrl1 ->subsys1 -> namespace2
> >
> > For a single controller and a single subsystem.
> > However, as mentioned above, there can be _several_
> > controllers attached to the same subsystem.
> > So we can express the second controller:
> >
> > pci -> ctrl2
> >
> > but we cannot attach the controller to 'subsys1'
> > as then 'subsys1' would need to be derived from
> > 'ctrl2', and not (as it is now) from 'ctrl1'.
> >
> > The most logical step would be to have 'subsystems'
> > their own entity, independent of any controllers.
> > But then the block devices (which are derived from
> > the namespaces) could not be traced back
> > to the PCI device, and a PCI hotplug would not
> > 'automatically' disconnect the nvme block devices.
> >
> > Plus the subsystem would be independent from the NVMe
> > PCI devices, so you could have a subsystem with
> > no controllers attached. And one would wonder who
> > should be responsible for cleaning up that.
> >
>
> Thanks for the details !
>
> My use case is the simple one with no nvme subsystem/namespaces:
> - hotplug a pci nvme device (nvme controller) as in the following CLI (which
> automatically put the drive into a default namespace)
>
> ./qemu-system-aarch64 -nographic -M virt \
> -drive file=nvme0.disk,if=none,id=nvme-drive0 \
> -device nvme,serial=nvme0,id=nvmedev0,drive=nvme-drive0
>
AFAIK, you just need a pci root port to plug the device into.
-drive file=nvme0.disk,if=none,id=nvme-drive0 \
-device "pcie-root-port,id=pcie_root_port0,chassis=1,slot=0" \
-device nvme,serial=nvme0,id=nvmedev0,drive=nvme-drive0
Then, you can use the qemu monitor to `device_del nvmedev0` and add it
with `device_add nvme,serial=nvme0,id=nvmedev0,drive=nvme-drive0`. The
"drive" (blockdev) will stick around after the device_del.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
next prev parent reply other threads:[~2024-01-29 14:11 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-23 10:59 NVME hotplug support ? Damien Hedde
2024-01-23 11:15 ` Klaus Jensen
2024-01-23 12:40 ` Hannes Reinecke
2024-01-24 6:52 ` Philippe Mathieu-Daudé
2024-01-24 7:47 ` Hannes Reinecke
2024-01-29 13:13 ` Damien Hedde
2024-01-29 13:37 ` Klaus Jensen [this message]
2024-01-29 15:35 ` Hannes Reinecke
2024-02-05 13:33 ` Damien Hedde
2024-01-24 7:39 ` Klaus Jensen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZbeqGMdCw2QlHccd@cormorant.local \
--to=its@irrelevant.dk \
--cc=armbru@redhat.com \
--cc=dhedde@kalrayinc.com \
--cc=hare@suse.de \
--cc=kbusch@kernel.org \
--cc=philmd@linaro.org \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=thuard@kalrayinc.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).