qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Klaus Jensen <its@irrelevant.dk>
To: Damien Hedde <dhedde@kalrayinc.com>
Cc: "Hannes Reinecke" <hare@suse.de>,
	"Philippe Mathieu-Daudé" <philmd@linaro.org>,
	qemu-block@nongnu.org, "Keith Busch" <kbusch@kernel.org>,
	qemu-devel <qemu-devel@nongnu.org>,
	"Titouan Huard" <thuard@kalrayinc.com>,
	"Markus Armbruster" <armbru@redhat.com>
Subject: Re: NVME hotplug support ?
Date: Mon, 29 Jan 2024 14:37:28 +0100	[thread overview]
Message-ID: <ZbeqGMdCw2QlHccd@cormorant.local> (raw)
In-Reply-To: <7e35528b-cc66-d2f1-e3e3-7dece5620c52@kalrayinc.com>

[-- Attachment #1: Type: text/plain, Size: 4982 bytes --]

On Jan 29 14:13, Damien Hedde wrote:
> 
> 
> On 1/24/24 08:47, Hannes Reinecke wrote:
> > On 1/24/24 07:52, Philippe Mathieu-Daudé wrote:
> > > Hi Hannes,
> > > 
> > > [+Markus as QOM/QDev rubber duck]
> > > 
> > > On 23/1/24 13:40, Hannes Reinecke wrote:
> > > > On 1/23/24 11:59, Damien Hedde wrote:
> > > > > Hi all,
> > > > > 
> > > > > We are currently looking into hotplugging nvme devices and
> > > > > it is currently not possible:
> > > > > When nvme was introduced 2 years ago, the feature was disabled.
> > > > > > commit cc6fb6bc506e6c47ed604fcb7b7413dff0b7d845
> > > > > > Author: Klaus Jensen
> > > > > > Date:   Tue Jul 6 10:48:40 2021 +0200
> > > > > > 
> > > > > >     hw/nvme: mark nvme-subsys non-hotpluggable
> > > > > >     We currently lack the infrastructure to handle
> > > > > > subsystem hotplugging, so
> > > > > >     disable it.
> > > > > 
> > > > > Do someone know what's lacking or anyone have some tips/idea
> > > > > of what we should develop to add the support ?
> > > > > 
> > > > Problem is that the object model is messed up. In qemu
> > > > namespaces are attached to controllers, which in turn are
> > > > children of the PCI device.
> > > > There are subsystems, but these just reference the controller.
> > > > 
> > > > So if you hotunplug the PCI device you detach/destroy the
> > > > controller and detach the namespaces from the controller.
> > > > But if you hotplug the PCI device again the NVMe controller will
> > > > be attached to the PCI device, but the namespace are still be
> > > > detached.
> > > > 
> > > > Klaus said he was going to fix that, and I dimly remember some patches
> > > > floating around. But apparently it never went anywhere.
> > > > 
> > > > Fundamental problem is that the NVMe hierarchy as per spec is
> > > > incompatible with the qemu object model; qemu requires a strict
> > > > tree model where every object has exactly _one_ parent.
> > > 
> > > The modelling problem is not clear to me.
> > > Do you have an example of how the NVMe hierarchy should be?
> > > 
> > Sure.
> > 
> > As per NVMe spec we have this hierarchy:
> > 
> >       --->  subsys ---
> >      |                |
> >      |                V
> > controller      namespaces
> > 
> > There can be several controllers, and several
> > namespaces.
> > The initiator (ie the linux 'nvme' driver) connects
> > to a controller, queries the subsystem for the attached
> > namespaces, and presents each namespace as a block device.
> > 
> > For Qemu we have the problem that every device _must_ be
> > a direct descendant of the parent (expressed by the fact
> > that each 'parent' object is embedded in the device object).
> > 
> > So if we were to present a NVMe PCI device, the controller
> > must be derived from the PCI device:
> > 
> > pci -> controller
> > 
> > but now we have to express the NVMe hierarchy, too:
> > 
> > pci -> ctrl1 -> subsys1 -> namespace1
> > 
> > which actually works.
> > We can easily attach several namespaces:
> > 
> > pci -> ctrl1 ->subsys1 -> namespace2
> > 
> > For a single controller and a single subsystem.
> > However, as mentioned above, there can be _several_
> > controllers attached to the same subsystem.
> > So we can express the second controller:
> > 
> > pci -> ctrl2
> > 
> > but we cannot attach the controller to 'subsys1'
> > as then 'subsys1' would need to be derived from
> > 'ctrl2', and not (as it is now) from 'ctrl1'.
> > 
> > The most logical step would be to have 'subsystems'
> > their own entity, independent of any controllers.
> > But then the block devices (which are derived from
> > the namespaces) could not be traced back
> > to the PCI device, and a PCI hotplug would not
> > 'automatically' disconnect the nvme block devices.
> > 
> > Plus the subsystem would be independent from the NVMe
> > PCI devices, so you could have a subsystem with
> > no controllers attached. And one would wonder who
> > should be responsible for cleaning up that.
> > 
> 
> Thanks for the details !
> 
> My use case is the simple one with no nvme subsystem/namespaces:
> - hotplug a pci nvme device (nvme controller) as in the following CLI (which
> automatically put the drive into a default namespace)
> 
> ./qemu-system-aarch64 -nographic -M virt \
>    -drive file=nvme0.disk,if=none,id=nvme-drive0 \
>    -device nvme,serial=nvme0,id=nvmedev0,drive=nvme-drive0
> 

AFAIK, you just need a pci root port to plug the device into.

  -drive file=nvme0.disk,if=none,id=nvme-drive0 \
  -device "pcie-root-port,id=pcie_root_port0,chassis=1,slot=0" \
  -device nvme,serial=nvme0,id=nvmedev0,drive=nvme-drive0

Then, you can use the qemu monitor to `device_del nvmedev0` and add it
with `device_add nvme,serial=nvme0,id=nvmedev0,drive=nvme-drive0`. The
"drive" (blockdev) will stick around after the device_del.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2024-01-29 14:11 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-23 10:59 NVME hotplug support ? Damien Hedde
2024-01-23 11:15 ` Klaus Jensen
2024-01-23 12:40 ` Hannes Reinecke
2024-01-24  6:52   ` Philippe Mathieu-Daudé
2024-01-24  7:47     ` Hannes Reinecke
2024-01-29 13:13       ` Damien Hedde
2024-01-29 13:37         ` Klaus Jensen [this message]
2024-01-29 15:35         ` Hannes Reinecke
2024-02-05 13:33           ` Damien Hedde
2024-01-24  7:39   ` Klaus Jensen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZbeqGMdCw2QlHccd@cormorant.local \
    --to=its@irrelevant.dk \
    --cc=armbru@redhat.com \
    --cc=dhedde@kalrayinc.com \
    --cc=hare@suse.de \
    --cc=kbusch@kernel.org \
    --cc=philmd@linaro.org \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=thuard@kalrayinc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).