From: Markus Armbruster <armbru@redhat.com>
To: Klaus Jensen <its@irrelevant.dk>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
stefanha@redhat.com, qemu-devel@nongnu.org,
qemu-block@nongnu.org, mst@redhat.com
Subject: Re: making a qdev bus available from a (non-qtree?) device
Date: Fri, 21 May 2021 09:33:46 +0200 [thread overview]
Message-ID: <878s48pmlh.fsf@dusky.pond.sub.org> (raw)
In-Reply-To: <YKIQsI4F49R4hEmd@apples.localdomain> (Klaus Jensen's message of "Mon, 17 May 2021 08:44:00 +0200")
I'm about to drop off for two weeks of much-needed vacation. I meant to
study your explanation and give design advice before I leave, but I'm
out of time. Regrettable. I hope Stefan can help you. Or perhaps
Paolo. If you still have questions when I'm back, feel free to contact
me again.
Klaus Jensen <its@irrelevant.dk> writes:
> On May 12 14:02, Markus Armbruster wrote:
>>Klaus Jensen <its@irrelevant.dk> writes:
>>
>>> Hi all,
>>>
>>> I need some help with grok'ing qdev busses. Stefan, Michael - David
>>> suggested on IRC that I CC'ed you guys since you might have solved a
>>> similar issue with virtio devices. I've tried to study how that works,
>>> but I'm not exactly sure how to apply it to the issue I'm having.
>>>
>>> Currently, to support multiple namespaces on the emulated nvme device,
>>> one can do something like this:
>>>
>>> -device nvme,id=nvme-ctrl-0,serial=foo,...
>>> -device nvme-ns,id=nvme-ns-0,bus=nvme-ctrl-0,...
>>> -device nvme-ns,id-nvme-ns-1,bus=nvme-ctrl-0,...
>>>
>>> The nvme device creates an 'nvme-bus' and the nvme-ns devices has
>>> dc->bus_type = TYPE_NVME_BUS. This all works very well and provides a
>>> nice overview in `info qtree`:
>>>
>>> bus: main-system-bus
>>> type System
>>> ...
>>> dev: q35-pcihost, id ""
>>> ..
>>> bus: pcie.0
>>> type PCIE
>>> ..
>>> dev: nvme, id "nvme-ctrl-0"
>>> ..
>>> bus: nvme-ctrl-0
>>> type nvme-bus
>>> dev: nvme-ns, id "nvme-ns-0"
>>> ..
>>> dev: nvme-ns, id "nvme-ns-1"
>>> ..
>>>
>>>
>>> Nice and qdevy.
>>>
>>> We have since introduced support for NVM Subsystems through an
>>> nvme-subsys device. The nvme-subsys device is just a TYPE_DEVICE and
>>> does not show in `info qtree`
>>
>>Yes.
>>
>>Most devices plug into a bus. DeviceClass member @bus_type specifies
>>the type of bus they plug into, and DeviceState member @parent_bus
>>points to the actual BusState. Example: PCI devices plug into a PCI
>>bus, and have ->bus_type = TYPE_PCI_BUS.
>>
>>Some devices don't. @bus_type and @parent_bus are NULL then.
>>
>>Most buses are provided by a device. BusState member @parent points to
>>the device.
>>
>>The main-system-bus isn't. Its @parent is null.
>>
>>"info qtree" only shows the qtree rooted at main-system-bus. It doesn't
>>show qtrees rooted at bus-less devices or device-less buses other than
>>main-system-bus. I doubt such buses exist.
>>
>
> Makes sense.
>
>>> (I wonder if this should actually just
>>> have been an -object?).
>>
>>Does nvme-subsys expose virtual hardware to the guest? Memory, IRQs,
>>...
>>
>>If yes, it needs to be a device.
>>
>>If no, object may be more appropriate. Tell us more about what it does.
>>
>
> It does not expose any virtual hardware. See below.
>
>>
>>> Anyway. The nvme device has a 'subsys' link
>>> parameter and we use this to manage the namespaces across the
>>> subsystem that may contain several nvme devices (controllers). The
>>> problem is that this doesnt work too well with unplugging since if the
>>> nvme device is `device_del`'ed, the nvme-ns devices on the nvme-bus
>>> are unrealized which is not what we want. We really want the
>>> namespaces to linger, preferably on an nvme-bus of the nvme-subsys
>>> device so they can be attached to other nvme devices that may show up
>>> (or already exist) in the subsystem.
>>>
>>> The core problem I'm having is that I can't seem to create an nvme-bus
>>> from the nvme-subsys device and make it available to the nvme-ns
>>> device on the command line:
>>>
>>> -device nvme-subsys,id=nvme-subsys-0,...
>>> -device nvme-ns,bus=nvme-subsys-0
>>>
>>> The above results in 'No 'nvme-bus' bus found for device 'nvme-ns',
>>> even though I do `qbus_create_inplace()` just like the nvme
>>> device. However, I *can* reparent the nvme-ns device in its realize()
>>> method, so if I instead define it like so:
>>>
>>> -device nvme-subsys,id=nvme-subsys-0,...
>>> -device nvme,id=nvme-ctrl-0,subsys=nvme-subsys-0
>>> -device nvme-ns,bus=nvme-ctrl-0
>>>
>>> I can then call `qdev_set_parent_bus()` and set the parent bus to the
>>> bus creates in the nvme-subsys device. This solves the problem since
>>> the namespaces are not "garbage collected" when the nvme device is
>>> removed, but it just feels wrong you know? Also, if possible, I'd of
>>> course really like to retain the nice entries in `info qtree`.
>>
>>I'm afraid I'm too ignorant on NVME to give useful advice.
>>
>>Can you give us a brief primer on the aspects of physical NVME devices
>>you'd like to model in QEMU? What are "controllers", "namespaces", and
>>"subsystems", and how do they work together?
>>
>>Once we understand the relevant aspects of physical devices, we can
>>discuss how to best model them in QEMU.
>>
>
> An "NVM Subsystem" is basically just a term to talk about a collection
> of controllers and namespaces. A namespace is just a quantity of
> non-volatile memory that the controller can use to store stuff on.
>
> Only the controller is a piece of virtual hardware. An example
> subsystem looks like this:
>
>
> +------------------+ +-----------------+
> | controller A | | controller B |
> +------------------+ +-----------------+
> +--------++--------+ +--------++-------+
> | NSID 1 || NSID 2 | | NSID 3 | NSID 2 |
> +--------++--------+ +--------++-------+
> +--------+ | +--------+ |
> | NS A | | | NS C | |
> +--------+ | +--------+ |
> | |
> +------------------------+
> |
> +--------+
> | NS B |
> +--------+
>
>
> This is the example in Figure 5 in the NVMe v1.4 specification. Here,
> we have two controllers (that we model with the 'nvme' pci-based
> device). Each controller has one "private" namespace (NS A and NS C)
> and shares one namespace (NS B). The namespace IDs are unique across
> the subsystem and are assigned by the controller when attached to a
> namespace.
>
> We use the 'nvme-ns' device (TYPE_DEVICE) to model the namespaces, and
> I guess this should could also just have been an -object, not sure if
> we can change that now. The 'nvme-ns' device mostly exist to hold the
> block backend configuration and related namespace only
> parameters. Prior to the introduction of subsystem, while we could
> have multiple controllers on the PCI bus, they could not share
> namespaces. To support this we introduced the 'nvme-subsys' device to
> allow the namespaces to be shared. This support is considered
> experimental, so I think we can get away with changing this to be an
> object.
>
> As I explained in my first mail, we attach namespaces to controllers
> through a bus. This means that even in the absence of an explicit
> "bus=..." parameter on the nvme-ns device, it will "connect" on the
> most recently defined "nvme-bus" (of the most recently defined
> controller). With subsystems we would also like to model "unattached"
> namespaces that exists solely in the subsystem (i.e. NOT attached to
> any controllers). That is why I was trying to get the nvme-ns devices
> to attach to a bus created by the "non-bus-attached" subsystem
> device. And that is what I can't do. We could add a link property to
> the nvme-ns device instead, but then the bus magic in qemu would still
> happen and the namespace would end up "attached" (in qemu terms) to a
> controller anyway - and it would complain if we defined the namespace
> device prior to defining any controller devices since no usable bus
> exist.
>
> Thanks for helping out with this!
next prev parent reply other threads:[~2021-05-21 7:34 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-05-11 18:17 making a qdev bus available from a (non-qtree?) device Klaus Jensen
2021-05-12 3:39 ` Philippe Mathieu-Daudé
2021-05-12 8:00 ` Peter Maydell
2021-05-12 12:02 ` Markus Armbruster
2021-05-13 14:02 ` Stefan Hajnoczi
2021-05-17 6:55 ` Klaus Jensen
2021-05-17 9:56 ` Stefan Hajnoczi
2021-05-17 6:44 ` Klaus Jensen
2021-05-21 7:33 ` Markus Armbruster [this message]
2021-05-21 8:48 ` Klaus Jensen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=878s48pmlh.fsf@dusky.pond.sub.org \
--to=armbru@redhat.com \
--cc=its@irrelevant.dk \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.