qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* NVME hotplug support ?
@ 2024-01-23 10:59 Damien Hedde
  2024-01-23 11:15 ` Klaus Jensen
  2024-01-23 12:40 ` Hannes Reinecke
  0 siblings, 2 replies; 10+ messages in thread
From: Damien Hedde @ 2024-01-23 10:59 UTC (permalink / raw)
  To: qemu-block, Klaus Jensen, Keith Busch
  Cc: qemu-devel, Titouan Huard, Hannes Reinecke

Hi all,

We are currently looking into hotplugging nvme devices and it is currently not possible:
When nvme was introduced 2 years ago, the feature was disabled.
> commit cc6fb6bc506e6c47ed604fcb7b7413dff0b7d845
> Author: Klaus Jensen 
> Date:   Tue Jul 6 10:48:40 2021 +0200
>
>    hw/nvme: mark nvme-subsys non-hotpluggable
>    
>    We currently lack the infrastructure to handle subsystem hotplugging, so
>    disable it.

Do someone know what's lacking or anyone have some tips/idea of what we should develop to add the support ?

Regards,
--
Damien 






^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: NVME hotplug support ?
  2024-01-23 10:59 NVME hotplug support ? Damien Hedde
@ 2024-01-23 11:15 ` Klaus Jensen
  2024-01-23 12:40 ` Hannes Reinecke
  1 sibling, 0 replies; 10+ messages in thread
From: Klaus Jensen @ 2024-01-23 11:15 UTC (permalink / raw)
  To: Damien Hedde
  Cc: qemu-block, Keith Busch, qemu-devel, Titouan Huard,
	Hannes Reinecke

[-- Attachment #1: Type: text/plain, Size: 1107 bytes --]

On Jan 23 10:59, Damien Hedde wrote:
> Hi all,
> 
> We are currently looking into hotplugging nvme devices and it is currently not possible:
> When nvme was introduced 2 years ago, the feature was disabled.
> > commit cc6fb6bc506e6c47ed604fcb7b7413dff0b7d845
> > Author: Klaus Jensen 
> > Date:   Tue Jul 6 10:48:40 2021 +0200
> >
> >    hw/nvme: mark nvme-subsys non-hotpluggable
> >    
> >    We currently lack the infrastructure to handle subsystem hotplugging, so
> >    disable it.
> 
> Do someone know what's lacking or anyone have some tips/idea of what we should develop to add the support ?
> 
> Regards,
> --
> Damien 
> 

That's not entirely true.

The *subsystem* is non-hotpluggable, but individual controllers can be
hotplugged. Even into an existing subsystem.

However, you cannot hotplug pci devices unless you set up a pcie root
port. Say,

  -device "pcie-root-port,id=pcie_root_port0,chassis=1,slot=0"
  -device "nvme,id=nvme0,serial=nvme0,bus=pcie_root_port0"

nvme0 can then be removed with device_del and added back as a new device
with device_add.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: NVME hotplug support ?
  2024-01-23 10:59 NVME hotplug support ? Damien Hedde
  2024-01-23 11:15 ` Klaus Jensen
@ 2024-01-23 12:40 ` Hannes Reinecke
  2024-01-24  6:52   ` Philippe Mathieu-Daudé
  2024-01-24  7:39   ` Klaus Jensen
  1 sibling, 2 replies; 10+ messages in thread
From: Hannes Reinecke @ 2024-01-23 12:40 UTC (permalink / raw)
  To: Damien Hedde, qemu-block, Klaus Jensen, Keith Busch
  Cc: qemu-devel, Titouan Huard

On 1/23/24 11:59, Damien Hedde wrote:
> Hi all,
> 
> We are currently looking into hotplugging nvme devices and it is currently not possible:
> When nvme was introduced 2 years ago, the feature was disabled.
>> commit cc6fb6bc506e6c47ed604fcb7b7413dff0b7d845
>> Author: Klaus Jensen
>> Date:   Tue Jul 6 10:48:40 2021 +0200
>>
>>     hw/nvme: mark nvme-subsys non-hotpluggable
>>     
>>     We currently lack the infrastructure to handle subsystem hotplugging, so
>>     disable it.
> 
> Do someone know what's lacking or anyone have some tips/idea of what we should develop to add the support ?
> 
Problem is that the object model is messed up. In qemu namespaces are 
attached to controllers, which in turn are children of the PCI device.
There are subsystems, but these just reference the controller.

So if you hotunplug the PCI device you detach/destroy the controller and 
detach the namespaces from the controller.
But if you hotplug the PCI device again the NVMe controller will be 
attached to the PCI device, but the namespace are still be detached.

Klaus said he was going to fix that, and I dimly remember some patches
floating around. But apparently it never went anywhere.

Fundamental problem is that the NVMe hierarchy as per spec is 
incompatible with the qemu object model; qemu requires a strict
tree model where every object has exactly _one_ parent.

Cheers,

Hannes



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: NVME hotplug support ?
  2024-01-23 12:40 ` Hannes Reinecke
@ 2024-01-24  6:52   ` Philippe Mathieu-Daudé
  2024-01-24  7:47     ` Hannes Reinecke
  2024-01-24  7:39   ` Klaus Jensen
  1 sibling, 1 reply; 10+ messages in thread
From: Philippe Mathieu-Daudé @ 2024-01-24  6:52 UTC (permalink / raw)
  To: Hannes Reinecke, Damien Hedde, qemu-block, Klaus Jensen,
	Keith Busch
  Cc: qemu-devel, Titouan Huard, Markus Armbruster

Hi Hannes,

[+Markus as QOM/QDev rubber duck]

On 23/1/24 13:40, Hannes Reinecke wrote:
> On 1/23/24 11:59, Damien Hedde wrote:
>> Hi all,
>>
>> We are currently looking into hotplugging nvme devices and it is 
>> currently not possible:
>> When nvme was introduced 2 years ago, the feature was disabled.
>>> commit cc6fb6bc506e6c47ed604fcb7b7413dff0b7d845
>>> Author: Klaus Jensen
>>> Date:   Tue Jul 6 10:48:40 2021 +0200
>>>
>>>     hw/nvme: mark nvme-subsys non-hotpluggable
>>>     We currently lack the infrastructure to handle subsystem 
>>> hotplugging, so
>>>     disable it.
>>
>> Do someone know what's lacking or anyone have some tips/idea of what 
>> we should develop to add the support ?
>>
> Problem is that the object model is messed up. In qemu namespaces are 
> attached to controllers, which in turn are children of the PCI device.
> There are subsystems, but these just reference the controller.
> 
> So if you hotunplug the PCI device you detach/destroy the controller and 
> detach the namespaces from the controller.
> But if you hotplug the PCI device again the NVMe controller will be 
> attached to the PCI device, but the namespace are still be detached.
> 
> Klaus said he was going to fix that, and I dimly remember some patches
> floating around. But apparently it never went anywhere.
> 
> Fundamental problem is that the NVMe hierarchy as per spec is 
> incompatible with the qemu object model; qemu requires a strict
> tree model where every object has exactly _one_ parent.

The modelling problem is not clear to me.
Do you have an example of how the NVMe hierarchy should be?

Thanks,

Phil.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: NVME hotplug support ?
  2024-01-23 12:40 ` Hannes Reinecke
  2024-01-24  6:52   ` Philippe Mathieu-Daudé
@ 2024-01-24  7:39   ` Klaus Jensen
  1 sibling, 0 replies; 10+ messages in thread
From: Klaus Jensen @ 2024-01-24  7:39 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Damien Hedde, qemu-block, Keith Busch, qemu-devel, Titouan Huard

[-- Attachment #1: Type: text/plain, Size: 3284 bytes --]

On Jan 23 13:40, Hannes Reinecke wrote:
> On 1/23/24 11:59, Damien Hedde wrote:
> > Hi all,
> > 
> > We are currently looking into hotplugging nvme devices and it is currently not possible:
> > When nvme was introduced 2 years ago, the feature was disabled.
> > > commit cc6fb6bc506e6c47ed604fcb7b7413dff0b7d845
> > > Author: Klaus Jensen
> > > Date:   Tue Jul 6 10:48:40 2021 +0200
> > > 
> > >     hw/nvme: mark nvme-subsys non-hotpluggable
> > >     We currently lack the infrastructure to handle subsystem hotplugging, so
> > >     disable it.
> > 
> > Do someone know what's lacking or anyone have some tips/idea of what we should develop to add the support ?
> > 
> Problem is that the object model is messed up. In qemu namespaces are
> attached to controllers, which in turn are children of the PCI device.
> There are subsystems, but these just reference the controller.
> 
> So if you hotunplug the PCI device you detach/destroy the controller and
> detach the namespaces from the controller.
> But if you hotplug the PCI device again the NVMe controller will be attached
> to the PCI device, but the namespace are still be detached.
> 
> Klaus said he was going to fix that, and I dimly remember some patches
> floating around. But apparently it never went anywhere.
> 
> Fundamental problem is that the NVMe hierarchy as per spec is incompatible
> with the qemu object model; qemu requires a strict
> tree model where every object has exactly _one_ parent.
> 

A little history might help to nuance this just a bit. And to defend the
current model ;)

When we added support for multiple namespaces we did not consider
subsystem support, so the namespaces would just be associated directly
with a parent controller (in QDev terms, the parent has a bus that the
namespace devices are attached to).

When we added subsystems, where namespaces may be attached to several
controllers, it became necessary to break the controller/namespace
parent/child relationship. The problem was that removing the controller
would take all the bus children with it, causing namespaces to be
removed from other controllers in the subsystem. We fixed this by
reparenting the namespaces to the subsystem device instead.

I think this model fits the NVMe hierarchy as good as possible.
Controllers and namespaces are considered children of the subsystem (as
they are in NVMe).

Now, the problem with namespaces not being re-attached is partly false.
If the namespaces are 'shared=on', they will be automatically attached
to any new controller attached to the subsystem. However, if they are
private, that is is not the case. In NVMe, a private namespace just
means a namespace that can only be attached to a single controller at a
time. It is not entirely unlikely that you have a private namespace that
you then reassign to controller B when controller A is removed. Now,
what we could do is track the last controller identifier that a private
namespace was attached to, and if the same controller identifier is
added to the subsystem, we could reattach the private namespace.

However, broadly, I think the current model does a pretty good job in
supporting experimentation with hotplug, multipath and failover
configurations.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: NVME hotplug support ?
  2024-01-24  6:52   ` Philippe Mathieu-Daudé
@ 2024-01-24  7:47     ` Hannes Reinecke
  2024-01-29 13:13       ` Damien Hedde
  0 siblings, 1 reply; 10+ messages in thread
From: Hannes Reinecke @ 2024-01-24  7:47 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, Damien Hedde, qemu-block,
	Klaus Jensen, Keith Busch
  Cc: qemu-devel, Titouan Huard, Markus Armbruster

On 1/24/24 07:52, Philippe Mathieu-Daudé wrote:
> Hi Hannes,
> 
> [+Markus as QOM/QDev rubber duck]
> 
> On 23/1/24 13:40, Hannes Reinecke wrote:
>> On 1/23/24 11:59, Damien Hedde wrote:
>>> Hi all,
>>>
>>> We are currently looking into hotplugging nvme devices and it is 
>>> currently not possible:
>>> When nvme was introduced 2 years ago, the feature was disabled.
>>>> commit cc6fb6bc506e6c47ed604fcb7b7413dff0b7d845
>>>> Author: Klaus Jensen
>>>> Date:   Tue Jul 6 10:48:40 2021 +0200
>>>>
>>>>     hw/nvme: mark nvme-subsys non-hotpluggable
>>>>     We currently lack the infrastructure to handle subsystem 
>>>> hotplugging, so
>>>>     disable it.
>>>
>>> Do someone know what's lacking or anyone have some tips/idea of what 
>>> we should develop to add the support ?
>>>
>> Problem is that the object model is messed up. In qemu namespaces are 
>> attached to controllers, which in turn are children of the PCI device.
>> There are subsystems, but these just reference the controller.
>>
>> So if you hotunplug the PCI device you detach/destroy the controller 
>> and detach the namespaces from the controller.
>> But if you hotplug the PCI device again the NVMe controller will be 
>> attached to the PCI device, but the namespace are still be detached.
>>
>> Klaus said he was going to fix that, and I dimly remember some patches
>> floating around. But apparently it never went anywhere.
>>
>> Fundamental problem is that the NVMe hierarchy as per spec is 
>> incompatible with the qemu object model; qemu requires a strict
>> tree model where every object has exactly _one_ parent.
> 
> The modelling problem is not clear to me.
> Do you have an example of how the NVMe hierarchy should be?
> 
Sure.

As per NVMe spec we have this hierarchy:

      --->  subsys ---
     |                |
     |                V
controller      namespaces

There can be several controllers, and several
namespaces.
The initiator (ie the linux 'nvme' driver) connects
to a controller, queries the subsystem for the attached
namespaces, and presents each namespace as a block device.

For Qemu we have the problem that every device _must_ be
a direct descendant of the parent (expressed by the fact
that each 'parent' object is embedded in the device object).

So if we were to present a NVMe PCI device, the controller
must be derived from the PCI device:

pci -> controller

but now we have to express the NVMe hierarchy, too:

pci -> ctrl1 -> subsys1 -> namespace1

which actually works.
We can easily attach several namespaces:

pci -> ctrl1 ->subsys1 -> namespace2

For a single controller and a single subsystem.
However, as mentioned above, there can be _several_
controllers attached to the same subsystem.
So we can express the second controller:

pci -> ctrl2

but we cannot attach the controller to 'subsys1'
as then 'subsys1' would need to be derived from
'ctrl2', and not (as it is now) from 'ctrl1'.

The most logical step would be to have 'subsystems'
their own entity, independent of any controllers.
But then the block devices (which are derived from
the namespaces) could not be traced back
to the PCI device, and a PCI hotplug would not
'automatically' disconnect the nvme block devices.

Plus the subsystem would be independent from the NVMe
PCI devices, so you could have a subsystem with
no controllers attached. And one would wonder who
should be responsible for cleaning up that.

Cheers,

Hannes



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: NVME hotplug support ?
  2024-01-24  7:47     ` Hannes Reinecke
@ 2024-01-29 13:13       ` Damien Hedde
  2024-01-29 13:37         ` Klaus Jensen
  2024-01-29 15:35         ` Hannes Reinecke
  0 siblings, 2 replies; 10+ messages in thread
From: Damien Hedde @ 2024-01-29 13:13 UTC (permalink / raw)
  To: Hannes Reinecke, Philippe Mathieu-Daudé, qemu-block,
	Klaus Jensen, Keith Busch
  Cc: qemu-devel, Titouan Huard, Markus Armbruster



On 1/24/24 08:47, Hannes Reinecke wrote:
> On 1/24/24 07:52, Philippe Mathieu-Daudé wrote:
>> Hi Hannes,
>>
>> [+Markus as QOM/QDev rubber duck]
>>
>> On 23/1/24 13:40, Hannes Reinecke wrote:
>>> On 1/23/24 11:59, Damien Hedde wrote:
>>>> Hi all,
>>>>
>>>> We are currently looking into hotplugging nvme devices and it is 
>>>> currently not possible:
>>>> When nvme was introduced 2 years ago, the feature was disabled.
>>>>> commit cc6fb6bc506e6c47ed604fcb7b7413dff0b7d845
>>>>> Author: Klaus Jensen
>>>>> Date:   Tue Jul 6 10:48:40 2021 +0200
>>>>>
>>>>>     hw/nvme: mark nvme-subsys non-hotpluggable
>>>>>     We currently lack the infrastructure to handle subsystem 
>>>>> hotplugging, so
>>>>>     disable it.
>>>>
>>>> Do someone know what's lacking or anyone have some tips/idea of what 
>>>> we should develop to add the support ?
>>>>
>>> Problem is that the object model is messed up. In qemu namespaces are 
>>> attached to controllers, which in turn are children of the PCI device.
>>> There are subsystems, but these just reference the controller.
>>>
>>> So if you hotunplug the PCI device you detach/destroy the controller 
>>> and detach the namespaces from the controller.
>>> But if you hotplug the PCI device again the NVMe controller will be 
>>> attached to the PCI device, but the namespace are still be detached.
>>>
>>> Klaus said he was going to fix that, and I dimly remember some patches
>>> floating around. But apparently it never went anywhere.
>>>
>>> Fundamental problem is that the NVMe hierarchy as per spec is 
>>> incompatible with the qemu object model; qemu requires a strict
>>> tree model where every object has exactly _one_ parent.
>>
>> The modelling problem is not clear to me.
>> Do you have an example of how the NVMe hierarchy should be?
>>
> Sure.
> 
> As per NVMe spec we have this hierarchy:
> 
>       --->  subsys ---
>      |                |
>      |                V
> controller      namespaces
> 
> There can be several controllers, and several
> namespaces.
> The initiator (ie the linux 'nvme' driver) connects
> to a controller, queries the subsystem for the attached
> namespaces, and presents each namespace as a block device.
> 
> For Qemu we have the problem that every device _must_ be
> a direct descendant of the parent (expressed by the fact
> that each 'parent' object is embedded in the device object).
> 
> So if we were to present a NVMe PCI device, the controller
> must be derived from the PCI device:
> 
> pci -> controller
> 
> but now we have to express the NVMe hierarchy, too:
> 
> pci -> ctrl1 -> subsys1 -> namespace1
> 
> which actually works.
> We can easily attach several namespaces:
> 
> pci -> ctrl1 ->subsys1 -> namespace2
> 
> For a single controller and a single subsystem.
> However, as mentioned above, there can be _several_
> controllers attached to the same subsystem.
> So we can express the second controller:
> 
> pci -> ctrl2
> 
> but we cannot attach the controller to 'subsys1'
> as then 'subsys1' would need to be derived from
> 'ctrl2', and not (as it is now) from 'ctrl1'.
> 
> The most logical step would be to have 'subsystems'
> their own entity, independent of any controllers.
> But then the block devices (which are derived from
> the namespaces) could not be traced back
> to the PCI device, and a PCI hotplug would not
> 'automatically' disconnect the nvme block devices.
> 
> Plus the subsystem would be independent from the NVMe
> PCI devices, so you could have a subsystem with
> no controllers attached. And one would wonder who
> should be responsible for cleaning up that.
> 

Thanks for the details !

My use case is the simple one with no nvme subsystem/namespaces:
- hotplug a pci nvme device (nvme controller) as in the following CLI 
(which automatically put the drive into a default namespace)

./qemu-system-aarch64 -nographic -M virt \
    -drive file=nvme0.disk,if=none,id=nvme-drive0 \
    -device nvme,serial=nvme0,id=nvmedev0,drive=nvme-drive0

In the simple tree approach where subsystems and namespaces are not 
shared by controllers. We could delete the whole nvme hiearchy under the 
controller while unplugging it ?

In your first message, you said
  > So if you hotunplug the PCI device you detach/destroy the controller
  > and detach the namespaces from the controller.
  > But if you hotplug the PCI device again the NVMe controller will be
  > attached to the PCI device, but the namespace are still be detached.

Do you mean that if we unplug the pci device we HAVE to keep some nvme 
objects so that if we plug the device back we can recover them ?
Or just that it's hard to unplug nvme objects if they are not real qom 
children of pci device ?

Thanks,
Damien








^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: NVME hotplug support ?
  2024-01-29 13:13       ` Damien Hedde
@ 2024-01-29 13:37         ` Klaus Jensen
  2024-01-29 15:35         ` Hannes Reinecke
  1 sibling, 0 replies; 10+ messages in thread
From: Klaus Jensen @ 2024-01-29 13:37 UTC (permalink / raw)
  To: Damien Hedde
  Cc: Hannes Reinecke, Philippe Mathieu-Daudé, qemu-block,
	Keith Busch, qemu-devel, Titouan Huard, Markus Armbruster

[-- Attachment #1: Type: text/plain, Size: 4982 bytes --]

On Jan 29 14:13, Damien Hedde wrote:
> 
> 
> On 1/24/24 08:47, Hannes Reinecke wrote:
> > On 1/24/24 07:52, Philippe Mathieu-Daudé wrote:
> > > Hi Hannes,
> > > 
> > > [+Markus as QOM/QDev rubber duck]
> > > 
> > > On 23/1/24 13:40, Hannes Reinecke wrote:
> > > > On 1/23/24 11:59, Damien Hedde wrote:
> > > > > Hi all,
> > > > > 
> > > > > We are currently looking into hotplugging nvme devices and
> > > > > it is currently not possible:
> > > > > When nvme was introduced 2 years ago, the feature was disabled.
> > > > > > commit cc6fb6bc506e6c47ed604fcb7b7413dff0b7d845
> > > > > > Author: Klaus Jensen
> > > > > > Date:   Tue Jul 6 10:48:40 2021 +0200
> > > > > > 
> > > > > >     hw/nvme: mark nvme-subsys non-hotpluggable
> > > > > >     We currently lack the infrastructure to handle
> > > > > > subsystem hotplugging, so
> > > > > >     disable it.
> > > > > 
> > > > > Do someone know what's lacking or anyone have some tips/idea
> > > > > of what we should develop to add the support ?
> > > > > 
> > > > Problem is that the object model is messed up. In qemu
> > > > namespaces are attached to controllers, which in turn are
> > > > children of the PCI device.
> > > > There are subsystems, but these just reference the controller.
> > > > 
> > > > So if you hotunplug the PCI device you detach/destroy the
> > > > controller and detach the namespaces from the controller.
> > > > But if you hotplug the PCI device again the NVMe controller will
> > > > be attached to the PCI device, but the namespace are still be
> > > > detached.
> > > > 
> > > > Klaus said he was going to fix that, and I dimly remember some patches
> > > > floating around. But apparently it never went anywhere.
> > > > 
> > > > Fundamental problem is that the NVMe hierarchy as per spec is
> > > > incompatible with the qemu object model; qemu requires a strict
> > > > tree model where every object has exactly _one_ parent.
> > > 
> > > The modelling problem is not clear to me.
> > > Do you have an example of how the NVMe hierarchy should be?
> > > 
> > Sure.
> > 
> > As per NVMe spec we have this hierarchy:
> > 
> >       --->  subsys ---
> >      |                |
> >      |                V
> > controller      namespaces
> > 
> > There can be several controllers, and several
> > namespaces.
> > The initiator (ie the linux 'nvme' driver) connects
> > to a controller, queries the subsystem for the attached
> > namespaces, and presents each namespace as a block device.
> > 
> > For Qemu we have the problem that every device _must_ be
> > a direct descendant of the parent (expressed by the fact
> > that each 'parent' object is embedded in the device object).
> > 
> > So if we were to present a NVMe PCI device, the controller
> > must be derived from the PCI device:
> > 
> > pci -> controller
> > 
> > but now we have to express the NVMe hierarchy, too:
> > 
> > pci -> ctrl1 -> subsys1 -> namespace1
> > 
> > which actually works.
> > We can easily attach several namespaces:
> > 
> > pci -> ctrl1 ->subsys1 -> namespace2
> > 
> > For a single controller and a single subsystem.
> > However, as mentioned above, there can be _several_
> > controllers attached to the same subsystem.
> > So we can express the second controller:
> > 
> > pci -> ctrl2
> > 
> > but we cannot attach the controller to 'subsys1'
> > as then 'subsys1' would need to be derived from
> > 'ctrl2', and not (as it is now) from 'ctrl1'.
> > 
> > The most logical step would be to have 'subsystems'
> > their own entity, independent of any controllers.
> > But then the block devices (which are derived from
> > the namespaces) could not be traced back
> > to the PCI device, and a PCI hotplug would not
> > 'automatically' disconnect the nvme block devices.
> > 
> > Plus the subsystem would be independent from the NVMe
> > PCI devices, so you could have a subsystem with
> > no controllers attached. And one would wonder who
> > should be responsible for cleaning up that.
> > 
> 
> Thanks for the details !
> 
> My use case is the simple one with no nvme subsystem/namespaces:
> - hotplug a pci nvme device (nvme controller) as in the following CLI (which
> automatically put the drive into a default namespace)
> 
> ./qemu-system-aarch64 -nographic -M virt \
>    -drive file=nvme0.disk,if=none,id=nvme-drive0 \
>    -device nvme,serial=nvme0,id=nvmedev0,drive=nvme-drive0
> 

AFAIK, you just need a pci root port to plug the device into.

  -drive file=nvme0.disk,if=none,id=nvme-drive0 \
  -device "pcie-root-port,id=pcie_root_port0,chassis=1,slot=0" \
  -device nvme,serial=nvme0,id=nvmedev0,drive=nvme-drive0

Then, you can use the qemu monitor to `device_del nvmedev0` and add it
with `device_add nvme,serial=nvme0,id=nvmedev0,drive=nvme-drive0`. The
"drive" (blockdev) will stick around after the device_del.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: NVME hotplug support ?
  2024-01-29 13:13       ` Damien Hedde
  2024-01-29 13:37         ` Klaus Jensen
@ 2024-01-29 15:35         ` Hannes Reinecke
  2024-02-05 13:33           ` Damien Hedde
  1 sibling, 1 reply; 10+ messages in thread
From: Hannes Reinecke @ 2024-01-29 15:35 UTC (permalink / raw)
  To: Damien Hedde, Philippe Mathieu-Daudé, qemu-block,
	Klaus Jensen, Keith Busch
  Cc: qemu-devel, Titouan Huard, Markus Armbruster

On 1/29/24 14:13, Damien Hedde wrote:
> 
> 
> On 1/24/24 08:47, Hannes Reinecke wrote:
>> On 1/24/24 07:52, Philippe Mathieu-Daudé wrote:
>>> Hi Hannes,
>>>
>>> [+Markus as QOM/QDev rubber duck]
>>>
>>> On 23/1/24 13:40, Hannes Reinecke wrote:
>>>> On 1/23/24 11:59, Damien Hedde wrote:
>>>>> Hi all,
>>>>>
>>>>> We are currently looking into hotplugging nvme devices and it is 
>>>>> currently not possible:
>>>>> When nvme was introduced 2 years ago, the feature was disabled.
>>>>>> commit cc6fb6bc506e6c47ed604fcb7b7413dff0b7d845
>>>>>> Author: Klaus Jensen
>>>>>> Date:   Tue Jul 6 10:48:40 2021 +0200
>>>>>>
>>>>>>     hw/nvme: mark nvme-subsys non-hotpluggable
>>>>>>     We currently lack the infrastructure to handle subsystem 
>>>>>> hotplugging, so
>>>>>>     disable it.
>>>>>
>>>>> Do someone know what's lacking or anyone have some tips/idea of 
>>>>> what we should develop to add the support ?
>>>>>
>>>> Problem is that the object model is messed up. In qemu namespaces 
>>>> are attached to controllers, which in turn are children of the PCI 
>>>> device.
>>>> There are subsystems, but these just reference the controller.
>>>>
>>>> So if you hotunplug the PCI device you detach/destroy the controller 
>>>> and detach the namespaces from the controller.
>>>> But if you hotplug the PCI device again the NVMe controller will be 
>>>> attached to the PCI device, but the namespace are still be detached.
>>>>
>>>> Klaus said he was going to fix that, and I dimly remember some patches
>>>> floating around. But apparently it never went anywhere.
>>>>
>>>> Fundamental problem is that the NVMe hierarchy as per spec is 
>>>> incompatible with the qemu object model; qemu requires a strict
>>>> tree model where every object has exactly _one_ parent.
>>>
>>> The modelling problem is not clear to me.
>>> Do you have an example of how the NVMe hierarchy should be?
>>>
>> Sure.
>>
>> As per NVMe spec we have this hierarchy:
>>
>>       --->  subsys ---
>>      |                |
>>      |                V
>> controller      namespaces
>>
>> There can be several controllers, and several
>> namespaces.
>> The initiator (ie the linux 'nvme' driver) connects
>> to a controller, queries the subsystem for the attached
>> namespaces, and presents each namespace as a block device.
>>
>> For Qemu we have the problem that every device _must_ be
>> a direct descendant of the parent (expressed by the fact
>> that each 'parent' object is embedded in the device object).
>>
>> So if we were to present a NVMe PCI device, the controller
>> must be derived from the PCI device:
>>
>> pci -> controller
>>
>> but now we have to express the NVMe hierarchy, too:
>>
>> pci -> ctrl1 -> subsys1 -> namespace1
>>
>> which actually works.
>> We can easily attach several namespaces:
>>
>> pci -> ctrl1 ->subsys1 -> namespace2
>>
>> For a single controller and a single subsystem.
>> However, as mentioned above, there can be _several_
>> controllers attached to the same subsystem.
>> So we can express the second controller:
>>
>> pci -> ctrl2
>>
>> but we cannot attach the controller to 'subsys1'
>> as then 'subsys1' would need to be derived from
>> 'ctrl2', and not (as it is now) from 'ctrl1'.
>>
>> The most logical step would be to have 'subsystems'
>> their own entity, independent of any controllers.
>> But then the block devices (which are derived from
>> the namespaces) could not be traced back
>> to the PCI device, and a PCI hotplug would not
>> 'automatically' disconnect the nvme block devices.
>>
>> Plus the subsystem would be independent from the NVMe
>> PCI devices, so you could have a subsystem with
>> no controllers attached. And one would wonder who
>> should be responsible for cleaning up that.
>>
> 
> Thanks for the details !
> 
> My use case is the simple one with no nvme subsystem/namespaces:
> - hotplug a pci nvme device (nvme controller) as in the following CLI 
> (which automatically put the drive into a default namespace)
> 
> ./qemu-system-aarch64 -nographic -M virt \
>     -drive file=nvme0.disk,if=none,id=nvme-drive0 \
>     -device nvme,serial=nvme0,id=nvmedev0,drive=nvme-drive0
> 
> In the simple tree approach where subsystems and namespaces are not 
> shared by controllers. We could delete the whole nvme hiearchy under the 
> controller while unplugging it ?
> 
> In your first message, you said
>   > So if you hotunplug the PCI device you detach/destroy the controller
>   > and detach the namespaces from the controller.
>   > But if you hotplug the PCI device again the NVMe controller will be
>   > attached to the PCI device, but the namespace are still be detached.
> 
> Do you mean that if we unplug the pci device we HAVE to keep some nvme 
> objects so that if we plug the device back we can recover them ?
> Or just that it's hard to unplug nvme objects if they are not real qom 
> children of pci device ?
> 
Key point for trying on PCI hotplug with qemu is to attach the PCI 
device to it's own PCI root port. Cf the mail from Klaus Jensen for details.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Ivo Totev, Andrew McDonald,
Werner Knoblich



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: NVME hotplug support ?
  2024-01-29 15:35         ` Hannes Reinecke
@ 2024-02-05 13:33           ` Damien Hedde
  0 siblings, 0 replies; 10+ messages in thread
From: Damien Hedde @ 2024-02-05 13:33 UTC (permalink / raw)
  To: Hannes Reinecke, qemu-block, Klaus Jensen
  Cc: qemu-devel, Keith Busch, Titouan Huard, Markus Armbruster,
	Philippe Mathieu-Daudé


On 1/29/24 16:35, Hannes Reinecke wrote:
> On 1/29/24 14:13, Damien Hedde wrote:
>>
>>
>> On 1/24/24 08:47, Hannes Reinecke wrote:
>>> On 1/24/24 07:52, Philippe Mathieu-Daudé wrote:
>>>> Hi Hannes,
>>>>
>>>> [+Markus as QOM/QDev rubber duck]
>>>>
>>>> On 23/1/24 13:40, Hannes Reinecke wrote:
>>>>> On 1/23/24 11:59, Damien Hedde wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> We are currently looking into hotplugging nvme devices and it is 
>>>>>> currently not possible:
>>>>>> When nvme was introduced 2 years ago, the feature was disabled.
>>>>>>> commit cc6fb6bc506e6c47ed604fcb7b7413dff0b7d845
>>>>>>> Author: Klaus Jensen
>>>>>>> Date:   Tue Jul 6 10:48:40 2021 +0200
>>>>>>>
>>>>>>>     hw/nvme: mark nvme-subsys non-hotpluggable
>>>>>>>     We currently lack the infrastructure to handle subsystem 
>>>>>>> hotplugging, so
>>>>>>>     disable it.
>>>>>>
>>>>>> Do someone know what's lacking or anyone have some tips/idea of 
>>>>>> what we should develop to add the support ?
>>>>>>
>>>>> Problem is that the object model is messed up. In qemu namespaces 
>>>>> are attached to controllers, which in turn are children of the PCI 
>>>>> device.
>>>>> There are subsystems, but these just reference the controller.
>>>>>
>>>>> So if you hotunplug the PCI device you detach/destroy the 
>>>>> controller and detach the namespaces from the controller.
>>>>> But if you hotplug the PCI device again the NVMe controller will be 
>>>>> attached to the PCI device, but the namespace are still be detached.
>>>>>
>>>>> Klaus said he was going to fix that, and I dimly remember some patches
>>>>> floating around. But apparently it never went anywhere.
>>>>>
>>>>> Fundamental problem is that the NVMe hierarchy as per spec is 
>>>>> incompatible with the qemu object model; qemu requires a strict
>>>>> tree model where every object has exactly _one_ parent.
>>>>
>>>> The modelling problem is not clear to me.
>>>> Do you have an example of how the NVMe hierarchy should be?
>>>>
>>> Sure.
>>>
>>> As per NVMe spec we have this hierarchy:
>>>
>>>       --->  subsys ---
>>>      |                |
>>>      |                V
>>> controller      namespaces
>>>
>>> There can be several controllers, and several
>>> namespaces.
>>> The initiator (ie the linux 'nvme' driver) connects
>>> to a controller, queries the subsystem for the attached
>>> namespaces, and presents each namespace as a block device.
>>>
>>> For Qemu we have the problem that every device _must_ be
>>> a direct descendant of the parent (expressed by the fact
>>> that each 'parent' object is embedded in the device object).
>>>
>>> So if we were to present a NVMe PCI device, the controller
>>> must be derived from the PCI device:
>>>
>>> pci -> controller
>>>
>>> but now we have to express the NVMe hierarchy, too:
>>>
>>> pci -> ctrl1 -> subsys1 -> namespace1
>>>
>>> which actually works.
>>> We can easily attach several namespaces:
>>>
>>> pci -> ctrl1 ->subsys1 -> namespace2
>>>
>>> For a single controller and a single subsystem.
>>> However, as mentioned above, there can be _several_
>>> controllers attached to the same subsystem.
>>> So we can express the second controller:
>>>
>>> pci -> ctrl2
>>>
>>> but we cannot attach the controller to 'subsys1'
>>> as then 'subsys1' would need to be derived from
>>> 'ctrl2', and not (as it is now) from 'ctrl1'.
>>>
>>> The most logical step would be to have 'subsystems'
>>> their own entity, independent of any controllers.
>>> But then the block devices (which are derived from
>>> the namespaces) could not be traced back
>>> to the PCI device, and a PCI hotplug would not
>>> 'automatically' disconnect the nvme block devices.
>>>
>>> Plus the subsystem would be independent from the NVMe
>>> PCI devices, so you could have a subsystem with
>>> no controllers attached. And one would wonder who
>>> should be responsible for cleaning up that.
>>>
>>
>> Thanks for the details !
>>
>> My use case is the simple one with no nvme subsystem/namespaces:
>> - hotplug a pci nvme device (nvme controller) as in the following CLI 
>> (which automatically put the drive into a default namespace)
>>
>> ./qemu-system-aarch64 -nographic -M virt \
>>     -drive file=nvme0.disk,if=none,id=nvme-drive0 \
>>     -device nvme,serial=nvme0,id=nvmedev0,drive=nvme-drive0
>>
>> In the simple tree approach where subsystems and namespaces are not 
>> shared by controllers. We could delete the whole nvme hiearchy under 
>> the controller while unplugging it ?
>>
>> In your first message, you said
>>   > So if you hotunplug the PCI device you detach/destroy the controller
>>   > and detach the namespaces from the controller.
>>   > But if you hotplug the PCI device again the NVMe controller will be
>>   > attached to the PCI device, but the namespace are still be detached.
>>
>> Do you mean that if we unplug the pci device we HAVE to keep some nvme 
>> objects so that if we plug the device back we can recover them ?
>> Or just that it's hard to unplug nvme objects if they are not real qom 
>> children of pci device ?
>>
> Key point for trying on PCI hotplug with qemu is to attach the PCI 
> device to it's own PCI root port. Cf the mail from Klaus Jensen for 
> details.
> 
> Cheers,
> 
> Hannes

Thanks a lot from both of you. I missed that.

Damien







^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-02-05 13:34 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-23 10:59 NVME hotplug support ? Damien Hedde
2024-01-23 11:15 ` Klaus Jensen
2024-01-23 12:40 ` Hannes Reinecke
2024-01-24  6:52   ` Philippe Mathieu-Daudé
2024-01-24  7:47     ` Hannes Reinecke
2024-01-29 13:13       ` Damien Hedde
2024-01-29 13:37         ` Klaus Jensen
2024-01-29 15:35         ` Hannes Reinecke
2024-02-05 13:33           ` Damien Hedde
2024-01-24  7:39   ` Klaus Jensen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).