virtio-iommu hotplug issue

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* virtio-iommu hotplug issue
@ 2023-04-13  4:49 Akihiko Odaki
  2023-04-13 10:40 ` Jean-Philippe Brucker
  0 siblings, 1 reply; 6+ messages in thread
From: Akihiko Odaki @ 2023-04-13  4:49 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Eric Auger
  Cc: virtio-dev, virtualization, linux-kernel, qemu-devel

Hi,

Recently I encountered a problem with the combination of Linux's 
virtio-iommu driver and QEMU when a SR-IOV virtual function gets 
disabled. I'd like to ask you what kind of solution is appropriate here 
and implement the solution if possible.

A PCIe device implementing the SR-IOV specification exports a virtual 
function, and the guest can enable or disable it at runtime by writing 
to a configuration register. This effectively looks like a PCI device is 
hotplugged for the guest. In such a case, the kernel assumes the 
endpoint is detached from the virtio-iommu domain, but QEMU actually 
does not detach it.

This inconsistent view of the removed device sometimes prevents the VM 
from correctly performing the following procedure, for example:
1. Enable a VF.
2. Disable the VF.
3. Open a vfio container.
4. Open the group which the PF belongs to.
5. Add the group to the vfio container.
6. Map some memory region.
7. Close the group.
8. Close the vfio container.
9. Repeat 3-8

When the VF gets disabled, the kernel assumes the endpoint is detached 
from the IOMMU domain, but QEMU actually doesn't detach it. Later, the 
domain will be reused in step 3-8.

In step 7, the PF will be detached, and the kernel thinks there is no 
endpoint attached and the mapping the domain holds is cleared, but the 
VF endpoint is still attached and the mapping is kept intact.

In step 9, the same domain will be reused again, and the kernel requests 
to create a new mapping, but it will conflict with the existing mapping 
and result in -EINVAL.

This problem can be fixed by either of:
- requesting the detachment of the endpoint from the guest when the PCI 
device is unplugged (the VF is disabled)
- detecting that the PCI device is gone and automatically detach it on 
QEMU-side.

It is not completely clear for me which solution is more appropriate as 
the virtio-iommu specification is written in a way independent of the 
endpoint mechanism and does not say what should be done when a PCI 
device is unplugged.

Regards,
Akihiko Odaki

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: virtio-iommu hotplug issue
  2023-04-13  4:49 virtio-iommu hotplug issue Akihiko Odaki
@ 2023-04-13 10:40 ` Jean-Philippe Brucker
  2023-04-13 11:01   ` Akihiko Odaki
  0 siblings, 1 reply; 6+ messages in thread
From: Jean-Philippe Brucker @ 2023-04-13 10:40 UTC (permalink / raw)
  To: Akihiko Odaki
  Cc: Eric Auger, virtio-dev, virtualization, linux-kernel, qemu-devel

Hello,

On Thu, Apr 13, 2023 at 01:49:43PM +0900, Akihiko Odaki wrote:
> Hi,
> 
> Recently I encountered a problem with the combination of Linux's
> virtio-iommu driver and QEMU when a SR-IOV virtual function gets disabled.
> I'd like to ask you what kind of solution is appropriate here and implement
> the solution if possible.
> 
> A PCIe device implementing the SR-IOV specification exports a virtual
> function, and the guest can enable or disable it at runtime by writing to a
> configuration register. This effectively looks like a PCI device is
> hotplugged for the guest.

Just so I understand this better: the guest gets a whole PCIe device PF
that implements SR-IOV, and so the guest can dynamically create VFs?  Out
of curiosity, is that a hardware device assigned to the guest with VFIO,
or a device emulated by QEMU?

> In such a case, the kernel assumes the endpoint is
> detached from the virtio-iommu domain, but QEMU actually does not detach it.
> 
> This inconsistent view of the removed device sometimes prevents the VM from
> correctly performing the following procedure, for example:
> 1. Enable a VF.
> 2. Disable the VF.
> 3. Open a vfio container.
> 4. Open the group which the PF belongs to.
> 5. Add the group to the vfio container.
> 6. Map some memory region.
> 7. Close the group.
> 8. Close the vfio container.
> 9. Repeat 3-8
> 
> When the VF gets disabled, the kernel assumes the endpoint is detached from
> the IOMMU domain, but QEMU actually doesn't detach it. Later, the domain
> will be reused in step 3-8.
> 
> In step 7, the PF will be detached, and the kernel thinks there is no
> endpoint attached and the mapping the domain holds is cleared, but the VF
> endpoint is still attached and the mapping is kept intact.
> 
> In step 9, the same domain will be reused again, and the kernel requests to
> create a new mapping, but it will conflict with the existing mapping and
> result in -EINVAL.
> 
> This problem can be fixed by either of:
> - requesting the detachment of the endpoint from the guest when the PCI
> device is unplugged (the VF is disabled)

Yes, I think this is an issue in the virtio-iommu driver, which should be
sending a DETACH request when the VF is disabled, likely from
viommu_release_device(). I'll work on a fix unless you would like to do it

> - detecting that the PCI device is gone and automatically detach it on
> QEMU-side.
> 
> It is not completely clear for me which solution is more appropriate as the
> virtio-iommu specification is written in a way independent of the endpoint
> mechanism and does not say what should be done when a PCI device is
> unplugged.

Yes, I'm not sure it's in scope for the specification, it's more about
software guidance

Thanks,
Jean


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: virtio-iommu hotplug issue
  2023-04-13 10:40 ` Jean-Philippe Brucker
@ 2023-04-13 11:01   ` Akihiko Odaki
  2023-04-13 13:39     ` Eric Auger
  2023-04-14 15:17     ` Jean-Philippe Brucker
  0 siblings, 2 replies; 6+ messages in thread
From: Akihiko Odaki @ 2023-04-13 11:01 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Eric Auger, virtio-dev, virtualization, linux-kernel, qemu-devel

On 2023/04/13 19:40, Jean-Philippe Brucker wrote:
> Hello,
> 
> On Thu, Apr 13, 2023 at 01:49:43PM +0900, Akihiko Odaki wrote:
>> Hi,
>>
>> Recently I encountered a problem with the combination of Linux's
>> virtio-iommu driver and QEMU when a SR-IOV virtual function gets disabled.
>> I'd like to ask you what kind of solution is appropriate here and implement
>> the solution if possible.
>>
>> A PCIe device implementing the SR-IOV specification exports a virtual
>> function, and the guest can enable or disable it at runtime by writing to a
>> configuration register. This effectively looks like a PCI device is
>> hotplugged for the guest.
> 
> Just so I understand this better: the guest gets a whole PCIe device PF
> that implements SR-IOV, and so the guest can dynamically create VFs?  Out
> of curiosity, is that a hardware device assigned to the guest with VFIO,
> or a device emulated by QEMU?

Yes, that's right. The guest can dynamically create and delete VFs. The 
device is emulated by QEMU: igb, an Intel NIC recently added to QEMU and 
projected to be released as part of QEMU 8.0.

> 
>> In such a case, the kernel assumes the endpoint is
>> detached from the virtio-iommu domain, but QEMU actually does not detach it.
>>
>> This inconsistent view of the removed device sometimes prevents the VM from
>> correctly performing the following procedure, for example:
>> 1. Enable a VF.
>> 2. Disable the VF.
>> 3. Open a vfio container.
>> 4. Open the group which the PF belongs to.
>> 5. Add the group to the vfio container.
>> 6. Map some memory region.
>> 7. Close the group.
>> 8. Close the vfio container.
>> 9. Repeat 3-8
>>
>> When the VF gets disabled, the kernel assumes the endpoint is detached from
>> the IOMMU domain, but QEMU actually doesn't detach it. Later, the domain
>> will be reused in step 3-8.
>>
>> In step 7, the PF will be detached, and the kernel thinks there is no
>> endpoint attached and the mapping the domain holds is cleared, but the VF
>> endpoint is still attached and the mapping is kept intact.
>>
>> In step 9, the same domain will be reused again, and the kernel requests to
>> create a new mapping, but it will conflict with the existing mapping and
>> result in -EINVAL.
>>
>> This problem can be fixed by either of:
>> - requesting the detachment of the endpoint from the guest when the PCI
>> device is unplugged (the VF is disabled)
> 
> Yes, I think this is an issue in the virtio-iommu driver, which should be
> sending a DETACH request when the VF is disabled, likely from
> viommu_release_device(). I'll work on a fix unless you would like to do it

It will be nice if you prepare a fix. I will test your patch with my 
workload if you share it with me.

Regards,
Akihiko Odaki

> 
>> - detecting that the PCI device is gone and automatically detach it on
>> QEMU-side.
>>
>> It is not completely clear for me which solution is more appropriate as the
>> virtio-iommu specification is written in a way independent of the endpoint
>> mechanism and does not say what should be done when a PCI device is
>> unplugged.
> 
> Yes, I'm not sure it's in scope for the specification, it's more about
> software guidance
> 
> Thanks,
> Jean


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: virtio-iommu hotplug issue
  2023-04-13 11:01   ` Akihiko Odaki
@ 2023-04-13 13:39     ` Eric Auger
  2023-04-14  2:51       ` Akihiko Odaki
  2023-04-14 15:17     ` Jean-Philippe Brucker
  1 sibling, 1 reply; 6+ messages in thread
From: Eric Auger @ 2023-04-13 13:39 UTC (permalink / raw)
  To: Akihiko Odaki, Jean-Philippe Brucker
  Cc: virtio-dev, virtualization, linux-kernel, qemu-devel

Hi,

On 4/13/23 13:01, Akihiko Odaki wrote:
> On 2023/04/13 19:40, Jean-Philippe Brucker wrote:
>> Hello,
>>
>> On Thu, Apr 13, 2023 at 01:49:43PM +0900, Akihiko Odaki wrote:
>>> Hi,
>>>
>>> Recently I encountered a problem with the combination of Linux's
>>> virtio-iommu driver and QEMU when a SR-IOV virtual function gets
>>> disabled.
>>> I'd like to ask you what kind of solution is appropriate here and
>>> implement
>>> the solution if possible.
>>>
>>> A PCIe device implementing the SR-IOV specification exports a virtual
>>> function, and the guest can enable or disable it at runtime by
>>> writing to a
>>> configuration register. This effectively looks like a PCI device is
>>> hotplugged for the guest.
>>
>> Just so I understand this better: the guest gets a whole PCIe device PF
>> that implements SR-IOV, and so the guest can dynamically create VFs? 
>> Out
>> of curiosity, is that a hardware device assigned to the guest with VFIO,
>> or a device emulated by QEMU?
>
> Yes, that's right. The guest can dynamically create and delete VFs.
> The device is emulated by QEMU: igb, an Intel NIC recently added to
> QEMU and projected to be released as part of QEMU 8.0.
From below description In understand you then bind this emulated device
to VFIO on guest, correct?
>
>>
>>> In such a case, the kernel assumes the endpoint is
>>> detached from the virtio-iommu domain, but QEMU actually does not
>>> detach it.
The QEMU virtio-iommu device executes commands from the virtio-iommu
driver and my understanding is the VFIO infra is not in trouble here. As
suggested by Jean, a detach command probably is missed.
>>>
>>> This inconsistent view of the removed device sometimes prevents the
>>> VM from
>>> correctly performing the following procedure, for example:
>>> 1. Enable a VF.
>>> 2. Disable the VF.
>>> 3. Open a vfio container.
>>> 4. Open the group which the PF belongs to.
>>> 5. Add the group to the vfio container.
>>> 6. Map some memory region.
>>> 7. Close the group.
>>> 8. Close the vfio container.
>>> 9. Repeat 3-8
>>>
>>> When the VF gets disabled, the kernel assumes the endpoint is
>>> detached from
>>> the IOMMU domain, but QEMU actually doesn't detach it. Later, the
>>> domain
>>> will be reused in step 3-8.
>>>
>>> In step 7, the PF will be detached, and the kernel thinks there is no
>>> endpoint attached and the mapping the domain holds is cleared, but
>>> the VF
>>> endpoint is still attached and the mapping is kept intact.
>>>
>>> In step 9, the same domain will be reused again, and the kernel
>>> requests to
>>> create a new mapping, but it will conflict with the existing mapping
>>> and
>>> result in -EINVAL.
>>>
>>> This problem can be fixed by either of:
>>> - requesting the detachment of the endpoint from the guest when the PCI
>>> device is unplugged (the VF is disabled)
>>
>> Yes, I think this is an issue in the virtio-iommu driver, which
>> should be
>> sending a DETACH request when the VF is disabled, likely from
>> viommu_release_device(). I'll work on a fix unless you would like to
>> do it
>
> It will be nice if you prepare a fix. I will test your patch with my
> workload if you share it with me.

I can help testing too

Thanks

Eric
>
> Regards,
> Akihiko Odaki
>
>>
>>> - detecting that the PCI device is gone and automatically detach it on
>>> QEMU-side.
>>>
>>> It is not completely clear for me which solution is more appropriate
>>> as the
>>> virtio-iommu specification is written in a way independent of the
>>> endpoint
>>> mechanism and does not say what should be done when a PCI device is
>>> unplugged.
>>
>> Yes, I'm not sure it's in scope for the specification, it's more about
>> software guidance
>>
>> Thanks,
>> Jean
>



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: virtio-iommu hotplug issue
  2023-04-13 13:39     ` Eric Auger
@ 2023-04-14  2:51       ` Akihiko Odaki
  0 siblings, 0 replies; 6+ messages in thread
From: Akihiko Odaki @ 2023-04-14  2:51 UTC (permalink / raw)
  To: eric.auger, Jean-Philippe Brucker
  Cc: virtio-dev, virtualization, linux-kernel, qemu-devel

On 2023/04/13 22:39, Eric Auger wrote:
> Hi,
> 
> On 4/13/23 13:01, Akihiko Odaki wrote:
>> On 2023/04/13 19:40, Jean-Philippe Brucker wrote:
>>> Hello,
>>>
>>> On Thu, Apr 13, 2023 at 01:49:43PM +0900, Akihiko Odaki wrote:
>>>> Hi,
>>>>
>>>> Recently I encountered a problem with the combination of Linux's
>>>> virtio-iommu driver and QEMU when a SR-IOV virtual function gets
>>>> disabled.
>>>> I'd like to ask you what kind of solution is appropriate here and
>>>> implement
>>>> the solution if possible.
>>>>
>>>> A PCIe device implementing the SR-IOV specification exports a virtual
>>>> function, and the guest can enable or disable it at runtime by
>>>> writing to a
>>>> configuration register. This effectively looks like a PCI device is
>>>> hotplugged for the guest.
>>>
>>> Just so I understand this better: the guest gets a whole PCIe device PF
>>> that implements SR-IOV, and so the guest can dynamically create VFs?
>>> Out
>>> of curiosity, is that a hardware device assigned to the guest with VFIO,
>>> or a device emulated by QEMU?
>>
>> Yes, that's right. The guest can dynamically create and delete VFs.
>> The device is emulated by QEMU: igb, an Intel NIC recently added to
>> QEMU and projected to be released as part of QEMU 8.0.
>  From below description In understand you then bind this emulated device
> to VFIO on guest, correct?

Yes, that's correct.

>>
>>>
>>>> In such a case, the kernel assumes the endpoint is
>>>> detached from the virtio-iommu domain, but QEMU actually does not
>>>> detach it.
> The QEMU virtio-iommu device executes commands from the virtio-iommu
> driver and my understanding is the VFIO infra is not in trouble here. As
> suggested by Jean, a detach command probably is missed.

VFIO just illustrates the problem and the origin of the problem is 
indeed virtio-iommu.

Regards,
Akihiko Odaki

>>>>
>>>> This inconsistent view of the removed device sometimes prevents the
>>>> VM from
>>>> correctly performing the following procedure, for example:
>>>> 1. Enable a VF.
>>>> 2. Disable the VF.
>>>> 3. Open a vfio container.
>>>> 4. Open the group which the PF belongs to.
>>>> 5. Add the group to the vfio container.
>>>> 6. Map some memory region.
>>>> 7. Close the group.
>>>> 8. Close the vfio container.
>>>> 9. Repeat 3-8
>>>>
>>>> When the VF gets disabled, the kernel assumes the endpoint is
>>>> detached from
>>>> the IOMMU domain, but QEMU actually doesn't detach it. Later, the
>>>> domain
>>>> will be reused in step 3-8.
>>>>
>>>> In step 7, the PF will be detached, and the kernel thinks there is no
>>>> endpoint attached and the mapping the domain holds is cleared, but
>>>> the VF
>>>> endpoint is still attached and the mapping is kept intact.
>>>>
>>>> In step 9, the same domain will be reused again, and the kernel
>>>> requests to
>>>> create a new mapping, but it will conflict with the existing mapping
>>>> and
>>>> result in -EINVAL.
>>>>
>>>> This problem can be fixed by either of:
>>>> - requesting the detachment of the endpoint from the guest when the PCI
>>>> device is unplugged (the VF is disabled)
>>>
>>> Yes, I think this is an issue in the virtio-iommu driver, which
>>> should be
>>> sending a DETACH request when the VF is disabled, likely from
>>> viommu_release_device(). I'll work on a fix unless you would like to
>>> do it
>>
>> It will be nice if you prepare a fix. I will test your patch with my
>> workload if you share it with me.
> 
> I can help testing too
> 
> Thanks
> 
> Eric
>>
>> Regards,
>> Akihiko Odaki
>>
>>>
>>>> - detecting that the PCI device is gone and automatically detach it on
>>>> QEMU-side.
>>>>
>>>> It is not completely clear for me which solution is more appropriate
>>>> as the
>>>> virtio-iommu specification is written in a way independent of the
>>>> endpoint
>>>> mechanism and does not say what should be done when a PCI device is
>>>> unplugged.
>>>
>>> Yes, I'm not sure it's in scope for the specification, it's more about
>>> software guidance
>>>
>>> Thanks,
>>> Jean
>>
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: virtio-iommu hotplug issue
  2023-04-13 11:01   ` Akihiko Odaki
  2023-04-13 13:39     ` Eric Auger
@ 2023-04-14 15:17     ` Jean-Philippe Brucker
  1 sibling, 0 replies; 6+ messages in thread
From: Jean-Philippe Brucker @ 2023-04-14 15:17 UTC (permalink / raw)
  To: Akihiko Odaki
  Cc: Eric Auger, virtio-dev, virtualization, linux-kernel, qemu-devel

On Thu, Apr 13, 2023 at 08:01:54PM +0900, Akihiko Odaki wrote:
> Yes, that's right. The guest can dynamically create and delete VFs. The
> device is emulated by QEMU: igb, an Intel NIC recently added to QEMU and
> projected to be released as part of QEMU 8.0.

Ah great, that's really useful, I'll add it to my tests

> > Yes, I think this is an issue in the virtio-iommu driver, which should be
> > sending a DETACH request when the VF is disabled, likely from
> > viommu_release_device(). I'll work on a fix unless you would like to do it
> 
> It will be nice if you prepare a fix. I will test your patch with my
> workload if you share it with me.

I sent a fix:
https://lore.kernel.org/linux-iommu/20230414150744.562456-1-jean-philippe@linaro.org/

Thank you for reporting this, it must have been annoying to debug

Thanks,
Jean



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-04-14 15:18 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-04-13  4:49 virtio-iommu hotplug issue Akihiko Odaki
2023-04-13 10:40 ` Jean-Philippe Brucker
2023-04-13 11:01   ` Akihiko Odaki
2023-04-13 13:39     ` Eric Auger
2023-04-14  2:51       ` Akihiko Odaki
2023-04-14 15:17     ` Jean-Philippe Brucker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).