[Qemu-devel] Questions about VFIO enabling MSIX vector

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] Questions about VFIO enabling MSIX vector
@ 2019-01-12  2:30 Heyi Guo
  2019-01-14 11:26 ` Auger Eric
  2019-01-14 16:07 ` Alex Williamson
  0 siblings, 2 replies; 8+ messages in thread
From: Heyi Guo @ 2019-01-12  2:30 UTC (permalink / raw)
  To: qemu-devel@nongnu.org; +Cc: Alex Williamson, Peter Maydell, wanghaibin 00208455

Hi folks,

I have some questions about vfio_msix_vector_do_use() in hw/vfio/pci.c, could you help to explain?

We can see that when guest tries to enable one specific MSIX vector by unmasking MSIX Vector Control, the access will be trapped and then into function vfio_msix_vector_do_use(). And we may go to the below branch in line 525:

520 <https://git.qemu.org/?p=qemu.git;a=blob;f=hw/vfio/pci.c;h=c0cb1ec289084eb1593f24dc423e647f4b29eb74;hb=HEAD#l520> /*
521 <https://git.qemu.org/?p=qemu.git;a=blob;f=hw/vfio/pci.c;h=c0cb1ec289084eb1593f24dc423e647f4b29eb74;hb=HEAD#l521> * We don't want to have the host allocate all possible MSI vectors
522 <https://git.qemu.org/?p=qemu.git;a=blob;f=hw/vfio/pci.c;h=c0cb1ec289084eb1593f24dc423e647f4b29eb74;hb=HEAD#l522> * for a device if they're not in use, so we shutdown and incrementally
523 <https://git.qemu.org/?p=qemu.git;a=blob;f=hw/vfio/pci.c;h=c0cb1ec289084eb1593f24dc423e647f4b29eb74;hb=HEAD#l523> * increase them as needed.
524 <https://git.qemu.org/?p=qemu.git;a=blob;f=hw/vfio/pci.c;h=c0cb1ec289084eb1593f24dc423e647f4b29eb74;hb=HEAD#l524> */
525 <https://git.qemu.org/?p=qemu.git;a=blob;f=hw/vfio/pci.c;h=c0cb1ec289084eb1593f24dc423e647f4b29eb74;hb=HEAD#l525> if (vdev->nr_vectors < nr + 1) {
526 <https://git.qemu.org/?p=qemu.git;a=blob;f=hw/vfio/pci.c;h=c0cb1ec289084eb1593f24dc423e647f4b29eb74;hb=HEAD#l526> vfio_disable_irqindex(&vdev->vbasedev, VFIO_PCI_MSIX_IRQ_INDEX);
527 <https://git.qemu.org/?p=qemu.git;a=blob;f=hw/vfio/pci.c;h=c0cb1ec289084eb1593f24dc423e647f4b29eb74;hb=HEAD#l527> vdev->nr_vectors = nr + 1;
528 <https://git.qemu.org/?p=qemu.git;a=blob;f=hw/vfio/pci.c;h=c0cb1ec289084eb1593f24dc423e647f4b29eb74;hb=HEAD#l528> ret = vfio_enable_vectors(vdev, true);
529 <https://git.qemu.org/?p=qemu.git;a=blob;f=hw/vfio/pci.c;h=c0cb1ec289084eb1593f24dc423e647f4b29eb74;hb=HEAD#l529> if (ret) {
530 <https://git.qemu.org/?p=qemu.git;a=blob;f=hw/vfio/pci.c;h=c0cb1ec289084eb1593f24dc423e647f4b29eb74;hb=HEAD#l530> error_report("vfio: failed to enable vectors, %d", ret);
531 <https://git.qemu.org/?p=qemu.git;a=blob;f=hw/vfio/pci.c;h=c0cb1ec289084eb1593f24dc423e647f4b29eb74;hb=HEAD#l531> }

Here all MSIX vectors will be disabled first and then enabled, with one more MSIX. The comment is there but I still don't quite understand. It makes sense for not allocating all possible MSI vectors, but why shall we shutdown the whole MSI when being requested to enable one specific vector? Can't we just enable the user specified vector indexed by "nr"?

What's more, on ARM64 systems with GIC ITS, the kernel will issue an ITS discard command when disabling a MSI vector, which will drop currently pending MSI interrupt. If device driver in guest system enables some MSIs first and interrupts may come at any time, and then it tries to enable other MSIs, is it possible for the above code to cause interrupts missing?

I may misunderstand the whole thing... Any comment is appreciated.

Thanks,

Heyi

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] Questions about VFIO enabling MSIX vector
  2019-01-12  2:30 [Qemu-devel] Questions about VFIO enabling MSIX vector Heyi Guo
@ 2019-01-14 11:26 ` Auger Eric
  2019-01-14 16:07 ` Alex Williamson
  1 sibling, 0 replies; 8+ messages in thread
From: Auger Eric @ 2019-01-14 11:26 UTC (permalink / raw)
  To: Heyi Guo, qemu-devel@nongnu.org
  Cc: Peter Maydell, Alex Williamson, wanghaibin 00208455

Hi,

On 1/12/19 3:30 AM, Heyi Guo wrote:
> Hi folks,
> 
> I have some questions about vfio_msix_vector_do_use() in hw/vfio/pci.c,
> could you help to explain?
> 
> We can see that when guest tries to enable one specific MSIX vector by
> unmasking MSIX Vector Control, the access will be trapped and then into
> function vfio_msix_vector_do_use(). And we may go to the below branch in
> line 525:
> 
> 
> 520
> <https://git.qemu.org/?p=qemu.git;a=blob;f=hw/vfio/pci.c;h=c0cb1ec289084eb1593f24dc423e647f4b29eb74;hb=HEAD#l520>
> /*
> 521
> <https://git.qemu.org/?p=qemu.git;a=blob;f=hw/vfio/pci.c;h=c0cb1ec289084eb1593f24dc423e647f4b29eb74;hb=HEAD#l521>
> * We don't want to have the host allocate all possible MSI vectors
> 522
> <https://git.qemu.org/?p=qemu.git;a=blob;f=hw/vfio/pci.c;h=c0cb1ec289084eb1593f24dc423e647f4b29eb74;hb=HEAD#l522>
> * for a device if they're not in use, so we shutdown and incrementally
> 523
> <https://git.qemu.org/?p=qemu.git;a=blob;f=hw/vfio/pci.c;h=c0cb1ec289084eb1593f24dc423e647f4b29eb74;hb=HEAD#l523>
> * increase them as needed.
> 524
> <https://git.qemu.org/?p=qemu.git;a=blob;f=hw/vfio/pci.c;h=c0cb1ec289084eb1593f24dc423e647f4b29eb74;hb=HEAD#l524>
> */
> 525
> <https://git.qemu.org/?p=qemu.git;a=blob;f=hw/vfio/pci.c;h=c0cb1ec289084eb1593f24dc423e647f4b29eb74;hb=HEAD#l525>
> if (vdev->nr_vectors < nr + 1) {
> 526
> <https://git.qemu.org/?p=qemu.git;a=blob;f=hw/vfio/pci.c;h=c0cb1ec289084eb1593f24dc423e647f4b29eb74;hb=HEAD#l526>
> vfio_disable_irqindex(&vdev->vbasedev, VFIO_PCI_MSIX_IRQ_INDEX);
> 527
> <https://git.qemu.org/?p=qemu.git;a=blob;f=hw/vfio/pci.c;h=c0cb1ec289084eb1593f24dc423e647f4b29eb74;hb=HEAD#l527>
> vdev->nr_vectors = nr + 1;
> 528
> <https://git.qemu.org/?p=qemu.git;a=blob;f=hw/vfio/pci.c;h=c0cb1ec289084eb1593f24dc423e647f4b29eb74;hb=HEAD#l528>
> ret = vfio_enable_vectors(vdev, true);
> 529
> <https://git.qemu.org/?p=qemu.git;a=blob;f=hw/vfio/pci.c;h=c0cb1ec289084eb1593f24dc423e647f4b29eb74;hb=HEAD#l529>
> if (ret) {
> 530
> <https://git.qemu.org/?p=qemu.git;a=blob;f=hw/vfio/pci.c;h=c0cb1ec289084eb1593f24dc423e647f4b29eb74;hb=HEAD#l530>
> error_report("vfio: failed to enable vectors, %d", ret);
> 531
> <https://git.qemu.org/?p=qemu.git;a=blob;f=hw/vfio/pci.c;h=c0cb1ec289084eb1593f24dc423e647f4b29eb74;hb=HEAD#l531>
> }
> 
> Here all MSIX vectors will be disabled first and then enabled, with one
> more MSIX. The comment is there but I still don't quite understand. It
> makes sense for not allocating all possible MSI vectors, but why shall
> we shutdown the whole MSI when being requested to enable one specific
> vector? Can't we just enable the user specified vector indexed by "nr"?
This is not obvious to me either. vfio_enable_vectors() seems to handle
the nr_vectors as a whole since the origin of the code.
> 
> What's more, on ARM64 systems with GIC ITS, the kernel will issue an ITS
> discard command when disabling a MSI vector, which will drop currently
> pending MSI interrupt. If device driver in guest system enables some
> MSIs first and interrupts may come at any time, and then it tries to
> enable other MSIs, is it possible for the above code to cause interrupts
> missing?

Seems originally we considered this case of losing interrupts. This was
then removed by "vfio-pci: No spurious MSIs" (98cd5a5eaf).

Does this prevent any particular device from working?

Thanks

Eric

> 
> I may misunderstand the whole thing... Any comment is appreciated.
> 
> Thanks,
> 
> Heyi
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] Questions about VFIO enabling MSIX vector
  2019-01-12  2:30 [Qemu-devel] Questions about VFIO enabling MSIX vector Heyi Guo
  2019-01-14 11:26 ` Auger Eric
@ 2019-01-14 16:07 ` Alex Williamson
  2019-01-15  3:21   ` Heyi Guo
  1 sibling, 1 reply; 8+ messages in thread
From: Alex Williamson @ 2019-01-14 16:07 UTC (permalink / raw)
  To: Heyi Guo; +Cc: qemu-devel@nongnu.org, Peter Maydell, wanghaibin 00208455

On Sat, 12 Jan 2019 10:30:40 +0800
Heyi Guo <guoheyi@huawei.com> wrote:

> Hi folks,
> 
> I have some questions about vfio_msix_vector_do_use() in
> hw/vfio/pci.c, could you help to explain?
> 
> We can see that when guest tries to enable one specific MSIX vector
> by unmasking MSIX Vector Control, the access will be trapped and then
> into function vfio_msix_vector_do_use(). And we may go to the below
> branch in line 525:
> 
> 
> 520 /*
> 521  * We don't want to have the host allocate all possible MSI vectors
> 522  * for a device if they're not in use, so we shutdown and incrementally
> 523  * increase them as needed.
> 524  */
> 525  if (vdev->nr_vectors < nr + 1) {
> 526      vfio_disable_irqindex(&vdev->vbasedev, VFIO_PCI_MSIX_IRQ_INDEX);
> 527      vdev->nr_vectors = nr + 1;
> 528      ret = vfio_enable_vectors(vdev, true);
> 529      if (ret) {
> 530          error_report("vfio: failed to enable vectors, %d", ret);
> 531      }
> 
> Here all MSIX vectors will be disabled first and then enabled, with
> one more MSIX. The comment is there but I still don't quite
> understand. It makes sense for not allocating all possible MSI
> vectors, but why shall we shutdown the whole MSI when being requested
> to enable one specific vector? Can't we just enable the user
> specified vector indexed by "nr"?

What internal kernel API would vfio-pci make use of to achieve that?
We can't know the intentions of the guest and we have a limited set of
tools to manage the MSI-X state in the host kernel.  It's generally the
case that MSI-X vectors are only manipulated during driver setup and
shutdown, so while the current behavior isn't necessarily efficient, it
works within the constraints and generally doesn't have a runtime impact
on performance.  We also recognize that this behavior may change in the
future depending on more dynamic internal host kernel interfaces, thus
we export the VFIO_IRQ_INFO_NORESIZE flag to indicate to userspace
whether this procedure is required.

> What's more, on ARM64 systems with GIC ITS, the kernel will issue an
> ITS discard command when disabling a MSI vector, which will drop
> currently pending MSI interrupt. If device driver in guest system
> enables some MSIs first and interrupts may come at any time, and then
> it tries to enable other MSIs, is it possible for the above code to
> cause interrupts missing?

Interrupt reconfiguration is generally during driver setup or teardown
when the device is considered idle or lost interrupts are not a
concern.  If you have a device which manipulates interrupt
configuration runtime, you can expect that lost interrupts won't be the
only problem, it's also likely to experience much more overhead in the
interrupt manipulation path under assignment than on bare metal.  This
is another reason that well behaved drivers generally use relatively
static interrupt configurations initialized at driver setup time.
Thanks,

Alex

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] Questions about VFIO enabling MSIX vector
  2019-01-14 16:07 ` Alex Williamson
@ 2019-01-15  3:21   ` Heyi Guo
  2019-01-15 16:18     ` Alex Williamson
  0 siblings, 1 reply; 8+ messages in thread
From: Heyi Guo @ 2019-01-15  3:21 UTC (permalink / raw)
  To: Alex Williamson; +Cc: qemu-devel@nongnu.org, Peter Maydell, Wanghaibin (D)

Hi Alex,

Really appreciate your comments. I have some more questions below.


On 2019/1/15 0:07, Alex Williamson wrote:
> On Sat, 12 Jan 2019 10:30:40 +0800
> Heyi Guo <guoheyi@huawei.com> wrote:
>
>> Hi folks,
>>
>> I have some questions about vfio_msix_vector_do_use() in
>> hw/vfio/pci.c, could you help to explain?
>>
>> We can see that when guest tries to enable one specific MSIX vector
>> by unmasking MSIX Vector Control, the access will be trapped and then
>> into function vfio_msix_vector_do_use(). And we may go to the below
>> branch in line 525:
>>
>>
>> 520 /*
>> 521  * We don't want to have the host allocate all possible MSI vectors
>> 522  * for a device if they're not in use, so we shutdown and incrementally
>> 523  * increase them as needed.
>> 524  */
>> 525  if (vdev->nr_vectors < nr + 1) {
>> 526      vfio_disable_irqindex(&vdev->vbasedev, VFIO_PCI_MSIX_IRQ_INDEX);
>> 527      vdev->nr_vectors = nr + 1;
>> 528      ret = vfio_enable_vectors(vdev, true);
>> 529      if (ret) {
>> 530          error_report("vfio: failed to enable vectors, %d", ret);
>> 531      }
>>
>> Here all MSIX vectors will be disabled first and then enabled, with
>> one more MSIX. The comment is there but I still don't quite
>> understand. It makes sense for not allocating all possible MSI
>> vectors, but why shall we shutdown the whole MSI when being requested
>> to enable one specific vector? Can't we just enable the user
>> specified vector indexed by "nr"?
> What internal kernel API would vfio-pci make use of to achieve that?
> We can't know the intentions of the guest and we have a limited set of
> tools to manage the MSI-X state in the host kernel.  It's generally the
> case that MSI-X vectors are only manipulated during driver setup and
> shutdown, so while the current behavior isn't necessarily efficient, it
> works within the constraints and generally doesn't have a runtime impact
> on performance.  We also recognize that this behavior may change in the
> future depending on more dynamic internal host kernel interfaces, thus
> we export the VFIO_IRQ_INFO_NORESIZE flag to indicate to userspace
> whether this procedure is required.

I tried to enable the specified MSIX vector only, and finally understood the original QEMU code after always getting EINVAL from ioctl->VFIO_DEVICE_SET_IRQS.
Then I tried to allocate all MSIX vectors in vfio_msix_enable(), not only MSIX vector 0. The change works, but it definitely doesn't follow the comment in the code "We don't want to have the host allocate all possible MSI vectors for a device if they're not in use". May I ask what the side effect is if we really allocate all possible vectors at the beginning?


>
>> What's more, on ARM64 systems with GIC ITS, the kernel will issue an
>> ITS discard command when disabling a MSI vector, which will drop
>> currently pending MSI interrupt. If device driver in guest system
>> enables some MSIs first and interrupts may come at any time, and then
>> it tries to enable other MSIs, is it possible for the above code to
>> cause interrupts missing?
> Interrupt reconfiguration is generally during driver setup or teardown
> when the device is considered idle or lost interrupts are not a
> concern.  If you have a device which manipulates interrupt
> configuration runtime, you can expect that lost interrupts won't be the
> only problem,
On physical machine, if the driver enables vector 0 first, and then vector 1 later, will the vector 0 interrupt be lost for some possibility? In my opinion the manipulation of vector 1 should not affect the interrupt of vector 0, isn't it? But if this is possible in virtual world, what do we consider it as? A bug to be fixed in future, or a known limitation of virtualization that won't be fixed and all driver developers should pay attention to it?

Thanks,

Heyi

> it's also likely to experience much more overhead in the
> interrupt manipulation path under assignment than on bare metal.  This
> is another reason that well behaved drivers generally use relatively
> static interrupt configurations initialized at driver setup time.
> Thanks,
>
> Alex
>
> .
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] Questions about VFIO enabling MSIX vector
  2019-01-15  3:21   ` Heyi Guo
@ 2019-01-15 16:18     ` Alex Williamson
  2019-01-17 12:55       ` Heyi Guo
  0 siblings, 1 reply; 8+ messages in thread
From: Alex Williamson @ 2019-01-15 16:18 UTC (permalink / raw)
  To: Heyi Guo; +Cc: qemu-devel@nongnu.org, Peter Maydell, Wanghaibin (D)

On Tue, 15 Jan 2019 11:21:51 +0800
Heyi Guo <guoheyi@huawei.com> wrote:

> Hi Alex,
> 
> Really appreciate your comments. I have some more questions below.
> 
> 
> On 2019/1/15 0:07, Alex Williamson wrote:
> > On Sat, 12 Jan 2019 10:30:40 +0800
> > Heyi Guo <guoheyi@huawei.com> wrote:
> >  
> >> Hi folks,
> >>
> >> I have some questions about vfio_msix_vector_do_use() in
> >> hw/vfio/pci.c, could you help to explain?
> >>
> >> We can see that when guest tries to enable one specific MSIX vector
> >> by unmasking MSIX Vector Control, the access will be trapped and then
> >> into function vfio_msix_vector_do_use(). And we may go to the below
> >> branch in line 525:
> >>
> >>
> >> 520 /*
> >> 521  * We don't want to have the host allocate all possible MSI vectors
> >> 522  * for a device if they're not in use, so we shutdown and incrementally
> >> 523  * increase them as needed.
> >> 524  */
> >> 525  if (vdev->nr_vectors < nr + 1) {
> >> 526      vfio_disable_irqindex(&vdev->vbasedev, VFIO_PCI_MSIX_IRQ_INDEX);
> >> 527      vdev->nr_vectors = nr + 1;
> >> 528      ret = vfio_enable_vectors(vdev, true);
> >> 529      if (ret) {
> >> 530          error_report("vfio: failed to enable vectors, %d", ret);
> >> 531      }
> >>
> >> Here all MSIX vectors will be disabled first and then enabled, with
> >> one more MSIX. The comment is there but I still don't quite
> >> understand. It makes sense for not allocating all possible MSI
> >> vectors, but why shall we shutdown the whole MSI when being requested
> >> to enable one specific vector? Can't we just enable the user
> >> specified vector indexed by "nr"?  
> >
> > What internal kernel API would vfio-pci make use of to achieve that?
> > We can't know the intentions of the guest and we have a limited set of
> > tools to manage the MSI-X state in the host kernel.  It's generally the
> > case that MSI-X vectors are only manipulated during driver setup and
> > shutdown, so while the current behavior isn't necessarily efficient, it
> > works within the constraints and generally doesn't have a runtime impact
> > on performance.  We also recognize that this behavior may change in the
> > future depending on more dynamic internal host kernel interfaces, thus
> > we export the VFIO_IRQ_INFO_NORESIZE flag to indicate to userspace
> > whether this procedure is required.  
> 
> I tried to enable the specified MSIX vector only, and finally
> understood the original QEMU code after always getting EINVAL from
> ioctl->VFIO_DEVICE_SET_IRQS. Then I tried to allocate all MSIX
> vectors in vfio_msix_enable(), not only MSIX vector 0. The change
> works, but it definitely doesn't follow the comment in the code "We
> don't want to have the host allocate all possible MSI vectors for a
> device if they're not in use". May I ask what the side effect is if
> we really allocate all possible vectors at the beginning?

Host IRQ vector exhaustion.  Very few devices seem to make use of the
their full compliment of available vectors and some devices support a
very, very large number of vectors.  The current approach only consumes
the host resources that the guest is actually making use of.

> >> What's more, on ARM64 systems with GIC ITS, the kernel will issue
> >> an ITS discard command when disabling a MSI vector, which will drop
> >> currently pending MSI interrupt. If device driver in guest system
> >> enables some MSIs first and interrupts may come at any time, and
> >> then it tries to enable other MSIs, is it possible for the above
> >> code to cause interrupts missing?  
> >
> > Interrupt reconfiguration is generally during driver setup or
> > teardown when the device is considered idle or lost interrupts are
> > not a concern.  If you have a device which manipulates interrupt
> > configuration runtime, you can expect that lost interrupts won't be
> > the only problem,  
>
> On physical machine, if the driver enables vector 0 first, and then
> vector 1 later, will the vector 0 interrupt be lost for some
> possibility? In my opinion the manipulation of vector 1 should not
> affect the interrupt of vector 0, isn't it? But if this is possible
> in virtual world, what do we consider it as? A bug to be fixed in
> future, or a known limitation of virtualization that won't be fixed
> and all driver developers should pay attention to it?

Are you experiencing an actual problem with lost interrupts or is this
a theoretical question?  There is a difference in programming of MSI-X
for an assigned device because we cannot know the intentions of the
guest with respect to how many vectors they intend to use.  We can only
know as individual vectors are unmasked that a vector is now used.
Allocating the maximum compliment of interrupts supported on the device
is wasteful of host resources.  Given this, combined with the fact that
drivers generally configure vectors only at device setup time, the
existing code generally works well.  Ideally drivers should not be
required to modify their behavior to support device assignment, but
certain actions, like high frequency masking/unmasking of interrupts
through the vector table impose unavoidable virtualization overhead
and should be avoided.  I believe older Linux guests (RHEL5 time frame)
had such behavior.

It's possible that current IRQ domains alleviate some of the concern
for host IRQ exhaustion and that vfio-pci could be made to allocate the
maximum supported vectors per device, removing the use of the NORESIZE
flag for the index.  Userspace would need to continue to support both
modes of operation and the kernel would need to continue to be
compatible with that.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] Questions about VFIO enabling MSIX vector
  2019-01-15 16:18     ` Alex Williamson
@ 2019-01-17 12:55       ` Heyi Guo
  2019-01-17 15:21         ` Alex Williamson
  0 siblings, 1 reply; 8+ messages in thread
From: Heyi Guo @ 2019-01-17 12:55 UTC (permalink / raw)
  To: Alex Williamson; +Cc: qemu-devel@nongnu.org, Peter Maydell, Wanghaibin (D)



On 2019/1/16 0:18, Alex Williamson wrote:
> On Tue, 15 Jan 2019 11:21:51 +0800
> Heyi Guo <guoheyi@huawei.com> wrote:
>
>> Hi Alex,
>>
>> Really appreciate your comments. I have some more questions below.
>>
>>
>> On 2019/1/15 0:07, Alex Williamson wrote:
>>> On Sat, 12 Jan 2019 10:30:40 +0800
>>> Heyi Guo <guoheyi@huawei.com> wrote:
>>>   
>>>> Hi folks,
>>>>
>>>> I have some questions about vfio_msix_vector_do_use() in
>>>> hw/vfio/pci.c, could you help to explain?
>>>>
>>>> We can see that when guest tries to enable one specific MSIX vector
>>>> by unmasking MSIX Vector Control, the access will be trapped and then
>>>> into function vfio_msix_vector_do_use(). And we may go to the below
>>>> branch in line 525:
>>>>
>>>>
>>>> 520 /*
>>>> 521  * We don't want to have the host allocate all possible MSI vectors
>>>> 522  * for a device if they're not in use, so we shutdown and incrementally
>>>> 523  * increase them as needed.
>>>> 524  */
>>>> 525  if (vdev->nr_vectors < nr + 1) {
>>>> 526      vfio_disable_irqindex(&vdev->vbasedev, VFIO_PCI_MSIX_IRQ_INDEX);
>>>> 527      vdev->nr_vectors = nr + 1;
>>>> 528      ret = vfio_enable_vectors(vdev, true);
>>>> 529      if (ret) {
>>>> 530          error_report("vfio: failed to enable vectors, %d", ret);
>>>> 531      }
>>>>
>>>> Here all MSIX vectors will be disabled first and then enabled, with
>>>> one more MSIX. The comment is there but I still don't quite
>>>> understand. It makes sense for not allocating all possible MSI
>>>> vectors, but why shall we shutdown the whole MSI when being requested
>>>> to enable one specific vector? Can't we just enable the user
>>>> specified vector indexed by "nr"?
>>> What internal kernel API would vfio-pci make use of to achieve that?
>>> We can't know the intentions of the guest and we have a limited set of
>>> tools to manage the MSI-X state in the host kernel.  It's generally the
>>> case that MSI-X vectors are only manipulated during driver setup and
>>> shutdown, so while the current behavior isn't necessarily efficient, it
>>> works within the constraints and generally doesn't have a runtime impact
>>> on performance.  We also recognize that this behavior may change in the
>>> future depending on more dynamic internal host kernel interfaces, thus
>>> we export the VFIO_IRQ_INFO_NORESIZE flag to indicate to userspace
>>> whether this procedure is required.
>> I tried to enable the specified MSIX vector only, and finally
>> understood the original QEMU code after always getting EINVAL from
>> ioctl->VFIO_DEVICE_SET_IRQS. Then I tried to allocate all MSIX
>> vectors in vfio_msix_enable(), not only MSIX vector 0. The change
>> works, but it definitely doesn't follow the comment in the code "We
>> don't want to have the host allocate all possible MSI vectors for a
>> device if they're not in use". May I ask what the side effect is if
>> we really allocate all possible vectors at the beginning?
> Host IRQ vector exhaustion.  Very few devices seem to make use of the
> their full compliment of available vectors and some devices support a
> very, very large number of vectors.  The current approach only consumes
> the host resources that the guest is actually making use of.
>
>>>> What's more, on ARM64 systems with GIC ITS, the kernel will issue
>>>> an ITS discard command when disabling a MSI vector, which will drop
>>>> currently pending MSI interrupt. If device driver in guest system
>>>> enables some MSIs first and interrupts may come at any time, and
>>>> then it tries to enable other MSIs, is it possible for the above
>>>> code to cause interrupts missing?
>>> Interrupt reconfiguration is generally during driver setup or
>>> teardown when the device is considered idle or lost interrupts are
>>> not a concern.  If you have a device which manipulates interrupt
>>> configuration runtime, you can expect that lost interrupts won't be
>>> the only problem,
>> On physical machine, if the driver enables vector 0 first, and then
>> vector 1 later, will the vector 0 interrupt be lost for some
>> possibility? In my opinion the manipulation of vector 1 should not
>> affect the interrupt of vector 0, isn't it? But if this is possible
>> in virtual world, what do we consider it as? A bug to be fixed in
>> future, or a known limitation of virtualization that won't be fixed
>> and all driver developers should pay attention to it?
> Are you experiencing an actual problem with lost interrupts or is this
> a theoretical question?  There is a difference in programming of MSI-X
> for an assigned device because we cannot know the intentions of the
> guest with respect to how many vectors they intend to use.  We can only
> know as individual vectors are unmasked that a vector is now used.
> Allocating the maximum compliment of interrupts supported on the device
> is wasteful of host resources.  Given this, combined with the fact that
> drivers generally configure vectors only at device setup time, the
> existing code generally works well.  Ideally drivers should not be
> required to modify their behavior to support device assignment, but
> certain actions, like high frequency masking/unmasking of interrupts
> through the vector table impose unavoidable virtualization overhead
> and should be avoided.  I believe older Linux guests (RHEL5 time frame)
> had such behavior.
Yes we are experiencing occasional interrupts missing for Huawei Hi1822 NIC VF used on Guest. Once I suspected the workflow in qemu could cause this issue, but after more investigation these two days, we found it should only be the trigger of the issue. The root cause may be on kernel or ARM GICR in the chip, for kernel always maps vPE to the first CPU (normally CPU 0), and then moves the map to the scheduled CPU. But there is a time window between these two actions, and interrupts may get lost when triggered in this short time window. However we have not come to a final conclusion yet.

>
> It's possible that current IRQ domains alleviate some of the concern
> for host IRQ exhaustion and that vfio-pci could be made to allocate the
> maximum supported vectors per device, removing the use of the NORESIZE
> flag for the index.  Userspace would need to continue to support both
> modes of operation and the kernel would need to continue to be
> compatible with that.  Thanks,
How about we set a threshold (maybe 16, 32) that only allocating vectors of number = min (maximum supported vectors, threshold), and go thru the original work flow if the specified vector is larger than the initially allocated?

Thanks,

Heyi

>
> Alex
>
> .
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] Questions about VFIO enabling MSIX vector
  2019-01-17 12:55       ` Heyi Guo
@ 2019-01-17 15:21         ` Alex Williamson
  2019-01-18  0:26           ` Heyi Guo
  0 siblings, 1 reply; 8+ messages in thread
From: Alex Williamson @ 2019-01-17 15:21 UTC (permalink / raw)
  To: Heyi Guo; +Cc: qemu-devel@nongnu.org, Peter Maydell, Wanghaibin (D)

On Thu, 17 Jan 2019 20:55:05 +0800
Heyi Guo <guoheyi@huawei.com> wrote:

> On 2019/1/16 0:18, Alex Williamson wrote:
> > On Tue, 15 Jan 2019 11:21:51 +0800
> > Heyi Guo <guoheyi@huawei.com> wrote:
> >  
> >> Hi Alex,
> >>
> >> Really appreciate your comments. I have some more questions below.
> >>
> >>
> >> On 2019/1/15 0:07, Alex Williamson wrote:  
> >>> On Sat, 12 Jan 2019 10:30:40 +0800
> >>> Heyi Guo <guoheyi@huawei.com> wrote:
> >>>     
> >>>> Hi folks,
> >>>>
> >>>> I have some questions about vfio_msix_vector_do_use() in
> >>>> hw/vfio/pci.c, could you help to explain?
> >>>>
> >>>> We can see that when guest tries to enable one specific MSIX vector
> >>>> by unmasking MSIX Vector Control, the access will be trapped and then
> >>>> into function vfio_msix_vector_do_use(). And we may go to the below
> >>>> branch in line 525:
> >>>>
> >>>>
> >>>> 520 /*
> >>>> 521  * We don't want to have the host allocate all possible MSI vectors
> >>>> 522  * for a device if they're not in use, so we shutdown and incrementally
> >>>> 523  * increase them as needed.
> >>>> 524  */
> >>>> 525  if (vdev->nr_vectors < nr + 1) {
> >>>> 526      vfio_disable_irqindex(&vdev->vbasedev, VFIO_PCI_MSIX_IRQ_INDEX);
> >>>> 527      vdev->nr_vectors = nr + 1;
> >>>> 528      ret = vfio_enable_vectors(vdev, true);
> >>>> 529      if (ret) {
> >>>> 530          error_report("vfio: failed to enable vectors, %d", ret);
> >>>> 531      }
> >>>>
> >>>> Here all MSIX vectors will be disabled first and then enabled, with
> >>>> one more MSIX. The comment is there but I still don't quite
> >>>> understand. It makes sense for not allocating all possible MSI
> >>>> vectors, but why shall we shutdown the whole MSI when being requested
> >>>> to enable one specific vector? Can't we just enable the user
> >>>> specified vector indexed by "nr"?  
> >>> What internal kernel API would vfio-pci make use of to achieve that?
> >>> We can't know the intentions of the guest and we have a limited set of
> >>> tools to manage the MSI-X state in the host kernel.  It's generally the
> >>> case that MSI-X vectors are only manipulated during driver setup and
> >>> shutdown, so while the current behavior isn't necessarily efficient, it
> >>> works within the constraints and generally doesn't have a runtime impact
> >>> on performance.  We also recognize that this behavior may change in the
> >>> future depending on more dynamic internal host kernel interfaces, thus
> >>> we export the VFIO_IRQ_INFO_NORESIZE flag to indicate to userspace
> >>> whether this procedure is required.  
> >> I tried to enable the specified MSIX vector only, and finally
> >> understood the original QEMU code after always getting EINVAL from
> >> ioctl->VFIO_DEVICE_SET_IRQS. Then I tried to allocate all MSIX
> >> vectors in vfio_msix_enable(), not only MSIX vector 0. The change
> >> works, but it definitely doesn't follow the comment in the code "We
> >> don't want to have the host allocate all possible MSI vectors for a
> >> device if they're not in use". May I ask what the side effect is if
> >> we really allocate all possible vectors at the beginning?  
> > Host IRQ vector exhaustion.  Very few devices seem to make use of the
> > their full compliment of available vectors and some devices support a
> > very, very large number of vectors.  The current approach only consumes
> > the host resources that the guest is actually making use of.
> >  
> >>>> What's more, on ARM64 systems with GIC ITS, the kernel will issue
> >>>> an ITS discard command when disabling a MSI vector, which will drop
> >>>> currently pending MSI interrupt. If device driver in guest system
> >>>> enables some MSIs first and interrupts may come at any time, and
> >>>> then it tries to enable other MSIs, is it possible for the above
> >>>> code to cause interrupts missing?  
> >>> Interrupt reconfiguration is generally during driver setup or
> >>> teardown when the device is considered idle or lost interrupts are
> >>> not a concern.  If you have a device which manipulates interrupt
> >>> configuration runtime, you can expect that lost interrupts won't be
> >>> the only problem,  
> >> On physical machine, if the driver enables vector 0 first, and then
> >> vector 1 later, will the vector 0 interrupt be lost for some
> >> possibility? In my opinion the manipulation of vector 1 should not
> >> affect the interrupt of vector 0, isn't it? But if this is possible
> >> in virtual world, what do we consider it as? A bug to be fixed in
> >> future, or a known limitation of virtualization that won't be fixed
> >> and all driver developers should pay attention to it?  
> > Are you experiencing an actual problem with lost interrupts or is this
> > a theoretical question?  There is a difference in programming of MSI-X
> > for an assigned device because we cannot know the intentions of the
> > guest with respect to how many vectors they intend to use.  We can only
> > know as individual vectors are unmasked that a vector is now used.
> > Allocating the maximum compliment of interrupts supported on the device
> > is wasteful of host resources.  Given this, combined with the fact that
> > drivers generally configure vectors only at device setup time, the
> > existing code generally works well.  Ideally drivers should not be
> > required to modify their behavior to support device assignment, but
> > certain actions, like high frequency masking/unmasking of interrupts
> > through the vector table impose unavoidable virtualization overhead
> > and should be avoided.  I believe older Linux guests (RHEL5 time frame)
> > had such behavior.  
>
> Yes we are experiencing occasional interrupts missing for Huawei
> Hi1822 NIC VF used on Guest. Once I suspected the workflow in qemu
> could cause this issue, but after more investigation these two days,
> we found it should only be the trigger of the issue. The root cause
> may be on kernel or ARM GICR in the chip, for kernel always maps vPE
> to the first CPU (normally CPU 0), and then moves the map to the
> scheduled CPU. But there is a time window between these two actions,
> and interrupts may get lost when triggered in this short time window.
> However we have not come to a final conclusion yet.
> 
> >
> > It's possible that current IRQ domains alleviate some of the concern
> > for host IRQ exhaustion and that vfio-pci could be made to allocate
> > the maximum supported vectors per device, removing the use of the
> > NORESIZE flag for the index.  Userspace would need to continue to
> > support both modes of operation and the kernel would need to
> > continue to be compatible with that.  Thanks,  
>
> How about we set a threshold (maybe 16, 32) that only allocating
> vectors of number = min (maximum supported vectors, threshold), and
> go thru the original work flow if the specified vector is larger than
> the initially allocated?

I would not support such a solution, it's not only still wasteful of
interrupt vectors but it introduces an inflection point in the behavior
which means that "high-end" configurations exercise code paths that
"low-end" configurations do not.  Besides, it seems you haven't yet
even proven that this is an issue.  If we wanted to remove the NORESIZE
flag for this vector index, I think the host would need to allocate all
vectors supported by the MSI-X capability in advance, which means we
need to verify that vector exhaustion is no longer a concern on the
host.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] Questions about VFIO enabling MSIX vector
  2019-01-17 15:21         ` Alex Williamson
@ 2019-01-18  0:26           ` Heyi Guo
  0 siblings, 0 replies; 8+ messages in thread
From: Heyi Guo @ 2019-01-18  0:26 UTC (permalink / raw)
  To: Alex Williamson; +Cc: qemu-devel@nongnu.org, Peter Maydell, Wanghaibin (D)



On 2019/1/17 23:21, Alex Williamson wrote:
> On Thu, 17 Jan 2019 20:55:05 +0800
> Heyi Guo <guoheyi@huawei.com> wrote:
>
>> On 2019/1/16 0:18, Alex Williamson wrote:
>>> On Tue, 15 Jan 2019 11:21:51 +0800
>>> Heyi Guo <guoheyi@huawei.com> wrote:
>>>   
>>>> Hi Alex,
>>>>
>>>> Really appreciate your comments. I have some more questions below.
>>>>
>>>>
>>>> On 2019/1/15 0:07, Alex Williamson wrote:
>>>>> On Sat, 12 Jan 2019 10:30:40 +0800
>>>>> Heyi Guo <guoheyi@huawei.com> wrote:
>>>>>      
>>>>>> Hi folks,
>>>>>>
>>>>>> I have some questions about vfio_msix_vector_do_use() in
>>>>>> hw/vfio/pci.c, could you help to explain?
>>>>>>
>>>>>> We can see that when guest tries to enable one specific MSIX vector
>>>>>> by unmasking MSIX Vector Control, the access will be trapped and then
>>>>>> into function vfio_msix_vector_do_use(). And we may go to the below
>>>>>> branch in line 525:
>>>>>>
>>>>>>
>>>>>> 520 /*
>>>>>> 521  * We don't want to have the host allocate all possible MSI vectors
>>>>>> 522  * for a device if they're not in use, so we shutdown and incrementally
>>>>>> 523  * increase them as needed.
>>>>>> 524  */
>>>>>> 525  if (vdev->nr_vectors < nr + 1) {
>>>>>> 526      vfio_disable_irqindex(&vdev->vbasedev, VFIO_PCI_MSIX_IRQ_INDEX);
>>>>>> 527      vdev->nr_vectors = nr + 1;
>>>>>> 528      ret = vfio_enable_vectors(vdev, true);
>>>>>> 529      if (ret) {
>>>>>> 530          error_report("vfio: failed to enable vectors, %d", ret);
>>>>>> 531      }
>>>>>>
>>>>>> Here all MSIX vectors will be disabled first and then enabled, with
>>>>>> one more MSIX. The comment is there but I still don't quite
>>>>>> understand. It makes sense for not allocating all possible MSI
>>>>>> vectors, but why shall we shutdown the whole MSI when being requested
>>>>>> to enable one specific vector? Can't we just enable the user
>>>>>> specified vector indexed by "nr"?
>>>>> What internal kernel API would vfio-pci make use of to achieve that?
>>>>> We can't know the intentions of the guest and we have a limited set of
>>>>> tools to manage the MSI-X state in the host kernel.  It's generally the
>>>>> case that MSI-X vectors are only manipulated during driver setup and
>>>>> shutdown, so while the current behavior isn't necessarily efficient, it
>>>>> works within the constraints and generally doesn't have a runtime impact
>>>>> on performance.  We also recognize that this behavior may change in the
>>>>> future depending on more dynamic internal host kernel interfaces, thus
>>>>> we export the VFIO_IRQ_INFO_NORESIZE flag to indicate to userspace
>>>>> whether this procedure is required.
>>>> I tried to enable the specified MSIX vector only, and finally
>>>> understood the original QEMU code after always getting EINVAL from
>>>> ioctl->VFIO_DEVICE_SET_IRQS. Then I tried to allocate all MSIX
>>>> vectors in vfio_msix_enable(), not only MSIX vector 0. The change
>>>> works, but it definitely doesn't follow the comment in the code "We
>>>> don't want to have the host allocate all possible MSI vectors for a
>>>> device if they're not in use". May I ask what the side effect is if
>>>> we really allocate all possible vectors at the beginning?
>>> Host IRQ vector exhaustion.  Very few devices seem to make use of the
>>> their full compliment of available vectors and some devices support a
>>> very, very large number of vectors.  The current approach only consumes
>>> the host resources that the guest is actually making use of.
>>>   
>>>>>> What's more, on ARM64 systems with GIC ITS, the kernel will issue
>>>>>> an ITS discard command when disabling a MSI vector, which will drop
>>>>>> currently pending MSI interrupt. If device driver in guest system
>>>>>> enables some MSIs first and interrupts may come at any time, and
>>>>>> then it tries to enable other MSIs, is it possible for the above
>>>>>> code to cause interrupts missing?
>>>>> Interrupt reconfiguration is generally during driver setup or
>>>>> teardown when the device is considered idle or lost interrupts are
>>>>> not a concern.  If you have a device which manipulates interrupt
>>>>> configuration runtime, you can expect that lost interrupts won't be
>>>>> the only problem,
>>>> On physical machine, if the driver enables vector 0 first, and then
>>>> vector 1 later, will the vector 0 interrupt be lost for some
>>>> possibility? In my opinion the manipulation of vector 1 should not
>>>> affect the interrupt of vector 0, isn't it? But if this is possible
>>>> in virtual world, what do we consider it as? A bug to be fixed in
>>>> future, or a known limitation of virtualization that won't be fixed
>>>> and all driver developers should pay attention to it?
>>> Are you experiencing an actual problem with lost interrupts or is this
>>> a theoretical question?  There is a difference in programming of MSI-X
>>> for an assigned device because we cannot know the intentions of the
>>> guest with respect to how many vectors they intend to use.  We can only
>>> know as individual vectors are unmasked that a vector is now used.
>>> Allocating the maximum compliment of interrupts supported on the device
>>> is wasteful of host resources.  Given this, combined with the fact that
>>> drivers generally configure vectors only at device setup time, the
>>> existing code generally works well.  Ideally drivers should not be
>>> required to modify their behavior to support device assignment, but
>>> certain actions, like high frequency masking/unmasking of interrupts
>>> through the vector table impose unavoidable virtualization overhead
>>> and should be avoided.  I believe older Linux guests (RHEL5 time frame)
>>> had such behavior.
>> Yes we are experiencing occasional interrupts missing for Huawei
>> Hi1822 NIC VF used on Guest. Once I suspected the workflow in qemu
>> could cause this issue, but after more investigation these two days,
>> we found it should only be the trigger of the issue. The root cause
>> may be on kernel or ARM GICR in the chip, for kernel always maps vPE
>> to the first CPU (normally CPU 0), and then moves the map to the
>> scheduled CPU. But there is a time window between these two actions,
>> and interrupts may get lost when triggered in this short time window.
>> However we have not come to a final conclusion yet.
>>
>>> It's possible that current IRQ domains alleviate some of the concern
>>> for host IRQ exhaustion and that vfio-pci could be made to allocate
>>> the maximum supported vectors per device, removing the use of the
>>> NORESIZE flag for the index.  Userspace would need to continue to
>>> support both modes of operation and the kernel would need to
>>> continue to be compatible with that.  Thanks,
>> How about we set a threshold (maybe 16, 32) that only allocating
>> vectors of number = min (maximum supported vectors, threshold), and
>> go thru the original work flow if the specified vector is larger than
>> the initially allocated?
> I would not support such a solution, it's not only still wasteful of
> interrupt vectors but it introduces an inflection point in the behavior
> which means that "high-end" configurations exercise code paths that
> "low-end" configurations do not.  Besides, it seems you haven't yet
> even proven that this is an issue.  If we wanted to remove the NORESIZE
> flag for this vector index, I think the host would need to allocate all
> vectors supported by the MSI-X capability in advance, which means we
> need to verify that vector exhaustion is no longer a concern on the
> host.  Thanks,
Sure, we can leave it as it is now.

Thanks,
Heyi


>
> Alex
>
> .
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-01-18  0:26 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-01-12  2:30 [Qemu-devel] Questions about VFIO enabling MSIX vector Heyi Guo
2019-01-14 11:26 ` Auger Eric
2019-01-14 16:07 ` Alex Williamson
2019-01-15  3:21   ` Heyi Guo
2019-01-15 16:18     ` Alex Williamson
2019-01-17 12:55       ` Heyi Guo
2019-01-17 15:21         ` Alex Williamson
2019-01-18  0:26           ` Heyi Guo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).