Re: kvm device assignment and MSI-X masking

kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jan Kiszka <jan.kiszka@siemens.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: kvm <kvm@vger.kernel.org>
Subject: Re: kvm device assignment and MSI-X masking
Date: Tue, 14 Aug 2012 17:47:04 +0200	[thread overview]
Message-ID: <502A72F8.4000502@siemens.com> (raw)
In-Reply-To: <1344958466.4683.260.camel@ul30vt.home>

On 2012-08-14 17:34, Alex Williamson wrote:
> On Tue, 2012-08-14 at 17:15 +0200, Jan Kiszka wrote:
>> On 2012-08-14 16:31, Alex Williamson wrote:
>>> On Tue, 2012-08-14 at 16:10 +0200, Jan Kiszka wrote:
>>>> On 2012-08-14 16:05, Alex Williamson wrote:
>>>>> On Tue, 2012-08-14 at 15:48 +0200, Jan Kiszka wrote:
>>>>>> Hi Alex,
>>>>>>
>>>>>> you once wrote this comment in device-assignment.c, msix_mmio_write:
>>>>>>
>>>>>>     if (!msix_masked(&orig) && msix_masked(entry)) {
>>>>>>         /*
>>>>>>          * Vector masked, disable it
>>>>>>          *
>>>>>>          * XXX It's not clear if we can or should actually attempt
>>>>>>          * to mask or disable the interrupt.  KVM doesn't have
>>>>>>          * support for pending bits and kvm_assign_set_msix_entry
>>>>>>          * doesn't modify the device hardware mask.  Interrupts
>>>>>>          * while masked are simply not injected to the guest, so
>>>>>>          * are lost.  Can we get away with always injecting an
>>>>>>          * interrupt on unmask?
>>>>>>          */
>>>>>>
>>>>>> I'm wondering what made you think that we won't inject if the vector is
>>>>>> masked like this (ie. in the shadow MSI-X table). Can you recall the
>>>>>> details?
>>>>>>
>>>>>> I'm trying to refactor this code to make the KVM interface a bit more
>>>>>> encapsulating the kernel interface details, not fixing anything. Still,
>>>>>> I would also like to avoid introducing regressions.
>>>>>
>>>>> Yeah, I didn't leave a very good comment there.  I'm sure it made more
>>>>> sense to me at the time.  I think I was trying to say that not only do
>>>>> we not have a way to mask the physical hardware, but if we did, we don't
>>>>> have a way to retrieve the pending bits, so any pending interrupts while
>>>>> masked would be lost.  We might be able to deal with that by posting a
>>>>> spurious interrupt on unmask, but for now we do nothing as masking is
>>>>> usually done just to update the vector.  Thanks,
>>>>
>>>> Ok, thanks for the clarification.
>>>>
>>>> As we are at it, do you also recall if this
>>>>
>>>> --- a/hw/device-assignment.c
>>>> +++ b/hw/device-assignment.c
>>>> @@ -1573,28 +1573,7 @@ static void msix_mmio_write(void *opaque, target_phys_addr_t addr,
>>>>               */
>>>>          } else if (msix_masked(&orig) && !msix_masked(entry)) {
>>>>              /* Vector unmasked */
>>>> -            if (i >= adev->irq_entries_nr || !adev->entry[i].type) {
>>>> -                /* Previously unassigned vector, start from scratch */
>>>> -                assigned_dev_update_msix(pdev);
>>>> -                return;
>>>> -            } else {
>>>> -                /* Update an existing, previously masked vector */
>>>> -                struct kvm_irq_routing_entry orig = adev->entry[i];
>>>> -                int ret;
>>>> -
>>>> -                adev->entry[i].u.msi.address_lo = entry->addr_lo;
>>>> -                adev->entry[i].u.msi.address_hi = entry->addr_hi;
>>>> -                adev->entry[i].u.msi.data = entry->data;
>>>> -
>>>> -                ret = kvm_update_routing_entry(&orig, &adev->entry[i]);
>>>> -                if (ret) {
>>>> -                    fprintf(stderr,
>>>> -                            "Error updating irq routing entry (%d)\n", ret);
>>>> -                    return;
>>>> -                }
>>>> -
>>>> -                kvm_irqchip_commit_routes(kvm_state);
>>>> -            }
>>>> +            assigned_dev_update_msix(pdev);
>>>>          }
>>>>      }
>>>>  }
>>>>
>>>> would make a relevant difference for known workloads? I'm trying to get
>>>> rid of direct routing table manipulations, but I would also like to
>>>> avoid introducing things like kvm_irqchip_update_msi_route unless really
>>>> necessary. Or could VFIO make use of that as well?
>>>
>>> It makes me a little nervous, but I don't know that it won't work.
>>> There's a lot more latency in turning off MSI-X and completely
>>> rebuilding it than there is in updating the routing of a single vector.
>>> You can imagine that irqbalance could be triggering this path pretty
>>> regularly.  Increasing vectors beyond what was previously setup is more
>>> of an init-time event, so the latency doesn't bother me as much.  We'd
>>> probably have to send some spurious interrupts for anything we might
>>> have missed if we take the high latency path.
>>
>> Yeah, good points.
>>
>>>
>>> VFIO is already a little more abstracted, making use of the msix vector
>>> use and release interface, but we do still make use of the kvm_irqchip
>>> irqfd/virq interfaces.
>>
>> Hmm, but due to the nature of the callbacks, we always disable/reanable
>> on mask/unmask. So VFIO will be slower than current device assignment in
>> this regard.
> 
> It's a bit awkward, I'm not thrilled with those msix callbacks but they
> seem to work.  I have a similar comment in static void
> vfio_msix_vector_release that maybe we should just disable direct
> injection on mask so that qemu-msix can do the masking and fill in the
> PBA.

That will require an enhancement of the callback mechanism. So far it
does not allow to tell apart per-vector masking from general disabling.
When the latter happens, we still want to release resources, I suppose.

With such enhancement in place, we could even consider keeping the VIRQ
and MSI route active (and provide a route update service) to avoid the
tear-down/recreate overhead on fast mask/unmask cycles, e.g. for IRQ
migration.

> 
>> BTW, how do you handle the device's PBA? Pass it through to the guest?
> 
> We could but I'm trying to use qemu-msix infrastructure which handles
> the PBA.  We've been working happily w/o good PBA support for so long, I
> haven't bothered to work on a channel to get to the physical PBA yet.

I think bouncing should be OK performance-wise - until some strange
guest pops up that actually polls the PBA in high-load scenarios.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

     prev parent reply	other threads:[~2012-08-14 15:47 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-08-14 13:48 kvm device assignment and MSI-X masking Jan Kiszka
2012-08-14 14:05 ` Alex Williamson
2012-08-14 14:10   ` Jan Kiszka
2012-08-14 14:31     ` Alex Williamson
2012-08-14 15:15       ` Jan Kiszka
2012-08-14 15:34         ` Alex Williamson
2012-08-14 15:47           ` Jan Kiszka [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=502A72F8.4000502@siemens.com \
    --to=jan.kiszka@siemens.com \
    --cc=alex.williamson@redhat.com \
    --cc=kvm@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).