qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Alexey Kardashevskiy <aik@ozlabs.ru>
To: "Cédric Le Goater" <clg@kaod.org>, qemu-ppc@nongnu.org
Cc: Frederic Barrat <fbarrat@linux.ibm.com>,
	Timothy Pearson <tpearson@raptorengineering.com>,
	Alex Williamson <alex.williamson@redhat.com>,
	qemu-devel@nongnu.org, David Gibson <david@gibson.dropbear.id.au>
Subject: Re: [PATCH qemu] spapr_pci: Disable IRQFD resampling on XIVE
Date: Thu, 28 Apr 2022 17:26:51 +1000	[thread overview]
Message-ID: <b490310d-debb-fd92-73cd-de9558157098@ozlabs.ru> (raw)
In-Reply-To: <fcb5a8c0-385b-29c6-4573-6ee09f36a9e4@kaod.org>



On 4/28/22 16:25, Cédric Le Goater wrote:
> On 4/28/22 07:32, Alexey Kardashevskiy wrote:
>>
>>
>> On 4/27/22 17:36, Cédric Le Goater wrote:
>>> Hello Alexey,
>>>
>>> On 4/27/22 06:36, Alexey Kardashevskiy wrote:
>>>> VFIO-PCI has an "KVM_IRQFD_FLAG_RESAMPLE" optimization for INTx EOI
>>>> handling when KVM can unmask PCI INTx (level triggered interrupt) 
>>>> without
>>>> switching to the userspace (==QEMU).
>>>>
>>>> Unfortunately XIVE does not support level interrupts, 
>>>
>>> That's not correctly phrased I think.
>>
>>
>> My bad, I meant "XIVE hardware".
> 
> ok. It makes more sense.
> 
> PSIHB and PHBs have internal latches to maintain the assertion level.
> XIVE has none.
> 
> 
>>
>>>
>>> The QEMU XIVE device support LSIs but the POWER9 kernel-irqchips,
>>> KVM XICS-on-XIVE and XIVE native devices, are broken with respect
>>> to passthrough adapters using INTx.
>>>
>>>
>>>> QEMU emulates them
>>>> and therefore there is no existing code path to kick the resamplefd.
>>>> The problem appears when passing through a PCI adapter with
>>>> the "pci=nomsi" kernel parameter - the adapter's interrupt interrupt
>>>> count in /proc/interrupts will stuck at "1".
>>>>
>>>> This disables resampler when the XIVE interrupt controller is 
>>>> configured.
>>>> This should not be very visible though KVM already exits to QEMU for 
>>>> INTx
>>>> and XIVE-capable boxes (POWER9 and newer) do not seem to have
>>>> performance-critical INTx-only capable devices.
>>>>
>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>> ---
>>>>
>>>>
>>>> Cédric, this is what I meant when I said that spapr_pci.c was 
>>>> unaware of
>>>> the interrupt controller type, neither xics nor xive was mentioned
>>>> in the file before.
>>>>
>>>>
>>>> ---
>>>>   hw/ppc/spapr_pci.c | 14 +++++++++++---
>>>>   1 file changed, 11 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
>>>> index 5bfd4aa9e5aa..2675052601db 100644
>>>> --- a/hw/ppc/spapr_pci.c
>>>> +++ b/hw/ppc/spapr_pci.c
>>>> @@ -729,11 +729,19 @@ static void pci_spapr_set_irq(void *opaque, 
>>>> int irq_num, int level)
>>>>   static PCIINTxRoute spapr_route_intx_pin_to_irq(void *opaque, int 
>>>> pin)
>>>>   {
>>>> +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>>>>       SpaprPhbState *sphb = SPAPR_PCI_HOST_BRIDGE(opaque);
>>>> -    PCIINTxRoute route;
>>>> +    PCIINTxRoute route = { .mode = PCI_INTX_DISABLED };
>>>> -    route.mode = PCI_INTX_ENABLED;
>>>> -    route.irq = sphb->lsi_table[pin].irq;
>>>> +    /*
>>>> +     * Disable IRQFD resampler on XIVE as it does not support LSI 
>>>> and QEMU
>>>> +     * emulates those so the KVM kernel resamplefd kick is skipped 
>>>> and EOI
>>>> +     * is not delivered to VFIO-PCI.
>>>> +     */
>>>> +    if (!spapr->xive) {
>>>
>>> This is testing the availability of the XIVE interrupt mode, but not
>>> the activate controller. See spapr_irq_init() which is called very
>>> early in the machine initialization.
>>>
>>> Is that what we want ? Is everything fine if we start the machine with
>>> ic-mode=xics ? On a POWER9 host, this would use the KVM XICS-on-XIVE
>>> device which is broken also AFAICT.
>>
>> I should probably fix that in KVM, just not quite sure yet how for the 
>> realmode handlers, or just drop those on P9 and then the fix is trivial.
>>
>>
>>> You should extend the SpaprInterruptControllerClass (for a routine) or
>>> simply SpaprIrq (for a bool) if you need to handle IRQ matters from a
>>> device model.
>>
>> It is a property of KVM rather than the interrupt controller so it 
>> probably makes more sense to just stop advertising 
>> KVM_CAP_IRQFD_RESAMPLE. Hmmm...
> 
> I would fix the realmode handlers of the the KVM XICS-on-XIVE device
> first. The problem has been there for a while.


Are they really used on POWER9? TCE ones are not.


> Then, for the XIVE native mode, I would simply handle it at the QEMU
> level with a 'resample' bool in SpaprIrq. It  would be tested in spapr
> pci when configuring the INTx routing.


But there is a dedicated CAP advertised by the KVM already which is not 
correct as we know that KVM won't resample.


> 
> 
> Thanks,
> 
> C.
> 


  reply	other threads:[~2022-04-28  7:51 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-27  4:36 [PATCH qemu] spapr_pci: Disable IRQFD resampling on XIVE Alexey Kardashevskiy
2022-04-27  7:36 ` Cédric Le Goater
2022-04-28  5:32   ` Alexey Kardashevskiy
2022-04-28  6:25     ` Cédric Le Goater
2022-04-28  7:26       ` Alexey Kardashevskiy [this message]
2022-04-28  7:31         ` Cédric Le Goater

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b490310d-debb-fd92-73cd-de9558157098@ozlabs.ru \
    --to=aik@ozlabs.ru \
    --cc=alex.williamson@redhat.com \
    --cc=clg@kaod.org \
    --cc=david@gibson.dropbear.id.au \
    --cc=fbarrat@linux.ibm.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    --cc=tpearson@raptorengineering.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).