From: Purcareata Bogdan <b43198@freescale.com>
To: Scott Wood <scottwood@freescale.com>
Cc: Laurentiu Tudor <b10716@freescale.com>,
linux-rt-users@vger.kernel.org,
Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
Alexander Graf <agraf@suse.de>,
linux-kernel@vger.kernel.org,
Bogdan Purcareata <bogdan.purcareata@freescale.com>,
mihai.caraman@freescale.com, Paolo Bonzini <pbonzini@redhat.com>,
Thomas Gleixner <tglx@linutronix.de>,
linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH 0/2] powerpc/kvm: Enable running guests on RT Linux
Date: Mon, 27 Apr 2015 09:45:09 +0300 [thread overview]
Message-ID: <553DDAF5.6030005@freescale.com> (raw)
In-Reply-To: <1429824418.16357.26.camel@freescale.com>
On 24.04.2015 00:26, Scott Wood wrote:
> On Thu, 2015-04-23 at 15:31 +0300, Purcareata Bogdan wrote:
>> On 23.04.2015 03:30, Scott Wood wrote:
>>> On Wed, 2015-04-22 at 15:06 +0300, Purcareata Bogdan wrote:
>>>> On 21.04.2015 03:52, Scott Wood wrote:
>>>>> On Mon, 2015-04-20 at 13:53 +0300, Purcareata Bogdan wrote:
>>>>>> There was a weird situation for .kvmppc_mpic_set_epr - its corresponding inner
>>>>>> function is kvmppc_set_epr, which is a static inline. Removing the static inline
>>>>>> yields a compiler crash (Segmentation fault (core dumped) -
>>>>>> scripts/Makefile.build:441: recipe for target 'arch/powerpc/kvm/kvm.o' failed),
>>>>>> but that's a different story, so I just let it be for now. Point is the time may
>>>>>> include other work after the lock has been released, but before the function
>>>>>> actually returned. I noticed this was the case for .kvm_set_msi, which could
>>>>>> work up to 90 ms, not actually under the lock. This made me change what I'm
>>>>>> looking at.
>>>>>
>>>>> kvm_set_msi does pretty much nothing outside the lock -- I suspect
>>>>> you're measuring an interrupt that happened as soon as the lock was
>>>>> released.
>>>>
>>>> That's exactly right. I've seen things like a timer interrupt occuring right
>>>> after the spinlock_irqrestore, but before kvm_set_msi actually returned.
>>>>
>>>> [...]
>>>>
>>>>>> Or perhaps a different stress scenario involving a lot of VCPUs
>>>>>> and external interrupts?
>>>>>
>>>>> You could instrument the MPIC code to find out how many loop iterations
>>>>> you maxed out on, and compare that to the theoretical maximum.
>>>>
>>>> Numbers are pretty low, and I'll try to explain based on my observations.
>>>>
>>>> The problematic section in openpic_update_irq is this [1], since it loops
>>>> through all VCPUs, and IRQ_local_pipe further calls IRQ_check, which loops
>>>> through all pending interrupts for a VCPU [2].
>>>>
>>>> The guest interfaces are virtio-vhostnet, which are based on MSI
>>>> (/proc/interrupts in guest shows they are MSI). For external interrupts to the
>>>> guest, the irq_source destmask is currently 0, and last_cpu is 0 (unitialized),
>>>> so [1] will go on and deliver the interrupt directly and unicast (no VCPUs loop).
>>>>
>>>> I activated the pr_debugs in arch/powerpc/kvm/mpic.c, to see how many interrupts
>>>> are actually pending for the destination VCPU. At most, there were 3 interrupts
>>>> - n_IRQ = {224,225,226} - even for 24 flows of ping flood. I understand that
>>>> guest virtio interrupts are cascaded over 1 or a couple of shared MSI interrupts.
>>>>
>>>> So worst case, in this scenario, was checking the priorities for 3 pending
>>>> interrupts for 1 VCPU. Something like this (some of my prints included):
>>>>
>>>> [61010.582033] openpic_update_irq: destmask 1 last_cpu 0
>>>> [61010.582034] openpic_update_irq: Only one CPU is allowed to receive this IRQ
>>>> [61010.582036] IRQ_local_pipe: IRQ 224 active 0 was 1
>>>> [61010.582037] IRQ_check: irq 226 set ivpr_pr=8 pr=-1
>>>> [61010.582038] IRQ_check: irq 225 set ivpr_pr=8 pr=-1
>>>> [61010.582039] IRQ_check: irq 224 set ivpr_pr=8 pr=-1
>>>>
>>>> It would be really helpful to get your comments regarding whether these are
>>>> realistical number for everyday use, or they are relevant only to this
>>>> particular scenario.
>>>
>>> RT isn't about "realistic numbers for everyday use". It's about worst
>>> cases.
>>>
>>>> - Can these interrupts be used in directed delivery, so that the destination
>>>> mask can include multiple VCPUs?
>>>
>>> The Freescale MPIC does not support multiple destinations for most
>>> interrupts, but the (non-FSL-specific) emulation code appears to allow
>>> it.
>>>
>>>> The MPIC manual states that timer and IPI
>>>> interrupts are supported for directed delivery, altough I'm not sure how much of
>>>> this is used in the emulation. I know that kvmppc uses the decrementer outside
>>>> of the MPIC.
>>>>
>>>> - How are virtio interrupts cascaded over the shared MSI interrupts?
>>>> /proc/device-tree/soc@e0000000/msi@41600/interrupts in the guest shows 8 values
>>>> - 224 - 231 - so at most there might be 8 pending interrupts in IRQ_check, is
>>>> that correct?
>>>
>>> It looks like that's currently the case, but actual hardware supports
>>> more than that, so it's possible (albeit unlikely any time soon) that
>>> the emulation eventually does as well.
>>>
>>> But it's possible to have interrupts other than MSIs...
>>
>> Right.
>>
>> So given that the raw spinlock conversion is not suitable for all the scenarios
>> supported by the OpenPIC emulation, is it ok that my next step would be to send
>> a patch containing both the raw spinlock conversion and a mandatory disable of
>> the in-kernel MPIC? This is actually the last conclusion we came up with some
>> time ago, but I guess it was good to get some more insight on how things
>> actually work (at least for me).
>
> Fine with me. Have you given any thought to ways to restructure the
> code to eliminate the problem?
My first thought would be to create a separate lock for each VCPU pending
interrupts queue, so that we make the whole openpic_irq_update more granular.
However, this is just a very preliminary thought. Before I can come up with
anything worthy of consideration, I must read the OpenPIC specification and the
current KVM emulated OpenPIC implementation thoroughly. I currently have other
things on my hands, and will come back to this once I have some time.
Meanwhile, I've sent a v2 on the PPC and RT mailing lists for this raw_spinlock
conversion, alongside disabling the in-kernel MPIC emulation for PREEMPT_RT. I
would be grateful to hear your feedback on that, so that it can get applied.
Thank you,
Bogdan P.
prev parent reply other threads:[~2015-04-27 6:45 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-02-18 9:32 [PATCH 0/2] powerpc/kvm: Enable running guests on RT Linux Bogdan Purcareata
2015-02-18 9:32 ` [PATCH 1/2] powerpc/kvm: Convert openpic lock to raw_spinlock Bogdan Purcareata
2015-02-23 22:43 ` Scott Wood
2015-02-18 9:32 ` [PATCH 2/2] powerpc/kvm: Limit MAX_VCPUS for guests running on RT Linux Bogdan Purcareata
2015-02-18 9:36 ` Sebastian Andrzej Siewior
2015-02-20 13:45 ` Alexander Graf
2015-02-23 22:48 ` Scott Wood
2015-02-20 13:45 ` [PATCH 0/2] powerpc/kvm: Enable running guests " Alexander Graf
2015-02-20 14:12 ` Paolo Bonzini
2015-02-20 14:16 ` Alexander Graf
2015-02-20 14:54 ` Sebastian Andrzej Siewior
2015-02-20 14:57 ` Paolo Bonzini
2015-02-20 15:06 ` Sebastian Andrzej Siewior
2015-02-20 15:10 ` Paolo Bonzini
2015-02-20 15:17 ` Sebastian Andrzej Siewior
2015-02-23 8:12 ` Purcareata Bogdan
2015-02-23 7:50 ` Purcareata Bogdan
2015-02-23 7:29 ` Purcareata Bogdan
2015-02-23 23:27 ` Scott Wood
2015-02-25 16:36 ` Sebastian Andrzej Siewior
2015-02-26 13:02 ` Paolo Bonzini
2015-02-26 13:31 ` Sebastian Andrzej Siewior
2015-02-27 1:05 ` Scott Wood
2015-02-27 13:06 ` Paolo Bonzini
2015-03-27 17:07 ` Purcareata Bogdan
2015-04-02 23:11 ` Scott Wood
2015-04-03 8:07 ` Purcareata Bogdan
2015-04-03 21:26 ` Scott Wood
2015-04-09 7:44 ` Purcareata Bogdan
2015-04-09 23:53 ` Scott Wood
2015-04-20 10:53 ` Purcareata Bogdan
2015-04-21 0:52 ` Scott Wood
2015-04-22 12:06 ` Purcareata Bogdan
2015-04-23 0:30 ` Scott Wood
2015-04-23 12:31 ` Purcareata Bogdan
2015-04-23 21:26 ` Scott Wood
2015-04-27 6:45 ` Purcareata Bogdan [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=553DDAF5.6030005@freescale.com \
--to=b43198@freescale.com \
--cc=agraf@suse.de \
--cc=b10716@freescale.com \
--cc=bigeasy@linutronix.de \
--cc=bogdan.purcareata@freescale.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rt-users@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mihai.caraman@freescale.com \
--cc=pbonzini@redhat.com \
--cc=scottwood@freescale.com \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).