From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>,
kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
joao.m.martins@oracle.com, boris.ostrovsky@oracle.com,
mark.kanda@oracle.com, suravee.suthikulpanit@amd.com,
mlevitsk@redhat.com
Subject: Re: [RFC 0/3] Export APICv-related state via binary stats interface
Date: Tue, 16 Apr 2024 15:34:55 -0700 [thread overview]
Message-ID: <Zh79D2BdtS0jKO6W@google.com> (raw)
In-Reply-To: <CABgObfZ4kqaXLaOAOj4aGB5GAe9GxOmJmOP+7kdke6OqA35HzA@mail.gmail.com>
On Tue, Apr 16, 2024, Paolo Bonzini wrote:
> On Tue, Apr 16, 2024 at 8:08 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Thu, Feb 15, 2024, Alejandro Jimenez wrote:
> > > The goal of this RFC is to agree on a mechanism for querying the state (and
> > > related stats) of APICv/AVIC. I clearly have an AVIC bias when approaching this
> > > topic since that is the side that I have mostly looked at, and has the greater
> > > number of possible inhibits, but I believe the argument applies for both
> > > vendor's technologies.
> > >
> > > Currently, a user or monitoring app trying to determine if APICv is actually
> > > being used needs implementation-specific knowlegde in order to look for specific
> > > types of #VMEXIT (i.e. AVIC_INCOMPLETE_IPI/AVIC_NOACCEL), checking GALog events
> > > by watching /proc/interrupts for AMD-Vi*-GA, etc. There are existing tracepoints
> > > (e.g. kvm_apicv_accept_irq, kvm_avic_ga_log) that make this task easier, but
> > > tracefs is not viable in some scenarios. Adding kvm debugfs entries has similar
> > > downsides. Suravee has previously proposed a new IOCTL interface[0] to expose
> > > this information, but there has not been any development in that direction.
> > > Sean has mentioned a preference for using BPF to extract info from the current
> > > tracepoints, which would require reworking existing structs to access some
> > > desired data, but as far as I know there isn't any work done on that approach
> > > yet.
> > >
> > > Recently Joao mentioned another alternative: the binary stats framework that is
> > > already supported by kernel[1] and QEMU[2].
> >
> > The hiccup with stats are that they are ABI, e.g. we can't (easily) ditch stats
> > once they're added, and KVM needs to maintain the exact behavior.
>
> Stats are not ABI---why would they be?
Because they exposed through an ioctl(), and userspace can and will use stats for
functional purposes? Maybe I just had the wrong takeaway from an old thread about
adding a big pile of stats[1], where unfortunately (for me) you weighed in on
whether or not tracepoints are ABI, but not stats.
And because I've have been operating under the assumption that stats are ABI, I've
been guiding people to using stats to make decisions in userspace. E.g. KVM doesn't
currently expose is_guest_mode() in kvm_run, but it is a stat, so it's not hard
to imagine userspace using the stat to make decisions without needing to call back
into KVM[2].
And based on the old discussion[1] I doubt I'm though only one that has this view.
That said, I'm definitely not opposed to stats _not_ being ABI, because that would
give us a ton of flexibility. E.g. we have a non-trivial number of internal stats
that are super useful _for us_, but are rather heavy and might not be desirable
for most environments. If stats aren't considered ABI, then I'm pretty sure we
could land some of the more generally useful ones upstream, but off-by-default
and guarded by a Kconfig. E.g. we have a pile of stats related to mmu_lock that
are very helpful in identifying performance issues, but they aren't things I would
want enabled by default.
But if we do decide stats aren't ABI, then we need to document and disseminate
that information.
[1] https://lore.kernel.org/all/CALzav=cuzT=u6G0TCVZUfEgAKOCKTSCDE8x2v5qc-Gd_NL-pzg@mail.gmail.com
[2] https://lore.kernel.org/all/Zh6-e9hy7U6DD2QM@google.com
> They have an established meaning and it's not a good idea to change it, but
> it's not an absolute no-no(*); and removing them is not a problem at all.
>
> For example, if stats were ABI, there would be no need for the
> introspection mechanism, you could just use a struct like ethtool
> stats (which *are* ABO).
>
> Not everything makes a good stat but, if in doubt and it's cheap
> enough to collect it, go ahead and add it.
Marc raised the (IMO) valid concern that "if it's cheap, add it" will lead to
death by a thousand cuts. E.g. add a few hundred vCPU stats and suddenly vCPUs
consumes an extra KiB or three of memory.
A few APIC stats obviously aren't going to move the needle much, I'm just pointing
out that not everyone agrees that KVM should be hyper permissive when it comes to
adding new stats.
> Cheap collection is the important point, because tracepoints in a hot path
> can be so expensive as to slow down the guest substantially, at least in
> microbenchmarks.
>
> In this case I'm not sure _all_ inhibits makes sense and I certainly
> wouldn't want a bitmask, but a generic APICv-enabled stat certainly
> makes sense, and perhaps another for a weirdly-configured local APIC.
>
> Paolo
>
> (*) you have to draw a line somewhere. New processor models may
> virtualize parts of the CPU in such a way that some stats become
> meaningless or just stay at zero. Should KVM not support those
> features because it is not possible anymore to introspect the guest
> through stat?
next prev parent reply other threads:[~2024-04-16 22:34 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-15 16:01 [RFC 0/3] Export APICv-related state via binary stats interface Alejandro Jimenez
2024-02-15 16:01 ` [RFC 1/3] x86: KVM: stats: Add a stat to report status of APICv inhibition Alejandro Jimenez
2024-04-09 14:54 ` Suthikulpanit, Suravee
2024-04-16 18:19 ` Sean Christopherson
2024-04-16 19:53 ` Paolo Bonzini
2024-04-16 19:59 ` Alejandro Jimenez
2024-02-15 16:01 ` [RFC 2/3] x86: KVM: stats: Add stat counter for IRQs injected via APICv Alejandro Jimenez
2024-02-15 16:16 ` Dongli Zhang
2024-02-15 18:12 ` Alejandro Jimenez
2024-04-16 18:25 ` Sean Christopherson
2024-02-15 16:01 ` [RFC 3/3] x86: KVM: stats: Add a stat counter for GALog events Alejandro Jimenez
2024-04-09 6:45 ` Chao Gao
2024-04-10 1:31 ` Alejandro Jimenez
2024-04-12 10:45 ` Chao Gao
2024-04-16 18:35 ` Sean Christopherson
2024-04-24 0:50 ` Alejandro Jimenez
2024-04-09 5:09 ` [RFC 0/3] Export APICv-related state via binary stats interface Vasant Hegde
2024-04-10 1:36 ` Alejandro Jimenez
2024-04-16 18:08 ` Sean Christopherson
2024-04-16 19:51 ` Paolo Bonzini
2024-04-16 21:29 ` Alejandro Jimenez
2024-04-16 21:36 ` Paolo Bonzini
2024-04-16 22:34 ` Sean Christopherson [this message]
2024-04-17 9:48 ` Paolo Bonzini
2024-04-23 20:43 ` Alejandro Jimenez
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zh79D2BdtS0jKO6W@google.com \
--to=seanjc@google.com \
--cc=alejandro.j.jimenez@oracle.com \
--cc=boris.ostrovsky@oracle.com \
--cc=joao.m.martins@oracle.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mark.kanda@oracle.com \
--cc=mlevitsk@redhat.com \
--cc=pbonzini@redhat.com \
--cc=suravee.suthikulpanit@amd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox