Re: [PATCH 2/2] kvm: Add ioctl for gathering debug counters

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

From: <milanpa@amazon.com>
To: Alexander Graf <graf@amazon.de>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Milan Pandurov <milanpa@amazon.de>, <kvm@vger.kernel.org>
Cc: <rkrcmar@redhat.com>, <borntraeger@de.ibm.com>
Subject: Re: [PATCH 2/2] kvm: Add ioctl for gathering debug counters
Date: Mon, 20 Jan 2020 19:57:30 +0100	[thread overview]
Message-ID: <30358a22-084c-6b0b-ae67-acfb7e69ba8e@amazon.com> (raw)
In-Reply-To: <e77a2477-6010-ae1d-0afd-0c5498ba2117@amazon.de>

On 1/20/20 6:53 PM, Alexander Graf wrote:
>
>
> On 18.01.20 00:38, Paolo Bonzini wrote:
>> On 15/01/20 15:59, Alexander Graf wrote:
>>> On 15.01.20 15:43, milanpa@amazon.com wrote:
>>>>>> Let's expose new interface to userspace for garhering these
>>>>>> statistics with one ioctl.
>>>>>>
>>>>>> Userspace application can read counter description once using
>>>>>> KVM_GET_SUPPORTED_DEBUGFS_STAT and periodically invoke the
>>>>>> KVM_GET_DEBUGFS_VALUES to get value update.
>>>>>
>>>>> This is an interface that requires a lot of logic and buffers from
>>>>> user space to retrieve individual, explicit counters. What if I just
>>>>> wanted to monitor the number of exits on every user space exit?
>>>>
>>>> In case we want to cover such latency sensitive use cases solution b),
>>>> with mmap'ed structs you suggested, would be a way to go, IMO.
>>>>
>>>>> Also, we're suddenly making the debugfs names a full ABI, because
>>>>> through this interface we only identify the individual stats through
>>>>> their names. That means we can not remove stats or change their
>>>>> names, because people may rely on them, no? Thining about this again,
>>>>> maybe they already are an ABI because people rely on them in debugfs
>>>>> though?
>>
>> In theory not, in practice I have treated them as a kind of "soft" ABI:
>> if the meaning changes you should rename them, and removing them is
>> fine, but you shouldn't for example change the unit of measure (which is
>> not hard since they are all counters :) but perhaps you could have
>> nanoseconds vs TSC cycles in the future for some counters).
>>
>>>>> I see two alternatives to this approach here:
>>>>>
>>>>> a) ONE_REG
>>>>>
>>>>> We can just add a new DEBUG arch in ONE_REG and expose ONE_REG per VM
>>>>> as well (if we really have to?). That gives us explicit identifiers
>>>>> for each stat with an explicit path to introduce new ones with very
>>>>> unique identifiers.
>> ONE_REG would force us to define constants for each counter, and would
>> make it hard to retire them.  I don't like this.
>
> Why does it make it hard to retire them? We would just return -EINVAL 
> on retrieval, like we do for any other non-supported ONE_REG.
>
> It's the same as a file not existing in debugfs/statfs. Or an entry in 
> the array of this patch to disappear.
>
>>
>>>>> b) part of the mmap'ed vcpu struct
>>
>> Same here.  Even if we say the semantics of the struct would be exposed
>> to userspace via KVM_GET_SUPPORTED_DEBUGFS_STAT, someone might end up
>> getting this wrong and expecting a particular layout.  Milan's proposed
>> API has the big advantage of being hard to get wrong for userspace.  And
>> pushing the aggregation to userspace is not a huge chore, but it's still
>> a chore.
>>
>> So unless someone has a usecase for latency-sensitive monitoring I'd
>> prefer to keep it simple (usually these kind of stats can even make
>> sense if you gather them over relatively large period of time, because
>> then you'll probably use something else like tracepoints to actually
>> pinpoint what's going on).
>
> I tend to agree. Fetching them via an explicit call into the kernel is 
> definitely the safer route.
>
>>
>>>>> 2) vcpu counters
>>>>>
>>>>> Most of the counters count on vcpu granularity, but debugfs only
>>>>> gives us a full VM view. Whatever we do to improve the situation, we
>>>>> should definitely try to build something that allows us to get the
>>>>> counters per vcpu (as well).
>>>>>
>>>>> The main purpose of these counters is monitoring. It can be quite
>>>>> important to know that only a single vCPU is going wild, compared to
>>>>> all of them for example.
>>>>
>>>> I agree, exposing per vcpu counters can be useful. I guess it didn't
>>>> make much sense exposing them through debugfs so aggregation was done
>>>> in kernel. However if we chose to go with approach 1-b) mmap counters
>>>> struct in userspace, we could do this.
>>>
>>> The reason I dislike the debugfs/statfs approach is that it generates a
>>> completely separate permission and access paths to the stats. That's
>>> great for full system monitoring, but really bad when you have multiple
>>> individual tenants on a single host.
>>
>> I agree, anything in sysfs is complementary to vmfd/vcpufd access.
>
> Cool :).
>
> So we only need to agree on ONE_REG vs. this patch mostly?

What about extending KVM_GET_SUPPORTED_DEBUGFS_STAT with some additional 
information like the data type and unit? ONE_REG exposes additional 
semantics about data.

>
> Alex





Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879

next prev parent reply	other threads:[~2020-01-20 18:57 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-15 13:43 [PATCH 2/2] kvm: Add ioctl for gathering debug counters Milan Pandurov
2020-01-15 14:04 ` Alexander Graf
2020-01-15 14:43   ` milanpa
2020-01-15 14:59     ` Alexander Graf
2020-01-17 23:38       ` Paolo Bonzini
2020-01-20 17:53         ` Alexander Graf
2020-01-20 18:57           ` milanpa [this message]
2020-01-21 15:38             ` Alexander Graf
2020-01-23 12:08               ` Paolo Bonzini
2020-01-23 12:32                 ` Alexander Graf
2020-01-23 14:19                   ` Paolo Bonzini
2020-01-23 14:45                     ` Alexander Graf
2020-01-23 14:50                       ` Paolo Bonzini
2020-01-23 14:58                         ` Alexander Graf
2020-01-23 15:05                           ` Paolo Bonzini
2020-01-23 15:27                             ` milanpa
2020-01-23 16:15                               ` Paolo Bonzini
2020-01-23 18:31                                 ` milanpa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=30358a22-084c-6b0b-ae67-acfb7e69ba8e@amazon.com \
    --to=milanpa@amazon.com \
    --cc=borntraeger@de.ibm.com \
    --cc=graf@amazon.de \
    --cc=kvm@vger.kernel.org \
    --cc=milanpa@amazon.de \
    --cc=pbonzini@redhat.com \
    --cc=rkrcmar@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox