Re: [PATCH 0/5] Alter steal time reporting in KVM

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Michael Wolf <mjw@linux.vnet.ibm.com>
To: Glauber Costa <glommer@parallels.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>,
	linux-kernel@vger.kernel.org, riel@redhat.com,
	kvm@vger.kernel.org, peterz@infradead.org, mingo@redhat.com,
	anthony@codemonkey.ws
Subject: Re: [PATCH 0/5] Alter steal time reporting in KVM
Date: Fri, 07 Dec 2012 09:50:46 -0600	[thread overview]
Message-ID: <50C21056.5090905@linux.vnet.ibm.com> (raw)
In-Reply-To: <50BF4225.9030709@parallels.com>

On 12/05/2012 06:46 AM, Glauber Costa wrote:
> I am deeply sorry.
>
> I was busy first time I read this, so I postponed answering and ended up
> forgetting.
>
> Sorry
>>> include/linux/sched.h:
>>> unsigned long long run_delay; /* time spent waiting on a runqueue */
>>>
>>> So if you are out of the runqueue, you won't get steal time accounted,
>>> and then I truly fail to understand what you are doing.
>> So I looked at something like this in the past.  To make sure things
>> haven't changed
>> I set up a cgroup on my test server running a kernel built from the
>> latest tip tree.
>>
>> [root]# cat cpu.cfs_quota_us
>> 50000
>> [root]# cat cpu.cfs_period_us
>> 100000
>> [root]# cat cpuset.cpus
>> 1
>> [root]# cat cpuset.mems
>> 0
>>
>> Next I put the PID from the cpu thread into tasks.  When I start a
>> script that will hog the cpu I see the
>> following in top on the guest
>> Cpu(s):  1.9%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa, 48.3%hi, 0.0%si,
>> 49.8%st
>>
>> So the steal time here is in line with the bandwidth control settings.
> Ok. So I was wrong in my hunch that it would be outside the runqueue,
> therefore work automatically. Still, the host kernel has all the
> information in cgroups.
>
>> So then the steal time did not show on the guest.  You have no value
>> that needs to be passed
>> around.  What I did not like about this approach was
>> * only works for cfs bandwidth control.  If another type of hard limit
>> was added to the kernel
>>     the code would potentially need to change.
> This is true for almost everything we have in the kernel!
> It is *very* unlikely for other bandwidth control mechanism to ever
> appear. If it ever does, it's *their* burden to make sure it works for
> steal time (provided it is merged). Code in tree gets precedence.

Ok,  I will work on a patch that uses the cgroup information for 
bandwidth control
to separate out the time.

>
>> * This approach doesn't help if the limits are set by overcommitting the
>> cpus.  It is my understanding
>>     that this is a common approach.
>>
> I can't say anything about commonality, but common or not, it is a
> *crazy* approach.
>
> When you simply overcommit, you have no way to differentiate between
> intended steal time and non-intended steal time. Moreover, when you
> overcommit, your cpu usage will vary over time. If two guests use the
> cpu to their full power, you will have 50 % each. But if one of them
> slows down, the other gets more. What is your entitlement value? How do
> you define this?
>
> And then after you define it, you end up using more than this, what is
> your cpu usage? 130 %?

yes exactly you would ideally show a boosted amount of cpu.  However to 
do that
you would need to either create a new tool or modify the current 
accounting tools
such as top.

My understanding is that you are not capping in this case as much as you 
are
guaranteeing a minimum level of performance.

>
>
> The only sane way to do it, is to communicate this value to the kernel
> somehow. The bandwidth controller is the interface we have for that. So
> everybody that wants to *intentionally* overcommit needs to communicate
> this to the controller. IOW: Any sane configuration should be explicit
> about your capping.
>
>>>>>>          Add an ioctl to communicate the consign limit to the host.
>>> This definitely should go away.
>>>
>>> More specifically, *whatever* way we use to cap the processor, the host
>>> system will have all the information at all times.
>> I'm not understanding that comment.  If you are capping by simply
>> controlling the amount of
>> overcommit on the host then wouldn't you still need some value to
>> indicate the desired amount.
> No, that is just crazy, and I don't like it a single bit.
>
> So in the light of it: Whatever capping mechanism we have, we need to be
> explicit about the expected entitlement. At this point, the kernel
> already knows what it is, and needs no extra ioctls or anything like that.
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

     prev parent reply	other threads:[~2012-12-07 15:50 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-26 20:36 [PATCH 0/5] Alter steal time reporting in KVM Michael Wolf
2012-11-26 20:36 ` [PATCH 1/5] Alter the amount of steal time reported by the guest Michael Wolf
2012-11-26 20:36 ` [PATCH 2/5] Expand the steal time msr to also contain the consigned time Michael Wolf
2012-11-27 21:03   ` Konrad Rzeszutek Wilk
2012-11-28 15:23     ` Michael Wolf
2012-11-26 20:36 ` [PATCH 3/5] Add the code to send the consigned time from the host to the guest Michael Wolf
2012-11-26 20:37 ` [PATCH 4/5] Add a timer to allow the separation of consigned from steal time Michael Wolf
2012-11-26 20:37 ` [PATCH 5/5] Add an ioctl to communicate the consign limit to the host Michael Wolf
2012-11-27  8:48 ` [PATCH 0/5] Alter steal time reporting in KVM Glauber Costa
2012-11-27 15:10   ` Michael Wolf
2012-11-28  8:45     ` Glauber Costa
2012-11-28 18:44       ` Michael Wolf
2012-11-28 19:16   ` Anthony Liguori
2012-11-27 23:24 ` Marcelo Tosatti
2012-11-28  0:32   ` Marcelo Tosatti
2012-11-28 18:43   ` Michael Wolf
2012-11-28 20:55     ` Glauber Costa
2012-11-29 17:43       ` Michael Wolf
2012-12-05 12:46         ` Glauber Costa
2012-12-07 15:50           ` Michael Wolf [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50C21056.5090905@linux.vnet.ibm.com \
    --to=mjw@linux.vnet.ibm.com \
    --cc=anthony@codemonkey.ws \
    --cc=glommer@parallels.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=mtosatti@redhat.com \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.