public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Marcelo Tosatti <mtosatti@redhat.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, linux-pm@lists.linux-foundation.org,
	Radim Krcmar <rkrcmar@redhat.com>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	Viresh Kumar <viresh.kumar@linaro.org>
Subject: Re: [patch 0/3] KVM CPU frequency change hypercalls (resend)
Date: Tue, 14 Mar 2017 20:27:51 -0300	[thread overview]
Message-ID: <20170314232748.GA15962@amt.cnet> (raw)
In-Reply-To: <fa831f8a-e9f7-f431-a1bc-9e9e0dda6d44@redhat.com>

Hi Paolo,

On Tue, Mar 14, 2017 at 05:40:21PM +0100, Paolo Bonzini wrote:
> 
> 
> On 02/03/2017 14:59, Marcelo Tosatti wrote:
> > On Thu, Mar 02, 2017 at 11:15:00AM +0100, Paolo Bonzini wrote:
> >>  one obvious downside is that any application that you
> >> run after DPDK will have its CPU frequency hardcoded to something that
> >> is not appropriate.  
> > 
> > To isolate the CPU where DPDK runs it is already necessary to perform
> > special procedures such as changing the cpumask of other tasks, changing
> > cpumask of interrupt handlers (to remove the isolated CPU from that
> > cpumask), etc. Changing the cpufreq governor to userspace is another
> > step of that setup phase.
> > 
> > On shutdown (or CPU unpin), you can switch back the CPU to the previous
> > governor, which can switch the frequency to whatever it finds suitable.
> 
> But I thought that one of the reasons to do NFV is to simplify this
> setup.  If you now have to do the same thing on virtual machines, things
> become more complicated to set up, and I don't think that NFV virtual
> machines are _that_ special.
> 
> In addition, in the list of setup steps above you forgot "chmod the
> sysfs files for cpufreq so that DPDK can access it".  Doing that chmod
> is a very explicit act, and that's unlike the functionality of this patch.
> 
> By letting virtual machines do the same with a simple hypercall, you're
> giving powers to whoever opens /dev/kvm that they didn't have before
> (unless the userspace process also had access to sysfs).  Worse, the
> effects last beyond the moment /dev/kvm is closed.

This can be fixed by requiring qemu-kvm-vcpu thread, which runs 
the hypercall, to have sufficient priority (similar to other cpufreq
users). Fine, good point.

> So, the question then is how to design the hypervisor so that these NFV
> virtual machines can play with cpufreq, but there are no adverse
> indefinite effects. 

Ok, we can modify the cpufreq cgroups patch, to, from the hypercalls
set the:

"The first three patches of this series introduces
capacity_{min,max} tracking
in the core scheduler, as an extension of the CPU controller."

capacity_min == capacity_max values (which forces the CPU to run
at that frequency, given there are no other tasks requesting
frequency information on that CPU).

This is good enough DPDK.

> One possibility is to have some kind of per-task
> cpufreq.  Another is to do everything in userspace with virtual ACPI
> P-states and the userspace governor in the VM.

Virtual ACPI P-state, that is an option. But why not make it
in-kernel, the exit to userspace can be a significant
fraction of the total if the frequency change time is small (say, 10us
freq change and 5us for userspace exit).

> I was hoping to get more feedback from linux-pm.
> 
> >> Here are two possibilities that I could think of:
> >>
> >> 1) Introduce a mechanism that allows a task to override the governor's
> >> choice of CPU frequency.  This could be a ioctl, a prctl, a cgroup-based
> >> mechanism or whatever else.  As Marcelo pointed out in the original kvm@
> >> thread, the latency and overhead of switching frequencies make it
> >> impractical to associate a desired CPU frequency with a task, because
> >> multiple tasks could be requesting a given frequency.  One possibility
> >> could be to treat the per-task CPU frequency as advisory
> > 
> > DPDK can't afford the frequency as advisory: failure in setting the
> > processor frequency when requested means dropped packets (not 
> > dropping packets being a requirement).
> 
> It can be advisory if you document a proper configuration where it's obeyed.

Sure.

> 
> Paolo
> 
> >>  and only obey
> >> it in restricted cases---for example only if nohz_full is in effect.
> > 
> > From cpufreq documentation:
> > 
> > "On all other cpufreq implementations, these boundaries still need to
> > be set. Then, a "governor" must be selected. Such a "governor" decides
> > what speed the processor shall run within the boundaries. One such
> > "governor" is the "userspace" governor. This one allows the user - or
> > a yet-to-implement userspace program - to decide what specific speed
> > the processor shall run at."
> > 
> > (it seems the cpufreq-hypercall+cpufreq-userspace combination is in 
> > accord with what cpufreq-userspace has been designed for).
> > 
> > Secondly, setting frequencies for multiple tasks is somewhat
> > contradictory:
> > 
> > In the DPDK context, or in any context actually, it makes sense for a
> > program to lower processor frequency when it decides the current 
> > frequency is sufficient to handle the job: that is lowering the
> > frequency will still make it possible to handle the load.
> > 
> > With multiple applications sharing that processor, the percentage 
> > of time given to a certain application also interferes with the
> > time it spends handling the job. So the other variable that 
> > affects "instructions per second" is timeslice given to the
> > task by the scheduler, not only "frequency".
> > 
> > Having a task request for a particular frequency in that case becomes
> > ambiguous: you could be asking for "increased timeslice".

  reply	other threads:[~2017-03-14 23:28 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-01 15:04 [patch 0/3] KVM CPU frequency change hypercalls (resend) Marcelo Tosatti
2017-03-01 15:04 ` [patch 1/3] cpufreq: implement min/max/up/down functions Marcelo Tosatti
2017-03-01 15:04 ` [patch 2/3] KVM: x86: introduce ioctl to allow frequency hypercalls Marcelo Tosatti
2017-03-01 15:04 ` [patch 3/3] KVM: x86: frequency change hypercalls Marcelo Tosatti
2017-03-02 10:15 ` [patch 0/3] KVM CPU frequency change hypercalls (resend) Paolo Bonzini
2017-03-02 13:59   ` Marcelo Tosatti
2017-03-14 16:40     ` Paolo Bonzini
2017-03-14 23:27       ` Marcelo Tosatti [this message]
2017-03-15  8:23         ` Paolo Bonzini
2017-03-15 18:30           ` Marcelo Tosatti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170314232748.GA15962@amt.cnet \
    --to=mtosatti@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-pm@lists.linux-foundation.org \
    --cc=pbonzini@redhat.com \
    --cc=rafael@kernel.org \
    --cc=rkrcmar@redhat.com \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox