All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marcelo Tosatti <mtosatti@redhat.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, linux-pm@lists.linux-foundation.org,
	Radim Krcmar <rkrcmar@redhat.com>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	Viresh Kumar <viresh.kumar@linaro.org>
Subject: Re: [patch 0/3] KVM CPU frequency change hypercalls (resend)
Date: Tue, 14 Mar 2017 20:27:51 -0300	[thread overview]
Message-ID: <20170314232748.GA15962@amt.cnet> (raw)
In-Reply-To: <fa831f8a-e9f7-f431-a1bc-9e9e0dda6d44@redhat.com>

Hi Paolo,

On Tue, Mar 14, 2017 at 05:40:21PM +0100, Paolo Bonzini wrote:
> 
> 
> On 02/03/2017 14:59, Marcelo Tosatti wrote:
> > On Thu, Mar 02, 2017 at 11:15:00AM +0100, Paolo Bonzini wrote:
> >>  one obvious downside is that any application that you
> >> run after DPDK will have its CPU frequency hardcoded to something that
> >> is not appropriate.  
> > 
> > To isolate the CPU where DPDK runs it is already necessary to perform
> > special procedures such as changing the cpumask of other tasks, changing
> > cpumask of interrupt handlers (to remove the isolated CPU from that
> > cpumask), etc. Changing the cpufreq governor to userspace is another
> > step of that setup phase.
> > 
> > On shutdown (or CPU unpin), you can switch back the CPU to the previous
> > governor, which can switch the frequency to whatever it finds suitable.
> 
> But I thought that one of the reasons to do NFV is to simplify this
> setup.  If you now have to do the same thing on virtual machines, things
> become more complicated to set up, and I don't think that NFV virtual
> machines are _that_ special.
> 
> In addition, in the list of setup steps above you forgot "chmod the
> sysfs files for cpufreq so that DPDK can access it".  Doing that chmod
> is a very explicit act, and that's unlike the functionality of this patch.
> 
> By letting virtual machines do the same with a simple hypercall, you're
> giving powers to whoever opens /dev/kvm that they didn't have before
> (unless the userspace process also had access to sysfs).  Worse, the
> effects last beyond the moment /dev/kvm is closed.

This can be fixed by requiring qemu-kvm-vcpu thread, which runs 
the hypercall, to have sufficient priority (similar to other cpufreq
users). Fine, good point.

> So, the question then is how to design the hypervisor so that these NFV
> virtual machines can play with cpufreq, but there are no adverse
> indefinite effects. 

Ok, we can modify the cpufreq cgroups patch, to, from the hypercalls
set the:

"The first three patches of this series introduces
capacity_{min,max} tracking
in the core scheduler, as an extension of the CPU controller."

capacity_min == capacity_max values (which forces the CPU to run
at that frequency, given there are no other tasks requesting
frequency information on that CPU).

This is good enough DPDK.

> One possibility is to have some kind of per-task
> cpufreq.  Another is to do everything in userspace with virtual ACPI
> P-states and the userspace governor in the VM.

Virtual ACPI P-state, that is an option. But why not make it
in-kernel, the exit to userspace can be a significant
fraction of the total if the frequency change time is small (say, 10us
freq change and 5us for userspace exit).

> I was hoping to get more feedback from linux-pm.
> 
> >> Here are two possibilities that I could think of:
> >>
> >> 1) Introduce a mechanism that allows a task to override the governor's
> >> choice of CPU frequency.  This could be a ioctl, a prctl, a cgroup-based
> >> mechanism or whatever else.  As Marcelo pointed out in the original kvm@
> >> thread, the latency and overhead of switching frequencies make it
> >> impractical to associate a desired CPU frequency with a task, because
> >> multiple tasks could be requesting a given frequency.  One possibility
> >> could be to treat the per-task CPU frequency as advisory
> > 
> > DPDK can't afford the frequency as advisory: failure in setting the
> > processor frequency when requested means dropped packets (not 
> > dropping packets being a requirement).
> 
> It can be advisory if you document a proper configuration where it's obeyed.

Sure.

> 
> Paolo
> 
> >>  and only obey
> >> it in restricted cases---for example only if nohz_full is in effect.
> > 
> > From cpufreq documentation:
> > 
> > "On all other cpufreq implementations, these boundaries still need to
> > be set. Then, a "governor" must be selected. Such a "governor" decides
> > what speed the processor shall run within the boundaries. One such
> > "governor" is the "userspace" governor. This one allows the user - or
> > a yet-to-implement userspace program - to decide what specific speed
> > the processor shall run at."
> > 
> > (it seems the cpufreq-hypercall+cpufreq-userspace combination is in 
> > accord with what cpufreq-userspace has been designed for).
> > 
> > Secondly, setting frequencies for multiple tasks is somewhat
> > contradictory:
> > 
> > In the DPDK context, or in any context actually, it makes sense for a
> > program to lower processor frequency when it decides the current 
> > frequency is sufficient to handle the job: that is lowering the
> > frequency will still make it possible to handle the load.
> > 
> > With multiple applications sharing that processor, the percentage 
> > of time given to a certain application also interferes with the
> > time it spends handling the job. So the other variable that 
> > affects "instructions per second" is timeslice given to the
> > task by the scheduler, not only "frequency".
> > 
> > Having a task request for a particular frequency in that case becomes
> > ambiguous: you could be asking for "increased timeslice".

  reply	other threads:[~2017-03-14 23:28 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-01 15:04 [patch 0/3] KVM CPU frequency change hypercalls (resend) Marcelo Tosatti
2017-03-01 15:04 ` [patch 1/3] cpufreq: implement min/max/up/down functions Marcelo Tosatti
2017-03-01 15:04 ` [patch 2/3] KVM: x86: introduce ioctl to allow frequency hypercalls Marcelo Tosatti
2017-03-01 15:04 ` [patch 3/3] KVM: x86: frequency change hypercalls Marcelo Tosatti
2017-03-02 10:15 ` [patch 0/3] KVM CPU frequency change hypercalls (resend) Paolo Bonzini
2017-03-02 13:59   ` Marcelo Tosatti
2017-03-14 16:40     ` Paolo Bonzini
2017-03-14 23:27       ` Marcelo Tosatti [this message]
2017-03-15  8:23         ` Paolo Bonzini
2017-03-15 18:30           ` Marcelo Tosatti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170314232748.GA15962@amt.cnet \
    --to=mtosatti@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-pm@lists.linux-foundation.org \
    --cc=pbonzini@redhat.com \
    --cc=rafael@kernel.org \
    --cc=rkrcmar@redhat.com \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.