[Qemu-devel] cpufreq and QEMU guests

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] cpufreq and QEMU guests
@ 2013-09-16 12:15 Benoît Canet
  2013-09-16 14:39 ` Alexander Graf
  0 siblings, 1 reply; 8+ messages in thread
From: Benoît Canet @ 2013-09-16 12:15 UTC (permalink / raw)
  To: cpufreq, qemu-devel
  Cc: peter.maydell, gleb, viresh.kumar, agraf, rjw, pbonzini

Hello,

I know a cloud provider worried about the fact that the /proc/cpuinfo of his
guests give a bogus frequency to his customer.

QEMU and the guests kernel currently have no way to reflect the host frequency
changes to the guests.

The customer compute intensive application then read this information and take
wrong decisions.

I looked at the various Linux cpufreq drivers and they all seems to be table
based. Is it true ?

For example the acpi cpufreq driver have 16 differents pstates at hand to lookup
in the pstate table and get the frequency.

Given that guests can migrate from one hardware to a slightly different hardware
the table may become wrong after live migration.

What would be the best hardware to emulate in order to pass an arbitrary
frequency to the guest ?

Would a pvfreq paravirtualized QEMU hardware and a guest driver implementing
only the callbacks needed to read the frequency be a good idea ?

Best regards

Benoît

ps:
I CC this mail to the other QEMU arch maintainers because the problem must be
the same everywhere where KVM run.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] cpufreq and QEMU guests
  2013-09-16 12:15 [Qemu-devel] cpufreq and QEMU guests Benoît Canet
@ 2013-09-16 14:39 ` Alexander Graf
  2013-09-16 15:05   ` Benoît Canet
  0 siblings, 1 reply; 8+ messages in thread
From: Alexander Graf @ 2013-09-16 14:39 UTC (permalink / raw)
  To: Benoît Canet
  Cc: peter.maydell@linaro.org, gleb@redhat.com,
	viresh.kumar@linaro.org, qemu-devel@nongnu.org,
	cpufreq@vger.kernel.org, rjw@sisk.pl, pbonzini@redhat.com



Am 16.09.2013 um 07:15 schrieb Benoît Canet <benoit.canet@irqsave.net>:

> 
> Hello,
> 
> I know a cloud provider worried about the fact that the /proc/cpuinfo of his
> guests give a bogus frequency to his customer.
> 
> QEMU and the guests kernel currently have no way to reflect the host frequency
> changes to the guests.
> 
> The customer compute intensive application then read this information and take
> wrong decisions.

Why do they care about the frequency? Is it for scheduling workloads? The only other case I can think of would be the TSC and that should be fixed frequency these days.

If it's scheduling, you could maybe expose the unavailable compute time as steal time to the guest. Exposibg frequency in a virtual environment feels backwards.

Alex

> 
> I looked at the various Linux cpufreq drivers and they all seems to be table
> based. Is it true ?
> 
> For example the acpi cpufreq driver have 16 differents pstates at hand to lookup
> in the pstate table and get the frequency.
> 
> Given that guests can migrate from one hardware to a slightly different hardware
> the table may become wrong after live migration.
> 
> What would be the best hardware to emulate in order to pass an arbitrary
> frequency to the guest ?
> 
> Would a pvfreq paravirtualized QEMU hardware and a guest driver implementing
> only the callbacks needed to read the frequency be a good idea ?
> 
> Best regards
> 
> Benoît
> 
> ps:
> I CC this mail to the other QEMU arch maintainers because the problem must be
> the same everywhere where KVM run.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] cpufreq and QEMU guests
  2013-09-16 14:39 ` Alexander Graf
@ 2013-09-16 15:05   ` Benoît Canet
  2013-09-16 15:32     ` Gleb Natapov
  0 siblings, 1 reply; 8+ messages in thread
From: Benoît Canet @ 2013-09-16 15:05 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Benoît Canet, peter.maydell@linaro.org, gleb@redhat.com,
	viresh.kumar@linaro.org, qemu-devel@nongnu.org,
	cpufreq@vger.kernel.org, rjw@sisk.pl, pbonzini@redhat.com

Le Monday 16 Sep 2013 à 09:39:10 (-0500), Alexander Graf a écrit :
> 
> 
> Am 16.09.2013 um 07:15 schrieb Benoît Canet <benoit.canet@irqsave.net>:
> 
> > 
> > Hello,
> > 
> > I know a cloud provider worried about the fact that the /proc/cpuinfo of his
> > guests give a bogus frequency to his customer.
> > 
> > QEMU and the guests kernel currently have no way to reflect the host frequency
> > changes to the guests.
> > 
> > The customer compute intensive application then read this information and take
> > wrong decisions.
> 
> Why do they care about the frequency? Is it for scheduling workloads? The only other case I can think of would be the TSC and that should be fixed frequency these days.
> 
> If it's scheduling, you could maybe expose the unavailable compute time as steal time to the guest. Exposibg frequency in a virtual environment feels backwards.

The final customer have a compute intensive workload.
At startup the code retrieve the cpu cache topology, the cpu model, and various
informations including the guest cpu frequency before starting the compute job.
The QEMU instance typicaly use -cpu host.

The code inspects the cpu frequency has seen by the guests to choose the number
of vms to instanciate to compute the given task.
They even destroy and recreate some vms that would be underperforming to
mitigate the high inter vm communication costs.

Do you think the steal time trick would work for this ?

Best regards

Benoît


> 
> Alex
> 
> > 
> > I looked at the various Linux cpufreq drivers and they all seems to be table
> > based. Is it true ?
> > 
> > For example the acpi cpufreq driver have 16 differents pstates at hand to lookup
> > in the pstate table and get the frequency.
> > 
> > Given that guests can migrate from one hardware to a slightly different hardware
> > the table may become wrong after live migration.
> > 
> > What would be the best hardware to emulate in order to pass an arbitrary
> > frequency to the guest ?
> > 
> > Would a pvfreq paravirtualized QEMU hardware and a guest driver implementing
> > only the callbacks needed to read the frequency be a good idea ?
> > 
> > Best regards
> > 
> > Benoît
> > 
> > ps:
> > I CC this mail to the other QEMU arch maintainers because the problem must be
> > the same everywhere where KVM run.
> --
> To unsubscribe from this list: send the line "unsubscribe cpufreq" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] cpufreq and QEMU guests
  2013-09-16 15:05   ` Benoît Canet
@ 2013-09-16 15:32     ` Gleb Natapov
  2013-09-16 15:46       ` Benoît Canet
  0 siblings, 1 reply; 8+ messages in thread
From: Gleb Natapov @ 2013-09-16 15:32 UTC (permalink / raw)
  To: Benoît Canet
  Cc: peter.maydell@linaro.org, viresh.kumar@linaro.org, Alexander Graf,
	cpufreq@vger.kernel.org, qemu-devel@nongnu.org, rjw@sisk.pl,
	pbonzini@redhat.com

On Mon, Sep 16, 2013 at 05:05:45PM +0200, Benoît Canet wrote:
> Le Monday 16 Sep 2013 à 09:39:10 (-0500), Alexander Graf a écrit :
> > 
> > 
> > Am 16.09.2013 um 07:15 schrieb Benoît Canet <benoit.canet@irqsave.net>:
> > 
> > > 
> > > Hello,
> > > 
> > > I know a cloud provider worried about the fact that the /proc/cpuinfo of his
> > > guests give a bogus frequency to his customer.
> > > 
> > > QEMU and the guests kernel currently have no way to reflect the host frequency
> > > changes to the guests.
> > > 
> > > The customer compute intensive application then read this information and take
> > > wrong decisions.
> > 
> > Why do they care about the frequency? Is it for scheduling workloads? The only other case I can think of would be the TSC and that should be fixed frequency these days.
> > 
> > If it's scheduling, you could maybe expose the unavailable compute time as steal time to the guest. Exposibg frequency in a virtual environment feels backwards.
> 
> The final customer have a compute intensive workload.
> At startup the code retrieve the cpu cache topology, the cpu model, and various
> informations including the guest cpu frequency before starting the compute job.
> The QEMU instance typicaly use -cpu host.
> 
> The code inspects the cpu frequency has seen by the guests to choose the number
> of vms to instanciate to compute the given task.
I am not sure I understand. They look at guest cpu frequency to estimate
guest's performance?

> They even destroy and recreate some vms that would be underperforming to
> mitigate the high inter vm communication costs.
> 
> Do you think the steal time trick would work for this ?
> 

--
			Gleb.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] cpufreq and QEMU guests
  2013-09-16 15:32     ` Gleb Natapov
@ 2013-09-16 15:46       ` Benoît Canet
  2013-09-16 15:58         ` Gleb Natapov
  0 siblings, 1 reply; 8+ messages in thread
From: Benoît Canet @ 2013-09-16 15:46 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Benoît Canet, peter.maydell@linaro.org,
	viresh.kumar@linaro.org, Alexander Graf, cpufreq@vger.kernel.org,
	qemu-devel@nongnu.org, rjw@sisk.pl, pbonzini@redhat.com

Le Monday 16 Sep 2013 à 18:32:39 (+0300), Gleb Natapov a écrit :
> On Mon, Sep 16, 2013 at 05:05:45PM +0200, Benoît Canet wrote:
> > Le Monday 16 Sep 2013 à 09:39:10 (-0500), Alexander Graf a écrit :
> > > 
> > > 
> > > Am 16.09.2013 um 07:15 schrieb Benoît Canet <benoit.canet@irqsave.net>:
> > > 
> > > > 
> > > > Hello,
> > > > 
> > > > I know a cloud provider worried about the fact that the /proc/cpuinfo of his
> > > > guests give a bogus frequency to his customer.
> > > > 
> > > > QEMU and the guests kernel currently have no way to reflect the host frequency
> > > > changes to the guests.
> > > > 
> > > > The customer compute intensive application then read this information and take
> > > > wrong decisions.
> > > 
> > > Why do they care about the frequency? Is it for scheduling workloads? The only other case I can think of would be the TSC and that should be fixed frequency these days.
> > > 
> > > If it's scheduling, you could maybe expose the unavailable compute time as steal time to the guest. Exposibg frequency in a virtual environment feels backwards.
> > 
> > The final customer have a compute intensive workload.
> > At startup the code retrieve the cpu cache topology, the cpu model, and various
> > informations including the guest cpu frequency before starting the compute job.
> > The QEMU instance typicaly use -cpu host.
> > 
> > The code inspects the cpu frequency has seen by the guests to choose the number
> > of vms to instanciate to compute the given task.
> I am not sure I understand. They look at guest cpu frequency to estimate
> guest's performance?

Yes they take guest cpu count, model and frequency to estimate the performance
of the guest.
Next they cluster enough guests to be able to compute the job in a given time by
using this estimate.

Best regards

Benoît

> 
> > They even destroy and recreate some vms that would be underperforming to
> > mitigate the high inter vm communication costs.
> > 
> > Do you think the steal time trick would work for this ?
> > 
> 
> --
> 			Gleb.
> --
> To unsubscribe from this list: send the line "unsubscribe cpufreq" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] cpufreq and QEMU guests
  2013-09-16 15:46       ` Benoît Canet
@ 2013-09-16 15:58         ` Gleb Natapov
  2013-09-16 18:42           ` Benoît Canet
  0 siblings, 1 reply; 8+ messages in thread
From: Gleb Natapov @ 2013-09-16 15:58 UTC (permalink / raw)
  To: Benoît Canet
  Cc: peter.maydell@linaro.org, viresh.kumar@linaro.org, Alexander Graf,
	cpufreq@vger.kernel.org, qemu-devel@nongnu.org, rjw@sisk.pl,
	pbonzini@redhat.com

On Mon, Sep 16, 2013 at 05:46:04PM +0200, Benoît Canet wrote:
> Le Monday 16 Sep 2013 à 18:32:39 (+0300), Gleb Natapov a écrit :
> > On Mon, Sep 16, 2013 at 05:05:45PM +0200, Benoît Canet wrote:
> > > Le Monday 16 Sep 2013 à 09:39:10 (-0500), Alexander Graf a écrit :
> > > > 
> > > > 
> > > > Am 16.09.2013 um 07:15 schrieb Benoît Canet <benoit.canet@irqsave.net>:
> > > > 
> > > > > 
> > > > > Hello,
> > > > > 
> > > > > I know a cloud provider worried about the fact that the /proc/cpuinfo of his
> > > > > guests give a bogus frequency to his customer.
> > > > > 
> > > > > QEMU and the guests kernel currently have no way to reflect the host frequency
> > > > > changes to the guests.
> > > > > 
> > > > > The customer compute intensive application then read this information and take
> > > > > wrong decisions.
> > > > 
> > > > Why do they care about the frequency? Is it for scheduling workloads? The only other case I can think of would be the TSC and that should be fixed frequency these days.
> > > > 
> > > > If it's scheduling, you could maybe expose the unavailable compute time as steal time to the guest. Exposibg frequency in a virtual environment feels backwards.
> > > 
> > > The final customer have a compute intensive workload.
> > > At startup the code retrieve the cpu cache topology, the cpu model, and various
> > > informations including the guest cpu frequency before starting the compute job.
> > > The QEMU instance typicaly use -cpu host.
> > > 
> > > The code inspects the cpu frequency has seen by the guests to choose the number
> > > of vms to instanciate to compute the given task.
> > I am not sure I understand. They look at guest cpu frequency to estimate
> > guest's performance?
> 
> Yes they take guest cpu count, model and frequency to estimate the performance
> of the guest.
> Next they cluster enough guests to be able to compute the job in a given time by
> using this estimate.
> 
They do it wrong. They should take guest cpu count, host cpu model and
frequency, pcpu/vcpu over commit (if any), guest/host memory overcommit
(if any) and estimate performance based on this. For pure computational
performance guest core performance should be close to host core
performance if there is not cpu/memory overcommit. With a lot of IO
things become more complicated.

--
			Gleb.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] cpufreq and QEMU guests
  2013-09-16 15:58         ` Gleb Natapov
@ 2013-09-16 18:42           ` Benoît Canet
  2013-09-17  8:58             ` Gleb Natapov
  0 siblings, 1 reply; 8+ messages in thread
From: Benoît Canet @ 2013-09-16 18:42 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Benoît Canet, peter.maydell@linaro.org,
	viresh.kumar@linaro.org, qemu-devel@nongnu.org,
	cpufreq@vger.kernel.org, Alexander Graf, rjw@sisk.pl,
	pbonzini@redhat.com

Le Monday 16 Sep 2013 à 18:58:40 (+0300), Gleb Natapov a écrit :
> On Mon, Sep 16, 2013 at 05:46:04PM +0200, Benoît Canet wrote:
> > Le Monday 16 Sep 2013 à 18:32:39 (+0300), Gleb Natapov a écrit :
> > > On Mon, Sep 16, 2013 at 05:05:45PM +0200, Benoît Canet wrote:
> > > > Le Monday 16 Sep 2013 à 09:39:10 (-0500), Alexander Graf a écrit :
> > > > > 
> > > > > 
> > > > > Am 16.09.2013 um 07:15 schrieb Benoît Canet <benoit.canet@irqsave.net>:
> > > > > 
> > > > > > 
> > > > > > Hello,
> > > > > > 
> > > > > > I know a cloud provider worried about the fact that the /proc/cpuinfo of his
> > > > > > guests give a bogus frequency to his customer.
> > > > > > 
> > > > > > QEMU and the guests kernel currently have no way to reflect the host frequency
> > > > > > changes to the guests.
> > > > > > 
> > > > > > The customer compute intensive application then read this information and take
> > > > > > wrong decisions.
> > > > > 
> > > > > Why do they care about the frequency? Is it for scheduling workloads? The only other case I can think of would be the TSC and that should be fixed frequency these days.
> > > > > 
> > > > > If it's scheduling, you could maybe expose the unavailable compute time as steal time to the guest. Exposibg frequency in a virtual environment feels backwards.
> > > > 
> > > > The final customer have a compute intensive workload.
> > > > At startup the code retrieve the cpu cache topology, the cpu model, and various
> > > > informations including the guest cpu frequency before starting the compute job.
> > > > The QEMU instance typicaly use -cpu host.
> > > > 
> > > > The code inspects the cpu frequency has seen by the guests to choose the number
> > > > of vms to instanciate to compute the given task.
> > > I am not sure I understand. They look at guest cpu frequency to estimate
> > > guest's performance?
> > 
> > Yes they take guest cpu count, model and frequency to estimate the performance
> > of the guest.
> > Next they cluster enough guests to be able to compute the job in a given time by
> > using this estimate.
> > 
> They do it wrong. They should take guest cpu count, host cpu model and
> frequency, pcpu/vcpu over commit (if any), guest/host memory overcommit
> (if any) and estimate performance based on this. For pure computational
> performance guest core performance should be close to host core
> performance if there is not cpu/memory overcommit. With a lot of IO
> things become more complicated.

I ommited to write some details of the use case.

The cloud is a Amazon compatible one this means there is no guest agent in the
guest to help retrieve the host frequency and model.

Also the AWS APIs don't provide a way to communicate the host CPU infos to the
program responsible of the vm orchestrations.

So the only interface to access the host cpu info is QEMU and it's started with
-cpu host to passthrough the cpu model to the guest.

What hurt the final customer badly is that the guest /proc/cpuinfo see the
regular max frequency of the host cpu but won't see the turbo frequency or a
scaled down one.

Best regards

Benoît

> 
> --
> 			Gleb.
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] cpufreq and QEMU guests
  2013-09-16 18:42           ` Benoît Canet
@ 2013-09-17  8:58             ` Gleb Natapov
  0 siblings, 0 replies; 8+ messages in thread
From: Gleb Natapov @ 2013-09-17  8:58 UTC (permalink / raw)
  To: Benoît Canet
  Cc: peter.maydell@linaro.org, viresh.kumar@linaro.org,
	qemu-devel@nongnu.org, cpufreq@vger.kernel.org, Alexander Graf,
	rjw@sisk.pl, pbonzini@redhat.com

On Mon, Sep 16, 2013 at 08:42:58PM +0200, Benoît Canet wrote:
> Le Monday 16 Sep 2013 à 18:58:40 (+0300), Gleb Natapov a écrit :
> > On Mon, Sep 16, 2013 at 05:46:04PM +0200, Benoît Canet wrote:
> > > Le Monday 16 Sep 2013 à 18:32:39 (+0300), Gleb Natapov a écrit :
> > > > On Mon, Sep 16, 2013 at 05:05:45PM +0200, Benoît Canet wrote:
> > > > > Le Monday 16 Sep 2013 à 09:39:10 (-0500), Alexander Graf a écrit :
> > > > > > 
> > > > > > 
> > > > > > Am 16.09.2013 um 07:15 schrieb Benoît Canet <benoit.canet@irqsave.net>:
> > > > > > 
> > > > > > > 
> > > > > > > Hello,
> > > > > > > 
> > > > > > > I know a cloud provider worried about the fact that the /proc/cpuinfo of his
> > > > > > > guests give a bogus frequency to his customer.
> > > > > > > 
> > > > > > > QEMU and the guests kernel currently have no way to reflect the host frequency
> > > > > > > changes to the guests.
> > > > > > > 
> > > > > > > The customer compute intensive application then read this information and take
> > > > > > > wrong decisions.
> > > > > > 
> > > > > > Why do they care about the frequency? Is it for scheduling workloads? The only other case I can think of would be the TSC and that should be fixed frequency these days.
> > > > > > 
> > > > > > If it's scheduling, you could maybe expose the unavailable compute time as steal time to the guest. Exposibg frequency in a virtual environment feels backwards.
> > > > > 
> > > > > The final customer have a compute intensive workload.
> > > > > At startup the code retrieve the cpu cache topology, the cpu model, and various
> > > > > informations including the guest cpu frequency before starting the compute job.
> > > > > The QEMU instance typicaly use -cpu host.
> > > > > 
> > > > > The code inspects the cpu frequency has seen by the guests to choose the number
> > > > > of vms to instanciate to compute the given task.
> > > > I am not sure I understand. They look at guest cpu frequency to estimate
> > > > guest's performance?
> > > 
> > > Yes they take guest cpu count, model and frequency to estimate the performance
> > > of the guest.
> > > Next they cluster enough guests to be able to compute the job in a given time by
> > > using this estimate.
> > > 
> > They do it wrong. They should take guest cpu count, host cpu model and
> > frequency, pcpu/vcpu over commit (if any), guest/host memory overcommit
> > (if any) and estimate performance based on this. For pure computational
> > performance guest core performance should be close to host core
> > performance if there is not cpu/memory overcommit. With a lot of IO
> > things become more complicated.
> 
> I ommited to write some details of the use case.
> 
> The cloud is a Amazon compatible one this means there is no guest agent in the
> guest to help retrieve the host frequency and model.
>
> Also the AWS APIs don't provide a way to communicate the host CPU infos to the
> program responsible of the vm orchestrations.
> 
> So the only interface to access the host cpu info is QEMU and it's started with
> -cpu host to passthrough the cpu model to the guest.
> 
Why are they sure they are started with "-cpu host"? Do they know if
host is overcommitted or guest's vcpu usage is restricted by any other
means?

> What hurt the final customer badly is that the guest /proc/cpuinfo see the
> regular max frequency of the host cpu but won't see the turbo frequency or a
> scaled down one.
> 
What he sees is host tsc frequency of the cpu a guest was booted on
[1] which should be adequate to estimate performance if guest is not
migrated. The frequency host cpu is running on at any given moment is
out of guest control and depend on host frequency governor and load.

[1] the value comes from host, for not constant tsc hosts this is max
    possible frequency

--
			Gleb.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2013-09-17  8:59 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-09-16 12:15 [Qemu-devel] cpufreq and QEMU guests Benoît Canet
2013-09-16 14:39 ` Alexander Graf
2013-09-16 15:05   ` Benoît Canet
2013-09-16 15:32     ` Gleb Natapov
2013-09-16 15:46       ` Benoît Canet
2013-09-16 15:58         ` Gleb Natapov
2013-09-16 18:42           ` Benoît Canet
2013-09-17  8:58             ` Gleb Natapov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).