* [Qemu-devel] cpufreq and QEMU guests @ 2013-09-16 12:15 Benoît Canet 2013-09-16 14:39 ` Alexander Graf 0 siblings, 1 reply; 8+ messages in thread From: Benoît Canet @ 2013-09-16 12:15 UTC (permalink / raw) To: cpufreq, qemu-devel Cc: peter.maydell, gleb, viresh.kumar, agraf, rjw, pbonzini Hello, I know a cloud provider worried about the fact that the /proc/cpuinfo of his guests give a bogus frequency to his customer. QEMU and the guests kernel currently have no way to reflect the host frequency changes to the guests. The customer compute intensive application then read this information and take wrong decisions. I looked at the various Linux cpufreq drivers and they all seems to be table based. Is it true ? For example the acpi cpufreq driver have 16 differents pstates at hand to lookup in the pstate table and get the frequency. Given that guests can migrate from one hardware to a slightly different hardware the table may become wrong after live migration. What would be the best hardware to emulate in order to pass an arbitrary frequency to the guest ? Would a pvfreq paravirtualized QEMU hardware and a guest driver implementing only the callbacks needed to read the frequency be a good idea ? Best regards Benoît ps: I CC this mail to the other QEMU arch maintainers because the problem must be the same everywhere where KVM run. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] cpufreq and QEMU guests 2013-09-16 12:15 [Qemu-devel] cpufreq and QEMU guests Benoît Canet @ 2013-09-16 14:39 ` Alexander Graf 2013-09-16 15:05 ` Benoît Canet 0 siblings, 1 reply; 8+ messages in thread From: Alexander Graf @ 2013-09-16 14:39 UTC (permalink / raw) To: Benoît Canet Cc: peter.maydell@linaro.org, gleb@redhat.com, viresh.kumar@linaro.org, qemu-devel@nongnu.org, cpufreq@vger.kernel.org, rjw@sisk.pl, pbonzini@redhat.com Am 16.09.2013 um 07:15 schrieb Benoît Canet <benoit.canet@irqsave.net>: > > Hello, > > I know a cloud provider worried about the fact that the /proc/cpuinfo of his > guests give a bogus frequency to his customer. > > QEMU and the guests kernel currently have no way to reflect the host frequency > changes to the guests. > > The customer compute intensive application then read this information and take > wrong decisions. Why do they care about the frequency? Is it for scheduling workloads? The only other case I can think of would be the TSC and that should be fixed frequency these days. If it's scheduling, you could maybe expose the unavailable compute time as steal time to the guest. Exposibg frequency in a virtual environment feels backwards. Alex > > I looked at the various Linux cpufreq drivers and they all seems to be table > based. Is it true ? > > For example the acpi cpufreq driver have 16 differents pstates at hand to lookup > in the pstate table and get the frequency. > > Given that guests can migrate from one hardware to a slightly different hardware > the table may become wrong after live migration. > > What would be the best hardware to emulate in order to pass an arbitrary > frequency to the guest ? > > Would a pvfreq paravirtualized QEMU hardware and a guest driver implementing > only the callbacks needed to read the frequency be a good idea ? > > Best regards > > Benoît > > ps: > I CC this mail to the other QEMU arch maintainers because the problem must be > the same everywhere where KVM run. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] cpufreq and QEMU guests 2013-09-16 14:39 ` Alexander Graf @ 2013-09-16 15:05 ` Benoît Canet 2013-09-16 15:32 ` Gleb Natapov 0 siblings, 1 reply; 8+ messages in thread From: Benoît Canet @ 2013-09-16 15:05 UTC (permalink / raw) To: Alexander Graf Cc: Benoît Canet, peter.maydell@linaro.org, gleb@redhat.com, viresh.kumar@linaro.org, qemu-devel@nongnu.org, cpufreq@vger.kernel.org, rjw@sisk.pl, pbonzini@redhat.com Le Monday 16 Sep 2013 à 09:39:10 (-0500), Alexander Graf a écrit : > > > Am 16.09.2013 um 07:15 schrieb Benoît Canet <benoit.canet@irqsave.net>: > > > > > Hello, > > > > I know a cloud provider worried about the fact that the /proc/cpuinfo of his > > guests give a bogus frequency to his customer. > > > > QEMU and the guests kernel currently have no way to reflect the host frequency > > changes to the guests. > > > > The customer compute intensive application then read this information and take > > wrong decisions. > > Why do they care about the frequency? Is it for scheduling workloads? The only other case I can think of would be the TSC and that should be fixed frequency these days. > > If it's scheduling, you could maybe expose the unavailable compute time as steal time to the guest. Exposibg frequency in a virtual environment feels backwards. The final customer have a compute intensive workload. At startup the code retrieve the cpu cache topology, the cpu model, and various informations including the guest cpu frequency before starting the compute job. The QEMU instance typicaly use -cpu host. The code inspects the cpu frequency has seen by the guests to choose the number of vms to instanciate to compute the given task. They even destroy and recreate some vms that would be underperforming to mitigate the high inter vm communication costs. Do you think the steal time trick would work for this ? Best regards Benoît > > Alex > > > > > I looked at the various Linux cpufreq drivers and they all seems to be table > > based. Is it true ? > > > > For example the acpi cpufreq driver have 16 differents pstates at hand to lookup > > in the pstate table and get the frequency. > > > > Given that guests can migrate from one hardware to a slightly different hardware > > the table may become wrong after live migration. > > > > What would be the best hardware to emulate in order to pass an arbitrary > > frequency to the guest ? > > > > Would a pvfreq paravirtualized QEMU hardware and a guest driver implementing > > only the callbacks needed to read the frequency be a good idea ? > > > > Best regards > > > > Benoît > > > > ps: > > I CC this mail to the other QEMU arch maintainers because the problem must be > > the same everywhere where KVM run. > -- > To unsubscribe from this list: send the line "unsubscribe cpufreq" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] cpufreq and QEMU guests 2013-09-16 15:05 ` Benoît Canet @ 2013-09-16 15:32 ` Gleb Natapov 2013-09-16 15:46 ` Benoît Canet 0 siblings, 1 reply; 8+ messages in thread From: Gleb Natapov @ 2013-09-16 15:32 UTC (permalink / raw) To: Benoît Canet Cc: peter.maydell@linaro.org, viresh.kumar@linaro.org, Alexander Graf, cpufreq@vger.kernel.org, qemu-devel@nongnu.org, rjw@sisk.pl, pbonzini@redhat.com On Mon, Sep 16, 2013 at 05:05:45PM +0200, Benoît Canet wrote: > Le Monday 16 Sep 2013 à 09:39:10 (-0500), Alexander Graf a écrit : > > > > > > Am 16.09.2013 um 07:15 schrieb Benoît Canet <benoit.canet@irqsave.net>: > > > > > > > > Hello, > > > > > > I know a cloud provider worried about the fact that the /proc/cpuinfo of his > > > guests give a bogus frequency to his customer. > > > > > > QEMU and the guests kernel currently have no way to reflect the host frequency > > > changes to the guests. > > > > > > The customer compute intensive application then read this information and take > > > wrong decisions. > > > > Why do they care about the frequency? Is it for scheduling workloads? The only other case I can think of would be the TSC and that should be fixed frequency these days. > > > > If it's scheduling, you could maybe expose the unavailable compute time as steal time to the guest. Exposibg frequency in a virtual environment feels backwards. > > The final customer have a compute intensive workload. > At startup the code retrieve the cpu cache topology, the cpu model, and various > informations including the guest cpu frequency before starting the compute job. > The QEMU instance typicaly use -cpu host. > > The code inspects the cpu frequency has seen by the guests to choose the number > of vms to instanciate to compute the given task. I am not sure I understand. They look at guest cpu frequency to estimate guest's performance? > They even destroy and recreate some vms that would be underperforming to > mitigate the high inter vm communication costs. > > Do you think the steal time trick would work for this ? > -- Gleb. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] cpufreq and QEMU guests 2013-09-16 15:32 ` Gleb Natapov @ 2013-09-16 15:46 ` Benoît Canet 2013-09-16 15:58 ` Gleb Natapov 0 siblings, 1 reply; 8+ messages in thread From: Benoît Canet @ 2013-09-16 15:46 UTC (permalink / raw) To: Gleb Natapov Cc: Benoît Canet, peter.maydell@linaro.org, viresh.kumar@linaro.org, Alexander Graf, cpufreq@vger.kernel.org, qemu-devel@nongnu.org, rjw@sisk.pl, pbonzini@redhat.com Le Monday 16 Sep 2013 à 18:32:39 (+0300), Gleb Natapov a écrit : > On Mon, Sep 16, 2013 at 05:05:45PM +0200, Benoît Canet wrote: > > Le Monday 16 Sep 2013 à 09:39:10 (-0500), Alexander Graf a écrit : > > > > > > > > > Am 16.09.2013 um 07:15 schrieb Benoît Canet <benoit.canet@irqsave.net>: > > > > > > > > > > > Hello, > > > > > > > > I know a cloud provider worried about the fact that the /proc/cpuinfo of his > > > > guests give a bogus frequency to his customer. > > > > > > > > QEMU and the guests kernel currently have no way to reflect the host frequency > > > > changes to the guests. > > > > > > > > The customer compute intensive application then read this information and take > > > > wrong decisions. > > > > > > Why do they care about the frequency? Is it for scheduling workloads? The only other case I can think of would be the TSC and that should be fixed frequency these days. > > > > > > If it's scheduling, you could maybe expose the unavailable compute time as steal time to the guest. Exposibg frequency in a virtual environment feels backwards. > > > > The final customer have a compute intensive workload. > > At startup the code retrieve the cpu cache topology, the cpu model, and various > > informations including the guest cpu frequency before starting the compute job. > > The QEMU instance typicaly use -cpu host. > > > > The code inspects the cpu frequency has seen by the guests to choose the number > > of vms to instanciate to compute the given task. > I am not sure I understand. They look at guest cpu frequency to estimate > guest's performance? Yes they take guest cpu count, model and frequency to estimate the performance of the guest. Next they cluster enough guests to be able to compute the job in a given time by using this estimate. Best regards Benoît > > > They even destroy and recreate some vms that would be underperforming to > > mitigate the high inter vm communication costs. > > > > Do you think the steal time trick would work for this ? > > > > -- > Gleb. > -- > To unsubscribe from this list: send the line "unsubscribe cpufreq" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] cpufreq and QEMU guests 2013-09-16 15:46 ` Benoît Canet @ 2013-09-16 15:58 ` Gleb Natapov 2013-09-16 18:42 ` Benoît Canet 0 siblings, 1 reply; 8+ messages in thread From: Gleb Natapov @ 2013-09-16 15:58 UTC (permalink / raw) To: Benoît Canet Cc: peter.maydell@linaro.org, viresh.kumar@linaro.org, Alexander Graf, cpufreq@vger.kernel.org, qemu-devel@nongnu.org, rjw@sisk.pl, pbonzini@redhat.com On Mon, Sep 16, 2013 at 05:46:04PM +0200, Benoît Canet wrote: > Le Monday 16 Sep 2013 à 18:32:39 (+0300), Gleb Natapov a écrit : > > On Mon, Sep 16, 2013 at 05:05:45PM +0200, Benoît Canet wrote: > > > Le Monday 16 Sep 2013 à 09:39:10 (-0500), Alexander Graf a écrit : > > > > > > > > > > > > Am 16.09.2013 um 07:15 schrieb Benoît Canet <benoit.canet@irqsave.net>: > > > > > > > > > > > > > > Hello, > > > > > > > > > > I know a cloud provider worried about the fact that the /proc/cpuinfo of his > > > > > guests give a bogus frequency to his customer. > > > > > > > > > > QEMU and the guests kernel currently have no way to reflect the host frequency > > > > > changes to the guests. > > > > > > > > > > The customer compute intensive application then read this information and take > > > > > wrong decisions. > > > > > > > > Why do they care about the frequency? Is it for scheduling workloads? The only other case I can think of would be the TSC and that should be fixed frequency these days. > > > > > > > > If it's scheduling, you could maybe expose the unavailable compute time as steal time to the guest. Exposibg frequency in a virtual environment feels backwards. > > > > > > The final customer have a compute intensive workload. > > > At startup the code retrieve the cpu cache topology, the cpu model, and various > > > informations including the guest cpu frequency before starting the compute job. > > > The QEMU instance typicaly use -cpu host. > > > > > > The code inspects the cpu frequency has seen by the guests to choose the number > > > of vms to instanciate to compute the given task. > > I am not sure I understand. They look at guest cpu frequency to estimate > > guest's performance? > > Yes they take guest cpu count, model and frequency to estimate the performance > of the guest. > Next they cluster enough guests to be able to compute the job in a given time by > using this estimate. > They do it wrong. They should take guest cpu count, host cpu model and frequency, pcpu/vcpu over commit (if any), guest/host memory overcommit (if any) and estimate performance based on this. For pure computational performance guest core performance should be close to host core performance if there is not cpu/memory overcommit. With a lot of IO things become more complicated. -- Gleb. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] cpufreq and QEMU guests 2013-09-16 15:58 ` Gleb Natapov @ 2013-09-16 18:42 ` Benoît Canet 2013-09-17 8:58 ` Gleb Natapov 0 siblings, 1 reply; 8+ messages in thread From: Benoît Canet @ 2013-09-16 18:42 UTC (permalink / raw) To: Gleb Natapov Cc: Benoît Canet, peter.maydell@linaro.org, viresh.kumar@linaro.org, qemu-devel@nongnu.org, cpufreq@vger.kernel.org, Alexander Graf, rjw@sisk.pl, pbonzini@redhat.com Le Monday 16 Sep 2013 à 18:58:40 (+0300), Gleb Natapov a écrit : > On Mon, Sep 16, 2013 at 05:46:04PM +0200, Benoît Canet wrote: > > Le Monday 16 Sep 2013 à 18:32:39 (+0300), Gleb Natapov a écrit : > > > On Mon, Sep 16, 2013 at 05:05:45PM +0200, Benoît Canet wrote: > > > > Le Monday 16 Sep 2013 à 09:39:10 (-0500), Alexander Graf a écrit : > > > > > > > > > > > > > > > Am 16.09.2013 um 07:15 schrieb Benoît Canet <benoit.canet@irqsave.net>: > > > > > > > > > > > > > > > > > Hello, > > > > > > > > > > > > I know a cloud provider worried about the fact that the /proc/cpuinfo of his > > > > > > guests give a bogus frequency to his customer. > > > > > > > > > > > > QEMU and the guests kernel currently have no way to reflect the host frequency > > > > > > changes to the guests. > > > > > > > > > > > > The customer compute intensive application then read this information and take > > > > > > wrong decisions. > > > > > > > > > > Why do they care about the frequency? Is it for scheduling workloads? The only other case I can think of would be the TSC and that should be fixed frequency these days. > > > > > > > > > > If it's scheduling, you could maybe expose the unavailable compute time as steal time to the guest. Exposibg frequency in a virtual environment feels backwards. > > > > > > > > The final customer have a compute intensive workload. > > > > At startup the code retrieve the cpu cache topology, the cpu model, and various > > > > informations including the guest cpu frequency before starting the compute job. > > > > The QEMU instance typicaly use -cpu host. > > > > > > > > The code inspects the cpu frequency has seen by the guests to choose the number > > > > of vms to instanciate to compute the given task. > > > I am not sure I understand. They look at guest cpu frequency to estimate > > > guest's performance? > > > > Yes they take guest cpu count, model and frequency to estimate the performance > > of the guest. > > Next they cluster enough guests to be able to compute the job in a given time by > > using this estimate. > > > They do it wrong. They should take guest cpu count, host cpu model and > frequency, pcpu/vcpu over commit (if any), guest/host memory overcommit > (if any) and estimate performance based on this. For pure computational > performance guest core performance should be close to host core > performance if there is not cpu/memory overcommit. With a lot of IO > things become more complicated. I ommited to write some details of the use case. The cloud is a Amazon compatible one this means there is no guest agent in the guest to help retrieve the host frequency and model. Also the AWS APIs don't provide a way to communicate the host CPU infos to the program responsible of the vm orchestrations. So the only interface to access the host cpu info is QEMU and it's started with -cpu host to passthrough the cpu model to the guest. What hurt the final customer badly is that the guest /proc/cpuinfo see the regular max frequency of the host cpu but won't see the turbo frequency or a scaled down one. Best regards Benoît > > -- > Gleb. > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] cpufreq and QEMU guests 2013-09-16 18:42 ` Benoît Canet @ 2013-09-17 8:58 ` Gleb Natapov 0 siblings, 0 replies; 8+ messages in thread From: Gleb Natapov @ 2013-09-17 8:58 UTC (permalink / raw) To: Benoît Canet Cc: peter.maydell@linaro.org, viresh.kumar@linaro.org, qemu-devel@nongnu.org, cpufreq@vger.kernel.org, Alexander Graf, rjw@sisk.pl, pbonzini@redhat.com On Mon, Sep 16, 2013 at 08:42:58PM +0200, Benoît Canet wrote: > Le Monday 16 Sep 2013 à 18:58:40 (+0300), Gleb Natapov a écrit : > > On Mon, Sep 16, 2013 at 05:46:04PM +0200, Benoît Canet wrote: > > > Le Monday 16 Sep 2013 à 18:32:39 (+0300), Gleb Natapov a écrit : > > > > On Mon, Sep 16, 2013 at 05:05:45PM +0200, Benoît Canet wrote: > > > > > Le Monday 16 Sep 2013 à 09:39:10 (-0500), Alexander Graf a écrit : > > > > > > > > > > > > > > > > > > Am 16.09.2013 um 07:15 schrieb Benoît Canet <benoit.canet@irqsave.net>: > > > > > > > > > > > > > > > > > > > > Hello, > > > > > > > > > > > > > > I know a cloud provider worried about the fact that the /proc/cpuinfo of his > > > > > > > guests give a bogus frequency to his customer. > > > > > > > > > > > > > > QEMU and the guests kernel currently have no way to reflect the host frequency > > > > > > > changes to the guests. > > > > > > > > > > > > > > The customer compute intensive application then read this information and take > > > > > > > wrong decisions. > > > > > > > > > > > > Why do they care about the frequency? Is it for scheduling workloads? The only other case I can think of would be the TSC and that should be fixed frequency these days. > > > > > > > > > > > > If it's scheduling, you could maybe expose the unavailable compute time as steal time to the guest. Exposibg frequency in a virtual environment feels backwards. > > > > > > > > > > The final customer have a compute intensive workload. > > > > > At startup the code retrieve the cpu cache topology, the cpu model, and various > > > > > informations including the guest cpu frequency before starting the compute job. > > > > > The QEMU instance typicaly use -cpu host. > > > > > > > > > > The code inspects the cpu frequency has seen by the guests to choose the number > > > > > of vms to instanciate to compute the given task. > > > > I am not sure I understand. They look at guest cpu frequency to estimate > > > > guest's performance? > > > > > > Yes they take guest cpu count, model and frequency to estimate the performance > > > of the guest. > > > Next they cluster enough guests to be able to compute the job in a given time by > > > using this estimate. > > > > > They do it wrong. They should take guest cpu count, host cpu model and > > frequency, pcpu/vcpu over commit (if any), guest/host memory overcommit > > (if any) and estimate performance based on this. For pure computational > > performance guest core performance should be close to host core > > performance if there is not cpu/memory overcommit. With a lot of IO > > things become more complicated. > > I ommited to write some details of the use case. > > The cloud is a Amazon compatible one this means there is no guest agent in the > guest to help retrieve the host frequency and model. > > Also the AWS APIs don't provide a way to communicate the host CPU infos to the > program responsible of the vm orchestrations. > > So the only interface to access the host cpu info is QEMU and it's started with > -cpu host to passthrough the cpu model to the guest. > Why are they sure they are started with "-cpu host"? Do they know if host is overcommitted or guest's vcpu usage is restricted by any other means? > What hurt the final customer badly is that the guest /proc/cpuinfo see the > regular max frequency of the host cpu but won't see the turbo frequency or a > scaled down one. > What he sees is host tsc frequency of the cpu a guest was booted on [1] which should be adequate to estimate performance if guest is not migrated. The frequency host cpu is running on at any given moment is out of guest control and depend on host frequency governor and load. [1] the value comes from host, for not constant tsc hosts this is max possible frequency -- Gleb. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2013-09-17 8:59 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-09-16 12:15 [Qemu-devel] cpufreq and QEMU guests Benoît Canet 2013-09-16 14:39 ` Alexander Graf 2013-09-16 15:05 ` Benoît Canet 2013-09-16 15:32 ` Gleb Natapov 2013-09-16 15:46 ` Benoît Canet 2013-09-16 15:58 ` Gleb Natapov 2013-09-16 18:42 ` Benoît Canet 2013-09-17 8:58 ` Gleb Natapov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).