8bf00a529967dafbbb210b377c38a15834d1e979

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

* 8bf00a529967dafbbb210b377c38a15834d1e979 - performance regression?
@ 2013-10-31  0:21 Michael S. Tsirkin
  2013-10-31  7:48 ` Gleb Natapov
  2013-11-04 19:30 ` Michael S. Tsirkin
  0 siblings, 2 replies; 17+ messages in thread
From: Michael S. Tsirkin @ 2013-10-31  0:21 UTC (permalink / raw)
  To: kvm; +Cc: gleb

commit 8bf00a529967dafbbb210b377c38a15834d1e979:
"    KVM: VMX: add support for switching of PERF_GLOBAL_CTRL " was
as far as I can tell supposed to bring about performance improvement
on hardware that supports it?
Instead it seems to make the typical case (not running guest
under perf) a bit slower than it used to be.
the cost of VMexit goes up by about 50 cycles
on sandy bridge where the optimization in question
actually is activated.

Why that's not a large regression, it's a far cry from
actually helping performance.

So are we better off not using this feature?
What kind of workload is improved by this change?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 8bf00a529967dafbbb210b377c38a15834d1e979 - performance regression?
  2013-10-31  0:21 8bf00a529967dafbbb210b377c38a15834d1e979 - performance regression? Michael S. Tsirkin
@ 2013-10-31  7:48 ` Gleb Natapov
  2013-11-04 20:11   ` Michael S. Tsirkin
  2013-11-04 19:30 ` Michael S. Tsirkin
  1 sibling, 1 reply; 17+ messages in thread
From: Gleb Natapov @ 2013-10-31  7:48 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kvm

On Thu, Oct 31, 2013 at 02:21:46AM +0200, Michael S. Tsirkin wrote:
> commit 8bf00a529967dafbbb210b377c38a15834d1e979:
> "    KVM: VMX: add support for switching of PERF_GLOBAL_CTRL " was
> as far as I can tell supposed to bring about performance improvement
> on hardware that supports it?
No, it (and commits after it) supposed to fix a bug which it did.

> Instead it seems to make the typical case (not running guest
> under perf) a bit slower than it used to be.
> the cost of VMexit goes up by about 50 cycles
> on sandy bridge where the optimization in question
> actually is activated.
>
You seams to be confused. 8bf00a529967dafbbb210 adds support for special
PERF_GLOBAL_CTRL switching, but does not add code to switch anything,
so the commit itself is a nop. Next commit d7cd97964ba6d70c5
uses add_atomic_switch_msr()/clear_atomic_switch_msr()
to switch PERF_GLOBAL_CTRL, but it does not depend on
VM_(ENTRY|EXIT)_LOAD_IA32_PERF_GLOBAL_CTRL support which previous
patch added, if the support is not there the switching will use
another mechanism which is even slower. So MSR is switched no matter
if PERF_GLOBAL_CTRL is enabled or not.  If you saying that using
VM_(ENTRY|EXIT)_LOAD_IA32_PERF_GLOBAL_CTRL is slower than using generic
vmentry MSR switching then I pretty much doubt it since the only purpose
of special VM_(ENTRY|EXIT)_LOAD_IA32_PERF_GLOBAL_CTRL is to be faster
then general mechanism.

--
			Gleb.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 8bf00a529967dafbbb210b377c38a15834d1e979 - performance regression?
  2013-10-31  0:21 8bf00a529967dafbbb210b377c38a15834d1e979 - performance regression? Michael S. Tsirkin
  2013-10-31  7:48 ` Gleb Natapov
@ 2013-11-04 19:30 ` Michael S. Tsirkin
  2013-11-04 19:39   ` Gleb Natapov
  1 sibling, 1 reply; 17+ messages in thread
From: Michael S. Tsirkin @ 2013-11-04 19:30 UTC (permalink / raw)
  To: kvm; +Cc: gleb

On Thu, Oct 31, 2013 at 02:21:46AM +0200, Michael S. Tsirkin wrote:
> commit 8bf00a529967dafbbb210b377c38a15834d1e979:
> "    KVM: VMX: add support for switching of PERF_GLOBAL_CTRL " was
> as far as I can tell supposed to bring about performance improvement
> on hardware that supports it?
> Instead it seems to make the typical case (not running guest
> under perf) a bit slower than it used to be.
> the cost of VMexit goes up by about 50 cycles
> on sandy bridge where the optimization in question
> actually is activated.
> 
> Why that's not a large regression, it's a far cry from
> actually helping performance.
> 
> So are we better off not using this feature?
> What kind of workload is improved by this change?

Ping.
Let's revert 8bf00a529967dafbbb210b377c38a15834d1e979?
I see a small performance gain from reverting it ...

-- 
MST

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 8bf00a529967dafbbb210b377c38a15834d1e979 - performance regression?
  2013-11-04 19:30 ` Michael S. Tsirkin
@ 2013-11-04 19:39   ` Gleb Natapov
  0 siblings, 0 replies; 17+ messages in thread
From: Gleb Natapov @ 2013-11-04 19:39 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kvm

On Mon, Nov 04, 2013 at 09:30:42PM +0200, Michael S. Tsirkin wrote:
> On Thu, Oct 31, 2013 at 02:21:46AM +0200, Michael S. Tsirkin wrote:
> > commit 8bf00a529967dafbbb210b377c38a15834d1e979:
> > "    KVM: VMX: add support for switching of PERF_GLOBAL_CTRL " was
> > as far as I can tell supposed to bring about performance improvement
> > on hardware that supports it?
> > Instead it seems to make the typical case (not running guest
> > under perf) a bit slower than it used to be.
> > the cost of VMexit goes up by about 50 cycles
> > on sandy bridge where the optimization in question
> > actually is activated.
> > 
> > Why that's not a large regression, it's a far cry from
> > actually helping performance.
> > 
> > So are we better off not using this feature?
> > What kind of workload is improved by this change?
> 
> Ping.
I answer it already.

--
			Gleb.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 8bf00a529967dafbbb210b377c38a15834d1e979 - performance regression?
  2013-10-31  7:48 ` Gleb Natapov
@ 2013-11-04 20:11   ` Michael S. Tsirkin
  2013-11-04 20:13     ` Gleb Natapov
  0 siblings, 1 reply; 17+ messages in thread
From: Michael S. Tsirkin @ 2013-11-04 20:11 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: kvm

On Thu, Oct 31, 2013 at 09:48:08AM +0200, Gleb Natapov wrote:
> On Thu, Oct 31, 2013 at 02:21:46AM +0200, Michael S. Tsirkin wrote:
> > commit 8bf00a529967dafbbb210b377c38a15834d1e979:
> > "    KVM: VMX: add support for switching of PERF_GLOBAL_CTRL " was
> > as far as I can tell supposed to bring about performance improvement
> > on hardware that supports it?
> No, it (and commits after it) supposed to fix a bug which it did.
> 
> > Instead it seems to make the typical case (not running guest
> > under perf) a bit slower than it used to be.
> > the cost of VMexit goes up by about 50 cycles
> > on sandy bridge where the optimization in question
> > actually is activated.
> >
> You seams to be confused. 8bf00a529967dafbbb210 adds support for special
> PERF_GLOBAL_CTRL switching, but does not add code to switch anything,
> so the commit itself is a nop.

It does add code to add_atomic_switch_msr.

> Next commit d7cd97964ba6d70c5
> uses add_atomic_switch_msr()/clear_atomic_switch_msr()
> to switch PERF_GLOBAL_CTRL, but it does not depend on
> VM_(ENTRY|EXIT)_LOAD_IA32_PERF_GLOBAL_CTRL support which previous
> patch added, if the support is not there the switching will use
> another mechanism which is even slower. So MSR is switched no matter
> if PERF_GLOBAL_CTRL is enabled or not.  If you saying that using
> VM_(ENTRY|EXIT)_LOAD_IA32_PERF_GLOBAL_CTRL is slower than using generic
> vmentry MSR switching then I pretty much doubt it since the only purpose
> of special VM_(ENTRY|EXIT)_LOAD_IA32_PERF_GLOBAL_CTRL is to be faster
> then general mechanism.
> 
> --
> 			Gleb.
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 8bf00a529967dafbbb210b377c38a15834d1e979 - performance regression?
  2013-11-04 20:11   ` Michael S. Tsirkin
@ 2013-11-04 20:13     ` Gleb Natapov
  2013-11-04 20:33       ` Michael S. Tsirkin
  0 siblings, 1 reply; 17+ messages in thread
From: Gleb Natapov @ 2013-11-04 20:13 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kvm

On Mon, Nov 04, 2013 at 10:11:33PM +0200, Michael S. Tsirkin wrote:
> On Thu, Oct 31, 2013 at 09:48:08AM +0200, Gleb Natapov wrote:
> > On Thu, Oct 31, 2013 at 02:21:46AM +0200, Michael S. Tsirkin wrote:
> > > commit 8bf00a529967dafbbb210b377c38a15834d1e979:
> > > "    KVM: VMX: add support for switching of PERF_GLOBAL_CTRL " was
> > > as far as I can tell supposed to bring about performance improvement
> > > on hardware that supports it?
> > No, it (and commits after it) supposed to fix a bug which it did.
> > 
> > > Instead it seems to make the typical case (not running guest
> > > under perf) a bit slower than it used to be.
> > > the cost of VMexit goes up by about 50 cycles
> > > on sandy bridge where the optimization in question
> > > actually is activated.
> > >
> > You seams to be confused. 8bf00a529967dafbbb210 adds support for special
> > PERF_GLOBAL_CTRL switching, but does not add code to switch anything,
> > so the commit itself is a nop.
> 
> It does add code to add_atomic_switch_msr.
> 
So what? You do not read what I wrote.

> > Next commit d7cd97964ba6d70c5
> > uses add_atomic_switch_msr()/clear_atomic_switch_msr()
> > to switch PERF_GLOBAL_CTRL, but it does not depend on
> > VM_(ENTRY|EXIT)_LOAD_IA32_PERF_GLOBAL_CTRL support which previous
> > patch added, if the support is not there the switching will use
> > another mechanism which is even slower. So MSR is switched no matter
> > if PERF_GLOBAL_CTRL is enabled or not.  If you saying that using
> > VM_(ENTRY|EXIT)_LOAD_IA32_PERF_GLOBAL_CTRL is slower than using generic
> > vmentry MSR switching then I pretty much doubt it since the only purpose
> > of special VM_(ENTRY|EXIT)_LOAD_IA32_PERF_GLOBAL_CTRL is to be faster
> > then general mechanism.
> > 
> > --
> > 			Gleb.
> > --
> > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
			Gleb.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 8bf00a529967dafbbb210b377c38a15834d1e979 - performance regression?
  2013-11-04 20:13     ` Gleb Natapov
@ 2013-11-04 20:33       ` Michael S. Tsirkin
  2013-11-04 20:44         ` Gleb Natapov
  0 siblings, 1 reply; 17+ messages in thread
From: Michael S. Tsirkin @ 2013-11-04 20:33 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: kvm

On Mon, Nov 04, 2013 at 10:13:39PM +0200, Gleb Natapov wrote:
> On Mon, Nov 04, 2013 at 10:11:33PM +0200, Michael S. Tsirkin wrote:
> > On Thu, Oct 31, 2013 at 09:48:08AM +0200, Gleb Natapov wrote:
> > > On Thu, Oct 31, 2013 at 02:21:46AM +0200, Michael S. Tsirkin wrote:
> > > > commit 8bf00a529967dafbbb210b377c38a15834d1e979:
> > > > "    KVM: VMX: add support for switching of PERF_GLOBAL_CTRL " was
> > > > as far as I can tell supposed to bring about performance improvement
> > > > on hardware that supports it?
> > > No, it (and commits after it) supposed to fix a bug which it did.
> > > 
> > > > Instead it seems to make the typical case (not running guest
> > > > under perf) a bit slower than it used to be.
> > > > the cost of VMexit goes up by about 50 cycles
> > > > on sandy bridge where the optimization in question
> > > > actually is activated.
> > > >
> > > You seams to be confused. 8bf00a529967dafbbb210 adds support for special
> > > PERF_GLOBAL_CTRL switching, but does not add code to switch anything,
> > > so the commit itself is a nop.
> > 
> > It does add code to add_atomic_switch_msr.
> > 
> So what? You do not read what I wrote.


It's simple: if I revert 8bf00a529967dafbbb210 then exit latency
is reduced.
You seem to tell me it should be a nop, but in practice it isn't.

> > > Next commit d7cd97964ba6d70c5
> > > uses add_atomic_switch_msr()/clear_atomic_switch_msr()
> > > to switch PERF_GLOBAL_CTRL, but it does not depend on
> > > VM_(ENTRY|EXIT)_LOAD_IA32_PERF_GLOBAL_CTRL support which previous
> > > patch added, if the support is not there the switching will use
> > > another mechanism which is even slower. So MSR is switched no matter
> > > if PERF_GLOBAL_CTRL is enabled or not.  If you saying that using
> > > VM_(ENTRY|EXIT)_LOAD_IA32_PERF_GLOBAL_CTRL is slower than using generic
> > > vmentry MSR switching then I pretty much doubt it since the only purpose
> > > of special VM_(ENTRY|EXIT)_LOAD_IA32_PERF_GLOBAL_CTRL is to be faster
> > > then general mechanism.
> > > 
> > > --
> > > 			Gleb.
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> --
> 			Gleb.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 8bf00a529967dafbbb210b377c38a15834d1e979 - performance regression?
  2013-11-04 20:33       ` Michael S. Tsirkin
@ 2013-11-04 20:44         ` Gleb Natapov
  2013-11-05 10:18           ` Michael S. Tsirkin
  0 siblings, 1 reply; 17+ messages in thread
From: Gleb Natapov @ 2013-11-04 20:44 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kvm

On Mon, Nov 04, 2013 at 10:33:57PM +0200, Michael S. Tsirkin wrote:
> On Mon, Nov 04, 2013 at 10:13:39PM +0200, Gleb Natapov wrote:
> > On Mon, Nov 04, 2013 at 10:11:33PM +0200, Michael S. Tsirkin wrote:
> > > On Thu, Oct 31, 2013 at 09:48:08AM +0200, Gleb Natapov wrote:
> > > > On Thu, Oct 31, 2013 at 02:21:46AM +0200, Michael S. Tsirkin wrote:
> > > > > commit 8bf00a529967dafbbb210b377c38a15834d1e979:
> > > > > "    KVM: VMX: add support for switching of PERF_GLOBAL_CTRL " was
> > > > > as far as I can tell supposed to bring about performance improvement
> > > > > on hardware that supports it?
> > > > No, it (and commits after it) supposed to fix a bug which it did.
> > > > 
> > > > > Instead it seems to make the typical case (not running guest
> > > > > under perf) a bit slower than it used to be.
> > > > > the cost of VMexit goes up by about 50 cycles
> > > > > on sandy bridge where the optimization in question
> > > > > actually is activated.
> > > > >
> > > > You seams to be confused. 8bf00a529967dafbbb210 adds support for special
> > > > PERF_GLOBAL_CTRL switching, but does not add code to switch anything,
> > > > so the commit itself is a nop.
> > > 
> > > It does add code to add_atomic_switch_msr.
> > > 
> > So what? You do not read what I wrote.
> 
> 
> It's simple: if I revert 8bf00a529967dafbbb210 then exit latency
> is reduced.
> You seem to tell me it should be a nop, but in practice it isn't.
> 

No, if you read below I am saying that it looks like you are claiming that
generic msr switch mechanism is faster and I am not buying that. If you
believe this to be the case ask Intel for explanation. Your claim about
"not running guest under perf" is even stranger since in this case no msr
switch should happen regardless of the aforementioned commit (unless guest
or host runs nmi watchdog, but then switch will happen no matter if perf
is running, so again not running guest under perf" does not make sense).
So, in short, you do not really know where the slow down is coming
from. My guess is that it comes from the fact that we unconditionally
call clear_atomic_switch_msr() in atomic_switch_perf_msrs(), but then
fix that instead of reverting the patch.

> > > > Next commit d7cd97964ba6d70c5
> > > > uses add_atomic_switch_msr()/clear_atomic_switch_msr()
> > > > to switch PERF_GLOBAL_CTRL, but it does not depend on
> > > > VM_(ENTRY|EXIT)_LOAD_IA32_PERF_GLOBAL_CTRL support which previous
> > > > patch added, if the support is not there the switching will use
> > > > another mechanism which is even slower. So MSR is switched no matter
> > > > if PERF_GLOBAL_CTRL is enabled or not.  If you saying that using
> > > > VM_(ENTRY|EXIT)_LOAD_IA32_PERF_GLOBAL_CTRL is slower than using generic
> > > > vmentry MSR switching then I pretty much doubt it since the only purpose
> > > > of special VM_(ENTRY|EXIT)_LOAD_IA32_PERF_GLOBAL_CTRL is to be faster
> > > > then general mechanism.
> > > > 
> > > > --
> > > > 			Gleb.
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > > > the body of a message to majordomo@vger.kernel.org
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> > --
> > 			Gleb.

--
			Gleb.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 8bf00a529967dafbbb210b377c38a15834d1e979 - performance regression?
  2013-11-04 20:44         ` Gleb Natapov
@ 2013-11-05 10:18           ` Michael S. Tsirkin
  2013-11-05 10:22             ` Gleb Natapov
  0 siblings, 1 reply; 17+ messages in thread
From: Michael S. Tsirkin @ 2013-11-05 10:18 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: kvm

On Mon, Nov 04, 2013 at 10:44:43PM +0200, Gleb Natapov wrote:
> On Mon, Nov 04, 2013 at 10:33:57PM +0200, Michael S. Tsirkin wrote:
> > On Mon, Nov 04, 2013 at 10:13:39PM +0200, Gleb Natapov wrote:
> > > On Mon, Nov 04, 2013 at 10:11:33PM +0200, Michael S. Tsirkin wrote:
> > > > On Thu, Oct 31, 2013 at 09:48:08AM +0200, Gleb Natapov wrote:
> > > > > On Thu, Oct 31, 2013 at 02:21:46AM +0200, Michael S. Tsirkin wrote:
> > > > > > commit 8bf00a529967dafbbb210b377c38a15834d1e979:
> > > > > > "    KVM: VMX: add support for switching of PERF_GLOBAL_CTRL " was
> > > > > > as far as I can tell supposed to bring about performance improvement
> > > > > > on hardware that supports it?
> > > > > No, it (and commits after it) supposed to fix a bug which it did.
> > > > > 
> > > > > > Instead it seems to make the typical case (not running guest
> > > > > > under perf) a bit slower than it used to be.
> > > > > > the cost of VMexit goes up by about 50 cycles
> > > > > > on sandy bridge where the optimization in question
> > > > > > actually is activated.
> > > > > >
> > > > > You seams to be confused. 8bf00a529967dafbbb210 adds support for special
> > > > > PERF_GLOBAL_CTRL switching, but does not add code to switch anything,
> > > > > so the commit itself is a nop.
> > > > 
> > > > It does add code to add_atomic_switch_msr.
> > > > 
> > > So what? You do not read what I wrote.
> > 
> > 
> > It's simple: if I revert 8bf00a529967dafbbb210 then exit latency
> > is reduced.
> > You seem to tell me it should be a nop, but in practice it isn't.
> > 
> 
> No, if you read below I am saying that it looks like you are claiming that
> generic msr switch mechanism is faster and I am not buying that. If you
> believe this to be the case ask Intel for explanation. Your claim about
> "not running guest under perf" is even stranger since in this case no msr
> switch should happen regardless of the aforementioned commit (unless guest
> or host runs nmi watchdog, but then switch will happen no matter if perf
> is running, so again not running guest under perf" does not make sense).
> So, in short, you do not really know where the slow down is coming
> from.

That's true.

> My guess is that it comes from the fact that we unconditionally
> call clear_atomic_switch_msr() in atomic_switch_perf_msrs(), but then
> fix that instead of reverting the patch.

We can try, but reverting is much simpler, it removes code instead of
adding code.  Do you know which workload is actually improved by
8bf00a529967dafbbb210?

> > > > > Next commit d7cd97964ba6d70c5
> > > > > uses add_atomic_switch_msr()/clear_atomic_switch_msr()
> > > > > to switch PERF_GLOBAL_CTRL, but it does not depend on
> > > > > VM_(ENTRY|EXIT)_LOAD_IA32_PERF_GLOBAL_CTRL support which previous
> > > > > patch added, if the support is not there the switching will use
> > > > > another mechanism which is even slower. So MSR is switched no matter
> > > > > if PERF_GLOBAL_CTRL is enabled or not.  If you saying that using
> > > > > VM_(ENTRY|EXIT)_LOAD_IA32_PERF_GLOBAL_CTRL is slower than using generic
> > > > > vmentry MSR switching then I pretty much doubt it since the only purpose
> > > > > of special VM_(ENTRY|EXIT)_LOAD_IA32_PERF_GLOBAL_CTRL is to be faster
> > > > > then general mechanism.
> > > > > 
> > > > > --
> > > > > 			Gleb.
> > > > > --
> > > > > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > > > > the body of a message to majordomo@vger.kernel.org
> > > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > 
> > > --
> > > 			Gleb.
> 
> --
> 			Gleb.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 8bf00a529967dafbbb210b377c38a15834d1e979 - performance regression?
  2013-11-05 10:18           ` Michael S. Tsirkin
@ 2013-11-05 10:22             ` Gleb Natapov
  2013-11-05 11:20               ` Gleb Natapov
  0 siblings, 1 reply; 17+ messages in thread
From: Gleb Natapov @ 2013-11-05 10:22 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kvm

On Tue, Nov 05, 2013 at 12:18:57PM +0200, Michael S. Tsirkin wrote:
> On Mon, Nov 04, 2013 at 10:44:43PM +0200, Gleb Natapov wrote:
> > On Mon, Nov 04, 2013 at 10:33:57PM +0200, Michael S. Tsirkin wrote:
> > > On Mon, Nov 04, 2013 at 10:13:39PM +0200, Gleb Natapov wrote:
> > > > On Mon, Nov 04, 2013 at 10:11:33PM +0200, Michael S. Tsirkin wrote:
> > > > > On Thu, Oct 31, 2013 at 09:48:08AM +0200, Gleb Natapov wrote:
> > > > > > On Thu, Oct 31, 2013 at 02:21:46AM +0200, Michael S. Tsirkin wrote:
> > > > > > > commit 8bf00a529967dafbbb210b377c38a15834d1e979:
> > > > > > > "    KVM: VMX: add support for switching of PERF_GLOBAL_CTRL " was
> > > > > > > as far as I can tell supposed to bring about performance improvement
> > > > > > > on hardware that supports it?
> > > > > > No, it (and commits after it) supposed to fix a bug which it did.
> > > > > > 
> > > > > > > Instead it seems to make the typical case (not running guest
> > > > > > > under perf) a bit slower than it used to be.
> > > > > > > the cost of VMexit goes up by about 50 cycles
> > > > > > > on sandy bridge where the optimization in question
> > > > > > > actually is activated.
> > > > > > >
> > > > > > You seams to be confused. 8bf00a529967dafbbb210 adds support for special
> > > > > > PERF_GLOBAL_CTRL switching, but does not add code to switch anything,
> > > > > > so the commit itself is a nop.
> > > > > 
> > > > > It does add code to add_atomic_switch_msr.
> > > > > 
> > > > So what? You do not read what I wrote.
> > > 
> > > 
> > > It's simple: if I revert 8bf00a529967dafbbb210 then exit latency
> > > is reduced.
> > > You seem to tell me it should be a nop, but in practice it isn't.
> > > 
> > 
> > No, if you read below I am saying that it looks like you are claiming that
> > generic msr switch mechanism is faster and I am not buying that. If you
> > believe this to be the case ask Intel for explanation. Your claim about
> > "not running guest under perf" is even stranger since in this case no msr
> > switch should happen regardless of the aforementioned commit (unless guest
> > or host runs nmi watchdog, but then switch will happen no matter if perf
> > is running, so again not running guest under perf" does not make sense).
> > So, in short, you do not really know where the slow down is coming
> > from.
> 
> That's true.
> 
Then dig dipper.

> > My guess is that it comes from the fact that we unconditionally
> > call clear_atomic_switch_msr() in atomic_switch_perf_msrs(), but then
> > fix that instead of reverting the patch.
> 
> We can try, but reverting is much simpler, it removes code instead of
> adding code. 
Well, this is such absurd statement I do not really know what to say :)

>               Do you know which workload is actually improved by
> 8bf00a529967dafbbb210?
> 
Switching PERF_GLOBAL_CTRL when it needs to be switched.

--
			Gleb.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 8bf00a529967dafbbb210b377c38a15834d1e979 - performance regression?
  2013-11-05 10:22             ` Gleb Natapov
@ 2013-11-05 11:20               ` Gleb Natapov
  2013-11-05 16:36                 ` Michael S. Tsirkin
  2013-11-18 19:04                 ` Michael S. Tsirkin
  0 siblings, 2 replies; 17+ messages in thread
From: Gleb Natapov @ 2013-11-05 11:20 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kvm

On Tue, Nov 05, 2013 at 12:22:49PM +0200, Gleb Natapov wrote:
> On Tue, Nov 05, 2013 at 12:18:57PM +0200, Michael S. Tsirkin wrote:
> > On Mon, Nov 04, 2013 at 10:44:43PM +0200, Gleb Natapov wrote:
> > > On Mon, Nov 04, 2013 at 10:33:57PM +0200, Michael S. Tsirkin wrote:
> > > > On Mon, Nov 04, 2013 at 10:13:39PM +0200, Gleb Natapov wrote:
> > > > > On Mon, Nov 04, 2013 at 10:11:33PM +0200, Michael S. Tsirkin wrote:
> > > > > > On Thu, Oct 31, 2013 at 09:48:08AM +0200, Gleb Natapov wrote:
> > > > > > > On Thu, Oct 31, 2013 at 02:21:46AM +0200, Michael S. Tsirkin wrote:
> > > > > > > > commit 8bf00a529967dafbbb210b377c38a15834d1e979:
> > > > > > > > "    KVM: VMX: add support for switching of PERF_GLOBAL_CTRL " was
> > > > > > > > as far as I can tell supposed to bring about performance improvement
> > > > > > > > on hardware that supports it?
> > > > > > > No, it (and commits after it) supposed to fix a bug which it did.
> > > > > > > 
> > > > > > > > Instead it seems to make the typical case (not running guest
> > > > > > > > under perf) a bit slower than it used to be.
> > > > > > > > the cost of VMexit goes up by about 50 cycles
> > > > > > > > on sandy bridge where the optimization in question
> > > > > > > > actually is activated.
> > > > > > > >
> > > > > > > You seams to be confused. 8bf00a529967dafbbb210 adds support for special
> > > > > > > PERF_GLOBAL_CTRL switching, but does not add code to switch anything,
> > > > > > > so the commit itself is a nop.
> > > > > > 
> > > > > > It does add code to add_atomic_switch_msr.
> > > > > > 
> > > > > So what? You do not read what I wrote.
> > > > 
> > > > 
> > > > It's simple: if I revert 8bf00a529967dafbbb210 then exit latency
> > > > is reduced.
> > > > You seem to tell me it should be a nop, but in practice it isn't.
> > > > 
> > > 
> > > No, if you read below I am saying that it looks like you are claiming that
> > > generic msr switch mechanism is faster and I am not buying that. If you
> > > believe this to be the case ask Intel for explanation. Your claim about
> > > "not running guest under perf" is even stranger since in this case no msr
> > > switch should happen regardless of the aforementioned commit (unless guest
> > > or host runs nmi watchdog, but then switch will happen no matter if perf
> > > is running, so again not running guest under perf" does not make sense).
> > > So, in short, you do not really know where the slow down is coming
> > > from.
> > 
> > That's true.
> > 
> Then dig dipper.
> 
So quick and dirty patch to not needlessly write into VM_ENTRY_CONTROLS
when no PERF_GLOBAL_CTRL switching needed removes all the overhead. But we
probably need more generic code to shadow entire VM_ENTRY_CONTROLS/VM_EXIT_CONTROLS.


diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index e293a62..be64221 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -413,6 +413,7 @@ struct vcpu_vmx {
 	struct shared_msr_entry *guest_msrs;
 	int                   nmsrs;
 	int                   save_nmsrs;
+	bool                  core_perf_global_ctrl;
 	unsigned long	      host_idt_base;
 #ifdef CONFIG_X86_64
 	u64 		      msr_host_kernel_gs_base;
@@ -1432,9 +1433,12 @@ static void clear_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr)
 		break;
 	case MSR_CORE_PERF_GLOBAL_CTRL:
 		if (cpu_has_load_perf_global_ctrl) {
+			if (vmx->core_perf_global_ctrl) {
 			clear_atomic_switch_msr_special(
 					VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL,
 					VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL);
+				vmx->core_perf_global_ctrl = false;
+			}
 			return;
 		}
 		break;
@@ -1488,6 +1492,7 @@ static void add_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr,
 					GUEST_IA32_PERF_GLOBAL_CTRL,
 					HOST_IA32_PERF_GLOBAL_CTRL,
 					guest_val, host_val);
+			vmx->core_perf_global_ctrl = true;
 			return;
 		}
 		break;
--
			Gleb.

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: 8bf00a529967dafbbb210b377c38a15834d1e979 - performance regression?
  2013-11-05 11:20               ` Gleb Natapov
@ 2013-11-05 16:36                 ` Michael S. Tsirkin
  2013-11-05 16:40                   ` Gleb Natapov
  2013-11-18 19:04                 ` Michael S. Tsirkin
  1 sibling, 1 reply; 17+ messages in thread
From: Michael S. Tsirkin @ 2013-11-05 16:36 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: kvm

On Tue, Nov 05, 2013 at 01:20:10PM +0200, Gleb Natapov wrote:
> On Tue, Nov 05, 2013 at 12:22:49PM +0200, Gleb Natapov wrote:
> > On Tue, Nov 05, 2013 at 12:18:57PM +0200, Michael S. Tsirkin wrote:
> > > On Mon, Nov 04, 2013 at 10:44:43PM +0200, Gleb Natapov wrote:
> > > > On Mon, Nov 04, 2013 at 10:33:57PM +0200, Michael S. Tsirkin wrote:
> > > > > On Mon, Nov 04, 2013 at 10:13:39PM +0200, Gleb Natapov wrote:
> > > > > > On Mon, Nov 04, 2013 at 10:11:33PM +0200, Michael S. Tsirkin wrote:
> > > > > > > On Thu, Oct 31, 2013 at 09:48:08AM +0200, Gleb Natapov wrote:
> > > > > > > > On Thu, Oct 31, 2013 at 02:21:46AM +0200, Michael S. Tsirkin wrote:
> > > > > > > > > commit 8bf00a529967dafbbb210b377c38a15834d1e979:
> > > > > > > > > "    KVM: VMX: add support for switching of PERF_GLOBAL_CTRL " was
> > > > > > > > > as far as I can tell supposed to bring about performance improvement
> > > > > > > > > on hardware that supports it?
> > > > > > > > No, it (and commits after it) supposed to fix a bug which it did.
> > > > > > > > 
> > > > > > > > > Instead it seems to make the typical case (not running guest
> > > > > > > > > under perf) a bit slower than it used to be.
> > > > > > > > > the cost of VMexit goes up by about 50 cycles
> > > > > > > > > on sandy bridge where the optimization in question
> > > > > > > > > actually is activated.
> > > > > > > > >
> > > > > > > > You seams to be confused. 8bf00a529967dafbbb210 adds support for special
> > > > > > > > PERF_GLOBAL_CTRL switching, but does not add code to switch anything,
> > > > > > > > so the commit itself is a nop.
> > > > > > > 
> > > > > > > It does add code to add_atomic_switch_msr.
> > > > > > > 
> > > > > > So what? You do not read what I wrote.
> > > > > 
> > > > > 
> > > > > It's simple: if I revert 8bf00a529967dafbbb210 then exit latency
> > > > > is reduced.
> > > > > You seem to tell me it should be a nop, but in practice it isn't.
> > > > > 
> > > > 
> > > > No, if you read below I am saying that it looks like you are claiming that
> > > > generic msr switch mechanism is faster and I am not buying that. If you
> > > > believe this to be the case ask Intel for explanation. Your claim about
> > > > "not running guest under perf" is even stranger since in this case no msr
> > > > switch should happen regardless of the aforementioned commit (unless guest
> > > > or host runs nmi watchdog, but then switch will happen no matter if perf
> > > > is running, so again not running guest under perf" does not make sense).
> > > > So, in short, you do not really know where the slow down is coming
> > > > from.
> > > 
> > > That's true.
> > > 
> > Then dig dipper.
> > 
> So quick and dirty patch to not needlessly write into VM_ENTRY_CONTROLS
> when no PERF_GLOBAL_CTRL switching needed removes all the overhead. But we
> probably need more generic code to shadow entire VM_ENTRY_CONTROLS/VM_EXIT_CONTROLS.

Hmm, seems to help but less than 50 cycles, seems to be
around 20 and that kind of gain is getting hard to measure.

> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index e293a62..be64221 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -413,6 +413,7 @@ struct vcpu_vmx {
>  	struct shared_msr_entry *guest_msrs;
>  	int                   nmsrs;
>  	int                   save_nmsrs;
> +	bool                  core_perf_global_ctrl;
>  	unsigned long	      host_idt_base;
>  #ifdef CONFIG_X86_64
>  	u64 		      msr_host_kernel_gs_base;
> @@ -1432,9 +1433,12 @@ static void clear_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr)
>  		break;
>  	case MSR_CORE_PERF_GLOBAL_CTRL:
>  		if (cpu_has_load_perf_global_ctrl) {
> +			if (vmx->core_perf_global_ctrl) {
>  			clear_atomic_switch_msr_special(
>  					VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL,
>  					VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL);
> +				vmx->core_perf_global_ctrl = false;
> +			}
>  			return;
>  		}
>  		break;
> @@ -1488,6 +1492,7 @@ static void add_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr,
>  					GUEST_IA32_PERF_GLOBAL_CTRL,
>  					HOST_IA32_PERF_GLOBAL_CTRL,
>  					guest_val, host_val);
> +			vmx->core_perf_global_ctrl = true;
>  			return;
>  		}
>  		break;
> --
> 			Gleb.
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 8bf00a529967dafbbb210b377c38a15834d1e979 - performance regression?
  2013-11-05 16:36                 ` Michael S. Tsirkin
@ 2013-11-05 16:40                   ` Gleb Natapov
  2013-11-05 16:44                     ` Michael S. Tsirkin
  0 siblings, 1 reply; 17+ messages in thread
From: Gleb Natapov @ 2013-11-05 16:40 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kvm

On Tue, Nov 05, 2013 at 06:36:41PM +0200, Michael S. Tsirkin wrote:
> On Tue, Nov 05, 2013 at 01:20:10PM +0200, Gleb Natapov wrote:
> > On Tue, Nov 05, 2013 at 12:22:49PM +0200, Gleb Natapov wrote:
> > > On Tue, Nov 05, 2013 at 12:18:57PM +0200, Michael S. Tsirkin wrote:
> > > > On Mon, Nov 04, 2013 at 10:44:43PM +0200, Gleb Natapov wrote:
> > > > > On Mon, Nov 04, 2013 at 10:33:57PM +0200, Michael S. Tsirkin wrote:
> > > > > > On Mon, Nov 04, 2013 at 10:13:39PM +0200, Gleb Natapov wrote:
> > > > > > > On Mon, Nov 04, 2013 at 10:11:33PM +0200, Michael S. Tsirkin wrote:
> > > > > > > > On Thu, Oct 31, 2013 at 09:48:08AM +0200, Gleb Natapov wrote:
> > > > > > > > > On Thu, Oct 31, 2013 at 02:21:46AM +0200, Michael S. Tsirkin wrote:
> > > > > > > > > > commit 8bf00a529967dafbbb210b377c38a15834d1e979:
> > > > > > > > > > "    KVM: VMX: add support for switching of PERF_GLOBAL_CTRL " was
> > > > > > > > > > as far as I can tell supposed to bring about performance improvement
> > > > > > > > > > on hardware that supports it?
> > > > > > > > > No, it (and commits after it) supposed to fix a bug which it did.
> > > > > > > > > 
> > > > > > > > > > Instead it seems to make the typical case (not running guest
> > > > > > > > > > under perf) a bit slower than it used to be.
> > > > > > > > > > the cost of VMexit goes up by about 50 cycles
> > > > > > > > > > on sandy bridge where the optimization in question
> > > > > > > > > > actually is activated.
> > > > > > > > > >
> > > > > > > > > You seams to be confused. 8bf00a529967dafbbb210 adds support for special
> > > > > > > > > PERF_GLOBAL_CTRL switching, but does not add code to switch anything,
> > > > > > > > > so the commit itself is a nop.
> > > > > > > > 
> > > > > > > > It does add code to add_atomic_switch_msr.
> > > > > > > > 
> > > > > > > So what? You do not read what I wrote.
> > > > > > 
> > > > > > 
> > > > > > It's simple: if I revert 8bf00a529967dafbbb210 then exit latency
> > > > > > is reduced.
> > > > > > You seem to tell me it should be a nop, but in practice it isn't.
> > > > > > 
> > > > > 
> > > > > No, if you read below I am saying that it looks like you are claiming that
> > > > > generic msr switch mechanism is faster and I am not buying that. If you
> > > > > believe this to be the case ask Intel for explanation. Your claim about
> > > > > "not running guest under perf" is even stranger since in this case no msr
> > > > > switch should happen regardless of the aforementioned commit (unless guest
> > > > > or host runs nmi watchdog, but then switch will happen no matter if perf
> > > > > is running, so again not running guest under perf" does not make sense).
> > > > > So, in short, you do not really know where the slow down is coming
> > > > > from.
> > > > 
> > > > That's true.
> > > > 
> > > Then dig dipper.
> > > 
> > So quick and dirty patch to not needlessly write into VM_ENTRY_CONTROLS
> > when no PERF_GLOBAL_CTRL switching needed removes all the overhead. But we
> > probably need more generic code to shadow entire VM_ENTRY_CONTROLS/VM_EXIT_CONTROLS.
> 
> Hmm, seems to help but less than 50 cycles, seems to be
> around 20 and that kind of gain is getting hard to measure.
> 
With the patch the code with and without MSR_CORE_PERF_GLOBAL_CTRL switch
is exactly same when no switching is needed (nothing is done basically),
so I doubt your measurement. For me result is exactly same (it is noisy,
but noisy in the same way).

--
			Gleb.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 8bf00a529967dafbbb210b377c38a15834d1e979 - performance regression?
  2013-11-05 16:40                   ` Gleb Natapov
@ 2013-11-05 16:44                     ` Michael S. Tsirkin
  0 siblings, 0 replies; 17+ messages in thread
From: Michael S. Tsirkin @ 2013-11-05 16:44 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: kvm

On Tue, Nov 05, 2013 at 06:40:04PM +0200, Gleb Natapov wrote:
> On Tue, Nov 05, 2013 at 06:36:41PM +0200, Michael S. Tsirkin wrote:
> > On Tue, Nov 05, 2013 at 01:20:10PM +0200, Gleb Natapov wrote:
> > > On Tue, Nov 05, 2013 at 12:22:49PM +0200, Gleb Natapov wrote:
> > > > On Tue, Nov 05, 2013 at 12:18:57PM +0200, Michael S. Tsirkin wrote:
> > > > > On Mon, Nov 04, 2013 at 10:44:43PM +0200, Gleb Natapov wrote:
> > > > > > On Mon, Nov 04, 2013 at 10:33:57PM +0200, Michael S. Tsirkin wrote:
> > > > > > > On Mon, Nov 04, 2013 at 10:13:39PM +0200, Gleb Natapov wrote:
> > > > > > > > On Mon, Nov 04, 2013 at 10:11:33PM +0200, Michael S. Tsirkin wrote:
> > > > > > > > > On Thu, Oct 31, 2013 at 09:48:08AM +0200, Gleb Natapov wrote:
> > > > > > > > > > On Thu, Oct 31, 2013 at 02:21:46AM +0200, Michael S. Tsirkin wrote:
> > > > > > > > > > > commit 8bf00a529967dafbbb210b377c38a15834d1e979:
> > > > > > > > > > > "    KVM: VMX: add support for switching of PERF_GLOBAL_CTRL " was
> > > > > > > > > > > as far as I can tell supposed to bring about performance improvement
> > > > > > > > > > > on hardware that supports it?
> > > > > > > > > > No, it (and commits after it) supposed to fix a bug which it did.
> > > > > > > > > > 
> > > > > > > > > > > Instead it seems to make the typical case (not running guest
> > > > > > > > > > > under perf) a bit slower than it used to be.
> > > > > > > > > > > the cost of VMexit goes up by about 50 cycles
> > > > > > > > > > > on sandy bridge where the optimization in question
> > > > > > > > > > > actually is activated.
> > > > > > > > > > >
> > > > > > > > > > You seams to be confused. 8bf00a529967dafbbb210 adds support for special
> > > > > > > > > > PERF_GLOBAL_CTRL switching, but does not add code to switch anything,
> > > > > > > > > > so the commit itself is a nop.
> > > > > > > > > 
> > > > > > > > > It does add code to add_atomic_switch_msr.
> > > > > > > > > 
> > > > > > > > So what? You do not read what I wrote.
> > > > > > > 
> > > > > > > 
> > > > > > > It's simple: if I revert 8bf00a529967dafbbb210 then exit latency
> > > > > > > is reduced.
> > > > > > > You seem to tell me it should be a nop, but in practice it isn't.
> > > > > > > 
> > > > > > 
> > > > > > No, if you read below I am saying that it looks like you are claiming that
> > > > > > generic msr switch mechanism is faster and I am not buying that. If you
> > > > > > believe this to be the case ask Intel for explanation. Your claim about
> > > > > > "not running guest under perf" is even stranger since in this case no msr
> > > > > > switch should happen regardless of the aforementioned commit (unless guest
> > > > > > or host runs nmi watchdog, but then switch will happen no matter if perf
> > > > > > is running, so again not running guest under perf" does not make sense).
> > > > > > So, in short, you do not really know where the slow down is coming
> > > > > > from.
> > > > > 
> > > > > That's true.
> > > > > 
> > > > Then dig dipper.
> > > > 
> > > So quick and dirty patch to not needlessly write into VM_ENTRY_CONTROLS
> > > when no PERF_GLOBAL_CTRL switching needed removes all the overhead. But we
> > > probably need more generic code to shadow entire VM_ENTRY_CONTROLS/VM_EXIT_CONTROLS.
> > 
> > Hmm, seems to help but less than 50 cycles, seems to be
> > around 20 and that kind of gain is getting hard to measure.
> > 
> With the patch the code with and without MSR_CORE_PERF_GLOBAL_CTRL switch
> is exactly same when no switching is needed (nothing is done basically),
> so I doubt your measurement.

OK I'll retry when I have the time.

> For me result is exactly same (it is noisy,
> but noisy in the same way).
> 
> --
> 			Gleb.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 8bf00a529967dafbbb210b377c38a15834d1e979 - performance regression?
  2013-11-05 11:20               ` Gleb Natapov
  2013-11-05 16:36                 ` Michael S. Tsirkin
@ 2013-11-18 19:04                 ` Michael S. Tsirkin
  2013-11-18 19:09                   ` Gleb Natapov
  1 sibling, 1 reply; 17+ messages in thread
From: Michael S. Tsirkin @ 2013-11-18 19:04 UTC (permalink / raw)
  To: Gleb Natapov, pbonzini; +Cc: kvm

On Tue, Nov 05, 2013 at 01:20:10PM +0200, Gleb Natapov wrote:
> On Tue, Nov 05, 2013 at 12:22:49PM +0200, Gleb Natapov wrote:
> > On Tue, Nov 05, 2013 at 12:18:57PM +0200, Michael S. Tsirkin wrote:
> > > On Mon, Nov 04, 2013 at 10:44:43PM +0200, Gleb Natapov wrote:
> > > > On Mon, Nov 04, 2013 at 10:33:57PM +0200, Michael S. Tsirkin wrote:
> > > > > On Mon, Nov 04, 2013 at 10:13:39PM +0200, Gleb Natapov wrote:
> > > > > > On Mon, Nov 04, 2013 at 10:11:33PM +0200, Michael S. Tsirkin wrote:
> > > > > > > On Thu, Oct 31, 2013 at 09:48:08AM +0200, Gleb Natapov wrote:
> > > > > > > > On Thu, Oct 31, 2013 at 02:21:46AM +0200, Michael S. Tsirkin wrote:
> > > > > > > > > commit 8bf00a529967dafbbb210b377c38a15834d1e979:
> > > > > > > > > "    KVM: VMX: add support for switching of PERF_GLOBAL_CTRL " was
> > > > > > > > > as far as I can tell supposed to bring about performance improvement
> > > > > > > > > on hardware that supports it?
> > > > > > > > No, it (and commits after it) supposed to fix a bug which it did.
> > > > > > > > 
> > > > > > > > > Instead it seems to make the typical case (not running guest
> > > > > > > > > under perf) a bit slower than it used to be.
> > > > > > > > > the cost of VMexit goes up by about 50 cycles
> > > > > > > > > on sandy bridge where the optimization in question
> > > > > > > > > actually is activated.
> > > > > > > > >
> > > > > > > > You seams to be confused. 8bf00a529967dafbbb210 adds support for special
> > > > > > > > PERF_GLOBAL_CTRL switching, but does not add code to switch anything,
> > > > > > > > so the commit itself is a nop.
> > > > > > > 
> > > > > > > It does add code to add_atomic_switch_msr.
> > > > > > > 
> > > > > > So what? You do not read what I wrote.
> > > > > 
> > > > > 
> > > > > It's simple: if I revert 8bf00a529967dafbbb210 then exit latency
> > > > > is reduced.
> > > > > You seem to tell me it should be a nop, but in practice it isn't.
> > > > > 
> > > > 
> > > > No, if you read below I am saying that it looks like you are claiming that
> > > > generic msr switch mechanism is faster and I am not buying that. If you
> > > > believe this to be the case ask Intel for explanation. Your claim about
> > > > "not running guest under perf" is even stranger since in this case no msr
> > > > switch should happen regardless of the aforementioned commit (unless guest
> > > > or host runs nmi watchdog, but then switch will happen no matter if perf
> > > > is running, so again not running guest under perf" does not make sense).
> > > > So, in short, you do not really know where the slow down is coming
> > > > from.
> > > 
> > > That's true.
> > > 
> > Then dig dipper.
> > 
> So quick and dirty patch to not needlessly write into VM_ENTRY_CONTROLS
> when no PERF_GLOBAL_CTRL switching needed removes all the overhead. But we
> probably need more generic code to shadow entire VM_ENTRY_CONTROLS/VM_EXIT_CONTROLS.
> 

Do you plan to complete this? Merge as is?
You said this saves about 50 cycles per exit for you,
did you not?


> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index e293a62..be64221 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -413,6 +413,7 @@ struct vcpu_vmx {
>  	struct shared_msr_entry *guest_msrs;
>  	int                   nmsrs;
>  	int                   save_nmsrs;
> +	bool                  core_perf_global_ctrl;
>  	unsigned long	      host_idt_base;
>  #ifdef CONFIG_X86_64
>  	u64 		      msr_host_kernel_gs_base;
> @@ -1432,9 +1433,12 @@ static void clear_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr)
>  		break;
>  	case MSR_CORE_PERF_GLOBAL_CTRL:
>  		if (cpu_has_load_perf_global_ctrl) {
> +			if (vmx->core_perf_global_ctrl) {
>  			clear_atomic_switch_msr_special(
>  					VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL,
>  					VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL);
> +				vmx->core_perf_global_ctrl = false;
> +			}
>  			return;
>  		}
>  		break;
> @@ -1488,6 +1492,7 @@ static void add_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr,
>  					GUEST_IA32_PERF_GLOBAL_CTRL,
>  					HOST_IA32_PERF_GLOBAL_CTRL,
>  					guest_val, host_val);
> +			vmx->core_perf_global_ctrl = true;
>  			return;
>  		}
>  		break;
> --
> 			Gleb.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 8bf00a529967dafbbb210b377c38a15834d1e979 - performance regression?
  2013-11-18 19:04                 ` Michael S. Tsirkin
@ 2013-11-18 19:09                   ` Gleb Natapov
  2013-11-18 19:37                     ` Michael S. Tsirkin
  0 siblings, 1 reply; 17+ messages in thread
From: Gleb Natapov @ 2013-11-18 19:09 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: pbonzini, kvm

On Mon, Nov 18, 2013 at 09:04:51PM +0200, Michael S. Tsirkin wrote:
> On Tue, Nov 05, 2013 at 01:20:10PM +0200, Gleb Natapov wrote:
> > On Tue, Nov 05, 2013 at 12:22:49PM +0200, Gleb Natapov wrote:
> > > On Tue, Nov 05, 2013 at 12:18:57PM +0200, Michael S. Tsirkin wrote:
> > > > On Mon, Nov 04, 2013 at 10:44:43PM +0200, Gleb Natapov wrote:
> > > > > On Mon, Nov 04, 2013 at 10:33:57PM +0200, Michael S. Tsirkin wrote:
> > > > > > On Mon, Nov 04, 2013 at 10:13:39PM +0200, Gleb Natapov wrote:
> > > > > > > On Mon, Nov 04, 2013 at 10:11:33PM +0200, Michael S. Tsirkin wrote:
> > > > > > > > On Thu, Oct 31, 2013 at 09:48:08AM +0200, Gleb Natapov wrote:
> > > > > > > > > On Thu, Oct 31, 2013 at 02:21:46AM +0200, Michael S. Tsirkin wrote:
> > > > > > > > > > commit 8bf00a529967dafbbb210b377c38a15834d1e979:
> > > > > > > > > > "    KVM: VMX: add support for switching of PERF_GLOBAL_CTRL " was
> > > > > > > > > > as far as I can tell supposed to bring about performance improvement
> > > > > > > > > > on hardware that supports it?
> > > > > > > > > No, it (and commits after it) supposed to fix a bug which it did.
> > > > > > > > > 
> > > > > > > > > > Instead it seems to make the typical case (not running guest
> > > > > > > > > > under perf) a bit slower than it used to be.
> > > > > > > > > > the cost of VMexit goes up by about 50 cycles
> > > > > > > > > > on sandy bridge where the optimization in question
> > > > > > > > > > actually is activated.
> > > > > > > > > >
> > > > > > > > > You seams to be confused. 8bf00a529967dafbbb210 adds support for special
> > > > > > > > > PERF_GLOBAL_CTRL switching, but does not add code to switch anything,
> > > > > > > > > so the commit itself is a nop.
> > > > > > > > 
> > > > > > > > It does add code to add_atomic_switch_msr.
> > > > > > > > 
> > > > > > > So what? You do not read what I wrote.
> > > > > > 
> > > > > > 
> > > > > > It's simple: if I revert 8bf00a529967dafbbb210 then exit latency
> > > > > > is reduced.
> > > > > > You seem to tell me it should be a nop, but in practice it isn't.
> > > > > > 
> > > > > 
> > > > > No, if you read below I am saying that it looks like you are claiming that
> > > > > generic msr switch mechanism is faster and I am not buying that. If you
> > > > > believe this to be the case ask Intel for explanation. Your claim about
> > > > > "not running guest under perf" is even stranger since in this case no msr
> > > > > switch should happen regardless of the aforementioned commit (unless guest
> > > > > or host runs nmi watchdog, but then switch will happen no matter if perf
> > > > > is running, so again not running guest under perf" does not make sense).
> > > > > So, in short, you do not really know where the slow down is coming
> > > > > from.
> > > > 
> > > > That's true.
> > > > 
> > > Then dig dipper.
> > > 
> > So quick and dirty patch to not needlessly write into VM_ENTRY_CONTROLS
> > when no PERF_GLOBAL_CTRL switching needed removes all the overhead. But we
> > probably need more generic code to shadow entire VM_ENTRY_CONTROLS/VM_EXIT_CONTROLS.
> > 
> 
> Do you plan to complete this? Merge as is?
I have a patch to shadow VM_ENTRY_CONTROLS/VM_EXIT_CONTROLS now, will
post it shortly, but it will have to wait for the end of a merge window
anyway.

> You said this saves about 50 cycles per exit for you,
> did you not?
Let's not exaggerate :) May be 20 after removing all the noise.

> 
> 
> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > index e293a62..be64221 100644
> > --- a/arch/x86/kvm/vmx.c
> > +++ b/arch/x86/kvm/vmx.c
> > @@ -413,6 +413,7 @@ struct vcpu_vmx {
> >  	struct shared_msr_entry *guest_msrs;
> >  	int                   nmsrs;
> >  	int                   save_nmsrs;
> > +	bool                  core_perf_global_ctrl;
> >  	unsigned long	      host_idt_base;
> >  #ifdef CONFIG_X86_64
> >  	u64 		      msr_host_kernel_gs_base;
> > @@ -1432,9 +1433,12 @@ static void clear_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr)
> >  		break;
> >  	case MSR_CORE_PERF_GLOBAL_CTRL:
> >  		if (cpu_has_load_perf_global_ctrl) {
> > +			if (vmx->core_perf_global_ctrl) {
> >  			clear_atomic_switch_msr_special(
> >  					VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL,
> >  					VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL);
> > +				vmx->core_perf_global_ctrl = false;
> > +			}
> >  			return;
> >  		}
> >  		break;
> > @@ -1488,6 +1492,7 @@ static void add_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr,
> >  					GUEST_IA32_PERF_GLOBAL_CTRL,
> >  					HOST_IA32_PERF_GLOBAL_CTRL,
> >  					guest_val, host_val);
> > +			vmx->core_perf_global_ctrl = true;
> >  			return;
> >  		}
> >  		break;
> > --
> > 			Gleb.

--
			Gleb.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 8bf00a529967dafbbb210b377c38a15834d1e979 - performance regression?
  2013-11-18 19:09                   ` Gleb Natapov
@ 2013-11-18 19:37                     ` Michael S. Tsirkin
  0 siblings, 0 replies; 17+ messages in thread
From: Michael S. Tsirkin @ 2013-11-18 19:37 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: pbonzini, kvm

On Mon, Nov 18, 2013 at 09:09:04PM +0200, Gleb Natapov wrote:
> On Mon, Nov 18, 2013 at 09:04:51PM +0200, Michael S. Tsirkin wrote:
> > On Tue, Nov 05, 2013 at 01:20:10PM +0200, Gleb Natapov wrote:
> > > On Tue, Nov 05, 2013 at 12:22:49PM +0200, Gleb Natapov wrote:
> > > > On Tue, Nov 05, 2013 at 12:18:57PM +0200, Michael S. Tsirkin wrote:
> > > > > On Mon, Nov 04, 2013 at 10:44:43PM +0200, Gleb Natapov wrote:
> > > > > > On Mon, Nov 04, 2013 at 10:33:57PM +0200, Michael S. Tsirkin wrote:
> > > > > > > On Mon, Nov 04, 2013 at 10:13:39PM +0200, Gleb Natapov wrote:
> > > > > > > > On Mon, Nov 04, 2013 at 10:11:33PM +0200, Michael S. Tsirkin wrote:
> > > > > > > > > On Thu, Oct 31, 2013 at 09:48:08AM +0200, Gleb Natapov wrote:
> > > > > > > > > > On Thu, Oct 31, 2013 at 02:21:46AM +0200, Michael S. Tsirkin wrote:
> > > > > > > > > > > commit 8bf00a529967dafbbb210b377c38a15834d1e979:
> > > > > > > > > > > "    KVM: VMX: add support for switching of PERF_GLOBAL_CTRL " was
> > > > > > > > > > > as far as I can tell supposed to bring about performance improvement
> > > > > > > > > > > on hardware that supports it?
> > > > > > > > > > No, it (and commits after it) supposed to fix a bug which it did.
> > > > > > > > > > 
> > > > > > > > > > > Instead it seems to make the typical case (not running guest
> > > > > > > > > > > under perf) a bit slower than it used to be.
> > > > > > > > > > > the cost of VMexit goes up by about 50 cycles
> > > > > > > > > > > on sandy bridge where the optimization in question
> > > > > > > > > > > actually is activated.
> > > > > > > > > > >
> > > > > > > > > > You seams to be confused. 8bf00a529967dafbbb210 adds support for special
> > > > > > > > > > PERF_GLOBAL_CTRL switching, but does not add code to switch anything,
> > > > > > > > > > so the commit itself is a nop.
> > > > > > > > > 
> > > > > > > > > It does add code to add_atomic_switch_msr.
> > > > > > > > > 
> > > > > > > > So what? You do not read what I wrote.
> > > > > > > 
> > > > > > > 
> > > > > > > It's simple: if I revert 8bf00a529967dafbbb210 then exit latency
> > > > > > > is reduced.
> > > > > > > You seem to tell me it should be a nop, but in practice it isn't.
> > > > > > > 
> > > > > > 
> > > > > > No, if you read below I am saying that it looks like you are claiming that
> > > > > > generic msr switch mechanism is faster and I am not buying that. If you
> > > > > > believe this to be the case ask Intel for explanation. Your claim about
> > > > > > "not running guest under perf" is even stranger since in this case no msr
> > > > > > switch should happen regardless of the aforementioned commit (unless guest
> > > > > > or host runs nmi watchdog, but then switch will happen no matter if perf
> > > > > > is running, so again not running guest under perf" does not make sense).
> > > > > > So, in short, you do not really know where the slow down is coming
> > > > > > from.
> > > > > 
> > > > > That's true.
> > > > > 
> > > > Then dig dipper.
> > > > 
> > > So quick and dirty patch to not needlessly write into VM_ENTRY_CONTROLS
> > > when no PERF_GLOBAL_CTRL switching needed removes all the overhead. But we
> > > probably need more generic code to shadow entire VM_ENTRY_CONTROLS/VM_EXIT_CONTROLS.
> > > 
> > 
> > Do you plan to complete this? Merge as is?
> I have a patch to shadow VM_ENTRY_CONTROLS/VM_EXIT_CONTROLS now, will
> post it shortly, but it will have to wait for the end of a merge window
> anyway.

No rush.

> > You said this saves about 50 cycles per exit for you,
> > did you not?
> Let's not exaggerate :) May be 20 after removing all the noise.

Interesting. I remember reverting the feature
gave me 50 cycles. I'll have to retest, but 20 cycles
on all exits is nice too.

> > 
> > 
> > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > > index e293a62..be64221 100644
> > > --- a/arch/x86/kvm/vmx.c
> > > +++ b/arch/x86/kvm/vmx.c
> > > @@ -413,6 +413,7 @@ struct vcpu_vmx {
> > >  	struct shared_msr_entry *guest_msrs;
> > >  	int                   nmsrs;
> > >  	int                   save_nmsrs;
> > > +	bool                  core_perf_global_ctrl;
> > >  	unsigned long	      host_idt_base;
> > >  #ifdef CONFIG_X86_64
> > >  	u64 		      msr_host_kernel_gs_base;
> > > @@ -1432,9 +1433,12 @@ static void clear_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr)
> > >  		break;
> > >  	case MSR_CORE_PERF_GLOBAL_CTRL:
> > >  		if (cpu_has_load_perf_global_ctrl) {
> > > +			if (vmx->core_perf_global_ctrl) {
> > >  			clear_atomic_switch_msr_special(
> > >  					VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL,
> > >  					VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL);
> > > +				vmx->core_perf_global_ctrl = false;
> > > +			}
> > >  			return;
> > >  		}
> > >  		break;
> > > @@ -1488,6 +1492,7 @@ static void add_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr,
> > >  					GUEST_IA32_PERF_GLOBAL_CTRL,
> > >  					HOST_IA32_PERF_GLOBAL_CTRL,
> > >  					guest_val, host_val);
> > > +			vmx->core_perf_global_ctrl = true;
> > >  			return;
> > >  		}
> > >  		break;
> > > --
> > > 			Gleb.
> 
> --
> 			Gleb.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2013-11-18 19:34 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-31  0:21 8bf00a529967dafbbb210b377c38a15834d1e979 - performance regression? Michael S. Tsirkin
2013-10-31  7:48 ` Gleb Natapov
2013-11-04 20:11   ` Michael S. Tsirkin
2013-11-04 20:13     ` Gleb Natapov
2013-11-04 20:33       ` Michael S. Tsirkin
2013-11-04 20:44         ` Gleb Natapov
2013-11-05 10:18           ` Michael S. Tsirkin
2013-11-05 10:22             ` Gleb Natapov
2013-11-05 11:20               ` Gleb Natapov
2013-11-05 16:36                 ` Michael S. Tsirkin
2013-11-05 16:40                   ` Gleb Natapov
2013-11-05 16:44                     ` Michael S. Tsirkin
2013-11-18 19:04                 ` Michael S. Tsirkin
2013-11-18 19:09                   ` Gleb Natapov
2013-11-18 19:37                     ` Michael S. Tsirkin
2013-11-04 19:30 ` Michael S. Tsirkin
2013-11-04 19:39   ` Gleb Natapov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox