heavy P2M lock contention on guest HPET counter reads

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

* heavy P2M lock contention on guest HPET counter reads
@ 2014-07-30 10:08 Jan Beulich
  2014-07-30 10:29 ` Andrew Cooper
  2014-07-30 14:22 ` Jan Beulich
  0 siblings, 2 replies; 9+ messages in thread
From: Jan Beulich @ 2014-07-30 10:08 UTC (permalink / raw)
  To: Andres Lagar-Cavilla, Tim Deegan; +Cc: xen-devel

Hi,

with 40+ vCPU Win2012R2 guests we're observing apparent guest live
locks. The 'd' debug key reveals more than half of the vCPU-s doing an
inlined HPET main counter read from KeQueryPerformanceCounter(),
resulting in all of them racing for the lock at the beginning of
__get_gfn_type_access(). Assuming it is really necessary to always
take the write lock (rather than just the read one) here, would it perhaps
be reasonable to introduce a bypass in hvm_hap_nested_page_fault()
for the HPET page similar to the LAPIC one?

Thanks, Jan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: heavy P2M lock contention on guest HPET counter reads
  2014-07-30 10:08 heavy P2M lock contention on guest HPET counter reads Jan Beulich
@ 2014-07-30 10:29 ` Andrew Cooper
  2014-07-30 10:46   ` Jan Beulich
  2014-07-30 14:22 ` Jan Beulich
  1 sibling, 1 reply; 9+ messages in thread
From: Andrew Cooper @ 2014-07-30 10:29 UTC (permalink / raw)
  To: Jan Beulich, Andres Lagar-Cavilla, Tim Deegan; +Cc: xen-devel

On 30/07/14 11:08, Jan Beulich wrote:
> Hi,
>
> with 40+ vCPU Win2012R2 guests we're observing apparent guest live
> locks. The 'd' debug key reveals more than half of the vCPU-s doing an
> inlined HPET main counter read from KeQueryPerformanceCounter(),
> resulting in all of them racing for the lock at the beginning of
> __get_gfn_type_access(). Assuming it is really necessary to always
> take the write lock (rather than just the read one) here, would it perhaps
> be reasonable to introduce a bypass in hvm_hap_nested_page_fault()
> for the HPET page similar to the LAPIC one?
>
> Thanks, Jan

We have received similar concerns about the emulation overhead of the
vhpet, but not with a guest that size.

One complication is that the HPET can optionally be emulated by qemu
based on an HVMPARAM, so the fastpath has to consider creating an
ioreq.  (Frankly, I think this ability is completely mad, and I doubt
anyone would notice if it ceased to work.  There is no reason to use
qemu's emulation in preference to Xen's, especially as Xen's emulatator
was copied from qemu at some point in the past)

Have you enabled with viridian extensions?  Providing the TSC
enlightenments should cause Windows to completely ignore the vhpet.

~Andrew

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: heavy P2M lock contention on guest HPET counter reads
  2014-07-30 10:29 ` Andrew Cooper
@ 2014-07-30 10:46   ` Jan Beulich
  0 siblings, 0 replies; 9+ messages in thread
From: Jan Beulich @ 2014-07-30 10:46 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, Tim Deegan, Andres Lagar-Cavilla

>>> On 30.07.14 at 12:29, <andrew.cooper3@citrix.com> wrote:
> One complication is that the HPET can optionally be emulated by qemu
> based on an HVMPARAM, so the fastpath has to consider creating an
> ioreq.  (Frankly, I think this ability is completely mad, and I doubt
> anyone would notice if it ceased to work.  There is no reason to use
> qemu's emulation in preference to Xen's, especially as Xen's emulatator
> was copied from qemu at some point in the past)

No, we wouldn't need to care about that mode. For one, the qemu
path should be unreachable for a well behaved guest, since with
"hpet=0" there's no ACPI table advertising a HPET. And then, even
if it was reachable (e.g. due to a guest guessing/probing it), I don't
think we need to be concerned about that path's performance.

> Have you enabled with viridian extensions?  Providing the TSC
> enlightenments should cause Windows to completely ignore the vhpet.

I'm told this flag doesn't make any difference, albeit I'll have them
double check.

Jan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: heavy P2M lock contention on guest HPET counter reads
  2014-07-30 10:08 heavy P2M lock contention on guest HPET counter reads Jan Beulich
  2014-07-30 10:29 ` Andrew Cooper
@ 2014-07-30 14:22 ` Jan Beulich
  2014-07-30 18:15   ` Andres Lagar Cavilla
  1 sibling, 1 reply; 9+ messages in thread
From: Jan Beulich @ 2014-07-30 14:22 UTC (permalink / raw)
  To: Andres Lagar-Cavilla, Tim Deegan; +Cc: xen-devel

>>> On 30.07.14 at 12:08, <JBeulich@suse.com> wrote:
> with 40+ vCPU Win2012R2 guests we're observing apparent guest live
> locks. The 'd' debug key reveals more than half of the vCPU-s doing an
> inlined HPET main counter read from KeQueryPerformanceCounter(),
> resulting in all of them racing for the lock at the beginning of
> __get_gfn_type_access(). Assuming it is really necessary to always
> take the write lock (rather than just the read one) here, would it perhaps
> be reasonable to introduce a bypass in hvm_hap_nested_page_fault()
> for the HPET page similar to the LAPIC one?

Like this (for 4.4, if it matters, and with no tester feedback yet),
raising the question whether we shouldn't just do this for all
internal MMIO handlers.

--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -1523,11 +1523,16 @@ int hvm_hap_nested_page_fault(paddr_t gp
         }
     }
 
-    /* For the benefit of 32-bit WinXP (& older Windows) on AMD CPUs,
-     * a fast path for LAPIC accesses, skipping the p2m lookup. */
+    /*
+     * For the benefit of 32-bit WinXP (& older Windows) on AMD CPUs,
+     * a fast path for LAPIC accesses, skipping the p2m lookup.
+     * Similarly for newer Windows (like Server 2012) a fast path for
+     * HPET accesses.
+     */
     if ( !nestedhvm_vcpu_in_guestmode(v)
          && is_hvm_vcpu(v)
-         && gfn == PFN_DOWN(vlapic_base_address(vcpu_vlapic(v))) )
+         && (gfn == PFN_DOWN(vlapic_base_address(vcpu_vlapic(v)))
+             || hpet_mmio_handler.check_handler(v, gpa)) )
     {
         if ( !handle_mmio() )
             hvm_inject_hw_exception(TRAP_gp_fault, 0);

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: heavy P2M lock contention on guest HPET counter reads
  2014-07-30 14:22 ` Jan Beulich
@ 2014-07-30 18:15   ` Andres Lagar Cavilla
  2014-08-01  6:38     ` Jan Beulich
  0 siblings, 1 reply; 9+ messages in thread
From: Andres Lagar Cavilla @ 2014-07-30 18:15 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Tim Deegan


[-- Attachment #1.1: Type: text/plain, Size: 2539 bytes --]

On Wed, Jul 30, 2014 at 10:22 AM, Jan Beulich <JBeulich@suse.com> wrote:

> >>> On 30.07.14 at 12:08, <JBeulich@suse.com> wrote:
> > with 40+ vCPU Win2012R2 guests we're observing apparent guest live
> > locks. The 'd' debug key reveals more than half of the vCPU-s doing an
> > inlined HPET main counter read from KeQueryPerformanceCounter(),
> > resulting in all of them racing for the lock at the beginning of
> > __get_gfn_type_access(). Assuming it is really necessary to always
> > take the write lock (rather than just the read one) here, would it
> perhaps
> > be reasonable to introduce a bypass in hvm_hap_nested_page_fault()
> > for the HPET page similar to the LAPIC one?
>
> Like this (for 4.4, if it matters, and with no tester feedback yet),
> raising the question whether we shouldn't just do this for all
> internal MMIO handlers.
>
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -1523,11 +1523,16 @@ int hvm_hap_nested_page_fault(paddr_t gp
>          }
>      }
>
> -    /* For the benefit of 32-bit WinXP (& older Windows) on AMD CPUs,
> -     * a fast path for LAPIC accesses, skipping the p2m lookup. */
> +    /*
> +     * For the benefit of 32-bit WinXP (& older Windows) on AMD CPUs,
> +     * a fast path for LAPIC accesses, skipping the p2m lookup.
> +     * Similarly for newer Windows (like Server 2012) a fast path for
> +     * HPET accesses.
> +     */
>      if ( !nestedhvm_vcpu_in_guestmode(v)
>           && is_hvm_vcpu(v)
> -         && gfn == PFN_DOWN(vlapic_base_address(vcpu_vlapic(v))) )
> +         && (gfn == PFN_DOWN(vlapic_base_address(vcpu_vlapic(v)))
> +             || hpet_mmio_handler.check_handler(v, gpa)) )
>

This is reasonable. I suggest knee-capping the open coding and moving this
block into its own little inline function (e.g. try_shortcut_hvm_mmio). I
think soon enough we may run into the next Xen-implemented mmio range that
is getting pounded. In any case it will do for better-looking code. Roughly:

int try_shortcut_hvm_mmio(struct vcpu *v, paddr_t gpa)
{
    int i;

    if ( nestedhvm_vcpu_in_guestmode(v)
         || !is_hvm_vcpu(v))
        return 0;

    for ( i = 0; i < HVM_MMIO_HANDLER_NR; i++ )
        if ( hvm_mmio_handlers[i]->check_handler(v, gpa) )
            return 1;
}

Having said that, I don't know off the top of my head that is obviously
correct to shortcut the p2m lookup for msix table and iommu. I think so,
but...

Andres

     {
>          if ( !handle_mmio() )
>              hvm_inject_hw_exception(TRAP_gp_fault, 0);
>
>
>

[-- Attachment #1.2: Type: text/html, Size: 3696 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: heavy P2M lock contention on guest HPET counter reads
  2014-07-30 18:15   ` Andres Lagar Cavilla
@ 2014-08-01  6:38     ` Jan Beulich
  2014-08-02  2:33       ` Andres Lagar Cavilla
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Beulich @ 2014-08-01  6:38 UTC (permalink / raw)
  To: Andres Lagar Cavilla; +Cc: xen-devel, Tim Deegan

>>> On 30.07.14 at 20:15, <andres@lagarcavilla.org> wrote:
> Having said that, I don't know off the top of my head that is obviously
> correct to shortcut the p2m lookup for msix table and iommu. I think so,
> but...

All internal MMIO ranges are what you'd call "positively decoded" on
real hardware, i.e. they ought to supersede any other address
decoding rules anyway. Since they're also not associated with
possibly varying MFNs, the P2M lookup on them is simply pointless,
and short-circuiting them is imo at once making better guarantees
towards consistent behavior.

Jan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: heavy P2M lock contention on guest HPET counter reads
  2014-08-01  6:38     ` Jan Beulich
@ 2014-08-02  2:33       ` Andres Lagar Cavilla
  2014-08-04  7:00         ` Jan Beulich
  0 siblings, 1 reply; 9+ messages in thread
From: Andres Lagar Cavilla @ 2014-08-02  2:33 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Tim Deegan


[-- Attachment #1.1: Type: text/plain, Size: 847 bytes --]

On Fri, Aug 1, 2014 at 2:38 AM, Jan Beulich <JBeulich@suse.com> wrote:

> >>> On 30.07.14 at 20:15, <andres@lagarcavilla.org> wrote:
> > Having said that, I don't know off the top of my head that is obviously
> > correct to shortcut the p2m lookup for msix table and iommu. I think so,
> > but...
>
> All internal MMIO ranges are what you'd call "positively decoded" on
> real hardware, i.e. they ought to supersede any other address
> decoding rules anyway. Since they're also not associated with
> possibly varying MFNs, the P2M lookup on them is simply pointless,
> and short-circuiting them is imo at once making better guarantees
> towards consistent behavior.
>
Ok. Good. However: this is a very hot path. I wonder if this iteration can
be made more efficient (i.e sort the resulting mmio regions and do binary
search).

Andres

>
> Jan
>
>

[-- Attachment #1.2: Type: text/html, Size: 1502 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: heavy P2M lock contention on guest HPET counter reads
  2014-08-02  2:33       ` Andres Lagar Cavilla
@ 2014-08-04  7:00         ` Jan Beulich
  2014-08-08  2:46           ` Andres Lagar Cavilla
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Beulich @ 2014-08-04  7:00 UTC (permalink / raw)
  To: Andres Lagar Cavilla; +Cc: xen-devel, Tim Deegan

>>> On 02.08.14 at 04:33, <andres@lagarcavilla.org> wrote:
> On Fri, Aug 1, 2014 at 2:38 AM, Jan Beulich <JBeulich@suse.com> wrote:
> 
>> >>> On 30.07.14 at 20:15, <andres@lagarcavilla.org> wrote:
>> > Having said that, I don't know off the top of my head that is obviously
>> > correct to shortcut the p2m lookup for msix table and iommu. I think so,
>> > but...
>>
>> All internal MMIO ranges are what you'd call "positively decoded" on
>> real hardware, i.e. they ought to supersede any other address
>> decoding rules anyway. Since they're also not associated with
>> possibly varying MFNs, the P2M lookup on them is simply pointless,
>> and short-circuiting them is imo at once making better guarantees
>> towards consistent behavior.
>>
> Ok. Good. However: this is a very hot path. I wonder if this iteration can
> be made more efficient (i.e sort the resulting mmio regions and do binary
> search).

How much more efficient would a binary search be for currently five
handlers? Furthermore, ->check_handler() involves more than just
an address range comparison in at least some of the cases.

Jan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: heavy P2M lock contention on guest HPET counter reads
  2014-08-04  7:00         ` Jan Beulich
@ 2014-08-08  2:46           ` Andres Lagar Cavilla
  0 siblings, 0 replies; 9+ messages in thread
From: Andres Lagar Cavilla @ 2014-08-08  2:46 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Tim Deegan


[-- Attachment #1.1: Type: text/plain, Size: 1605 bytes --]

On Mon, Aug 4, 2014 at 3:00 AM, Jan Beulich <JBeulich@suse.com> wrote:

> >>> On 02.08.14 at 04:33, <andres@lagarcavilla.org> wrote:
> > On Fri, Aug 1, 2014 at 2:38 AM, Jan Beulich <JBeulich@suse.com> wrote:
> >
> >> >>> On 30.07.14 at 20:15, <andres@lagarcavilla.org> wrote:
> >> > Having said that, I don't know off the top of my head that is
> obviously
> >> > correct to shortcut the p2m lookup for msix table and iommu. I think
> so,
> >> > but...
> >>
> >> All internal MMIO ranges are what you'd call "positively decoded" on
> >> real hardware, i.e. they ought to supersede any other address
> >> decoding rules anyway. Since they're also not associated with
> >> possibly varying MFNs, the P2M lookup on them is simply pointless,
> >> and short-circuiting them is imo at once making better guarantees
> >> towards consistent behavior.
> >>
> > Ok. Good. However: this is a very hot path. I wonder if this iteration
> can
> > be made more efficient (i.e sort the resulting mmio regions and do binary
> > search).
>
> How much more efficient would a binary search be for currently five
> handlers? Furthermore, ->check_handler() involves more than just
> an address range comparison in at least some of the cases.
>

Not much more efficient, given the almost zero odds of having more than
five mmio handlers that can skip p2m lookup.

The only better performant option seems open-coding. Given the sensitivity
of the path, I'd say stick with the original plan with two open coded if's
(blatant backtrack!)

Let me know if you'd rather we follow the route I laid out here.

Thanks
Andres


> Jan
>
>

[-- Attachment #1.2: Type: text/html, Size: 2603 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-08-08  2:46 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-30 10:08 heavy P2M lock contention on guest HPET counter reads Jan Beulich
2014-07-30 10:29 ` Andrew Cooper
2014-07-30 10:46   ` Jan Beulich
2014-07-30 14:22 ` Jan Beulich
2014-07-30 18:15   ` Andres Lagar Cavilla
2014-08-01  6:38     ` Jan Beulich
2014-08-02  2:33       ` Andres Lagar Cavilla
2014-08-04  7:00         ` Jan Beulich
2014-08-08  2:46           ` Andres Lagar Cavilla

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).