Re: [PATCH 2/2] x86/HVM: batch vCPU wakeups

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Jan Beulich <JBeulich@suse.com>,
	xen-devel <xen-devel@lists.xenproject.org>
Cc: Ian Campbell <Ian.Campbell@eu.citrix.com>,
	Ian Jackson <Ian.Jackson@eu.citrix.com>,
	Keir Fraser <keir@xen.org>, Tim Deegan <tim@xen.org>
Subject: Re: [PATCH 2/2] x86/HVM: batch vCPU wakeups
Date: Thu, 11 Sep 2014 11:48:00 +0100	[thread overview]
Message-ID: <54117DE0.1010203@citrix.com> (raw)
In-Reply-To: <54118A3A0200007800033BCD@mail.emea.novell.com>


[-- Attachment #1.1: Type: text/plain, Size: 7322 bytes --]

On 11/09/14 10:40, Jan Beulich wrote:
> Mass wakeups (via vlapic_ipi()) can take enormous amounts of time,
> especially when many of the remote pCPU-s are in deep C-states. For
> 64-vCPU Windows Server 2012 R2 guests on Ivybridge hardware,
> accumulated times of over 2ms were observed (average 1.1ms).
> Considering that Windows broadcasts IPIs from its timer interrupt,
> which at least at certain times can run at 1kHz, it is clear that this
> can't result in good guest behavior. In fact, on said hardware guests
> with significantly beyond 40 vCPU-s simply hung when e.g. ServerManager
> gets started.
>
> This isn't just helping to reduce the number of ICR writes when the
> host APICs run in clustered mode, it also reduces them by suppressing
> the sends altogether when - by the time
> cpu_raise_softirq_batch_finish() is reached - the remote CPU already
> managed to handle the softirq. Plus - when using MONITOR/MWAIT - the
> update of softirq_pending(cpu), being on the monitored cache line -
> should make the remote CPU wake up ahead of the ICR being sent,
> allowing the wait-for-ICR-idle latencies to be reduced (perhaps to a
> large part due to overlapping the wakeups of multiple CPUs).
>
> With this alone (i.e. without the IPI avoidance patch in place),
> average broadcast times for a 64-vCPU guest went down to a measured
> maximum of 310us. With that other patch in place, improvements aren't
> as clear anymore (short term averages only went down from 255us to
> 250us, which clearly is within the error range of the measurements),
> but longer term an improvement of the averages is still visible.
> Depending on hardware, long term maxima were observed to go down quite
> a bit (on aforementioned hardware), while they were seen to go up
> again on a (single core) Nehalem (where instead the improvement on the
> average values was more visible).
>
> Of course this necessarily increases the latencies for the remote
> CPU wakeup at least slightly. To weigh between the effects, the
> condition to enable batching in vlapic_ipi() may need further tuning.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>
> --- a/xen/arch/x86/hvm/vlapic.c
> +++ b/xen/arch/x86/hvm/vlapic.c
> @@ -409,6 +409,26 @@ void vlapic_handle_EOI_induced_exit(stru
>      hvm_dpci_msi_eoi(current->domain, vector);
>  }
>  
> +static bool_t is_multicast_dest(struct vlapic *vlapic, unsigned int short_hand,
> +                                uint32_t dest, bool_t dest_mode)
> +{
> +    if ( vlapic_domain(vlapic)->max_vcpus <= 2 )
> +        return 0;
> +
> +    if ( short_hand )
> +        return short_hand != APIC_DEST_SELF;
> +
> +    if ( vlapic_x2apic_mode(vlapic) )
> +        return dest_mode ? hweight16(dest) > 1 : dest == 0xffffffff;
> +
> +    if ( dest_mode )
> +        return hweight8(dest &
> +                        GET_xAPIC_DEST_FIELD(vlapic_get_reg(vlapic,
> +                                                            APIC_DFR))) > 1;
> +
> +    return dest == 0xff;
> +}

Much more readable!

> +
>  void vlapic_ipi(
>      struct vlapic *vlapic, uint32_t icr_low, uint32_t icr_high)
>  {
> @@ -447,12 +467,18 @@ void vlapic_ipi(
>  
>      default: {
>          struct vcpu *v;
> +        bool_t batch = is_multicast_dest(vlapic, short_hand, dest, dest_mode);
> +
> +        if ( batch )
> +            cpu_raise_softirq_batch_begin();
>          for_each_vcpu ( vlapic_domain(vlapic), v )
>          {
>              if ( vlapic_match_dest(vcpu_vlapic(v), vlapic,
>                                     short_hand, dest, dest_mode) )
>                  vlapic_accept_irq(v, icr_low);
>          }
> +        if ( batch )
> +            cpu_raise_softirq_batch_finish();
>          break;
>      }
>      }
> --- a/xen/common/softirq.c
> +++ b/xen/common/softirq.c
> @@ -23,6 +23,9 @@ irq_cpustat_t irq_stat[NR_CPUS];
>  
>  static softirq_handler softirq_handlers[NR_SOFTIRQS];
>  
> +static DEFINE_PER_CPU(cpumask_t, batch_mask);
> +static DEFINE_PER_CPU(unsigned int, batching);
> +
>  static void __do_softirq(unsigned long ignore_mask)
>  {
>      unsigned int i, cpu;
> @@ -71,24 +74,58 @@ void open_softirq(int nr, softirq_handle
>  void cpumask_raise_softirq(const cpumask_t *mask, unsigned int nr)
>  {
>      unsigned int cpu, this_cpu = smp_processor_id();
> -    cpumask_t send_mask;
> +    cpumask_t send_mask, *raise_mask;
> +
> +    if ( !per_cpu(batching, this_cpu) || in_irq() )
> +    {
> +        cpumask_clear(&send_mask);
> +        raise_mask = &send_mask;
> +    }
> +    else
> +        raise_mask = &per_cpu(batch_mask, this_cpu);
>  
> -    cpumask_clear(&send_mask);
>      for_each_cpu(cpu, mask)
>          if ( !test_and_set_bit(nr, &softirq_pending(cpu)) &&
>               cpu != this_cpu &&
>               !arch_skip_send_event_check(cpu) )
> -            cpumask_set_cpu(cpu, &send_mask);
> +            cpumask_set_cpu(cpu, raise_mask);
>  
> -    smp_send_event_check_mask(&send_mask);
> +    if ( raise_mask == &send_mask )
> +        smp_send_event_check_mask(raise_mask);
>  }
>  
>  void cpu_raise_softirq(unsigned int cpu, unsigned int nr)
>  {
> -    if ( !test_and_set_bit(nr, &softirq_pending(cpu))
> -         && (cpu != smp_processor_id())
> -         && !arch_skip_send_event_check(cpu) )
> +    unsigned int this_cpu = smp_processor_id();
> +
> +    if ( test_and_set_bit(nr, &softirq_pending(cpu))
> +         || (cpu == this_cpu)
> +         || arch_skip_send_event_check(cpu) )
> +        return;
> +
> +    if ( !per_cpu(batching, this_cpu) || in_irq() )
>          smp_send_event_check_cpu(cpu);
> +    else
> +        set_bit(nr, &per_cpu(batch_mask, this_cpu));

Under what circumstances would it be sensible to batch calls to
cpu_raise_softirq()?

All of the current callers are singleshot events, and their use in a
batched period would only be as a result of a timer interrupt, which
bypasses the batching.

~Andrew

> +}
> +
> +void cpu_raise_softirq_batch_begin(void)
> +{
> +    ++this_cpu(batching);
> +}
> +
> +void cpu_raise_softirq_batch_finish(void)
> +{
> +    unsigned int cpu, this_cpu = smp_processor_id();
> +    cpumask_t *mask = &per_cpu(batch_mask, this_cpu);
> +
> +    ASSERT(per_cpu(batching, this_cpu));
> +    for_each_cpu ( cpu, mask )
> +        if ( !softirq_pending(cpu) )
> +            cpumask_clear_cpu(cpu, mask);
> +    smp_send_event_check_mask(mask);
> +    cpumask_clear(mask);
> +    --per_cpu(batching, this_cpu);
>  }
>  
>  void raise_softirq(unsigned int nr)
> --- a/xen/include/xen/softirq.h
> +++ b/xen/include/xen/softirq.h
> @@ -30,6 +30,9 @@ void cpumask_raise_softirq(const cpumask
>  void cpu_raise_softirq(unsigned int cpu, unsigned int nr);
>  void raise_softirq(unsigned int nr);
>  
> +void cpu_raise_softirq_batch_begin(void);
> +void cpu_raise_softirq_batch_finish(void);
> +
>  /*
>   * Process pending softirqs on this CPU. This should be called periodically
>   * when performing work that prevents softirqs from running in a timely manner.
>
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel


[-- Attachment #1.2: Type: text/html, Size: 7922 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

next prev parent reply	other threads:[~2014-09-11 10:48 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-11  9:36 [PATCH 0/2] x86: improve remote CPU wakeup Jan Beulich
2014-09-11  9:40 ` [PATCH 1/2] x86: suppress event check IPI to MWAITing CPUs Jan Beulich
2014-09-11 10:02   ` Andrew Cooper
2014-09-11 10:07     ` Jan Beulich
2014-09-11 10:09       ` Andrew Cooper
2014-09-11 10:26         ` Jan Beulich
2014-09-11  9:40 ` [PATCH 2/2] x86/HVM: batch vCPU wakeups Jan Beulich
2014-09-11 10:48   ` Andrew Cooper [this message]
2014-09-11 11:03     ` Jan Beulich
2014-09-11 11:11       ` Andrew Cooper
2014-09-18 10:59 ` [PATCH 0/2] x86: improve remote CPU wakeup Tim Deegan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54117DE0.1010203@citrix.com \
    --to=andrew.cooper3@citrix.com \
    --cc=Ian.Campbell@eu.citrix.com \
    --cc=Ian.Jackson@eu.citrix.com \
    --cc=JBeulich@suse.com \
    --cc=keir@xen.org \
    --cc=tim@xen.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).