From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Jan Beulich <JBeulich@suse.com>,
xen-devel <xen-devel@lists.xenproject.org>
Cc: Ian Campbell <Ian.Campbell@eu.citrix.com>,
Ian Jackson <Ian.Jackson@eu.citrix.com>,
Keir Fraser <keir@xen.org>, Tim Deegan <tim@xen.org>
Subject: Re: [PATCH 2/2] x86/HVM: batch vCPU wakeups
Date: Thu, 11 Sep 2014 11:48:00 +0100 [thread overview]
Message-ID: <54117DE0.1010203@citrix.com> (raw)
In-Reply-To: <54118A3A0200007800033BCD@mail.emea.novell.com>
[-- Attachment #1.1: Type: text/plain, Size: 7322 bytes --]
On 11/09/14 10:40, Jan Beulich wrote:
> Mass wakeups (via vlapic_ipi()) can take enormous amounts of time,
> especially when many of the remote pCPU-s are in deep C-states. For
> 64-vCPU Windows Server 2012 R2 guests on Ivybridge hardware,
> accumulated times of over 2ms were observed (average 1.1ms).
> Considering that Windows broadcasts IPIs from its timer interrupt,
> which at least at certain times can run at 1kHz, it is clear that this
> can't result in good guest behavior. In fact, on said hardware guests
> with significantly beyond 40 vCPU-s simply hung when e.g. ServerManager
> gets started.
>
> This isn't just helping to reduce the number of ICR writes when the
> host APICs run in clustered mode, it also reduces them by suppressing
> the sends altogether when - by the time
> cpu_raise_softirq_batch_finish() is reached - the remote CPU already
> managed to handle the softirq. Plus - when using MONITOR/MWAIT - the
> update of softirq_pending(cpu), being on the monitored cache line -
> should make the remote CPU wake up ahead of the ICR being sent,
> allowing the wait-for-ICR-idle latencies to be reduced (perhaps to a
> large part due to overlapping the wakeups of multiple CPUs).
>
> With this alone (i.e. without the IPI avoidance patch in place),
> average broadcast times for a 64-vCPU guest went down to a measured
> maximum of 310us. With that other patch in place, improvements aren't
> as clear anymore (short term averages only went down from 255us to
> 250us, which clearly is within the error range of the measurements),
> but longer term an improvement of the averages is still visible.
> Depending on hardware, long term maxima were observed to go down quite
> a bit (on aforementioned hardware), while they were seen to go up
> again on a (single core) Nehalem (where instead the improvement on the
> average values was more visible).
>
> Of course this necessarily increases the latencies for the remote
> CPU wakeup at least slightly. To weigh between the effects, the
> condition to enable batching in vlapic_ipi() may need further tuning.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>
> --- a/xen/arch/x86/hvm/vlapic.c
> +++ b/xen/arch/x86/hvm/vlapic.c
> @@ -409,6 +409,26 @@ void vlapic_handle_EOI_induced_exit(stru
> hvm_dpci_msi_eoi(current->domain, vector);
> }
>
> +static bool_t is_multicast_dest(struct vlapic *vlapic, unsigned int short_hand,
> + uint32_t dest, bool_t dest_mode)
> +{
> + if ( vlapic_domain(vlapic)->max_vcpus <= 2 )
> + return 0;
> +
> + if ( short_hand )
> + return short_hand != APIC_DEST_SELF;
> +
> + if ( vlapic_x2apic_mode(vlapic) )
> + return dest_mode ? hweight16(dest) > 1 : dest == 0xffffffff;
> +
> + if ( dest_mode )
> + return hweight8(dest &
> + GET_xAPIC_DEST_FIELD(vlapic_get_reg(vlapic,
> + APIC_DFR))) > 1;
> +
> + return dest == 0xff;
> +}
Much more readable!
> +
> void vlapic_ipi(
> struct vlapic *vlapic, uint32_t icr_low, uint32_t icr_high)
> {
> @@ -447,12 +467,18 @@ void vlapic_ipi(
>
> default: {
> struct vcpu *v;
> + bool_t batch = is_multicast_dest(vlapic, short_hand, dest, dest_mode);
> +
> + if ( batch )
> + cpu_raise_softirq_batch_begin();
> for_each_vcpu ( vlapic_domain(vlapic), v )
> {
> if ( vlapic_match_dest(vcpu_vlapic(v), vlapic,
> short_hand, dest, dest_mode) )
> vlapic_accept_irq(v, icr_low);
> }
> + if ( batch )
> + cpu_raise_softirq_batch_finish();
> break;
> }
> }
> --- a/xen/common/softirq.c
> +++ b/xen/common/softirq.c
> @@ -23,6 +23,9 @@ irq_cpustat_t irq_stat[NR_CPUS];
>
> static softirq_handler softirq_handlers[NR_SOFTIRQS];
>
> +static DEFINE_PER_CPU(cpumask_t, batch_mask);
> +static DEFINE_PER_CPU(unsigned int, batching);
> +
> static void __do_softirq(unsigned long ignore_mask)
> {
> unsigned int i, cpu;
> @@ -71,24 +74,58 @@ void open_softirq(int nr, softirq_handle
> void cpumask_raise_softirq(const cpumask_t *mask, unsigned int nr)
> {
> unsigned int cpu, this_cpu = smp_processor_id();
> - cpumask_t send_mask;
> + cpumask_t send_mask, *raise_mask;
> +
> + if ( !per_cpu(batching, this_cpu) || in_irq() )
> + {
> + cpumask_clear(&send_mask);
> + raise_mask = &send_mask;
> + }
> + else
> + raise_mask = &per_cpu(batch_mask, this_cpu);
>
> - cpumask_clear(&send_mask);
> for_each_cpu(cpu, mask)
> if ( !test_and_set_bit(nr, &softirq_pending(cpu)) &&
> cpu != this_cpu &&
> !arch_skip_send_event_check(cpu) )
> - cpumask_set_cpu(cpu, &send_mask);
> + cpumask_set_cpu(cpu, raise_mask);
>
> - smp_send_event_check_mask(&send_mask);
> + if ( raise_mask == &send_mask )
> + smp_send_event_check_mask(raise_mask);
> }
>
> void cpu_raise_softirq(unsigned int cpu, unsigned int nr)
> {
> - if ( !test_and_set_bit(nr, &softirq_pending(cpu))
> - && (cpu != smp_processor_id())
> - && !arch_skip_send_event_check(cpu) )
> + unsigned int this_cpu = smp_processor_id();
> +
> + if ( test_and_set_bit(nr, &softirq_pending(cpu))
> + || (cpu == this_cpu)
> + || arch_skip_send_event_check(cpu) )
> + return;
> +
> + if ( !per_cpu(batching, this_cpu) || in_irq() )
> smp_send_event_check_cpu(cpu);
> + else
> + set_bit(nr, &per_cpu(batch_mask, this_cpu));
Under what circumstances would it be sensible to batch calls to
cpu_raise_softirq()?
All of the current callers are singleshot events, and their use in a
batched period would only be as a result of a timer interrupt, which
bypasses the batching.
~Andrew
> +}
> +
> +void cpu_raise_softirq_batch_begin(void)
> +{
> + ++this_cpu(batching);
> +}
> +
> +void cpu_raise_softirq_batch_finish(void)
> +{
> + unsigned int cpu, this_cpu = smp_processor_id();
> + cpumask_t *mask = &per_cpu(batch_mask, this_cpu);
> +
> + ASSERT(per_cpu(batching, this_cpu));
> + for_each_cpu ( cpu, mask )
> + if ( !softirq_pending(cpu) )
> + cpumask_clear_cpu(cpu, mask);
> + smp_send_event_check_mask(mask);
> + cpumask_clear(mask);
> + --per_cpu(batching, this_cpu);
> }
>
> void raise_softirq(unsigned int nr)
> --- a/xen/include/xen/softirq.h
> +++ b/xen/include/xen/softirq.h
> @@ -30,6 +30,9 @@ void cpumask_raise_softirq(const cpumask
> void cpu_raise_softirq(unsigned int cpu, unsigned int nr);
> void raise_softirq(unsigned int nr);
>
> +void cpu_raise_softirq_batch_begin(void);
> +void cpu_raise_softirq_batch_finish(void);
> +
> /*
> * Process pending softirqs on this CPU. This should be called periodically
> * when performing work that prevents softirqs from running in a timely manner.
>
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
[-- Attachment #1.2: Type: text/html, Size: 7922 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
next prev parent reply other threads:[~2014-09-11 10:48 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-09-11 9:36 [PATCH 0/2] x86: improve remote CPU wakeup Jan Beulich
2014-09-11 9:40 ` [PATCH 1/2] x86: suppress event check IPI to MWAITing CPUs Jan Beulich
2014-09-11 10:02 ` Andrew Cooper
2014-09-11 10:07 ` Jan Beulich
2014-09-11 10:09 ` Andrew Cooper
2014-09-11 10:26 ` Jan Beulich
2014-09-11 9:40 ` [PATCH 2/2] x86/HVM: batch vCPU wakeups Jan Beulich
2014-09-11 10:48 ` Andrew Cooper [this message]
2014-09-11 11:03 ` Jan Beulich
2014-09-11 11:11 ` Andrew Cooper
2014-09-18 10:59 ` [PATCH 0/2] x86: improve remote CPU wakeup Tim Deegan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54117DE0.1010203@citrix.com \
--to=andrew.cooper3@citrix.com \
--cc=Ian.Campbell@eu.citrix.com \
--cc=Ian.Jackson@eu.citrix.com \
--cc=JBeulich@suse.com \
--cc=keir@xen.org \
--cc=tim@xen.org \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.