From: Chao Gao <chao.gao@intel.com>
To: xen-devel@lists.xen.org
Cc: Andrew Cooper <andrew.cooper3@citrix.com>,
Kevin Tian <kevin.tian@intel.com>,
Jun Nakajima <jun.nakajima@intel.com>,
Jan Beulich <jbeulich@suse.com>, Chao Gao <chao.gao@intel.com>
Subject: [PATCH v3 3/3] VT-d PI: restrict the vcpu number on a given pcpu
Date: Wed, 24 May 2017 14:56:17 +0800 [thread overview]
Message-ID: <1495608977-15921-4-git-send-email-chao.gao@intel.com> (raw)
In-Reply-To: <1495608977-15921-1-git-send-email-chao.gao@intel.com>
Currently, a blocked vCPU is put in its pCPU's pi blocking list. If
too many vCPUs are blocked on a given pCPU, it will incur that the list
grows too long. After a simple analysis, there are 32k domains and
128 vcpu per domain, thus about 4M vCPUs may be blocked in one pCPU's
PI blocking list. When a wakeup interrupt arrives, the list is
traversed to find some specific vCPUs to wake them up. This traversal in
that case would consume much time.
To mitigate this issue, this patch limits the vcpu number on a given
pCPU, taking factors such as perfomance of common case, current hvm vcpu
count and current pcpu count into consideration. With this method, for
the common case, it works fast and for some extreme cases, the list
length is under control.
The change in vmx_pi_unblock_vcpu() is for the following case:
vcpu is running -> try to block (this patch may change NSDT to
another pCPU) but notification comes in time, thus the vcpu
goes back to running station -> VM-entry (we should set NSDT again,
reverting the change we make to NSDT in vmx_vcpu_block())
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
xen/arch/x86/hvm/vmx/vmx.c | 70 +++++++++++++++++++++++++++++++++++++++++-----
1 file changed, 63 insertions(+), 7 deletions(-)
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index abbf16b..91ee65b 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -100,16 +100,62 @@ void vmx_pi_per_cpu_init(unsigned int cpu)
spin_lock_init(&per_cpu(vmx_pi_blocking, cpu).lock);
}
+/*
+ * By default, the local pcpu (means the one the vcpu is currently running on)
+ * is chosen as the destination of wakeup interrupt. But if the vcpu number of
+ * the pcpu exceeds a limit, another pcpu is chosen until we find a suitable
+ * one.
+ *
+ * Currently, choose (v_tot/p_tot) + K as the limit of vcpu count, where
+ * v_tot is the total number of hvm vcpus on the system, p_tot is the total
+ * number of pcpus in the system, and K is a fixed number. Experments shows
+ * the maximum time to wakeup a vcpu from a 128-entry blocking list is about
+ * 22us, which is tolerable. So choose 128 as the fixed number K.
+ *
+ * This policy makes sure:
+ * 1) for common cases, the limit won't be reached and the local pcpu is used
+ * which is beneficial to performance (at least, avoid an IPI when unblocking
+ * vcpu).
+ * 2) for the worst case, the blocking list length scales with the vcpu count
+ * divided by the pcpu count.
+ */
+#define PI_LIST_FIXED_NUM 128
+#define PI_LIST_LIMIT (atomic_read(&num_hvm_vcpus) / num_online_cpus() + \
+ PI_LIST_FIXED_NUM)
+
+static bool pi_over_limit(int count)
+{
+ /* Compare w/ constant first to save an atomic read in the common case */
+ return ((count > PI_LIST_FIXED_NUM) &&
+ (count > (atomic_read(&num_hvm_vcpus) / num_online_cpus()) +
+ PI_LIST_FIXED_NUM));
+}
+
static void vmx_vcpu_block(struct vcpu *v)
{
unsigned long flags;
- unsigned int dest;
+ unsigned int dest, pi_cpu;
spinlock_t *old_lock;
- spinlock_t *pi_blocking_list_lock =
- &per_cpu(vmx_pi_blocking, v->processor).lock;
struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
+ spinlock_t *pi_blocking_list_lock;
+
+ pi_cpu = v->processor;
+ retry:
+ pi_blocking_list_lock = &per_cpu(vmx_pi_blocking, pi_cpu).lock;
spin_lock_irqsave(pi_blocking_list_lock, flags);
+ /*
+ * Since pi_cpu may now be one other than the one v is currently
+ * running on, check to make sure that it's still up.
+ */
+ if ( unlikely((!cpu_online(pi_cpu)) ||
+ pi_over_limit(per_cpu(vmx_pi_blocking, pi_cpu).counter)) )
+ {
+ pi_cpu = cpumask_cycle(pi_cpu, &cpu_online_map);
+ spin_unlock_irqrestore(pi_blocking_list_lock, flags);
+ goto retry;
+ }
+
old_lock = cmpxchg(&v->arch.hvm_vmx.pi_blocking.lock, NULL,
pi_blocking_list_lock);
@@ -120,11 +166,11 @@ static void vmx_vcpu_block(struct vcpu *v)
*/
ASSERT(old_lock == NULL);
- per_cpu(vmx_pi_blocking, v->processor).counter++;
- TRACE_4D(TRC_HVM_PI_LIST_ADD, v->domain->domain_id, v->vcpu_id,
- v->processor, per_cpu(vmx_pi_blocking, v->processor).counter);
+ per_cpu(vmx_pi_blocking, pi_cpu).counter++;
+ TRACE_4D(TRC_HVM_PI_LIST_ADD, v->domain->domain_id, v->vcpu_id, pi_cpu,
+ per_cpu(vmx_pi_blocking, pi_cpu).counter);
list_add_tail(&v->arch.hvm_vmx.pi_blocking.list,
- &per_cpu(vmx_pi_blocking, v->processor).list);
+ &per_cpu(vmx_pi_blocking, pi_cpu).list);
spin_unlock_irqrestore(pi_blocking_list_lock, flags);
ASSERT(!pi_test_sn(pi_desc));
@@ -134,6 +180,13 @@ static void vmx_vcpu_block(struct vcpu *v)
ASSERT(pi_desc->ndst ==
(x2apic_enabled ? dest : MASK_INSR(dest, PI_xAPIC_NDST_MASK)));
+ if ( unlikely(pi_cpu != v->processor) )
+ {
+ dest = cpu_physical_id(pi_cpu);
+ write_atomic(&pi_desc->ndst,
+ (x2apic_enabled ? dest : MASK_INSR(dest, PI_xAPIC_NDST_MASK)));
+ }
+
write_atomic(&pi_desc->nv, pi_wakeup_vector);
}
@@ -163,6 +216,7 @@ static void vmx_pi_unblock_vcpu(struct vcpu *v)
unsigned long flags;
spinlock_t *pi_blocking_list_lock;
struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
+ unsigned int dest = cpu_physical_id(v->processor);
/*
* Set 'NV' field back to posted_intr_vector, so the
@@ -170,6 +224,8 @@ static void vmx_pi_unblock_vcpu(struct vcpu *v)
* it is running in non-root mode.
*/
write_atomic(&pi_desc->nv, posted_intr_vector);
+ write_atomic(&pi_desc->ndst,
+ x2apic_enabled ? dest : MASK_INSR(dest, PI_xAPIC_NDST_MASK));
pi_blocking_list_lock = v->arch.hvm_vmx.pi_blocking.lock;
--
1.8.3.1
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
next prev parent reply other threads:[~2017-05-24 6:56 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-05-24 6:56 [PATCH v3 0/3] mitigate the per-pCPU blocking list may be too long Chao Gao
2017-05-24 6:56 ` [PATCH v3] VT-d PI: track the vcpu number in pi blocking list Chao Gao
2017-06-16 14:34 ` Jan Beulich
2017-06-22 5:16 ` Chao Gao
2017-06-22 6:51 ` Jan Beulich
2017-05-24 6:56 ` [PATCH v3 2/3] vcpu: track hvm vcpu number on the system Chao Gao
2017-06-16 14:44 ` Jan Beulich
2017-05-24 6:56 ` Chao Gao [this message]
2017-06-16 15:09 ` [PATCH v3 3/3] VT-d PI: restrict the vcpu number on a given pcpu Jan Beulich
2017-06-23 4:22 ` Chao Gao
2017-06-23 7:58 ` Jan Beulich
2017-06-23 8:33 ` Chao Gao
2017-06-23 9:05 ` Jan Beulich
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1495608977-15921-4-git-send-email-chao.gao@intel.com \
--to=chao.gao@intel.com \
--cc=andrew.cooper3@citrix.com \
--cc=jbeulich@suse.com \
--cc=jun.nakajima@intel.com \
--cc=kevin.tian@intel.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).