xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Chao Gao <chao.gao@intel.com>
To: xen-devel@lists.xen.org
Cc: Andrew Cooper <andrew.cooper3@citrix.com>,
	Kevin Tian <kevin.tian@intel.com>,
	Jan Beulich <jbeulich@suse.com>,
	Jun Nakajima <jun.nakajima@intel.com>,
	Chao Gao <chao.gao@intel.com>
Subject: [PATCH 2/4] VT-d PI: Randomly Distribute entries to all online pCPUs' pi blocking list
Date: Wed, 26 Apr 2017 08:52:45 +0800	[thread overview]
Message-ID: <1493167967-74144-3-git-send-email-chao.gao@intel.com> (raw)
In-Reply-To: <1493167967-74144-1-git-send-email-chao.gao@intel.com>

Currently, a blocked vCPU is put in its pCPU's pi blocking list. If
too many vCPUs are blocked on one pCPU, it will incur that the list
grows too long. After a simple analysis, we can have 32k domains and
128 vcpu per domain, thus about 4M vCPUs may be blocked in one pCPU's
PI blocking list. When a wakeup interrupt arrives, the list is
travelled to find some specific vCPUs to wake them up. This travel in
that case would consume much time.

To mitigate this issue, this patch randomly distributes entries to all
online pCPUs' pi blocking list, other than the pCPU which the vCPU
(the entry) is running on. With this method, the number of entries in
the PI blocking list can reduce by N-times theoretically. N is the
number of online pCPUs.

The change in vmx_pi_unblock_vcpu() is for the following case:
vcpu is running -> try to block (this patch changes NSDT to
another random pCPU) but notification comes in time, thus the vcpu
goes back to running station -> VM-entry (we should set NSDT again,
reverting the change we make to NSDT in vmx_vcpu_block())

Signed-off-by: Chao Gao <chao.gao@intel.com>
---
 xen/arch/x86/hvm/vmx/vmx.c | 36 +++++++++++++++++++++++++++++-------
 1 file changed, 29 insertions(+), 7 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index a2dac56..4c9af59 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -103,13 +103,28 @@ void vmx_pi_per_cpu_init(unsigned int cpu)
 static void vmx_vcpu_block(struct vcpu *v)
 {
     unsigned long flags;
-    unsigned int dest;
+    unsigned int dest, dest_cpu;
     spinlock_t *old_lock;
-    spinlock_t *pi_blocking_list_lock =
-		&per_cpu(vmx_pi_blocking, v->processor).lock;
     struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
+    spinlock_t *pi_blocking_list_lock;
+
+    /*
+     * After pCPU goes down, the per-cpu PI blocking list is cleared.
+     * To make sure the parameter vCPU is added to the chosen pCPU's
+     * PI blocking list before the list is cleared, just retry when
+     * finding the pCPU has gone down.
+     */
+ retry:
+    dest_cpu = cpumask_any(&cpu_online_map);
+    pi_blocking_list_lock = &per_cpu(vmx_pi_blocking, dest_cpu).lock;
 
     spin_lock_irqsave(pi_blocking_list_lock, flags);
+    if ( unlikely(!cpu_online(dest_cpu)) )
+    {
+        spin_unlock_irqrestore(pi_blocking_list_lock, flags);
+        goto retry;
+    }
+
     old_lock = cmpxchg(&v->arch.hvm_vmx.pi_blocking.lock, NULL,
                        pi_blocking_list_lock);
 
@@ -120,11 +135,11 @@ static void vmx_vcpu_block(struct vcpu *v)
      */
     ASSERT(old_lock == NULL);
 
-    atomic_inc(&per_cpu(vmx_pi_blocking, v->processor).counter);
-    HVMTRACE_4D(VT_D_PI_BLOCK, v->domain->domain_id, v->vcpu_id, v->processor,
-                atomic_read(&per_cpu(vmx_pi_blocking, v->processor).counter));
+    atomic_inc(&per_cpu(vmx_pi_blocking, dest_cpu).counter);
+    HVMTRACE_4D(VT_D_PI_BLOCK, v->domain->domain_id, v->vcpu_id, dest_cpu,
+                atomic_read(&per_cpu(vmx_pi_blocking, dest_cpu).counter));
     list_add_tail(&v->arch.hvm_vmx.pi_blocking.list,
-                  &per_cpu(vmx_pi_blocking, v->processor).list);
+                  &per_cpu(vmx_pi_blocking, dest_cpu).list);
     spin_unlock_irqrestore(pi_blocking_list_lock, flags);
 
     ASSERT(!pi_test_sn(pi_desc));
@@ -134,7 +149,11 @@ static void vmx_vcpu_block(struct vcpu *v)
     ASSERT(pi_desc->ndst ==
            (x2apic_enabled ? dest : MASK_INSR(dest, PI_xAPIC_NDST_MASK)));
 
+    dest = cpu_physical_id(dest_cpu);
+
     write_atomic(&pi_desc->nv, pi_wakeup_vector);
+    write_atomic(&pi_desc->ndst,
+                 (x2apic_enabled ? dest : MASK_INSR(dest, PI_xAPIC_NDST_MASK)));
 }
 
 static void vmx_pi_switch_from(struct vcpu *v)
@@ -163,6 +182,7 @@ static void vmx_pi_unblock_vcpu(struct vcpu *v)
     unsigned long flags;
     spinlock_t *pi_blocking_list_lock;
     struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
+    unsigned int dest = cpu_physical_id(v->processor);
 
     /*
      * Set 'NV' field back to posted_intr_vector, so the
@@ -170,6 +190,8 @@ static void vmx_pi_unblock_vcpu(struct vcpu *v)
      * it is running in non-root mode.
      */
     write_atomic(&pi_desc->nv, posted_intr_vector);
+    write_atomic(&pi_desc->ndst,
+                 x2apic_enabled ? dest : MASK_INSR(dest, PI_xAPIC_NDST_MASK));
 
     pi_blocking_list_lock = v->arch.hvm_vmx.pi_blocking.lock;
 
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

  parent reply	other threads:[~2017-04-26  0:52 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-26  0:52 [PATCH 0/4] mitigate the per-pCPU blocking list may be too long Chao Gao
2017-04-26  0:52 ` [PATCH 1/4] xentrace: add TRC_HVM_VT_D_PI_BLOCK Chao Gao
2017-04-26  0:52 ` Chao Gao [this message]
2017-04-26  0:52 ` [PATCH 3/4] VT-d PI: Add reference count to pi_desc Chao Gao
2017-04-26  0:52 ` [PATCH 4/4] VT-d PI: Don't add vCPU to PI blocking list for a case Chao Gao
2017-04-26  8:19 ` [PATCH 0/4] mitigate the per-pCPU blocking list may be too long Jan Beulich
2017-04-26  3:30   ` Chao Gao
2017-04-26 10:52     ` Jan Beulich
2017-04-26 16:39 ` George Dunlap
2017-04-27  0:43   ` Chao Gao
2017-04-27  9:44     ` George Dunlap
2017-04-27  5:02       ` Chao Gao
2017-05-02  5:45   ` Chao Gao
2017-05-03 10:08     ` George Dunlap
2017-05-03 10:21       ` Jan Beulich
2017-05-08 16:15         ` Chao Gao
2017-05-08  8:39           ` Jan Beulich
2017-05-08 16:38             ` Chao Gao
2017-05-08  9:13           ` George Dunlap
2017-05-08  9:24             ` Jan Beulich
2017-05-08 17:37               ` Chao Gao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1493167967-74144-3-git-send-email-chao.gao@intel.com \
    --to=chao.gao@intel.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=jbeulich@suse.com \
    --cc=jun.nakajima@intel.com \
    --cc=kevin.tian@intel.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).