[PATCHv6 0/6] gnttab: Improve scaleability

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCHv6 0/6] gnttab: Improve scaleability
@ 2015-04-22 16:00 David Vrabel
  2015-04-22 16:00 ` [PATCHv6 1/5] gnttab: add locking documentation David Vrabel
                   ` (4 more replies)
  0 siblings, 5 replies; 21+ messages in thread
From: David Vrabel @ 2015-04-22 16:00 UTC (permalink / raw)
  To: xen-devel
  Cc: Keir Fraser, Jan Beulich, Christoph Egger, Tim Deegan,
	David Vrabel, Matt Wilson, Ian Campbell

The series makes the grant table locking for fine-grained and add
per-VCPU maptrack free lists, which greatly improves scalability.

The series builds on the original series by Matt Wilson and Christoph
Egger from Amazon.

The per-VCPU maptrack free lists makes one of our aggregate intrahost
network throughput benchmarks increases from 15 Gbit/s to 75 Gbit/s,
when compared to just Amazon's original patches.

v6:
  * Remove most uses of the grant table lock.
  * Make the grant table lock a spin lock again (there were only
    writers left after the above)
  * Add per-VCPU maptrack free lists.
v5:
  * Addressed locking issue pointed out by Jan Beulich
  * Fixed git rebase merge issue introduced in v4
    (acquiring locking twice)
  * Change for ()-loop in grant_map_exists
  * Coding style fixes
v4:
  * Coding style nits from Jan Beulich
  * Fixup read locks pointed out by Jan Beulich
  * renamed double_gt_(un)lock to double_maptrack_(un)lock
    per request from Jan Beulich
  * Addressed ASSERT()'s from Jan Beulich
  * Addressed locking issues in unmap_common pointed out
    by Jan Beulich
v3:
  * Addressed gnttab_swap_grant_ref() comment from Andrew Cooper
v2:
  * Add arm part per request from Julien Grall

David

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCHv6 1/5] gnttab: add locking documentation
  2015-04-22 16:00 [PATCHv6 0/6] gnttab: Improve scaleability David Vrabel
@ 2015-04-22 16:00 ` David Vrabel
  2015-04-22 16:00 ` [PATCHv6 2/5] gnttab: introduce per-active entry locks David Vrabel
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 21+ messages in thread
From: David Vrabel @ 2015-04-22 16:00 UTC (permalink / raw)
  To: xen-devel
  Cc: Keir Fraser, Jan Beulich, Christoph Egger, Tim Deegan,
	David Vrabel, Matt Wilson, Ian Campbell

From: Matt Wilson <msw@amazon.com>

The grant table locking is becomes more fine-grained in subsequent
commits.  Describe how it will work.

Signed-off-by: Matt Wilson <msw@amazon.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 docs/misc/grant-tables.txt |   35 ++++++++++++++++++++++++++++++++++-
 1 file changed, 34 insertions(+), 1 deletion(-)

diff --git a/docs/misc/grant-tables.txt b/docs/misc/grant-tables.txt
index 19db4ec..626f40b 100644
--- a/docs/misc/grant-tables.txt
+++ b/docs/misc/grant-tables.txt
@@ -63,6 +63,7 @@ is complete.
   act->domid : remote domain being granted rights
   act->frame : machine frame being granted
   act->pin   : used to hold reference counts
+  act->lock  : spinlock used to serialize access to active entry state
 
  Map tracking
  ~~~~~~~~~~~~
@@ -74,7 +75,39 @@ is complete.
  matching map track entry is then removed, as if unmap had been invoked.
  These are not used by the transfer mechanism.
   map->domid         : owner of the mapped frame
-  map->ref_and_flags : grant reference, ro/rw, mapped for host or device access
+  map->ref           : grant reference
+  map->flags         : ro/rw, mapped for host or device access
+
+********************************************************************************
+ Locking
+ ~~~~~~~
+ Xen uses several locks to serialize access to the internal grant table state.
+
+  grant_table->lock          : lock used to protect grant table size, version,
+                               etc. updates
+  grant_table->maptrack_lock : spinlock used to protect the maptrack state
+  active_grant_entry->lock   : spinlock used to serialize modifications to
+                               active entries
+
+ The primary lock for the grant table is a spinlock. All functions
+ that modify or access members of struct grant_table must acquire the
+ lock around critical sections.  As an exception, testing that ref in
+ in range of nr_grant_frames and nr_status_frames or gt_version is
+ initialized, does _not_ require the grant table lock.  This is safe
+ because the grant table only grows and the table version is only set
+ once (to 1 or 2).  In particular, acquiring an active entry (see
+ below) does not require the grant table lock to be held.
+
+ The maptrack state is protected by its own spinlock. Any access (read
+ or write) of struct grant_table members that have a "maptrack_"
+ prefix must be made while holding the maptrack lock. The maptrack
+ state can be rapidly modified under some workloads, and the critical
+ sections are very small, thus we use a spinlock to protect them.
+
+ Active entries are obtained by calling active_entry_acquire(gt, ref).
+ This function returns a pointer to the active entry after locking its
+ spinlock. Once all access to the active entry is complete, release
+ the lock by calling active_entry_release(act).
 
 ********************************************************************************
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCHv6 2/5] gnttab: introduce per-active entry locks
  2015-04-22 16:00 [PATCHv6 0/6] gnttab: Improve scaleability David Vrabel
  2015-04-22 16:00 ` [PATCHv6 1/5] gnttab: add locking documentation David Vrabel
@ 2015-04-22 16:00 ` David Vrabel
  2015-04-23 12:42   ` Jan Beulich
  2015-04-22 16:00 ` [PATCHv6 3/5] gnttab: split grant table lock into table and maptrack locks David Vrabel
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 21+ messages in thread
From: David Vrabel @ 2015-04-22 16:00 UTC (permalink / raw)
  To: xen-devel
  Cc: Keir Fraser, Jan Beulich, Christoph Egger, Tim Deegan,
	David Vrabel, Matt Wilson, Ian Campbell

From: Matt Wilson <msw@amazon.com>

Instead of protecting the state of a grant table active entry with the
grant table lock, use per active entry locks.

Active entries must be acquired with active_entry_acquire() and
released with active_entry_release() which lock and unlock the entry's
spinlock.

This is the first step to improving grant table scalability and will
allow the heaviliy contented grant table lock to be used less.

Signed-off-by: Matt Wilson <msw@amazon.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 xen/common/grant_table.c |  213 ++++++++++++++++++++++++++++------------------
 1 file changed, 129 insertions(+), 84 deletions(-)

diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index dfb45f8..3a555f9 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -157,10 +157,13 @@ struct active_grant_entry {
                                in the page.                           */
     unsigned      length:16; /* For sub-page grants, the length of the
                                 grant.                                */
+    spinlock_t    lock;      /* lock to protect access of this entry.
+                                see docs/misc/grant-tables.txt for
+                                locking protocol                      */
 };
 
 #define ACGNT_PER_PAGE (PAGE_SIZE / sizeof(struct active_grant_entry))
-#define active_entry(t, e) \
+#define _active_entry(t, e) \
     ((t)->active[(e)/ACGNT_PER_PAGE][(e)%ACGNT_PER_PAGE])
 
 static inline void gnttab_flush_tlb(const struct domain *d)
@@ -188,6 +191,24 @@ nr_active_grant_frames(struct grant_table *gt)
     return num_act_frames_from_sha_frames(nr_grant_frames(gt));
 }
 
+static inline struct active_grant_entry *
+active_entry_acquire(struct grant_table *t, grant_ref_t e)
+{
+    struct active_grant_entry *act;
+
+    ASSERT(spin_is_locked(&t->lock));
+
+    act = &_active_entry(t, e);
+    spin_lock(&act->lock);
+
+    return act;
+}
+
+static inline void active_entry_release(struct active_grant_entry *act)
+{
+    spin_unlock(&act->lock);
+}
+    
 /* Check if the page has been paged out, or needs unsharing. 
    If rc == GNTST_okay, *page contains the page struct with a ref taken.
    Caller must do put_page(*page).
@@ -228,30 +249,6 @@ static int __get_paged_frame(unsigned long gfn, unsigned long *frame, struct pag
     return rc;
 }
 
-static inline void
-double_gt_lock(struct grant_table *lgt, struct grant_table *rgt)
-{
-    if ( lgt < rgt )
-    {
-        spin_lock(&lgt->lock);
-        spin_lock(&rgt->lock);
-    }
-    else
-    {
-        if ( lgt != rgt )
-            spin_lock(&rgt->lock);
-        spin_lock(&lgt->lock);
-    }
-}
-
-static inline void
-double_gt_unlock(struct grant_table *lgt, struct grant_table *rgt)
-{
-    spin_unlock(&lgt->lock);
-    if ( lgt != rgt )
-        spin_unlock(&rgt->lock);
-}
-
 static inline int
 __get_maptrack_handle(
     struct grant_table *t)
@@ -505,26 +502,23 @@ static int grant_map_exists(const struct domain *ld,
                             unsigned long mfn,
                             unsigned int *ref_count)
 {
-    const struct active_grant_entry *act;
+    struct active_grant_entry *act;
     unsigned int ref, max_iter;
     
     ASSERT(spin_is_locked(&rgt->lock));
 
     max_iter = min(*ref_count + (1 << GNTTABOP_CONTINUATION_ARG_SHIFT),
                    nr_grant_entries(rgt));
-    for ( ref = *ref_count; ref < max_iter; ref++ )
+    for ( ref = *ref_count; ref < max_iter; active_entry_release(act), ref++ )
     {
-        act = &active_entry(rgt, ref);
+        act = active_entry_acquire(rgt, ref);
 
-        if ( !act->pin )
-            continue;
-
-        if ( act->domid != ld->domain_id )
-            continue;
-
-        if ( act->frame != mfn )
+        if ( !act->pin ||
+             act->domid != ld->domain_id ||
+             act->frame != mfn )
             continue;
 
+        active_entry_release(act);
         return 0;
     }
 
@@ -548,11 +542,14 @@ static void mapcount(
 
     for ( handle = 0; handle < lgt->maptrack_limit; handle++ )
     {
+        struct active_grant_entry *act;
+
         map = &maptrack_entry(lgt, handle);
         if ( !(map->flags & (GNTMAP_device_map|GNTMAP_host_map)) ||
              map->domid != rd->domain_id )
             continue;
-        if ( active_entry(rd->grant_table, map->ref).frame == mfn )
+        act = &_active_entry(rd->grant_table, map->ref);
+        if ( act->frame == mfn )
             (map->flags & GNTMAP_readonly) ? (*rdc)++ : (*wrc)++;
     }
 }
@@ -576,7 +573,6 @@ __gnttab_map_grant_ref(
     struct page_info *pg = NULL;
     int            rc = GNTST_okay;
     u32            old_pin;
-    u32            act_pin;
     unsigned int   cache_flags;
     struct active_grant_entry *act = NULL;
     struct grant_mapping *mt;
@@ -639,7 +635,7 @@ __gnttab_map_grant_ref(
     if ( unlikely(op->ref >= nr_grant_entries(rgt)))
         PIN_FAIL(unlock_out, GNTST_bad_gntref, "Bad ref (%d).\n", op->ref);
 
-    act = &active_entry(rgt, op->ref);
+    act = active_entry_acquire(rgt, op->ref);
     shah = shared_entry_header(rgt, op->ref);
     if (rgt->gt_version == 1) {
         sha1 = &shared_entry_v1(rgt, op->ref);
@@ -656,7 +652,7 @@ __gnttab_map_grant_ref(
          ((act->domid != ld->domain_id) ||
           (act->pin & 0x80808080U) != 0 ||
           (act->is_sub_page)) )
-        PIN_FAIL(unlock_out, GNTST_general_error,
+        PIN_FAIL(act_release_out, GNTST_general_error,
                  "Bad domain (%d != %d), or risk of counter overflow %08x, or subpage %d\n",
                  act->domid, ld->domain_id, act->pin, act->is_sub_page);
 
@@ -667,7 +663,7 @@ __gnttab_map_grant_ref(
         if ( (rc = _set_status(rgt->gt_version, ld->domain_id,
                                op->flags & GNTMAP_readonly,
                                1, shah, act, status) ) != GNTST_okay )
-             goto unlock_out;
+            goto act_release_out;
 
         if ( !act->pin )
         {
@@ -698,12 +694,9 @@ __gnttab_map_grant_ref(
             GNTPIN_hstr_inc : GNTPIN_hstw_inc;
 
     frame = act->frame;
-    act_pin = act->pin;
 
     cache_flags = (shah->flags & (GTF_PAT | GTF_PWT | GTF_PCD) );
 
-    spin_unlock(&rgt->lock);
-
     /* pg may be set, with a refcount included, from __get_paged_frame */
     if ( !pg )
     {
@@ -778,8 +771,6 @@ __gnttab_map_grant_ref(
         goto undo_out;
     }
 
-    double_gt_lock(lgt, rgt);
-
     if ( gnttab_need_iommu_mapping(ld) )
     {
         unsigned int wrc, rdc;
@@ -787,21 +778,20 @@ __gnttab_map_grant_ref(
         /* We're not translated, so we know that gmfns and mfns are
            the same things, so the IOMMU entry is always 1-to-1. */
         mapcount(lgt, rd, frame, &wrc, &rdc);
-        if ( (act_pin & (GNTPIN_hstw_mask|GNTPIN_devw_mask)) &&
+        if ( (act->pin & (GNTPIN_hstw_mask|GNTPIN_devw_mask)) &&
              !(old_pin & (GNTPIN_hstw_mask|GNTPIN_devw_mask)) )
         {
             if ( wrc == 0 )
                 err = iommu_map_page(ld, frame, frame,
                                      IOMMUF_readable|IOMMUF_writable);
         }
-        else if ( act_pin && !old_pin )
+        else if ( act->pin && !old_pin )
         {
             if ( (wrc + rdc) == 0 )
                 err = iommu_map_page(ld, frame, frame, IOMMUF_readable);
         }
         if ( err )
         {
-            double_gt_unlock(lgt, rgt);
             rc = GNTST_general_error;
             goto undo_out;
         }
@@ -814,7 +804,8 @@ __gnttab_map_grant_ref(
     mt->ref   = op->ref;
     mt->flags = op->flags;
 
-    double_gt_unlock(lgt, rgt);
+    active_entry_release(act);
+    spin_unlock(&rgt->lock);
 
     op->dev_bus_addr = (u64)frame << PAGE_SHIFT;
     op->handle       = handle;
@@ -837,10 +828,6 @@ __gnttab_map_grant_ref(
         put_page(pg);
     }
 
-    spin_lock(&rgt->lock);
-
-    act = &active_entry(rgt, op->ref);
-
     if ( op->flags & GNTMAP_device_map )
         act->pin -= (op->flags & GNTMAP_readonly) ?
             GNTPIN_devr_inc : GNTPIN_devw_inc;
@@ -856,6 +843,9 @@ __gnttab_map_grant_ref(
     if ( !act->pin )
         gnttab_clear_flag(_GTF_reading, status);
 
+ act_release_out:
+    active_entry_release(act);
+
  unlock_out:
     spin_unlock(&rgt->lock);
     op->status = rc;
@@ -939,18 +929,19 @@ __gnttab_unmap_common(
     TRACE_1D(TRC_MEM_PAGE_GRANT_UNMAP, dom);
 
     rgt = rd->grant_table;
-    double_gt_lock(lgt, rgt);
+
+    spin_lock(&rgt->lock);
+    act = active_entry_acquire(rgt, op->map->ref);
 
     op->flags = op->map->flags;
     if ( unlikely(!op->flags) || unlikely(op->map->domid != dom) )
     {
         gdprintk(XENLOG_WARNING, "Unstable handle %u\n", op->handle);
         rc = GNTST_bad_handle;
-        goto unmap_out;
+        goto act_release_out;
     }
 
     op->rd = rd;
-    act = &active_entry(rgt, op->map->ref);
 
     if ( op->frame == 0 )
     {
@@ -959,7 +950,7 @@ __gnttab_unmap_common(
     else
     {
         if ( unlikely(op->frame != act->frame) )
-            PIN_FAIL(unmap_out, GNTST_general_error,
+            PIN_FAIL(act_release_out, GNTST_general_error,
                      "Bad frame number doesn't match gntref. (%lx != %lx)\n",
                      op->frame, act->frame);
         if ( op->flags & GNTMAP_device_map )
@@ -978,7 +969,7 @@ __gnttab_unmap_common(
         if ( (rc = replace_grant_host_mapping(op->host_addr,
                                               op->frame, op->new_addr, 
                                               op->flags)) < 0 )
-            goto unmap_out;
+            goto act_release_out;
 
         ASSERT(act->pin & (GNTPIN_hstw_mask | GNTPIN_hstr_mask));
         op->map->flags &= ~GNTMAP_host_map;
@@ -1000,7 +991,7 @@ __gnttab_unmap_common(
         if ( err )
         {
             rc = GNTST_general_error;
-            goto unmap_out;
+            goto act_release_out;
         }
     }
 
@@ -1008,8 +999,10 @@ __gnttab_unmap_common(
     if ( !(op->flags & GNTMAP_readonly) )
          gnttab_mark_dirty(rd, op->frame);
 
- unmap_out:
-    double_gt_unlock(lgt, rgt);
+ act_release_out:
+    active_entry_release(act);
+    spin_unlock(&rgt->lock);
+
     op->status = rc;
     rcu_unlock_domain(rd);
 }
@@ -1039,12 +1032,12 @@ __gnttab_unmap_common_complete(struct gnttab_unmap_common *op)
 
     rcu_lock_domain(rd);
     rgt = rd->grant_table;
-    spin_lock(&rgt->lock);
 
+    spin_lock(&rgt->lock);
     if ( rgt->gt_version == 0 )
-        goto unmap_out;
+        goto unlock_out;
 
-    act = &active_entry(rgt, op->map->ref);
+    act = active_entry_acquire(rgt, op->map->ref);
     sha = shared_entry_header(rgt, op->map->ref);
 
     if ( rgt->gt_version == 1 )
@@ -1058,7 +1051,7 @@ __gnttab_unmap_common_complete(struct gnttab_unmap_common *op)
          * Suggests that __gntab_unmap_common failed early and so
          * nothing further to do
          */
-        goto unmap_out;
+        goto act_release_out;
     }
 
     pg = mfn_to_page(op->frame);
@@ -1082,7 +1075,7 @@ __gnttab_unmap_common_complete(struct gnttab_unmap_common *op)
              * Suggests that __gntab_unmap_common failed in
              * replace_grant_host_mapping() so nothing further to do
              */
-            goto unmap_out;
+            goto act_release_out;
         }
 
         if ( !is_iomem_page(op->frame) ) 
@@ -1103,8 +1096,12 @@ __gnttab_unmap_common_complete(struct gnttab_unmap_common *op)
     if ( act->pin == 0 )
         gnttab_clear_flag(_GTF_reading, status);
 
- unmap_out:
+ act_release_out:
+    active_entry_release(act);
+    
+ unlock_out:
     spin_unlock(&rgt->lock);
+
     if ( put_handle )
     {
         op->map->flags = 0;
@@ -1290,13 +1287,17 @@ gnttab_unpopulate_status_frames(struct domain *d, struct grant_table *gt)
     gt->nr_status_frames = 0;
 }
 
+/* 
+ * Grow the grant table. The caller must hold the grant table's
+ * write lock before calling this function.
+ */
 int
 gnttab_grow_table(struct domain *d, unsigned int req_nr_frames)
 {
-    /* d's grant table lock must be held by the caller */
+    /* d's grant table write lock must be held by the caller */
 
     struct grant_table *gt = d->grant_table;
-    unsigned int i;
+    unsigned int i, j;
 
     ASSERT(req_nr_frames <= max_grant_frames);
 
@@ -1311,6 +1312,8 @@ gnttab_grow_table(struct domain *d, unsigned int req_nr_frames)
         if ( (gt->active[i] = alloc_xenheap_page()) == NULL )
             goto active_alloc_failed;
         clear_page(gt->active[i]);
+        for ( j = 0; j < ACGNT_PER_PAGE; j++ )
+            spin_lock_init(&gt->active[i][j].lock);
     }
 
     /* Shared */
@@ -1806,7 +1809,7 @@ __release_grant_for_copy(
 
     spin_lock(&rgt->lock);
 
-    act = &active_entry(rgt, gref);
+    act = active_entry_acquire(rgt, gref);
     sha = shared_entry_header(rgt, gref);
     r_frame = act->frame;
 
@@ -1845,6 +1848,7 @@ __release_grant_for_copy(
         released_read = 1;
     }
 
+    active_entry_release(act);
     spin_unlock(&rgt->lock);
 
     if ( td != rd )
@@ -1906,14 +1910,14 @@ __acquire_grant_for_copy(
     spin_lock(&rgt->lock);
 
     if ( rgt->gt_version == 0 )
-        PIN_FAIL(unlock_out, GNTST_general_error,
+        PIN_FAIL(gnt_unlock_out, GNTST_general_error,
                  "remote grant table not ready\n");
 
     if ( unlikely(gref >= nr_grant_entries(rgt)) )
-        PIN_FAIL(unlock_out, GNTST_bad_gntref,
+        PIN_FAIL(gnt_unlock_out, GNTST_bad_gntref,
                  "Bad grant reference %ld\n", gref);
 
-    act = &active_entry(rgt, gref);
+    act = active_entry_acquire(rgt, gref);
     shah = shared_entry_header(rgt, gref);
     if ( rgt->gt_version == 1 )
     {
@@ -1972,6 +1976,13 @@ __acquire_grant_for_copy(
                 PIN_FAIL(unlock_out_clear, GNTST_general_error,
                          "transitive grant referenced bad domain %d\n",
                          trans_domid);
+
+            /* 
+             * __acquire_grant_for_copy() could take the read lock on
+             * the right table (if rd == td), so we have to drop the
+             * lock here and reacquire
+             */
+            active_entry_release(act);
             spin_unlock(&rgt->lock);
 
             rc = __acquire_grant_for_copy(td, trans_gref, rd->domain_id,
@@ -1979,10 +1990,13 @@ __acquire_grant_for_copy(
                                           &trans_page_off, &trans_length, 0);
 
             spin_lock(&rgt->lock);
+            act = active_entry_acquire(rgt, gref);
+
             if ( rc != GNTST_okay ) {
                 __fixup_status_for_copy_pin(act, status);
-                rcu_unlock_domain(td);
+                active_entry_release(act);
                 spin_unlock(&rgt->lock);
+                rcu_unlock_domain(td);
                 return rc;
             }
 
@@ -1994,6 +2008,7 @@ __acquire_grant_for_copy(
             {
                 __fixup_status_for_copy_pin(act, status);
                 rcu_unlock_domain(td);
+                active_entry_release(act);
                 spin_unlock(&rgt->lock);
                 put_page(*page);
                 return __acquire_grant_for_copy(rd, gref, ldom, readonly,
@@ -2062,6 +2077,7 @@ __acquire_grant_for_copy(
     *length = act->length;
     *frame = act->frame;
 
+    active_entry_release(act);
     spin_unlock(&rgt->lock);
     return rc;
  
@@ -2074,7 +2090,11 @@ __acquire_grant_for_copy(
         gnttab_clear_flag(_GTF_reading, status);
 
  unlock_out:
+    active_entry_release(act);
+
+ gnt_unlock_out:
     spin_unlock(&rgt->lock);
+
     return rc;
 }
 
@@ -2399,16 +2419,18 @@ gnttab_set_version(XEN_GUEST_HANDLE_PARAM(gnttab_set_version_t) uop)
     {
         for ( i = GNTTAB_NR_RESERVED_ENTRIES; i < nr_grant_entries(gt); i++ )
         {
-            act = &active_entry(gt, i);
+            act = active_entry_acquire(gt, i);
             if ( act->pin != 0 )
             {
                 gdprintk(XENLOG_WARNING,
                          "tried to change grant table version from %d to %d, but some grant entries still in use\n",
                          gt->gt_version,
                          op.version);
+                active_entry_release(act);
                 res = -EBUSY;
                 goto out_unlock;
             }
+            active_entry_release(act);
         }
     }
 
@@ -2587,9 +2609,17 @@ __gnttab_swap_grant_ref(grant_ref_t ref_a, grant_ref_t ref_b)
 {
     struct domain *d = rcu_lock_current_domain();
     struct grant_table *gt = d->grant_table;
-    struct active_grant_entry *act;
+    struct active_grant_entry *act_a = NULL;
+    struct active_grant_entry *act_b = NULL;
     s16 rc = GNTST_okay;
 
+    if ( ref_a == ref_b )
+        /*
+         * noop, so avoid acquiring the same active entry
+         * twice which would case a deadlock.
+         */
+        return rc;
+
     spin_lock(&gt->lock);
 
     /* Bounds check on the grant refs */
@@ -2598,12 +2628,12 @@ __gnttab_swap_grant_ref(grant_ref_t ref_a, grant_ref_t ref_b)
     if ( unlikely(ref_b >= nr_grant_entries(d->grant_table)))
         PIN_FAIL(out, GNTST_bad_gntref, "Bad ref-b (%d).\n", ref_b);
 
-    act = &active_entry(gt, ref_a);
-    if ( act->pin )
+    act_a = active_entry_acquire(gt, ref_a);
+    if ( act_a->pin )
         PIN_FAIL(out, GNTST_eagain, "ref a %ld busy\n", (long)ref_a);
 
-    act = &active_entry(gt, ref_b);
-    if ( act->pin )
+    act_b = active_entry_acquire(gt, ref_b);
+    if ( act_b->pin )
         PIN_FAIL(out, GNTST_eagain, "ref b %ld busy\n", (long)ref_b);
 
     if ( gt->gt_version == 1 )
@@ -2630,6 +2660,10 @@ __gnttab_swap_grant_ref(grant_ref_t ref_a, grant_ref_t ref_b)
     }
 
 out:
+    if ( act_b != NULL )
+        active_entry_release(act_b);
+    if ( act_a != NULL )
+        active_entry_release(act_a);
     spin_unlock(&gt->lock);
 
     rcu_unlock_domain(d);
@@ -2939,7 +2973,7 @@ grant_table_create(
     struct domain *d)
 {
     struct grant_table *t;
-    int                 i;
+    int                 i, j;
 
     if ( (t = xzalloc(struct grant_table)) == NULL )
         goto no_mem_0;
@@ -2958,6 +2992,8 @@ grant_table_create(
         if ( (t->active[i] = alloc_xenheap_page()) == NULL )
             goto no_mem_2;
         clear_page(t->active[i]);
+        for ( j = 0; j < ACGNT_PER_PAGE; j++ )
+            spin_lock_init(&t->active[i][j].lock);
     }
 
     /* Tracking of mapped foreign frames table */
@@ -3054,7 +3090,7 @@ gnttab_release_mappings(
         rgt = rd->grant_table;
         spin_lock(&rgt->lock);
 
-        act = &active_entry(rgt, ref);
+        act = active_entry_acquire(rgt, ref);
         sha = shared_entry_header(rgt, ref);
         if (rgt->gt_version == 1)
             status = &sha->flags;
@@ -3112,6 +3148,7 @@ gnttab_release_mappings(
         if ( act->pin == 0 )
             gnttab_clear_flag(_GTF_reading, status);
 
+        active_entry_release(act);
         spin_unlock(&rgt->lock);
 
         rcu_unlock_domain(rd);
@@ -3130,7 +3167,9 @@ grant_table_destroy(
 
     if ( t == NULL )
         return;
-    
+
+    spin_lock(&t->lock);
+
     for ( i = 0; i < nr_grant_frames(t); i++ )
         free_xenheap_page(t->shared_raw[i]);
     xfree(t->shared_raw);
@@ -3147,6 +3186,8 @@ grant_table_destroy(
         free_xenheap_page(t->status[i]);
     xfree(t->status);
 
+    spin_unlock(&t->lock);
+
     xfree(t);
     d->grant_table = NULL;
 }
@@ -3174,9 +3215,12 @@ static void gnttab_usage_print(struct domain *rd)
         uint16_t status;
         uint64_t frame;
 
-        act = &active_entry(gt, ref);
+        act = active_entry_acquire(gt, ref);
         if ( !act->pin )
+        {
+            active_entry_release(act);
             continue;
+        }
 
         sha = shared_entry_header(gt, ref);
 
@@ -3206,6 +3250,7 @@ static void gnttab_usage_print(struct domain *rd)
         printk("[%3d]    %5d 0x%06lx 0x%08x      %5d 0x%06"PRIx64" 0x%02x\n",
                ref, act->domid, act->frame, act->pin,
                sha->domid, frame, status);
+        active_entry_release(act);
     }
 
  out:
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCHv6 3/5] gnttab: split grant table lock into table and maptrack locks
  2015-04-22 16:00 [PATCHv6 0/6] gnttab: Improve scaleability David Vrabel
  2015-04-22 16:00 ` [PATCHv6 1/5] gnttab: add locking documentation David Vrabel
  2015-04-22 16:00 ` [PATCHv6 2/5] gnttab: introduce per-active entry locks David Vrabel
@ 2015-04-22 16:00 ` David Vrabel
  2015-04-23 15:04   ` Jan Beulich
  2015-04-22 16:00 ` [PATCHv6 4/5] gnttab: remove unnecessary grant table locks David Vrabel
  2015-04-22 16:00 ` [PATCHv6 5/5] gnttab: use per-VCPU maptrack free lists David Vrabel
  4 siblings, 1 reply; 21+ messages in thread
From: David Vrabel @ 2015-04-22 16:00 UTC (permalink / raw)
  To: xen-devel
  Cc: Keir Fraser, Jan Beulich, Christoph Egger, Tim Deegan,
	David Vrabel, Matt Wilson, Ian Campbell

From: Matt Wilson <msw@amazon.com>

The maptrack lock protects the maptrack state only.

Signed-off-by: Matt Wilson <msw@amazon.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
Subsequent changes make both these locks uncontented.  Is this patch
really necessary? -- dvrabel
---
 xen/common/grant_table.c      |   33 ++++++++++++++++++++++++++++-----
 xen/include/xen/grant_table.h |    7 ++++++-
 2 files changed, 34 insertions(+), 6 deletions(-)

diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index 3a555f9..a5470c8 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -264,10 +264,10 @@ static inline void
 put_maptrack_handle(
     struct grant_table *t, int handle)
 {
-    spin_lock(&t->lock);
+    spin_lock(&t->maptrack_lock);
     maptrack_entry(t, handle).ref = t->maptrack_head;
     t->maptrack_head = handle;
-    spin_unlock(&t->lock);
+    spin_unlock(&t->maptrack_lock);
 }
 
 static inline int
@@ -279,7 +279,7 @@ get_maptrack_handle(
     struct grant_mapping *new_mt;
     unsigned int          new_mt_limit, nr_frames;
 
-    spin_lock(&lgt->lock);
+    spin_lock(&lgt->maptrack_lock);
 
     while ( unlikely((handle = __get_maptrack_handle(lgt)) == -1) )
     {
@@ -308,12 +308,15 @@ get_maptrack_handle(
                  nr_frames + 1);
     }
 
-    spin_unlock(&lgt->lock);
+    spin_unlock(&lgt->maptrack_lock);
 
     return handle;
 }
 
-/* Number of grant table entries. Caller must hold d's grant table lock. */
+/* 
+ * Number of grant table entries. Caller must hold d's grant table
+ * read lock.
+ */
 static unsigned int nr_grant_entries(struct grant_table *gt)
 {
     ASSERT(gt->gt_version != 0);
@@ -531,6 +534,13 @@ static int grant_map_exists(const struct domain *ld,
     return -EINVAL;
 }
 
+/* 
+ * Count the number of mapped ro or rw entries tracked in the left
+ * grant table given a provided mfn provided by a foreign domain. 
+ *
+ * This function takes the maptrack lock from the left grant table and
+ * must be called with the right grant table's rwlock held.
+ */
 static void mapcount(
     struct grant_table *lgt, struct domain *rd, unsigned long mfn,
     unsigned int *wrc, unsigned int *rdc)
@@ -540,6 +550,16 @@ static void mapcount(
 
     *wrc = *rdc = 0;
 
+    /* 
+     * While taking the local maptrack spinlock prevents any mapping
+     * changes, the remote active entries could be changing while we
+     * are counting. The caller has to hold the grant table lock, or
+     * some other mechanism should be used to prevent concurrent
+     * changes during this operation.
+     */
+    ASSERT(spin_is_locked(&rd->grant_table->lock));
+    spin_lock(&lgt->maptrack_lock);
+
     for ( handle = 0; handle < lgt->maptrack_limit; handle++ )
     {
         struct active_grant_entry *act;
@@ -552,6 +572,8 @@ static void mapcount(
         if ( act->frame == mfn )
             (map->flags & GNTMAP_readonly) ? (*rdc)++ : (*wrc)++;
     }
+
+    spin_unlock(&lgt->maptrack_lock);
 }
 
 /*
@@ -2980,6 +3002,7 @@ grant_table_create(
 
     /* Simple stuff. */
     spin_lock_init(&t->lock);
+    spin_lock_init(&t->maptrack_lock);
     t->nr_grant_frames = INITIAL_NR_GRANT_FRAMES;
 
     /* Active grant table. */
diff --git a/xen/include/xen/grant_table.h b/xen/include/xen/grant_table.h
index 32f5786..8ff1883 100644
--- a/xen/include/xen/grant_table.h
+++ b/xen/include/xen/grant_table.h
@@ -82,7 +82,12 @@ struct grant_table {
     struct grant_mapping **maptrack;
     unsigned int          maptrack_head;
     unsigned int          maptrack_limit;
-    /* Lock protecting updates to active and shared grant tables. */
+    /* Lock protecting the maptrack page list, head, and limit */
+    spinlock_t            maptrack_lock;
+    /* 
+     * Lock protecting updates to grant table state (version, active
+     * entry list, etc.)
+     */
     spinlock_t            lock;
     /* The defined versions are 1 and 2.  Set to 0 if we don't know
        what version to use yet. */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCHv6 4/5] gnttab: remove unnecessary grant table locks
  2015-04-22 16:00 [PATCHv6 0/6] gnttab: Improve scaleability David Vrabel
                   ` (2 preceding siblings ...)
  2015-04-22 16:00 ` [PATCHv6 3/5] gnttab: split grant table lock into table and maptrack locks David Vrabel
@ 2015-04-22 16:00 ` David Vrabel
  2015-04-23 11:23   ` Tim Deegan
  2015-04-23 15:31   ` Jan Beulich
  2015-04-22 16:00 ` [PATCHv6 5/5] gnttab: use per-VCPU maptrack free lists David Vrabel
  4 siblings, 2 replies; 21+ messages in thread
From: David Vrabel @ 2015-04-22 16:00 UTC (permalink / raw)
  To: xen-devel
  Cc: Keir Fraser, Jan Beulich, Christoph Egger, Tim Deegan,
	David Vrabel, Matt Wilson, Malcolm Crossley, Ian Campbell

From: Malcolm Crossley <malcolm.crossley@citrix.com>

The grant table lock is not required to protect reads of the
table version, the active entry array, or the map track array.

This is safe because: a) the grant table version only changes once
from 0 to 1 or 2; b) the active entry array only grows; and c) the map
track array only grows.

Signed-off-by: Malcolm Crossley <malcolm.crossley@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 xen/common/grant_table.c      |   76 +++++++++--------------------------------
 xen/include/xen/grant_table.h |    8 ++---
 2 files changed, 21 insertions(+), 63 deletions(-)

diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index a5470c8..8f3d125 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -196,8 +196,6 @@ active_entry_acquire(struct grant_table *t, grant_ref_t e)
 {
     struct active_grant_entry *act;
 
-    ASSERT(spin_is_locked(&t->lock));
-
     act = &_active_entry(t, e);
     spin_lock(&act->lock);
 
@@ -647,7 +645,6 @@ __gnttab_map_grant_ref(
     }
 
     rgt = rd->grant_table;
-    spin_lock(&rgt->lock);
 
     if ( rgt->gt_version == 0 )
         PIN_FAIL(unlock_out, GNTST_general_error,
@@ -827,7 +824,6 @@ __gnttab_map_grant_ref(
     mt->flags = op->flags;
 
     active_entry_release(act);
-    spin_unlock(&rgt->lock);
 
     op->dev_bus_addr = (u64)frame << PAGE_SHIFT;
     op->handle       = handle;
@@ -869,7 +865,6 @@ __gnttab_map_grant_ref(
     active_entry_release(act);
 
  unlock_out:
-    spin_unlock(&rgt->lock);
     op->status = rc;
     put_maptrack_handle(lgt, handle);
     rcu_unlock_domain(rd);
@@ -919,18 +914,15 @@ __gnttab_unmap_common(
     }
 
     op->map = &maptrack_entry(lgt, op->handle);
-    spin_lock(&lgt->lock);
 
     if ( unlikely(!op->map->flags) )
     {
-        spin_unlock(&lgt->lock);
         gdprintk(XENLOG_INFO, "Zero flags for handle (%d).\n", op->handle);
         op->status = GNTST_bad_handle;
         return;
     }
 
     dom = op->map->domid;
-    spin_unlock(&lgt->lock);
 
     if ( unlikely((rd = rcu_lock_domain_by_id(dom)) == NULL) )
     {
@@ -952,7 +944,6 @@ __gnttab_unmap_common(
 
     rgt = rd->grant_table;
 
-    spin_lock(&rgt->lock);
     act = active_entry_acquire(rgt, op->map->ref);
 
     op->flags = op->map->flags;
@@ -1023,7 +1014,6 @@ __gnttab_unmap_common(
 
  act_release_out:
     active_entry_release(act);
-    spin_unlock(&rgt->lock);
 
     op->status = rc;
     rcu_unlock_domain(rd);
@@ -1055,7 +1045,6 @@ __gnttab_unmap_common_complete(struct gnttab_unmap_common *op)
     rcu_lock_domain(rd);
     rgt = rd->grant_table;
 
-    spin_lock(&rgt->lock);
     if ( rgt->gt_version == 0 )
         goto unlock_out;
 
@@ -1122,8 +1111,6 @@ __gnttab_unmap_common_complete(struct gnttab_unmap_common *op)
     active_entry_release(act);
     
  unlock_out:
-    spin_unlock(&rgt->lock);
-
     if ( put_handle )
     {
         op->map->flags = 0;
@@ -1277,7 +1264,7 @@ gnttab_populate_status_frames(struct domain *d, struct grant_table *gt,
     for ( i = nr_status_frames(gt); i < req_status_frames; i++ )
         gnttab_create_status_page(d, gt, i);
 
-    gt->nr_status_frames = req_status_frames;
+    atomic_set(&gt->nr_status_frames, req_status_frames);
 
     return 0;
 
@@ -1306,7 +1293,7 @@ gnttab_unpopulate_status_frames(struct domain *d, struct grant_table *gt)
         free_xenheap_page(gt->status[i]);
         gt->status[i] = NULL;
     }
-    gt->nr_status_frames = 0;
+    atomic_set(&gt->nr_status_frames, 0);
 }
 
 /* 
@@ -1356,7 +1343,7 @@ gnttab_grow_table(struct domain *d, unsigned int req_nr_frames)
     /* Share the new shared frames with the recipient domain */
     for ( i = nr_grant_frames(gt); i < req_nr_frames; i++ )
         gnttab_create_shared_page(d, gt, i);
-    gt->nr_grant_frames = req_nr_frames;
+    atomic_set(&gt->nr_grant_frames, req_nr_frames);
 
     return 1;
 
@@ -1423,7 +1410,17 @@ gnttab_setup_table(
     }
 
     gt = d->grant_table;
-    spin_lock(&gt->lock);
+
+    /* Tracking of mapped foreign frames table */
+    if ( (gt->maptrack = xzalloc_array(struct grant_mapping *,
+                                       max_maptrack_frames * d->max_vcpus)) == NULL )
+        goto out2;
+    for_each_vcpu( d, v )
+    {
+        v->maptrack_head = MAPTRACK_TAIL;
+        v->maptrack_tail = MAPTRACK_TAIL;
+    }
+    gt->maptrack_pages = 0;
 
     if ( gt->gt_version == 0 )
         gt->gt_version = 1;
@@ -1437,7 +1434,7 @@ gnttab_setup_table(
                  "Expand grant table to %u failed. Current: %u Max: %u\n",
                  op.nr_frames, nr_grant_frames(gt), max_grant_frames);
         op.status = GNTST_general_error;
-        goto out3;
+        goto out2;
     }
  
     op.status = GNTST_okay;
@@ -1450,8 +1447,6 @@ gnttab_setup_table(
             op.status = GNTST_bad_virt_addr;
     }
 
- out3:
-    spin_unlock(&gt->lock);
  out2:
     rcu_unlock_domain(d);
  out1:
@@ -1525,8 +1520,6 @@ gnttab_prepare_for_transfer(
     union grant_combo   scombo, prev_scombo, new_scombo;
     int                 retries = 0;
 
-    spin_lock(&rgt->lock);
-
     if ( rgt->gt_version == 0 )
     {
         gdprintk(XENLOG_INFO,
@@ -1575,12 +1568,9 @@ gnttab_prepare_for_transfer(
 
         scombo = prev_scombo;
     }
-
-    spin_unlock(&rgt->lock);
     return 1;
 
  fail:
-    spin_unlock(&rgt->lock);
     return 0;
 }
 
@@ -1773,8 +1763,6 @@ gnttab_transfer(
         TRACE_1D(TRC_MEM_PAGE_GRANT_TRANSFER, e->domain_id);
 
         /* Tell the guest about its new page frame. */
-        spin_lock(&e->grant_table->lock);
-
         if ( e->grant_table->gt_version == 1 )
         {
             grant_entry_v1_t *sha = &shared_entry_v1(e->grant_table, gop.ref);
@@ -1791,8 +1779,6 @@ gnttab_transfer(
         shared_entry_header(e->grant_table, gop.ref)->flags |=
             GTF_transfer_completed;
 
-        spin_unlock(&e->grant_table->lock);
-
         rcu_unlock_domain(e);
 
         gop.status = GNTST_okay;
@@ -1829,8 +1815,6 @@ __release_grant_for_copy(
     released_read = 0;
     released_write = 0;
 
-    spin_lock(&rgt->lock);
-
     act = active_entry_acquire(rgt, gref);
     sha = shared_entry_header(rgt, gref);
     r_frame = act->frame;
@@ -1871,7 +1855,6 @@ __release_grant_for_copy(
     }
 
     active_entry_release(act);
-    spin_unlock(&rgt->lock);
 
     if ( td != rd )
     {
@@ -1929,8 +1912,6 @@ __acquire_grant_for_copy(
 
     *page = NULL;
 
-    spin_lock(&rgt->lock);
-
     if ( rgt->gt_version == 0 )
         PIN_FAIL(gnt_unlock_out, GNTST_general_error,
                  "remote grant table not ready\n");
@@ -2005,19 +1986,16 @@ __acquire_grant_for_copy(
              * lock here and reacquire
              */
             active_entry_release(act);
-            spin_unlock(&rgt->lock);
 
             rc = __acquire_grant_for_copy(td, trans_gref, rd->domain_id,
                                           readonly, &grant_frame, page,
                                           &trans_page_off, &trans_length, 0);
 
-            spin_lock(&rgt->lock);
             act = active_entry_acquire(rgt, gref);
 
             if ( rc != GNTST_okay ) {
                 __fixup_status_for_copy_pin(act, status);
                 active_entry_release(act);
-                spin_unlock(&rgt->lock);
                 rcu_unlock_domain(td);
                 return rc;
             }
@@ -2031,7 +2009,6 @@ __acquire_grant_for_copy(
                 __fixup_status_for_copy_pin(act, status);
                 rcu_unlock_domain(td);
                 active_entry_release(act);
-                spin_unlock(&rgt->lock);
                 put_page(*page);
                 return __acquire_grant_for_copy(rd, gref, ldom, readonly,
                                                 frame, page, page_off, length,
@@ -2100,7 +2077,6 @@ __acquire_grant_for_copy(
     *frame = act->frame;
 
     active_entry_release(act);
-    spin_unlock(&rgt->lock);
     return rc;
  
  unlock_out_clear:
@@ -2115,8 +2091,6 @@ __acquire_grant_for_copy(
     active_entry_release(act);
 
  gnt_unlock_out:
-    spin_unlock(&rgt->lock);
-
     return rc;
 }
 
@@ -2432,7 +2406,6 @@ gnttab_set_version(XEN_GUEST_HANDLE_PARAM(gnttab_set_version_t) uop)
     if ( gt->gt_version == op.version )
         goto out;
 
-    spin_lock(&gt->lock);
     /* Make sure that the grant table isn't currently in use when we
        change the version number, except for the first 8 entries which
        are allowed to be in use (xenstore/xenconsole keeps them mapped).
@@ -2642,8 +2615,6 @@ __gnttab_swap_grant_ref(grant_ref_t ref_a, grant_ref_t ref_b)
          */
         return rc;
 
-    spin_lock(&gt->lock);
-
     /* Bounds check on the grant refs */
     if ( unlikely(ref_a >= nr_grant_entries(d->grant_table)))
         PIN_FAIL(out, GNTST_bad_gntref, "Bad ref-a (%d).\n", ref_a);
@@ -2686,7 +2657,6 @@ out:
         active_entry_release(act_b);
     if ( act_a != NULL )
         active_entry_release(act_a);
-    spin_unlock(&gt->lock);
 
     rcu_unlock_domain(d);
 
@@ -2757,12 +2727,9 @@ static int __gnttab_cache_flush(gnttab_cache_flush_t *cflush,
 
     if ( d != owner )
     {
-        spin_lock(&owner->grant_table->lock);
-
         ret = grant_map_exists(d, owner->grant_table, mfn, ref_count);
         if ( ret != 0 )
         {
-            spin_unlock(&owner->grant_table->lock);
             rcu_unlock_domain(d);
             put_page(page);
             return ret;
@@ -2781,8 +2748,6 @@ static int __gnttab_cache_flush(gnttab_cache_flush_t *cflush,
     else
         ret = 0;
 
-    if ( d != owner )
-        spin_unlock(&owner->grant_table->lock);
     unmap_domain_page(v);
     put_page(page);
 
@@ -3003,7 +2968,7 @@ grant_table_create(
     /* Simple stuff. */
     spin_lock_init(&t->lock);
     spin_lock_init(&t->maptrack_lock);
-    t->nr_grant_frames = INITIAL_NR_GRANT_FRAMES;
+    atomic_set(&t->nr_grant_frames, INITIAL_NR_GRANT_FRAMES);
 
     /* Active grant table. */
     if ( (t->active = xzalloc_array(struct active_grant_entry *,
@@ -3050,7 +3015,7 @@ grant_table_create(
     for ( i = 0; i < INITIAL_NR_GRANT_FRAMES; i++ )
         gnttab_create_shared_page(d, t, i);
 
-    t->nr_status_frames = 0;
+    atomic_set(&t->nr_status_frames, 0);
 
     /* Okay, install the structure. */
     d->grant_table = t;
@@ -3111,8 +3076,6 @@ gnttab_release_mappings(
         }
 
         rgt = rd->grant_table;
-        spin_lock(&rgt->lock);
-
         act = active_entry_acquire(rgt, ref);
         sha = shared_entry_header(rgt, ref);
         if (rgt->gt_version == 1)
@@ -3172,7 +3135,6 @@ gnttab_release_mappings(
             gnttab_clear_flag(_GTF_reading, status);
 
         active_entry_release(act);
-        spin_unlock(&rgt->lock);
 
         rcu_unlock_domain(rd);
 
@@ -3224,8 +3186,6 @@ static void gnttab_usage_print(struct domain *rd)
     printk("      -------- active --------       -------- shared --------\n");
     printk("[ref] localdom mfn      pin          localdom gmfn     flags\n");
 
-    spin_lock(&gt->lock);
-
     if ( gt->gt_version == 0 )
         goto out;
 
@@ -3277,8 +3237,6 @@ static void gnttab_usage_print(struct domain *rd)
     }
 
  out:
-    spin_unlock(&gt->lock);
-
     if ( first )
         printk("grant-table for remote domain:%5d ... "
                "no active grant table entries\n", rd->domain_id);
diff --git a/xen/include/xen/grant_table.h b/xen/include/xen/grant_table.h
index 8ff1883..d78c381 100644
--- a/xen/include/xen/grant_table.h
+++ b/xen/include/xen/grant_table.h
@@ -65,7 +65,7 @@ struct grant_mapping {
 /* Per-domain grant information. */
 struct grant_table {
     /* Table size. Number of frames shared with guest */
-    unsigned int          nr_grant_frames;
+    atomic_t nr_grant_frames;
     /* Shared grant table (see include/public/grant_table.h). */
     union {
         void **shared_raw;
@@ -73,7 +73,7 @@ struct grant_table {
         union grant_entry_v2 **shared_v2;
     };
     /* Number of grant status frames shared with guest (for version 2) */
-    unsigned int          nr_status_frames;
+    atomic_t nr_status_frames;
     /* State grant table (see include/public/grant_table.h). */
     grant_status_t       **status;
     /* Active grant table. */
@@ -114,13 +114,13 @@ gnttab_grow_table(struct domain *d, unsigned int req_nr_frames);
 /* Number of grant table frames. Caller must hold d's grant table lock. */
 static inline unsigned int nr_grant_frames(struct grant_table *gt)
 {
-    return gt->nr_grant_frames;
+    return atomic_read(&gt->nr_grant_frames);
 }
 
 /* Number of status grant table frames. Caller must hold d's gr. table lock.*/
 static inline unsigned int nr_status_frames(struct grant_table *gt)
 {
-    return gt->nr_status_frames;
+    return atomic_read(&gt->nr_status_frames);
 }
 
 #define GRANT_STATUS_PER_PAGE (PAGE_SIZE / sizeof(grant_status_t))
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCHv6 5/5] gnttab: use per-VCPU maptrack free lists
  2015-04-22 16:00 [PATCHv6 0/6] gnttab: Improve scaleability David Vrabel
                   ` (3 preceding siblings ...)
  2015-04-22 16:00 ` [PATCHv6 4/5] gnttab: remove unnecessary grant table locks David Vrabel
@ 2015-04-22 16:00 ` David Vrabel
  2015-04-23 16:11   ` Jan Beulich
  4 siblings, 1 reply; 21+ messages in thread
From: David Vrabel @ 2015-04-22 16:00 UTC (permalink / raw)
  To: xen-devel
  Cc: Keir Fraser, Jan Beulich, Christoph Egger, Tim Deegan,
	David Vrabel, Matt Wilson, Malcolm Crossley, Ian Campbell

From: Malcolm Crossley <malcolm.crossley@citrix.com>

Performance analysis of aggregate network throughput with many VMs
shows that performance is signficantly limited by contention on the
maptrack lock when obtaining/releasing maptrack handles from the free
list.

Instead of a single free list use a per-VCPU list. This avoids any
contention when obtaining a handle.  Handles must be released back to
their original list and since this may occur on a different VCPU there
is some contention on the destination VCPU's free list tail pointer
(but this is much better than a per-domain lock).

A domain's maptrack limit is multiplied by the number of VCPUs.  This
ensures that a worst case domain that only performs grant table
operations via one VCPU will not see a lower map track limit.

Signed-off-by: Malcolm Crossley <malcolm.crossley@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 xen/common/grant_table.c      |  129 ++++++++++++++++++++++++-----------------
 xen/include/xen/grant_table.h |    8 ++-
 xen/include/xen/sched.h       |    5 ++
 3 files changed, 86 insertions(+), 56 deletions(-)

diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index 8f3d125..a237d11 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -118,9 +118,9 @@ struct gnttab_unmap_common {
     ((t)->maptrack[(e)/MAPTRACK_PER_PAGE][(e)%MAPTRACK_PER_PAGE])
 
 static inline unsigned int
-nr_maptrack_frames(struct grant_table *t)
+nr_vcpu_maptrack_frames(struct vcpu *v)
 {
-    return t->maptrack_limit / MAPTRACK_PER_PAGE;
+    return v->maptrack_limit / MAPTRACK_PER_PAGE;
 }
 
 #define MAPTRACK_TAIL (~0u)
@@ -249,66 +249,91 @@ static int __get_paged_frame(unsigned long gfn, unsigned long *frame, struct pag
 
 static inline int
 __get_maptrack_handle(
-    struct grant_table *t)
+    struct grant_table *t,
+    struct vcpu *v)
 {
-    unsigned int h;
-    if ( unlikely((h = t->maptrack_head) == MAPTRACK_TAIL) )
+    unsigned int head, next;
+
+    head = v->maptrack_head;
+    if ( unlikely(head == MAPTRACK_TAIL) )
+        return -1;
+
+    next = maptrack_entry(t, head).ref;
+    if ( unlikely(next == MAPTRACK_TAIL) )
         return -1;
-    t->maptrack_head = maptrack_entry(t, h).ref;
-    return h;
+
+    v->maptrack_head = next;
+
+    return head;
 }
 
 static inline void
 put_maptrack_handle(
     struct grant_table *t, int handle)
 {
-    spin_lock(&t->maptrack_lock);
-    maptrack_entry(t, handle).ref = t->maptrack_head;
-    t->maptrack_head = handle;
-    spin_unlock(&t->maptrack_lock);
+    struct domain *d = current->domain;
+    struct vcpu *v;
+    u32 prev_tail, cur_tail;
+
+    /* Set current hande to have TAIL reference */
+    maptrack_entry(t, handle).ref = MAPTRACK_TAIL;
+    v = d->vcpu[maptrack_entry(t,handle).vcpu];
+    cur_tail = v->maptrack_tail;
+    do {
+        prev_tail = cur_tail;
+        cur_tail = cmpxchg((u32 *)&v->maptrack_tail, prev_tail, handle);
+    } while ( cur_tail != prev_tail );
+    maptrack_entry(t, prev_tail).ref = handle;
 }
 
 static inline int
 get_maptrack_handle(
     struct grant_table *lgt)
 {
+    struct vcpu          *v = current;
     int                   i;
     grant_handle_t        handle;
     struct grant_mapping *new_mt;
     unsigned int          new_mt_limit, nr_frames;
 
-    spin_lock(&lgt->maptrack_lock);
-
-    while ( unlikely((handle = __get_maptrack_handle(lgt)) == -1) )
-    {
-        nr_frames = nr_maptrack_frames(lgt);
-        if ( nr_frames >= max_maptrack_frames )
-            break;
-
-        new_mt = alloc_xenheap_page();
-        if ( !new_mt )
-            break;
+    handle = __get_maptrack_handle(lgt, v);
+    if ( likely(handle != -1) )
+        return handle;
 
-        clear_page(new_mt);
+    nr_frames = nr_vcpu_maptrack_frames(v);
+    if ( nr_frames >= max_maptrack_frames )
+        return -1;
 
-        new_mt_limit = lgt->maptrack_limit + MAPTRACK_PER_PAGE;
+    new_mt = alloc_xenheap_page();
+    if ( !new_mt )
+        return -1;
 
-        for ( i = 1; i < MAPTRACK_PER_PAGE; i++ )
-            new_mt[i - 1].ref = lgt->maptrack_limit + i;
-        new_mt[i - 1].ref = lgt->maptrack_head;
-        lgt->maptrack_head = lgt->maptrack_limit;
+    clear_page(new_mt);
 
-        lgt->maptrack[nr_frames] = new_mt;
-        smp_wmb();
-        lgt->maptrack_limit      = new_mt_limit;
+    new_mt_limit = v->maptrack_limit + MAPTRACK_PER_PAGE;
 
-        gdprintk(XENLOG_INFO, "Increased maptrack size to %u frames\n",
-                 nr_frames + 1);
+    for ( i = 1; i < MAPTRACK_PER_PAGE; i++ ){
+        new_mt[i - 1].ref = (lgt->maptrack_pages * MAPTRACK_PER_PAGE) + i;
+        new_mt[i - 1].vcpu = v->vcpu_id;
+    }
+    /* Set last entry vcpu and ref */
+    new_mt[i - 1].ref = v->maptrack_head;
+    new_mt[i - 1].vcpu = v->vcpu_id;
+    v->maptrack_head = lgt->maptrack_pages * MAPTRACK_PER_PAGE;
+    if (v->maptrack_tail == MAPTRACK_TAIL)
+    {
+        v->maptrack_tail = (lgt->maptrack_pages * MAPTRACK_PER_PAGE)
+            + MAPTRACK_PER_PAGE - 1;
+        new_mt[i - 1].ref = MAPTRACK_TAIL;
     }
 
+    spin_lock(&lgt->maptrack_lock);
+    lgt->maptrack[lgt->maptrack_pages++] = new_mt;
     spin_unlock(&lgt->maptrack_lock);
 
-    return handle;
+    v->maptrack_limit = new_mt_limit;
+
+    return __get_maptrack_handle(lgt, v);
 }
 
 /* 
@@ -558,7 +583,7 @@ static void mapcount(
     ASSERT(spin_is_locked(&rd->grant_table->lock));
     spin_lock(&lgt->maptrack_lock);
 
-    for ( handle = 0; handle < lgt->maptrack_limit; handle++ )
+    for ( handle = 0; handle < lgt->maptrack_pages * MAPTRACK_PER_PAGE; handle++ )
     {
         struct active_grant_entry *act;
 
@@ -906,7 +931,7 @@ __gnttab_unmap_common(
 
     op->frame = (unsigned long)(op->dev_bus_addr >> PAGE_SHIFT);
 
-    if ( unlikely(op->handle >= lgt->maptrack_limit) )
+    if ( unlikely(op->handle >= lgt->maptrack_pages * MAPTRACK_PER_PAGE) )
     {
         gdprintk(XENLOG_INFO, "Bad handle (%d).\n", op->handle);
         op->status = GNTST_bad_handle;
@@ -1371,6 +1396,7 @@ gnttab_setup_table(
     struct gnttab_setup_table op;
     struct domain *d;
     struct grant_table *gt;
+    struct vcpu *v;
     int            i;
     xen_pfn_t  gmfn;
 
@@ -1422,6 +1448,17 @@ gnttab_setup_table(
     }
     gt->maptrack_pages = 0;
 
+    /* Tracking of mapped foreign frames table */
+    if ( (gt->maptrack = xzalloc_array(struct grant_mapping *,
+                                       max_maptrack_frames * d->max_vcpus)) == NULL )
+        goto out2;
+    for_each_vcpu( d, v )
+    {
+        v->maptrack_head = MAPTRACK_TAIL;
+        v->maptrack_tail = MAPTRACK_TAIL;
+    }
+    gt->maptrack_pages = 0;
+
     if ( gt->gt_version == 0 )
         gt->gt_version = 1;
 
@@ -2984,18 +3021,6 @@ grant_table_create(
             spin_lock_init(&t->active[i][j].lock);
     }
 
-    /* Tracking of mapped foreign frames table */
-    if ( (t->maptrack = xzalloc_array(struct grant_mapping *,
-                                      max_maptrack_frames)) == NULL )
-        goto no_mem_2;
-    if ( (t->maptrack[0] = alloc_xenheap_page()) == NULL )
-        goto no_mem_3;
-    clear_page(t->maptrack[0]);
-    t->maptrack_limit = MAPTRACK_PER_PAGE;
-    for ( i = 1; i < MAPTRACK_PER_PAGE; i++ )
-        t->maptrack[0][i - 1].ref = i;
-    t->maptrack[0][i - 1].ref = MAPTRACK_TAIL;
-
     /* Shared grant table. */
     if ( (t->shared_raw = xzalloc_array(void *, max_grant_frames)) == NULL )
         goto no_mem_3;
@@ -3026,12 +3051,10 @@ grant_table_create(
         free_xenheap_page(t->shared_raw[i]);
     xfree(t->shared_raw);
  no_mem_3:
-    free_xenheap_page(t->maptrack[0]);
-    xfree(t->maptrack);
- no_mem_2:
     for ( i = 0;
           i < num_act_frames_from_sha_frames(INITIAL_NR_GRANT_FRAMES); i++ )
         free_xenheap_page(t->active[i]);
+ no_mem_2:
     xfree(t->active);
  no_mem_1:
     xfree(t);
@@ -3055,7 +3078,7 @@ gnttab_release_mappings(
 
     BUG_ON(!d->is_dying);
 
-    for ( handle = 0; handle < gt->maptrack_limit; handle++ )
+    for ( handle = 0; handle < (gt->maptrack_pages * MAPTRACK_PER_PAGE); handle++ )
     {
         map = &maptrack_entry(gt, handle);
         if ( !(map->flags & (GNTMAP_device_map|GNTMAP_host_map)) )
@@ -3159,7 +3182,7 @@ grant_table_destroy(
         free_xenheap_page(t->shared_raw[i]);
     xfree(t->shared_raw);
 
-    for ( i = 0; i < nr_maptrack_frames(t); i++ )
+    for ( i = 0; i < t->maptrack_pages; i++ )
         free_xenheap_page(t->maptrack[i]);
     xfree(t->maptrack);
 
diff --git a/xen/include/xen/grant_table.h b/xen/include/xen/grant_table.h
index d78c381..feb435d 100644
--- a/xen/include/xen/grant_table.h
+++ b/xen/include/xen/grant_table.h
@@ -60,6 +60,8 @@ struct grant_mapping {
     u32      ref;           /* grant ref */
     u16      flags;         /* 0-4: GNTMAP_* ; 5-15: unused */
     domid_t  domid;         /* granting domain */
+    u32      vcpu;          /* vcpu which created the grant mapping */
+    u16      pad[2];
 };
 
 /* Per-domain grant information. */
@@ -78,10 +80,10 @@ struct grant_table {
     grant_status_t       **status;
     /* Active grant table. */
     struct active_grant_entry **active;
-    /* Mapping tracking table. */
+    /* Mapping tracking table per vcpu. */
     struct grant_mapping **maptrack;
-    unsigned int          maptrack_head;
-    unsigned int          maptrack_limit;
+    /* Total pages used for mapping tracking table */
+    unsigned int          maptrack_pages;
     /* Lock protecting the maptrack page list, head, and limit */
     spinlock_t            maptrack_lock;
     /* 
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 80c6f62..7f8422e 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -219,6 +219,11 @@ struct vcpu
     /* VCPU paused by system controller. */
     int              controller_pause_count;
 
+    /* Maptrack */
+    unsigned int     maptrack_head;
+    unsigned int     maptrack_tail;
+    unsigned int     maptrack_limit;
+
     /* IRQ-safe virq_lock protects against delivering VIRQ to stale evtchn. */
     evtchn_port_t    virq_to_evtchn[NR_VIRQS];
     spinlock_t       virq_lock;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCHv6 4/5] gnttab: remove unnecessary grant table locks
  2015-04-22 16:00 ` [PATCHv6 4/5] gnttab: remove unnecessary grant table locks David Vrabel
@ 2015-04-23 11:23   ` Tim Deegan
  2015-04-23 15:31   ` Jan Beulich
  1 sibling, 0 replies; 21+ messages in thread
From: Tim Deegan @ 2015-04-23 11:23 UTC (permalink / raw)
  To: David Vrabel
  Cc: Keir Fraser, Ian Campbell, Christoph Egger, Jan Beulich,
	Matt Wilson, xen-devel, Malcolm Crossley

At 17:00 +0100 on 22 Apr (1429722035), David Vrabel wrote:
> @@ -2642,8 +2615,6 @@ __gnttab_swap_grant_ref(grant_ref_t ref_a, grant_ref_t ref_b)
>           */
>          return rc;
>  
> -    spin_lock(&gt->lock);
> -

Now that this lock is gone, is anything stopping us racing against
another CPU with ref_a and ref_b the other way around?

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCHv6 2/5] gnttab: introduce per-active entry locks
  2015-04-22 16:00 ` [PATCHv6 2/5] gnttab: introduce per-active entry locks David Vrabel
@ 2015-04-23 12:42   ` Jan Beulich
  2015-04-23 13:49     ` David Vrabel
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Beulich @ 2015-04-23 12:42 UTC (permalink / raw)
  To: David Vrabel
  Cc: Keir Fraser, Ian Campbell, Christoph Egger, Tim Deegan,
	Matt Wilson, xen-devel

>>> On 22.04.15 at 18:00, <david.vrabel@citrix.com> wrote:
> @@ -548,11 +542,14 @@ static void mapcount(
>  
>      for ( handle = 0; handle < lgt->maptrack_limit; handle++ )
>      {
> +        struct active_grant_entry *act;

const please (wherever possible, to document the intent to not
modify).

> -        if ( active_entry(rd->grant_table, map->ref).frame == mfn )
> +        act = &_active_entry(rd->grant_table, map->ref);
> +        if ( act->frame == mfn )

Or, as suggested before, avoid the intermediate variable in the
first place.

> @@ -698,12 +694,9 @@ __gnttab_map_grant_ref(
>              GNTPIN_hstr_inc : GNTPIN_hstw_inc;
>  
>      frame = act->frame;
> -    act_pin = act->pin;
>  
>      cache_flags = (shah->flags & (GTF_PAT | GTF_PWT | GTF_PCD) );
>  
> -    spin_unlock(&rgt->lock);
> -
>      /* pg may be set, with a refcount included, from __get_paged_frame */
>      if ( !pg )
>      {
> @@ -778,8 +771,6 @@ __gnttab_map_grant_ref(
>          goto undo_out;
>      }
>  
> -    double_gt_lock(lgt, rgt);
> -
>      if ( gnttab_need_iommu_mapping(ld) )
>      {
>          unsigned int wrc, rdc;

I continue to think that using lgt unlocked here is wrong (i.e. while
the next patch may fix this, it still represents intermediate breakage).
At the very least this should be called out explicitly in the commit
message, making it clear that this patch shouldn't be applied without
the next one immediately following. But maybe the better approach
would be to avoid the breakage in the first place by keeping 
double_gt_lock() in place until the next patch, moving a spin_unlock()
right ahead of it to document the intentions, or even keeping that
one in place too until the next patch.

> @@ -939,18 +929,19 @@ __gnttab_unmap_common(
>      TRACE_1D(TRC_MEM_PAGE_GRANT_UNMAP, dom);
>  
>      rgt = rd->grant_table;
> -    double_gt_lock(lgt, rgt);
> +
> +    spin_lock(&rgt->lock);
> +    act = active_entry_acquire(rgt, op->map->ref);
>  
>      op->flags = op->map->flags;
>      if ( unlikely(!op->flags) || unlikely(op->map->domid != dom) )
>      {
>          gdprintk(XENLOG_WARNING, "Unstable handle %u\n", op->handle);
>          rc = GNTST_bad_handle;
> -        goto unmap_out;
> +        goto act_release_out;
>      }
>  
>      op->rd = rd;
> -    act = &active_entry(rgt, op->map->ref);

Why does this get moved up? "act" isn't need anywhere above.

> @@ -1290,13 +1287,17 @@ gnttab_unpopulate_status_frames(struct domain *d, struct grant_table *gt)
>      gt->nr_status_frames = 0;
>  }
>  
> +/* 
> + * Grow the grant table. The caller must hold the grant table's
> + * write lock before calling this function.
> + */
>  int
>  gnttab_grow_table(struct domain *d, unsigned int req_nr_frames)

The "write" in the comment appears to be stale.

> @@ -1972,6 +1976,13 @@ __acquire_grant_for_copy(
>                  PIN_FAIL(unlock_out_clear, GNTST_general_error,
>                           "transitive grant referenced bad domain %d\n",
>                           trans_domid);
> +
> +            /* 
> +             * __acquire_grant_for_copy() could take the read lock on
> +             * the right table (if rd == td), so we have to drop the
> +             * lock here and reacquire
> +             */
> +            active_entry_release(act);

Same for the "read" here.

> @@ -2939,7 +2973,7 @@ grant_table_create(
>      struct domain *d)
>  {
>      struct grant_table *t;
> -    int                 i;
> +    int                 i, j;

Could you take the opportunity to make these unsigned? And perhaps
also drop the odd looking blank padding?

> @@ -3130,7 +3167,9 @@ grant_table_destroy(
>  
>      if ( t == NULL )
>          return;
> -    
> +
> +    spin_lock(&t->lock);
> +
>      for ( i = 0; i < nr_grant_frames(t); i++ )
>          free_xenheap_page(t->shared_raw[i]);
>      xfree(t->shared_raw);
> @@ -3147,6 +3186,8 @@ grant_table_destroy(
>          free_xenheap_page(t->status[i]);
>      xfree(t->status);
>  
> +    spin_unlock(&t->lock);

Iirc I asked this before - do you really think these are needed? Because
again - if they are, that would seem to be a separate bug fix.

Jan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCHv6 2/5] gnttab: introduce per-active entry locks
  2015-04-23 12:42   ` Jan Beulich
@ 2015-04-23 13:49     ` David Vrabel
  0 siblings, 0 replies; 21+ messages in thread
From: David Vrabel @ 2015-04-23 13:49 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Keir Fraser, Ian Campbell, Christoph Egger, Tim Deegan,
	Matt Wilson, xen-devel

On 23/04/15 13:42, Jan Beulich wrote:
>
>> @@ -3130,7 +3167,9 @@ grant_table_destroy(
>>  
>>      if ( t == NULL )
>>          return;
>> -    
>> +
>> +    spin_lock(&t->lock);
>> +
>>      for ( i = 0; i < nr_grant_frames(t); i++ )
>>          free_xenheap_page(t->shared_raw[i]);
>>      xfree(t->shared_raw);
>> @@ -3147,6 +3186,8 @@ grant_table_destroy(
>>          free_xenheap_page(t->status[i]);
>>      xfree(t->status);
>>  
>> +    spin_unlock(&t->lock);
> 
> Iirc I asked this before - do you really think these are needed? Because
> again - if they are, that would seem to be a separate bug fix.

It looks unnecessary to me.  Unless Matt can explain why he added it,
I'll drop it.

David

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCHv6 3/5] gnttab: split grant table lock into table and maptrack locks
  2015-04-22 16:00 ` [PATCHv6 3/5] gnttab: split grant table lock into table and maptrack locks David Vrabel
@ 2015-04-23 15:04   ` Jan Beulich
  2015-04-29 10:53     ` David Vrabel
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Beulich @ 2015-04-23 15:04 UTC (permalink / raw)
  To: David Vrabel
  Cc: Keir Fraser, Ian Campbell, Christoph Egger, Tim Deegan,
	Matt Wilson, xen-devel

>>> On 22.04.15 at 18:00, <david.vrabel@citrix.com> wrote:
> Subsequent changes make both these locks uncontented.  Is this patch
> really necessary? -- dvrabel

Yeah, particularly if this ...

> @@ -540,6 +550,16 @@ static void mapcount(
>  
>      *wrc = *rdc = 0;
>  
> +    /* 
> +     * While taking the local maptrack spinlock prevents any mapping
> +     * changes, the remote active entries could be changing while we
> +     * are counting. The caller has to hold the grant table lock, or
> +     * some other mechanism should be used to prevent concurrent
> +     * changes during this operation.
> +     */
> +    ASSERT(spin_is_locked(&rd->grant_table->lock));
> +    spin_lock(&lgt->maptrack_lock);
> +
>      for ( handle = 0; handle < lgt->maptrack_limit; handle++ )
>      {
>          struct active_grant_entry *act;
> @@ -552,6 +572,8 @@ static void mapcount(
>          if ( act->frame == mfn )
>              (map->flags & GNTMAP_readonly) ? (*rdc)++ : (*wrc)++;
>      }
> +
> +    spin_unlock(&lgt->maptrack_lock);
>  }

... is not resulting in contention anymore, I'd prefer avoiding a
change like this without measurable benefit. And when still an
issue, I'd wonder whether an rwlock wouldn't be the better
choice here.

> --- a/xen/include/xen/grant_table.h
> +++ b/xen/include/xen/grant_table.h
> @@ -82,7 +82,12 @@ struct grant_table {
>      struct grant_mapping **maptrack;
>      unsigned int          maptrack_head;
>      unsigned int          maptrack_limit;
> -    /* Lock protecting updates to active and shared grant tables. */
> +    /* Lock protecting the maptrack page list, head, and limit */
> +    spinlock_t            maptrack_lock;
> +    /* 
> +     * Lock protecting updates to grant table state (version, active
> +     * entry list, etc.)
> +     */
>      spinlock_t            lock;

If the patch still was to be applied, these two locks should be put
on separate cache lines, to avoid unnecessary bouncing.

Jan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCHv6 4/5] gnttab: remove unnecessary grant table locks
  2015-04-22 16:00 ` [PATCHv6 4/5] gnttab: remove unnecessary grant table locks David Vrabel
  2015-04-23 11:23   ` Tim Deegan
@ 2015-04-23 15:31   ` Jan Beulich
  1 sibling, 0 replies; 21+ messages in thread
From: Jan Beulich @ 2015-04-23 15:31 UTC (permalink / raw)
  To: David Vrabel
  Cc: Keir Fraser, Ian Campbell, Christoph Egger, Tim Deegan,
	Matt Wilson, xen-devel, Malcolm Crossley

>>> On 22.04.15 at 18:00, <david.vrabel@citrix.com> wrote:
> This is safe because: a) the grant table version only changes once
> from 0 to 1 or 2;

gnttab_set_version() also allows it to transition between 1 and 2
afaics.

> @@ -919,18 +914,15 @@ __gnttab_unmap_common(
>      }
>  
>      op->map = &maptrack_entry(lgt, op->handle);
> -    spin_lock(&lgt->lock);
>  
>      if ( unlikely(!op->map->flags) )
>      {
> -        spin_unlock(&lgt->lock);
>          gdprintk(XENLOG_INFO, "Zero flags for handle (%d).\n", op->handle);
>          op->status = GNTST_bad_handle;
>          return;
>      }
>  
>      dom = op->map->domid;
> -    spin_unlock(&lgt->lock);

Don't you need a lock here to see a consistent
op->map->{dom,ref,flags} tuple (with the writing side also holding
a lock while updating them)?

> @@ -952,7 +944,6 @@ __gnttab_unmap_common(
>  
>      rgt = rd->grant_table;
>  
> -    spin_lock(&rgt->lock);
>      act = active_entry_acquire(rgt, op->map->ref);
>  
>      op->flags = op->map->flags;

Same here then.

> @@ -1423,7 +1410,17 @@ gnttab_setup_table(
>      }
>  
>      gt = d->grant_table;
> -    spin_lock(&gt->lock);
> +
> +    /* Tracking of mapped foreign frames table */
> +    if ( (gt->maptrack = xzalloc_array(struct grant_mapping *,
> +                                       max_maptrack_frames * d->max_vcpus)) == NULL )
> +        goto out2;

This provides no error indication to the caller. And is this allocation
guaranteed to be no larger than a page? Finally - does this belong
here (and not instead into the last patch)?

> @@ -114,13 +114,13 @@ gnttab_grow_table(struct domain *d, unsigned int req_nr_frames);
>  /* Number of grant table frames. Caller must hold d's grant table lock. */
>  static inline unsigned int nr_grant_frames(struct grant_table *gt)
>  {
> -    return gt->nr_grant_frames;
> +    return atomic_read(&gt->nr_grant_frames);
>  }
>  
>  /* Number of status grant table frames. Caller must hold d's gr. table lock.*/
>  static inline unsigned int nr_status_frames(struct grant_table *gt)
>  {
> -    return gt->nr_status_frames;
> +    return atomic_read(&gt->nr_status_frames);
>  }

Both comments need changing.

Jan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCHv6 5/5] gnttab: use per-VCPU maptrack free lists
  2015-04-22 16:00 ` [PATCHv6 5/5] gnttab: use per-VCPU maptrack free lists David Vrabel
@ 2015-04-23 16:11   ` Jan Beulich
  2015-04-23 16:29     ` David Vrabel
  2015-04-24  9:09     ` Malcolm Crossley
  0 siblings, 2 replies; 21+ messages in thread
From: Jan Beulich @ 2015-04-23 16:11 UTC (permalink / raw)
  To: David Vrabel
  Cc: Keir Fraser, Ian Campbell, Christoph Egger, Tim Deegan,
	Matt Wilson, xen-devel, Malcolm Crossley

>>> On 22.04.15 at 18:00, <david.vrabel@citrix.com> wrote:
>  static inline int
>  get_maptrack_handle(
>      struct grant_table *lgt)
>  {
> +    struct vcpu          *v = current;
>      int                   i;
>      grant_handle_t        handle;
>      struct grant_mapping *new_mt;
>      unsigned int          new_mt_limit, nr_frames;
>  
> -    spin_lock(&lgt->maptrack_lock);
> -
> -    while ( unlikely((handle = __get_maptrack_handle(lgt)) == -1) )
> -    {
> -        nr_frames = nr_maptrack_frames(lgt);
> -        if ( nr_frames >= max_maptrack_frames )
> -            break;
> -
> -        new_mt = alloc_xenheap_page();
> -        if ( !new_mt )
> -            break;
> +    handle = __get_maptrack_handle(lgt, v);
> +    if ( likely(handle != -1) )
> +        return handle;
>  
> -        clear_page(new_mt);
> +    nr_frames = nr_vcpu_maptrack_frames(v);
> +    if ( nr_frames >= max_maptrack_frames )
> +        return -1;
>  
> -        new_mt_limit = lgt->maptrack_limit + MAPTRACK_PER_PAGE;
> +    new_mt = alloc_xenheap_page();
> +    if ( !new_mt )
> +        return -1;
>  
> -        for ( i = 1; i < MAPTRACK_PER_PAGE; i++ )
> -            new_mt[i - 1].ref = lgt->maptrack_limit + i;
> -        new_mt[i - 1].ref = lgt->maptrack_head;
> -        lgt->maptrack_head = lgt->maptrack_limit;
> +    clear_page(new_mt);
>  
> -        lgt->maptrack[nr_frames] = new_mt;
> -        smp_wmb();
> -        lgt->maptrack_limit      = new_mt_limit;
> +    new_mt_limit = v->maptrack_limit + MAPTRACK_PER_PAGE;
>  
> -        gdprintk(XENLOG_INFO, "Increased maptrack size to %u frames\n",
> -                 nr_frames + 1);
> +    for ( i = 1; i < MAPTRACK_PER_PAGE; i++ ){
> +        new_mt[i - 1].ref = (lgt->maptrack_pages * MAPTRACK_PER_PAGE) + i;
> +        new_mt[i - 1].vcpu = v->vcpu_id;
> +    }
> +    /* Set last entry vcpu and ref */
> +    new_mt[i - 1].ref = v->maptrack_head;
> +    new_mt[i - 1].vcpu = v->vcpu_id;
> +    v->maptrack_head = lgt->maptrack_pages * MAPTRACK_PER_PAGE;
> +    if (v->maptrack_tail == MAPTRACK_TAIL)
> +    {
> +        v->maptrack_tail = (lgt->maptrack_pages * MAPTRACK_PER_PAGE)
> +            + MAPTRACK_PER_PAGE - 1;
> +        new_mt[i - 1].ref = MAPTRACK_TAIL;
>      }
>  
> +    spin_lock(&lgt->maptrack_lock);
> +    lgt->maptrack[lgt->maptrack_pages++] = new_mt;
>      spin_unlock(&lgt->maptrack_lock);

The uses of ->maptrack_pages ahead of taking the lock can race
with updates inside the lock. And with locking elsewhere dropped
by the previous patch it looks like you can't update ->maptrack[]
like you do (you'd need a barrier between the pointer store and
the increment, and with that I think the lock would become
superfluous if it was needed only for this update).

Also note the coding style issues in the changes above^(and also
in code further down).

> -    return handle;
> +    v->maptrack_limit = new_mt_limit;
> +
> +    return __get_maptrack_handle(lgt, v);

With the lock dropped, nothing guarantees this to succeed, which it
ought to unless the table size reached its allowed maximum.

> @@ -1422,6 +1448,17 @@ gnttab_setup_table(
>      }
>      gt->maptrack_pages = 0;
>  
> +    /* Tracking of mapped foreign frames table */
> +    if ( (gt->maptrack = xzalloc_array(struct grant_mapping *,
> +                                       max_maptrack_frames * d->max_vcpus)) == NULL )
> +        goto out2;

See the comments on the similar misplaced hunk in the previous
patch.

> --- a/xen/include/xen/grant_table.h
> +++ b/xen/include/xen/grant_table.h
> @@ -60,6 +60,8 @@ struct grant_mapping {
>      u32      ref;           /* grant ref */
>      u16      flags;         /* 0-4: GNTMAP_* ; 5-15: unused */
>      domid_t  domid;         /* granting domain */
> +    u32      vcpu;          /* vcpu which created the grant mapping */
> +    u16      pad[2];
>  };

What is this pad[] good for?

Jan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCHv6 5/5] gnttab: use per-VCPU maptrack free lists
  2015-04-23 16:11   ` Jan Beulich
@ 2015-04-23 16:29     ` David Vrabel
  2015-04-24  6:44       ` Jan Beulich
  2015-04-24  9:09     ` Malcolm Crossley
  1 sibling, 1 reply; 21+ messages in thread
From: David Vrabel @ 2015-04-23 16:29 UTC (permalink / raw)
  To: Jan Beulich, David Vrabel
  Cc: Keir Fraser, Ian Campbell, Christoph Egger, Tim Deegan,
	Matt Wilson, xen-devel, Malcolm Crossley

On 23/04/15 17:11, Jan Beulich wrote:
>>>> On 22.04.15 at 18:00, <david.vrabel@citrix.com> wrote:
>>  
>> +    v->maptrack_head = lgt->maptrack_pages * MAPTRACK_PER_PAGE;
>> +    if (v->maptrack_tail == MAPTRACK_TAIL)
>> +    {
>> +        v->maptrack_tail = (lgt->maptrack_pages * MAPTRACK_PER_PAGE)
>> +            + MAPTRACK_PER_PAGE - 1;
>> +        new_mt[i - 1].ref = MAPTRACK_TAIL;
>>      }
>>  
>> +    spin_lock(&lgt->maptrack_lock);
>> +    lgt->maptrack[lgt->maptrack_pages++] = new_mt;
>>      spin_unlock(&lgt->maptrack_lock);
> 
> The uses of ->maptrack_pages ahead of taking the lock can race
> with updates inside the lock. And with locking elsewhere dropped
> by the previous patch it looks like you can't update ->maptrack[]
> like you do (you'd need a barrier between the pointer store and
> the increment, and with that I think the lock would become
> superfluous if it was needed only for this update).

This was my mistake.  Malcolm's original patch had correct locking here.

> Also note the coding style issues in the changes above^(and also
> in code further down).
> 
>> -    return handle;
>> +    v->maptrack_limit = new_mt_limit;
>> +
>> +    return __get_maptrack_handle(lgt, v);
> 
> With the lock dropped, nothing guarantees this to succeed, which it
> ought to unless the table size reached its allowed maximum.

We've just added a new page of handles for this VCPU.  This will succeed.

David

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCHv6 5/5] gnttab: use per-VCPU maptrack free lists
  2015-04-23 16:29     ` David Vrabel
@ 2015-04-24  6:44       ` Jan Beulich
  0 siblings, 0 replies; 21+ messages in thread
From: Jan Beulich @ 2015-04-24  6:44 UTC (permalink / raw)
  To: David Vrabel
  Cc: Keir Fraser, Ian Campbell, Christoph Egger, Tim Deegan,
	Matt Wilson, xen-devel, Malcolm Crossley

>>> On 23.04.15 at 18:29, <david.vrabel@citrix.com> wrote:
> On 23/04/15 17:11, Jan Beulich wrote:
>>>>> On 22.04.15 at 18:00, <david.vrabel@citrix.com> wrote:
>>> -    return handle;
>>> +    v->maptrack_limit = new_mt_limit;
>>> +
>>> +    return __get_maptrack_handle(lgt, v);
>> 
>> With the lock dropped, nothing guarantees this to succeed, which it
>> ought to unless the table size reached its allowed maximum.
> 
> We've just added a new page of handles for this VCPU.  This will succeed.

Ah, right, this is per-vCPU now.

Jan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCHv6 5/5] gnttab: use per-VCPU maptrack free lists
  2015-04-23 16:11   ` Jan Beulich
  2015-04-23 16:29     ` David Vrabel
@ 2015-04-24  9:09     ` Malcolm Crossley
  2015-04-24  9:50       ` Jan Beulich
  1 sibling, 1 reply; 21+ messages in thread
From: Malcolm Crossley @ 2015-04-24  9:09 UTC (permalink / raw)
  To: Jan Beulich, David Vrabel
  Cc: Keir Fraser, Ian Campbell, Christoph Egger, Tim Deegan,
	Matt Wilson, xen-devel

On 23/04/15 17:11, Jan Beulich wrote:
>>>> On 22.04.15 at 18:00, <david.vrabel@citrix.com> wrote:
>>  static inline int
>>  get_maptrack_handle(
>>      struct grant_table *lgt)
>>  {
>> +    struct vcpu          *v = current;
>>      int                   i;
>>      grant_handle_t        handle;
>>      struct grant_mapping *new_mt;
>>      unsigned int          new_mt_limit, nr_frames;
>>  
>> -    spin_lock(&lgt->maptrack_lock);
>> -
>> -    while ( unlikely((handle = __get_maptrack_handle(lgt)) == -1) )
>> -    {
>> -        nr_frames = nr_maptrack_frames(lgt);
>> -        if ( nr_frames >= max_maptrack_frames )
>> -            break;
>> -
>> -        new_mt = alloc_xenheap_page();
>> -        if ( !new_mt )
>> -            break;
>> +    handle = __get_maptrack_handle(lgt, v);
>> +    if ( likely(handle != -1) )
>> +        return handle;
>>  
>> -        clear_page(new_mt);
>> +    nr_frames = nr_vcpu_maptrack_frames(v);
>> +    if ( nr_frames >= max_maptrack_frames )
>> +        return -1;
>>  
>> -        new_mt_limit = lgt->maptrack_limit + MAPTRACK_PER_PAGE;
>> +    new_mt = alloc_xenheap_page();
>> +    if ( !new_mt )
>> +        return -1;
>>  
>> -        for ( i = 1; i < MAPTRACK_PER_PAGE; i++ )
>> -            new_mt[i - 1].ref = lgt->maptrack_limit + i;
>> -        new_mt[i - 1].ref = lgt->maptrack_head;
>> -        lgt->maptrack_head = lgt->maptrack_limit;
>> +    clear_page(new_mt);
>>  
>> -        lgt->maptrack[nr_frames] = new_mt;
>> -        smp_wmb();
>> -        lgt->maptrack_limit      = new_mt_limit;
>> +    new_mt_limit = v->maptrack_limit + MAPTRACK_PER_PAGE;
>>  
>> -        gdprintk(XENLOG_INFO, "Increased maptrack size to %u frames\n",
>> -                 nr_frames + 1);
>> +    for ( i = 1; i < MAPTRACK_PER_PAGE; i++ ){
>> +        new_mt[i - 1].ref = (lgt->maptrack_pages * MAPTRACK_PER_PAGE) + i;
>> +        new_mt[i - 1].vcpu = v->vcpu_id;
>> +    }
>> +    /* Set last entry vcpu and ref */
>> +    new_mt[i - 1].ref = v->maptrack_head;
>> +    new_mt[i - 1].vcpu = v->vcpu_id;
>> +    v->maptrack_head = lgt->maptrack_pages * MAPTRACK_PER_PAGE;
>> +    if (v->maptrack_tail == MAPTRACK_TAIL)
>> +    {
>> +        v->maptrack_tail = (lgt->maptrack_pages * MAPTRACK_PER_PAGE)
>> +            + MAPTRACK_PER_PAGE - 1;
>> +        new_mt[i - 1].ref = MAPTRACK_TAIL;
>>      }
>>  
>> +    spin_lock(&lgt->maptrack_lock);
>> +    lgt->maptrack[lgt->maptrack_pages++] = new_mt;
>>      spin_unlock(&lgt->maptrack_lock);
> 
> The uses of ->maptrack_pages ahead of taking the lock can race
> with updates inside the lock. And with locking elsewhere dropped
> by the previous patch it looks like you can't update ->maptrack[]
> like you do (you'd need a barrier between the pointer store and
> the increment, and with that I think the lock would become
> superfluous if it was needed only for this update).
> 
> Also note the coding style issues in the changes above^(and also
> in code further down).

This was a last minute optimisation, this isn't on the hot patch so
we'll expand the spin_lock to cover all users of maptrack_pages.

Sorry about the coding style problems.

> 
>> -    return handle;
>> +    v->maptrack_limit = new_mt_limit;
>> +
>> +    return __get_maptrack_handle(lgt, v);
> 
> With the lock dropped, nothing guarantees this to succeed, which it
> ought to unless the table size reached its allowed maximum.
> 
>> @@ -1422,6 +1448,17 @@ gnttab_setup_table(
>>      }
>>      gt->maptrack_pages = 0;
>>  
>> +    /* Tracking of mapped foreign frames table */
>> +    if ( (gt->maptrack = xzalloc_array(struct grant_mapping *,
>> +                                       max_maptrack_frames * d->max_vcpus)) == NULL )
>> +        goto out2;
> 
> See the comments on the similar misplaced hunk in the previous
> patch.
> 
>> --- a/xen/include/xen/grant_table.h
>> +++ b/xen/include/xen/grant_table.h
>> @@ -60,6 +60,8 @@ struct grant_mapping {
>>      u32      ref;           /* grant ref */
>>      u16      flags;         /* 0-4: GNTMAP_* ; 5-15: unused */
>>      domid_t  domid;         /* granting domain */
>> +    u32      vcpu;          /* vcpu which created the grant mapping */
>> +    u16      pad[2];
>>  };
> 
> What is this pad[] good for?

The pad is to keep the struct power of 2 sized because this allows the
compiler to optimise these macro's to right and left shifts:

#define MAPTRACK_PER_PAGE (PAGE_SIZE / sizeof(struct grant_mapping))
#define maptrack_entry(t, e) \
    ((t)->maptrack[(e)/MAPTRACK_PER_PAGE][(e)%MAPTRACK_PER_PAGE])


Malcolm
> 
> Jan
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCHv6 5/5] gnttab: use per-VCPU maptrack free lists
  2015-04-24  9:09     ` Malcolm Crossley
@ 2015-04-24  9:50       ` Jan Beulich
  2015-04-24 10:02         ` Andrew Cooper
  2015-04-24 10:21         ` Malcolm Crossley
  0 siblings, 2 replies; 21+ messages in thread
From: Jan Beulich @ 2015-04-24  9:50 UTC (permalink / raw)
  To: David Vrabel, Malcolm Crossley
  Cc: KeirFraser, IanCampbell, Christoph Egger, Tim Deegan, Matt Wilson,
	xen-devel

>>> On 24.04.15 at 11:09, <malcolm.crossley@citrix.com> wrote:
> On 23/04/15 17:11, Jan Beulich wrote:
>>>>> On 22.04.15 at 18:00, <david.vrabel@citrix.com> wrote:
>>> --- a/xen/include/xen/grant_table.h
>>> +++ b/xen/include/xen/grant_table.h
>>> @@ -60,6 +60,8 @@ struct grant_mapping {
>>>      u32      ref;           /* grant ref */
>>>      u16      flags;         /* 0-4: GNTMAP_* ; 5-15: unused */
>>>      domid_t  domid;         /* granting domain */
>>> +    u32      vcpu;          /* vcpu which created the grant mapping */
>>> +    u16      pad[2];
>>>  };
>> 
>> What is this pad[] good for?
> 
> The pad is to keep the struct power of 2 sized because this allows the
> compiler to optimise these macro's to right and left shifts:
> 
> #define MAPTRACK_PER_PAGE (PAGE_SIZE / sizeof(struct grant_mapping))
> #define maptrack_entry(t, e) \
>     ((t)->maptrack[(e)/MAPTRACK_PER_PAGE][(e)%MAPTRACK_PER_PAGE])

Okay, then why u16[2] instead of u32? And please add a brief
comment explaining the reason.

Apart from that I wonder whether fitting vcpu in the 10 unused
flags bits (not 11, as the comment on the field suggests) would be
an option. That would require limiting vCPU count to 4k, which I
don't think would really be a problem for anyone.

Jan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCHv6 5/5] gnttab: use per-VCPU maptrack free lists
  2015-04-24  9:50       ` Jan Beulich
@ 2015-04-24 10:02         ` Andrew Cooper
  2015-04-24 10:21         ` Malcolm Crossley
  1 sibling, 0 replies; 21+ messages in thread
From: Andrew Cooper @ 2015-04-24 10:02 UTC (permalink / raw)
  To: Jan Beulich, David Vrabel, Malcolm Crossley
  Cc: KeirFraser, IanCampbell, Christoph Egger, Tim Deegan, Matt Wilson,
	xen-devel

On 24/04/15 10:50, Jan Beulich wrote:
>>>> On 24.04.15 at 11:09, <malcolm.crossley@citrix.com> wrote:
>> On 23/04/15 17:11, Jan Beulich wrote:
>>>>>> On 22.04.15 at 18:00, <david.vrabel@citrix.com> wrote:
>>>> --- a/xen/include/xen/grant_table.h
>>>> +++ b/xen/include/xen/grant_table.h
>>>> @@ -60,6 +60,8 @@ struct grant_mapping {
>>>>      u32      ref;           /* grant ref */
>>>>      u16      flags;         /* 0-4: GNTMAP_* ; 5-15: unused */
>>>>      domid_t  domid;         /* granting domain */
>>>> +    u32      vcpu;          /* vcpu which created the grant mapping */
>>>> +    u16      pad[2];
>>>>  };
>>> What is this pad[] good for?
>> The pad is to keep the struct power of 2 sized because this allows the
>> compiler to optimise these macro's to right and left shifts:
>>
>> #define MAPTRACK_PER_PAGE (PAGE_SIZE / sizeof(struct grant_mapping))
>> #define maptrack_entry(t, e) \
>>     ((t)->maptrack[(e)/MAPTRACK_PER_PAGE][(e)%MAPTRACK_PER_PAGE])
> Okay, then why u16[2] instead of u32? And please add a brief
> comment explaining the reason.

Most likely because vcpu was a u16 in the first draft, and got bumped
during internal review.

>
> Apart from that I wonder whether fitting vcpu in the 10 unused
> flags bits (not 11, as the comment on the field suggests) would be
> an option. That would require limiting vCPU count to 4k, which I
> don't think would really be a problem for anyone.

8k VCPU PV guests do function, and are very good at finding scalability
limits.

It would be nice not to break this.

~Andrew

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCHv6 5/5] gnttab: use per-VCPU maptrack free lists
  2015-04-24  9:50       ` Jan Beulich
  2015-04-24 10:02         ` Andrew Cooper
@ 2015-04-24 10:21         ` Malcolm Crossley
  2015-04-24 12:52           ` Jan Beulich
  1 sibling, 1 reply; 21+ messages in thread
From: Malcolm Crossley @ 2015-04-24 10:21 UTC (permalink / raw)
  To: Jan Beulich, David Vrabel
  Cc: KeirFraser, IanCampbell, Christoph Egger, Tim Deegan, Matt Wilson,
	xen-devel

On 24/04/15 10:50, Jan Beulich wrote:
>>>> On 24.04.15 at 11:09, <malcolm.crossley@citrix.com> wrote:
>> On 23/04/15 17:11, Jan Beulich wrote:
>>>>>> On 22.04.15 at 18:00, <david.vrabel@citrix.com> wrote:
>>>> --- a/xen/include/xen/grant_table.h
>>>> +++ b/xen/include/xen/grant_table.h
>>>> @@ -60,6 +60,8 @@ struct grant_mapping {
>>>>      u32      ref;           /* grant ref */
>>>>      u16      flags;         /* 0-4: GNTMAP_* ; 5-15: unused */
>>>>      domid_t  domid;         /* granting domain */
>>>> +    u32      vcpu;          /* vcpu which created the grant mapping */
>>>> +    u16      pad[2];
>>>>  };
>>>
>>> What is this pad[] good for?
>>
>> The pad is to keep the struct power of 2 sized because this allows the
>> compiler to optimise these macro's to right and left shifts:
>>
>> #define MAPTRACK_PER_PAGE (PAGE_SIZE / sizeof(struct grant_mapping))
>> #define maptrack_entry(t, e) \
>>     ((t)->maptrack[(e)/MAPTRACK_PER_PAGE][(e)%MAPTRACK_PER_PAGE])
> 
> Okay, then why u16[2] instead of u32? And please add a brief
> comment explaining the reason.
> 
> Apart from that I wonder whether fitting vcpu in the 10 unused
> flags bits (not 11, as the comment on the field suggests) would be
> an option. That would require limiting vCPU count to 4k, which I
> don't think would really be a problem for anyone.

I'd like to not have the overhead of bit operations on a hot path.

We're going from 512 grant entries per page to 256 grant entries per
page. This is taking the Xen memory overhead from 0.2% to 0.4% for grant
mapped pages.

Is the extra 0.2% memory overhead that concerning?

Malcolm
> 
> Jan
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCHv6 5/5] gnttab: use per-VCPU maptrack free lists
  2015-04-24 10:21         ` Malcolm Crossley
@ 2015-04-24 12:52           ` Jan Beulich
  0 siblings, 0 replies; 21+ messages in thread
From: Jan Beulich @ 2015-04-24 12:52 UTC (permalink / raw)
  To: Malcolm Crossley
  Cc: KeirFraser, IanCampbell, Christoph Egger, Tim Deegan,
	David Vrabel, Matt Wilson, xen-devel

>>> On 24.04.15 at 12:21, <malcolm.crossley@citrix.com> wrote:
> On 24/04/15 10:50, Jan Beulich wrote:
>>>>> On 24.04.15 at 11:09, <malcolm.crossley@citrix.com> wrote:
>>> On 23/04/15 17:11, Jan Beulich wrote:
>>>>>>> On 22.04.15 at 18:00, <david.vrabel@citrix.com> wrote:
>>>>> --- a/xen/include/xen/grant_table.h
>>>>> +++ b/xen/include/xen/grant_table.h
>>>>> @@ -60,6 +60,8 @@ struct grant_mapping {
>>>>>      u32      ref;           /* grant ref */
>>>>>      u16      flags;         /* 0-4: GNTMAP_* ; 5-15: unused */
>>>>>      domid_t  domid;         /* granting domain */
>>>>> +    u32      vcpu;          /* vcpu which created the grant mapping */
>>>>> +    u16      pad[2];
>>>>>  };
>>>>
>>>> What is this pad[] good for?
>>>
>>> The pad is to keep the struct power of 2 sized because this allows the
>>> compiler to optimise these macro's to right and left shifts:
>>>
>>> #define MAPTRACK_PER_PAGE (PAGE_SIZE / sizeof(struct grant_mapping))
>>> #define maptrack_entry(t, e) \
>>>     ((t)->maptrack[(e)/MAPTRACK_PER_PAGE][(e)%MAPTRACK_PER_PAGE])
>> 
>> Okay, then why u16[2] instead of u32? And please add a brief
>> comment explaining the reason.
>> 
>> Apart from that I wonder whether fitting vcpu in the 10 unused
>> flags bits (not 11, as the comment on the field suggests) would be
>> an option. That would require limiting vCPU count to 4k, which I
>> don't think would really be a problem for anyone.
> 
> I'd like to not have the overhead of bit operations on a hot path.
> 
> We're going from 512 grant entries per page to 256 grant entries per
> page. This is taking the Xen memory overhead from 0.2% to 0.4% for grant
> mapped pages.
> 
> Is the extra 0.2% memory overhead that concerning?

Perhaps not so much the memory overhead but the doubled
cache bandwidth needed. But anyway, this was only a
question, and with both you and Andrew being opposed let's
just stay with what you submitted.

Jan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCHv6 3/5] gnttab: split grant table lock into table and maptrack locks
  2015-04-23 15:04   ` Jan Beulich
@ 2015-04-29 10:53     ` David Vrabel
  2015-04-29 11:12       ` Jan Beulich
  0 siblings, 1 reply; 21+ messages in thread
From: David Vrabel @ 2015-04-29 10:53 UTC (permalink / raw)
  To: Jan Beulich, David Vrabel
  Cc: Keir Fraser, Ian Campbell, Christoph Egger, Tim Deegan,
	Matt Wilson, xen-devel

On 23/04/15 16:04, Jan Beulich wrote:
>
>> --- a/xen/include/xen/grant_table.h
>> +++ b/xen/include/xen/grant_table.h
>> @@ -82,7 +82,12 @@ struct grant_table {
>>      struct grant_mapping **maptrack;
>>      unsigned int          maptrack_head;
>>      unsigned int          maptrack_limit;
>> -    /* Lock protecting updates to active and shared grant tables. */
>> +    /* Lock protecting the maptrack page list, head, and limit */
>> +    spinlock_t            maptrack_lock;
>> +    /* 
>> +     * Lock protecting updates to grant table state (version, active
>> +     * entry list, etc.)
>> +     */
>>      spinlock_t            lock;
> 
> If the patch still was to be applied, these two locks should be put
> on separate cache lines, to avoid unnecessary bouncing.

I was shuffling the fields around to make this happen and I ended up
shrinking the structure to fit it in a single cache line instead.

Shall I add some explicit padding to get the alignment you want?  Note
that with the per-CPU maptrack free lists the maptrack lock isn't
heavily contended.

David

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCHv6 3/5] gnttab: split grant table lock into table and maptrack locks
  2015-04-29 10:53     ` David Vrabel
@ 2015-04-29 11:12       ` Jan Beulich
  0 siblings, 0 replies; 21+ messages in thread
From: Jan Beulich @ 2015-04-29 11:12 UTC (permalink / raw)
  To: david.vrabel; +Cc: keir, ian.campbell, chegger, tim, msw, xen-devel

>>> David Vrabel <david.vrabel@citrix.com> 04/29/15 12:54 PM >>>
>On 23/04/15 16:04, Jan Beulich wrote:
>>> --- a/xen/include/xen/grant_table.h
>>> +++ b/xen/include/xen/grant_table.h
>>> @@ -82,7 +82,12 @@ struct grant_table {
>>>      struct grant_mapping **maptrack;
>>>      unsigned int          maptrack_head;
>>>      unsigned int          maptrack_limit;
>>> -    /* Lock protecting updates to active and shared grant tables. */
>>> +    /* Lock protecting the maptrack page list, head, and limit */
>>> +    spinlock_t            maptrack_lock;
>>> +    /* 
>>> +     * Lock protecting updates to grant table state (version, active
>>> +     * entry list, etc.)
>>> +     */
>>>      spinlock_t            lock;
>> 
>> If the patch still was to be applied, these two locks should be put
>> on separate cache lines, to avoid unnecessary bouncing.
>
>I was shuffling the fields around to make this happen and I ended up
>shrinking the structure to fit it in a single cache line instead.
>
>Shall I add some explicit padding to get the alignment you want?  Note
>that with the per-CPU maptrack free lists the maptrack lock isn't
>heavily contended.

Ah, right. In that case artificially bloating the structure is likely not worth
it.

Jan

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2015-04-29 11:12 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-04-22 16:00 [PATCHv6 0/6] gnttab: Improve scaleability David Vrabel
2015-04-22 16:00 ` [PATCHv6 1/5] gnttab: add locking documentation David Vrabel
2015-04-22 16:00 ` [PATCHv6 2/5] gnttab: introduce per-active entry locks David Vrabel
2015-04-23 12:42   ` Jan Beulich
2015-04-23 13:49     ` David Vrabel
2015-04-22 16:00 ` [PATCHv6 3/5] gnttab: split grant table lock into table and maptrack locks David Vrabel
2015-04-23 15:04   ` Jan Beulich
2015-04-29 10:53     ` David Vrabel
2015-04-29 11:12       ` Jan Beulich
2015-04-22 16:00 ` [PATCHv6 4/5] gnttab: remove unnecessary grant table locks David Vrabel
2015-04-23 11:23   ` Tim Deegan
2015-04-23 15:31   ` Jan Beulich
2015-04-22 16:00 ` [PATCHv6 5/5] gnttab: use per-VCPU maptrack free lists David Vrabel
2015-04-23 16:11   ` Jan Beulich
2015-04-23 16:29     ` David Vrabel
2015-04-24  6:44       ` Jan Beulich
2015-04-24  9:09     ` Malcolm Crossley
2015-04-24  9:50       ` Jan Beulich
2015-04-24 10:02         ` Andrew Cooper
2015-04-24 10:21         ` Malcolm Crossley
2015-04-24 12:52           ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.