* [PATCH v4 0/2] gnttab: Improve scaleability
@ 2015-01-09 15:12 Christoph Egger
2015-01-09 15:12 ` [PATCH v4 1/2] gnttab: Introduce rwlock to protect updates to grant table state Christoph Egger
2015-01-09 15:12 ` [PATCH v4 2/2] gnttab: refactor locking for scalability Christoph Egger
0 siblings, 2 replies; 7+ messages in thread
From: Christoph Egger @ 2015-01-09 15:12 UTC (permalink / raw)
To: xen-devel; +Cc: Christoph Egger
This patch series changes the grant table locking to
a more fain grained locking protocol. The result is
a performance boost measured with blkfront/blkback.
Document the locking protocol.
v4:
* Coding style nits from Jan Beulich
* Fixup read locks pointed out by Jan Beulich
* renamed double_gt_(un)lock to double_maptrack_(un)lock
per request from Jan Beulich
* Addressed ASSERT()'s from Jan Beulich
v3:
* Addressed gnttab_swap_grant_ref() comment from Andrew Cooper
v2:
* Add arm part per request from Julien Grall
Christoph Egger (1):
gnttab: Introduce rwlock to protect updates to grant table state
Matt Wilson (1):
gnttab: refactor locking for scalability
docs/misc/grant-tables.txt | 49 ++++++-
xen/arch/arm/mm.c | 4 +-
xen/arch/x86/mm.c | 4 +-
xen/common/grant_table.c | 321 +++++++++++++++++++++++++----------------
xen/include/xen/grant_table.h | 9 +-
5 files changed, 258 insertions(+), 129 deletions(-)
--
1.7.9.5
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v4 1/2] gnttab: Introduce rwlock to protect updates to grant table state
2015-01-09 15:12 [PATCH v4 0/2] gnttab: Improve scaleability Christoph Egger
@ 2015-01-09 15:12 ` Christoph Egger
2015-01-12 15:09 ` Jan Beulich
2015-01-09 15:12 ` [PATCH v4 2/2] gnttab: refactor locking for scalability Christoph Egger
1 sibling, 1 reply; 7+ messages in thread
From: Christoph Egger @ 2015-01-09 15:12 UTC (permalink / raw)
To: xen-devel
Cc: Christoph Egger, Keir Fraser, Jan Beulich, Matt Wilson,
Julien Grall
Split grant table lock into two separate locks. One to protect
maptrack state and change the other into a rwlock.
The rwlock is used to prevent readers from accessing
inconsistent grant table state such as current
version, partially initialized active table pages,
etc.
Signed-off-by: Matt Wilson <msw@amazon.com>
[chegger: ported to xen-staging, split into multiple commits]
v4:
* Coding style nits from Jan Beulich
* Fixup read locks pointed out by Jan Beulich
* renamed double_gt_(un)lock to double_maptrack_(un)lock
per request from Jan Beulich
v3:
* Addressed gnttab_swap_grant_ref() comment from Andrew Cooper
v2:
* Add arm part per request from Julien Grall
Signed-off-by: Christoph Egger <chegger@amazon.de>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Julien Grall <julien.grall@linaro.org>
---
docs/misc/grant-tables.txt | 28 ++++++++-
xen/arch/arm/mm.c | 4 +-
xen/arch/x86/mm.c | 4 +-
xen/common/grant_table.c | 134 +++++++++++++++++++++++------------------
xen/include/xen/grant_table.h | 17 ++++--
5 files changed, 117 insertions(+), 70 deletions(-)
diff --git a/docs/misc/grant-tables.txt b/docs/misc/grant-tables.txt
index 19db4ec..c9ae8f2 100644
--- a/docs/misc/grant-tables.txt
+++ b/docs/misc/grant-tables.txt
@@ -74,7 +74,33 @@ is complete.
matching map track entry is then removed, as if unmap had been invoked.
These are not used by the transfer mechanism.
map->domid : owner of the mapped frame
- map->ref_and_flags : grant reference, ro/rw, mapped for host or device access
+ map->ref : grant reference
+ map->flags : ro/rw, mapped for host or device access
+
+********************************************************************************
+ Locking
+ ~~~~~~~
+ Xen uses several locks to serialize access to the internal grant table state.
+
+ grant_table->lock : rwlock used to prevent readers from accessing
+ inconsistent grant table state such as current
+ version, partially initialized active table pages,
+ etc.
+ grant_table->maptrack_lock : spinlock used to protect the maptrack state
+
+ The primary lock for the grant table is a read/write spinlock. All
+ functions that access members of struct grant_table must acquire a
+ read lock around critical sections. Any modification to the members
+ of struct grant_table (e.g., nr_status_frames, nr_grant_frames,
+ active frames, etc.) must only be made if the write lock is
+ held. These elements are read-mostly, and read critical sections can
+ be large, which makes a rwlock a good choice.
+
+ The maptrack state is protected by its own spinlock. Any access (read
+ or write) of struct grant_table members that have a "maptrack_"
+ prefix must be made while holding the maptrack lock. The maptrack
+ state can be rapidly modified under some workloads, and the critical
+ sections are very small, thus we use a spinlock to protect them.
********************************************************************************
diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index 7d4ba0c..2765683 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -1037,7 +1037,7 @@ int xenmem_add_to_physmap_one(
switch ( space )
{
case XENMAPSPACE_grant_table:
- spin_lock(&d->grant_table->lock);
+ write_lock(&d->grant_table->lock);
if ( d->grant_table->gt_version == 0 )
d->grant_table->gt_version = 1;
@@ -1067,7 +1067,7 @@ int xenmem_add_to_physmap_one(
t = p2m_ram_rw;
- spin_unlock(&d->grant_table->lock);
+ write_unlock(&d->grant_table->lock);
break;
case XENMAPSPACE_shared_info:
if ( idx != 0 )
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 6e9c2c0..d371ff9 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4565,7 +4565,7 @@ int xenmem_add_to_physmap_one(
mfn = virt_to_mfn(d->shared_info);
break;
case XENMAPSPACE_grant_table:
- spin_lock(&d->grant_table->lock);
+ write_lock(&d->grant_table->lock);
if ( d->grant_table->gt_version == 0 )
d->grant_table->gt_version = 1;
@@ -4587,7 +4587,7 @@ int xenmem_add_to_physmap_one(
mfn = virt_to_mfn(d->grant_table->shared_raw[idx]);
}
- spin_unlock(&d->grant_table->lock);
+ write_unlock(&d->grant_table->lock);
break;
case XENMAPSPACE_gmfn_range:
case XENMAPSPACE_gmfn:
diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index fe52b63..586ca94 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -229,27 +229,27 @@ static int __get_paged_frame(unsigned long gfn, unsigned long *frame, struct pag
}
static inline void
-double_gt_lock(struct grant_table *lgt, struct grant_table *rgt)
+double_maptrack_lock(struct grant_table *lgt, struct grant_table *rgt)
{
if ( lgt < rgt )
{
- spin_lock(&lgt->lock);
- spin_lock(&rgt->lock);
+ spin_lock(&lgt->maptrack_lock);
+ spin_lock(&rgt->maptrack_lock);
}
else
{
if ( lgt != rgt )
- spin_lock(&rgt->lock);
- spin_lock(&lgt->lock);
+ spin_lock(&rgt->maptrack_lock);
+ spin_lock(&lgt->maptrack_lock);
}
}
static inline void
-double_gt_unlock(struct grant_table *lgt, struct grant_table *rgt)
+double_maptrack_unlock(struct grant_table *lgt, struct grant_table *rgt)
{
- spin_unlock(&lgt->lock);
+ spin_unlock(&lgt->maptrack_lock);
if ( lgt != rgt )
- spin_unlock(&rgt->lock);
+ spin_unlock(&rgt->maptrack_lock);
}
static inline int
@@ -267,10 +267,10 @@ static inline void
put_maptrack_handle(
struct grant_table *t, int handle)
{
- spin_lock(&t->lock);
+ spin_lock(&t->maptrack_lock);
maptrack_entry(t, handle).ref = t->maptrack_head;
t->maptrack_head = handle;
- spin_unlock(&t->lock);
+ spin_unlock(&t->maptrack_lock);
}
static inline int
@@ -282,7 +282,7 @@ get_maptrack_handle(
struct grant_mapping *new_mt;
unsigned int new_mt_limit, nr_frames;
- spin_lock(&lgt->lock);
+ spin_lock(&lgt->maptrack_lock);
while ( unlikely((handle = __get_maptrack_handle(lgt)) == -1) )
{
@@ -311,7 +311,7 @@ get_maptrack_handle(
nr_frames + 1);
}
- spin_unlock(&lgt->lock);
+ spin_unlock(&lgt->maptrack_lock);
return handle;
}
@@ -508,7 +508,7 @@ static int grant_map_exists(const struct domain *ld,
const struct active_grant_entry *act;
unsigned int ref, max_iter;
- ASSERT(spin_is_locked(&rgt->lock));
+ ASSERT(rw_is_locked(&rgt->lock));
max_iter = min(*ref_count + (1 << GNTTABOP_CONTINUATION_ARG_SHIFT),
nr_grant_entries(rgt));
@@ -629,7 +629,7 @@ __gnttab_map_grant_ref(
}
rgt = rd->grant_table;
- spin_lock(&rgt->lock);
+ read_lock(&rgt->lock);
if ( rgt->gt_version == 0 )
PIN_FAIL(unlock_out, GNTST_general_error,
@@ -702,7 +702,7 @@ __gnttab_map_grant_ref(
cache_flags = (shah->flags & (GTF_PAT | GTF_PWT | GTF_PCD) );
- spin_unlock(&rgt->lock);
+ read_unlock(&rgt->lock);
/* pg may be set, with a refcount included, from __get_paged_frame */
if ( !pg )
@@ -778,7 +778,7 @@ __gnttab_map_grant_ref(
goto undo_out;
}
- double_gt_lock(lgt, rgt);
+ double_maptrack_lock(lgt, rgt);
if ( gnttab_need_iommu_mapping(ld) )
{
@@ -801,7 +801,7 @@ __gnttab_map_grant_ref(
}
if ( err )
{
- double_gt_unlock(lgt, rgt);
+ double_maptrack_unlock(lgt, rgt);
rc = GNTST_general_error;
goto undo_out;
}
@@ -814,7 +814,7 @@ __gnttab_map_grant_ref(
mt->ref = op->ref;
mt->flags = op->flags;
- double_gt_unlock(lgt, rgt);
+ double_maptrack_unlock(lgt, rgt);
op->dev_bus_addr = (u64)frame << PAGE_SHIFT;
op->handle = handle;
@@ -837,7 +837,7 @@ __gnttab_map_grant_ref(
put_page(pg);
}
- spin_lock(&rgt->lock);
+ read_lock(&rgt->lock);
act = &active_entry(rgt, op->ref);
@@ -857,7 +857,7 @@ __gnttab_map_grant_ref(
gnttab_clear_flag(_GTF_reading, status);
unlock_out:
- spin_unlock(&rgt->lock);
+ read_unlock(&rgt->lock);
op->status = rc;
put_maptrack_handle(lgt, handle);
rcu_unlock_domain(rd);
@@ -899,26 +899,27 @@ __gnttab_unmap_common(
op->frame = (unsigned long)(op->dev_bus_addr >> PAGE_SHIFT);
+ read_lock(&lgt->lock);
if ( unlikely(op->handle >= lgt->maptrack_limit) )
{
+ read_unlock(&lgt->lock);
gdprintk(XENLOG_INFO, "Bad handle (%d).\n", op->handle);
op->status = GNTST_bad_handle;
return;
}
op->map = &maptrack_entry(lgt, op->handle);
- spin_lock(&lgt->lock);
if ( unlikely(!op->map->flags) )
{
- spin_unlock(&lgt->lock);
+ read_unlock(&lgt->lock);
gdprintk(XENLOG_INFO, "Zero flags for handle (%d).\n", op->handle);
op->status = GNTST_bad_handle;
return;
}
dom = op->map->domid;
- spin_unlock(&lgt->lock);
+ read_unlock(&lgt->lock);
if ( unlikely((rd = rcu_lock_domain_by_id(dom)) == NULL) )
{
@@ -939,7 +940,8 @@ __gnttab_unmap_common(
TRACE_1D(TRC_MEM_PAGE_GRANT_UNMAP, dom);
rgt = rd->grant_table;
- double_gt_lock(lgt, rgt);
+ read_lock(&rgt->lock);
+ double_maptrack_lock(lgt, rgt);
op->flags = op->map->flags;
if ( unlikely(!op->flags) || unlikely(op->map->domid != dom) )
@@ -1009,7 +1011,8 @@ __gnttab_unmap_common(
gnttab_mark_dirty(rd, op->frame);
unmap_out:
- double_gt_unlock(lgt, rgt);
+ double_maptrack_unlock(lgt, rgt);
+ read_unlock(&rgt->lock);
op->status = rc;
rcu_unlock_domain(rd);
}
@@ -1039,8 +1042,8 @@ __gnttab_unmap_common_complete(struct gnttab_unmap_common *op)
rcu_lock_domain(rd);
rgt = rd->grant_table;
- spin_lock(&rgt->lock);
+ read_lock(&rgt->lock);
if ( rgt->gt_version == 0 )
goto unmap_out;
@@ -1104,7 +1107,7 @@ __gnttab_unmap_common_complete(struct gnttab_unmap_common *op)
gnttab_clear_flag(_GTF_reading, status);
unmap_out:
- spin_unlock(&rgt->lock);
+ read_unlock(&rgt->lock);
if ( put_handle )
{
op->map->flags = 0;
@@ -1290,10 +1293,12 @@ gnttab_unpopulate_status_frames(struct domain *d, struct grant_table *gt)
gt->nr_status_frames = 0;
}
+/* Grow the grant table. The caller must hold the grant table's
+ write lock before calling this function. */
int
gnttab_grow_table(struct domain *d, unsigned int req_nr_frames)
{
- /* d's grant table lock must be held by the caller */
+ /* d's grant table write lock must be held by the caller */
struct grant_table *gt = d->grant_table;
unsigned int i;
@@ -1398,7 +1403,7 @@ gnttab_setup_table(
}
gt = d->grant_table;
- spin_lock(>->lock);
+ write_lock(>->lock);
if ( gt->gt_version == 0 )
gt->gt_version = 1;
@@ -1426,7 +1431,7 @@ gnttab_setup_table(
}
out3:
- spin_unlock(>->lock);
+ write_unlock(>->lock);
out2:
rcu_unlock_domain(d);
out1:
@@ -1468,13 +1473,13 @@ gnttab_query_size(
goto query_out_unlock;
}
- spin_lock(&d->grant_table->lock);
+ read_lock(&d->grant_table->lock);
op.nr_frames = nr_grant_frames(d->grant_table);
op.max_nr_frames = max_grant_frames;
op.status = GNTST_okay;
- spin_unlock(&d->grant_table->lock);
+ read_unlock(&d->grant_table->lock);
query_out_unlock:
@@ -1500,7 +1505,7 @@ gnttab_prepare_for_transfer(
union grant_combo scombo, prev_scombo, new_scombo;
int retries = 0;
- spin_lock(&rgt->lock);
+ read_lock(&rgt->lock);
if ( rgt->gt_version == 0 )
{
@@ -1551,11 +1556,11 @@ gnttab_prepare_for_transfer(
scombo = prev_scombo;
}
- spin_unlock(&rgt->lock);
+ read_unlock(&rgt->lock);
return 1;
fail:
- spin_unlock(&rgt->lock);
+ read_unlock(&rgt->lock);
return 0;
}
@@ -1747,7 +1752,7 @@ gnttab_transfer(
TRACE_1D(TRC_MEM_PAGE_GRANT_TRANSFER, e->domain_id);
/* Tell the guest about its new page frame. */
- spin_lock(&e->grant_table->lock);
+ read_lock(&e->grant_table->lock);
if ( e->grant_table->gt_version == 1 )
{
@@ -1765,7 +1770,7 @@ gnttab_transfer(
shared_entry_header(e->grant_table, gop.ref)->flags |=
GTF_transfer_completed;
- spin_unlock(&e->grant_table->lock);
+ read_unlock(&e->grant_table->lock);
rcu_unlock_domain(e);
@@ -1803,7 +1808,7 @@ __release_grant_for_copy(
released_read = 0;
released_write = 0;
- spin_lock(&rgt->lock);
+ read_lock(&rgt->lock);
act = &active_entry(rgt, gref);
sha = shared_entry_header(rgt, gref);
@@ -1844,7 +1849,7 @@ __release_grant_for_copy(
released_read = 1;
}
- spin_unlock(&rgt->lock);
+ read_unlock(&rgt->lock);
if ( td != rd )
{
@@ -1902,7 +1907,7 @@ __acquire_grant_for_copy(
*page = NULL;
- spin_lock(&rgt->lock);
+ read_lock(&rgt->lock);
if ( rgt->gt_version == 0 )
PIN_FAIL(unlock_out, GNTST_general_error,
@@ -1971,17 +1976,21 @@ __acquire_grant_for_copy(
PIN_FAIL(unlock_out_clear, GNTST_general_error,
"transitive grant referenced bad domain %d\n",
trans_domid);
- spin_unlock(&rgt->lock);
+
+ /* __acquire_grant_for_copy() could take the read lock on
+ the right table (if rd == td), so we have to drop the
+ lock here and reacquire */
+ read_unlock(&rgt->lock);
rc = __acquire_grant_for_copy(td, trans_gref, rd->domain_id,
readonly, &grant_frame, page,
&trans_page_off, &trans_length, 0);
- spin_lock(&rgt->lock);
+ read_lock(&rgt->lock);
if ( rc != GNTST_okay ) {
__fixup_status_for_copy_pin(act, status);
+ read_unlock(&rgt->lock);
rcu_unlock_domain(td);
- spin_unlock(&rgt->lock);
return rc;
}
@@ -1993,7 +2002,7 @@ __acquire_grant_for_copy(
{
__fixup_status_for_copy_pin(act, status);
rcu_unlock_domain(td);
- spin_unlock(&rgt->lock);
+ read_unlock(&rgt->lock);
put_page(*page);
return __acquire_grant_for_copy(rd, gref, ldom, readonly,
frame, page, page_off, length,
@@ -2061,7 +2070,7 @@ __acquire_grant_for_copy(
*length = act->length;
*frame = act->frame;
- spin_unlock(&rgt->lock);
+ read_unlock(&rgt->lock);
return rc;
unlock_out_clear:
@@ -2073,7 +2082,7 @@ __acquire_grant_for_copy(
gnttab_clear_flag(_GTF_reading, status);
unlock_out:
- spin_unlock(&rgt->lock);
+ read_unlock(&rgt->lock);
return rc;
}
@@ -2247,7 +2256,7 @@ gnttab_set_version(XEN_GUEST_HANDLE_PARAM(gnttab_set_version_t) uop)
if ( gt->gt_version == op.version )
goto out;
- spin_lock(>->lock);
+ write_lock(>->lock);
/* Make sure that the grant table isn't currently in use when we
change the version number, except for the first 8 entries which
are allowed to be in use (xenstore/xenconsole keeps them mapped).
@@ -2333,7 +2342,7 @@ gnttab_set_version(XEN_GUEST_HANDLE_PARAM(gnttab_set_version_t) uop)
gt->gt_version = op.version;
out_unlock:
- spin_unlock(>->lock);
+ write_unlock(>->lock);
out:
op.version = gt->gt_version;
@@ -2389,7 +2398,7 @@ gnttab_get_status_frames(XEN_GUEST_HANDLE_PARAM(gnttab_get_status_frames_t) uop,
op.status = GNTST_okay;
- spin_lock(>->lock);
+ read_lock(>->lock);
for ( i = 0; i < op.nr_frames; i++ )
{
@@ -2398,7 +2407,7 @@ gnttab_get_status_frames(XEN_GUEST_HANDLE_PARAM(gnttab_get_status_frames_t) uop,
op.status = GNTST_bad_virt_addr;
}
- spin_unlock(>->lock);
+ read_unlock(>->lock);
out2:
rcu_unlock_domain(d);
out1:
@@ -2447,7 +2456,7 @@ __gnttab_swap_grant_ref(grant_ref_t ref_a, grant_ref_t ref_b)
struct active_grant_entry *act;
s16 rc = GNTST_okay;
- spin_lock(>->lock);
+ read_lock(>->lock);
/* Bounds check on the grant refs */
if ( unlikely(ref_a >= nr_grant_entries(d->grant_table)))
@@ -2487,7 +2496,7 @@ __gnttab_swap_grant_ref(grant_ref_t ref_a, grant_ref_t ref_b)
}
out:
- spin_unlock(>->lock);
+ read_unlock(>->lock);
rcu_unlock_domain(d);
@@ -2558,12 +2567,12 @@ static int __gnttab_cache_flush(gnttab_cache_flush_t *cflush,
if ( d != owner )
{
- spin_lock(&owner->grant_table->lock);
+ read_lock(&owner->grant_table->lock);
ret = grant_map_exists(d, owner->grant_table, mfn, ref_count);
if ( ret != 0 )
{
- spin_unlock(&owner->grant_table->lock);
+ read_unlock(&owner->grant_table->lock);
rcu_unlock_domain(d);
put_page(page);
return ret;
@@ -2583,7 +2592,7 @@ static int __gnttab_cache_flush(gnttab_cache_flush_t *cflush,
ret = 0;
if ( d != owner )
- spin_unlock(&owner->grant_table->lock);
+ read_unlock(&owner->grant_table->lock);
unmap_domain_page(v);
put_page(page);
@@ -2802,7 +2811,8 @@ grant_table_create(
goto no_mem_0;
/* Simple stuff. */
- spin_lock_init(&t->lock);
+ rwlock_init(&t->lock);
+ spin_lock_init(&t->maptrack_lock);
t->nr_grant_frames = INITIAL_NR_GRANT_FRAMES;
/* Active grant table. */
@@ -2909,7 +2919,7 @@ gnttab_release_mappings(
}
rgt = rd->grant_table;
- spin_lock(&rgt->lock);
+ read_lock(&rgt->lock);
act = &active_entry(rgt, ref);
sha = shared_entry_header(rgt, ref);
@@ -2969,7 +2979,7 @@ gnttab_release_mappings(
if ( act->pin == 0 )
gnttab_clear_flag(_GTF_reading, status);
- spin_unlock(&rgt->lock);
+ read_unlock(&rgt->lock);
rcu_unlock_domain(rd);
@@ -2987,6 +2997,8 @@ grant_table_destroy(
if ( t == NULL )
return;
+
+ write_lock(&t->lock);
for ( i = 0; i < nr_grant_frames(t); i++ )
free_xenheap_page(t->shared_raw[i]);
@@ -3004,6 +3016,8 @@ grant_table_destroy(
free_xenheap_page(t->status[i]);
xfree(t->status);
+ write_unlock(&t->lock);
+
xfree(t);
d->grant_table = NULL;
}
@@ -3017,7 +3031,7 @@ static void gnttab_usage_print(struct domain *rd)
printk(" -------- active -------- -------- shared --------\n");
printk("[ref] localdom mfn pin localdom gmfn flags\n");
- spin_lock(>->lock);
+ read_lock(>->lock);
if ( gt->gt_version == 0 )
goto out;
@@ -3066,7 +3080,7 @@ static void gnttab_usage_print(struct domain *rd)
}
out:
- spin_unlock(>->lock);
+ read_unlock(>->lock);
if ( first )
printk("grant-table for remote domain:%5d ... "
diff --git a/xen/include/xen/grant_table.h b/xen/include/xen/grant_table.h
index 32f5786..298031e 100644
--- a/xen/include/xen/grant_table.h
+++ b/xen/include/xen/grant_table.h
@@ -82,10 +82,17 @@ struct grant_table {
struct grant_mapping **maptrack;
unsigned int maptrack_head;
unsigned int maptrack_limit;
- /* Lock protecting updates to active and shared grant tables. */
- spinlock_t lock;
- /* The defined versions are 1 and 2. Set to 0 if we don't know
- what version to use yet. */
+ /* Lock protecting the maptrack page list, head, and limit */
+ spinlock_t maptrack_lock;
+ /*
+ * Lock protecting updates to grant table state (version, active
+ * entry list, etc.)
+ */
+ rwlock_t lock;
+ /*
+ * The defined versions are 1 and 2. Set to 0 if we don't know
+ * what version to use yet.
+ */
unsigned gt_version;
};
@@ -101,7 +108,7 @@ gnttab_release_mappings(
struct domain *d);
/* Increase the size of a domain's grant table.
- * Caller must hold d's grant table lock.
+ * Caller must hold d's grant table write lock.
*/
int
gnttab_grow_table(struct domain *d, unsigned int req_nr_frames);
--
1.7.9.5
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v4 2/2] gnttab: refactor locking for scalability
2015-01-09 15:12 [PATCH v4 0/2] gnttab: Improve scaleability Christoph Egger
2015-01-09 15:12 ` [PATCH v4 1/2] gnttab: Introduce rwlock to protect updates to grant table state Christoph Egger
@ 2015-01-09 15:12 ` Christoph Egger
2015-01-12 15:25 ` Jan Beulich
1 sibling, 1 reply; 7+ messages in thread
From: Christoph Egger @ 2015-01-09 15:12 UTC (permalink / raw)
To: xen-devel
Cc: Jan Beulich, Christoph Egger, Keir Fraser, Anthony Liguori,
Matt Wilson
From: Matt Wilson <msw@amazon.com>
This patch refactors grant table locking. It splits the previous
single spinlock per grant table into multiple locks. The heavily
modified components of the grant table (the maptrack state and the
active entries) are now protected by their own spinlocks. The
remaining elements of the grant table are read-mostly, so the main
grant table lock is modified to be a rwlock to improve concurrency.
Workloads with high grant table operation rates, especially map/unmap
operations as used by blkback/blkfront when persistent grants are not
supported, show marked improvement with these changes. A domU with 24
VBDs in a streaming 2M write workload achieved 1,400 MB/sec before
this change. Performance more than doubles with this patch, reaching
3,000 MB/sec before tuning and 3,600 MB/sec after adjusting event
channel vCPU bindings.
Signed-off-by: Matt Wilson <msw@amazon.com>
[chegger: ported to xen-staging, split into multiple commits]
v4:
* Addressed ASSERT()'s from Jan Beulich
* Addressed locking issues in unmap_common pointed out
by Jan Beulich
v3:
* Addressed gnttab_swap_grant_ref() comment from Andrew Cooper
Signed-off-by: Christoph Egger <chegger@amazon.de>
Cc: Anthony Liguori <aliguori@amazon.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Keir Fraser <keir@xen.org>
---
docs/misc/grant-tables.txt | 21 +++++
xen/common/grant_table.c | 219 +++++++++++++++++++++++++++-----------------
2 files changed, 158 insertions(+), 82 deletions(-)
diff --git a/docs/misc/grant-tables.txt b/docs/misc/grant-tables.txt
index c9ae8f2..1ada018 100644
--- a/docs/misc/grant-tables.txt
+++ b/docs/misc/grant-tables.txt
@@ -63,6 +63,7 @@ is complete.
act->domid : remote domain being granted rights
act->frame : machine frame being granted
act->pin : used to hold reference counts
+ act->lock : spinlock used to serialize access to active entry state
Map tracking
~~~~~~~~~~~~
@@ -87,6 +88,8 @@ is complete.
version, partially initialized active table pages,
etc.
grant_table->maptrack_lock : spinlock used to protect the maptrack state
+ active_grant_entry->lock : spinlock used to serialize modifications to
+ active entries
The primary lock for the grant table is a read/write spinlock. All
functions that access members of struct grant_table must acquire a
@@ -102,6 +105,24 @@ is complete.
state can be rapidly modified under some workloads, and the critical
sections are very small, thus we use a spinlock to protect them.
+ Active entries are obtained by calling active_entry_acquire(gt, ref).
+ This function returns a pointer to the active entry after locking its
+ spinlock. The caller must hold the rwlock for the gt in question
+ before calling active_entry_acquire(). This is because the grant
+ table can be dynamically extended via gnttab_grow_table() while a
+ domain is running and must be fully initialized. Once all access to
+ the active entry is complete, release the lock by calling
+ active_entry_release(act).
+
+ Summary of rules for locking:
+ active_entry_acquire() and active_entry_release() can only be
+ called when holding the relevant grant table's lock. I.e.:
+ read_lock(>->lock);
+ act = active_entry_acquire(gt, ref);
+ ...
+ active_entry_release(act);
+ read_unlock(>->lock);
+
********************************************************************************
Granting a foreign domain access to frames
diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index 586ca94..92b3cb1 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -157,10 +157,13 @@ struct active_grant_entry {
in the page. */
unsigned length:16; /* For sub-page grants, the length of the
grant. */
+ spinlock_t lock; /* lock to protect access of this entry.
+ see docs/misc/grant-tables.txt for
+ locking protocol */
};
#define ACGNT_PER_PAGE (PAGE_SIZE / sizeof(struct active_grant_entry))
-#define active_entry(t, e) \
+#define _active_entry(t, e) \
((t)->active[(e)/ACGNT_PER_PAGE][(e)%ACGNT_PER_PAGE])
static inline void gnttab_flush_tlb(const struct domain *d)
@@ -188,6 +191,26 @@ nr_active_grant_frames(struct grant_table *gt)
return num_act_frames_from_sha_frames(nr_grant_frames(gt));
}
+static inline struct active_grant_entry *
+active_entry_acquire(struct grant_table *t, grant_ref_t e)
+{
+ struct active_grant_entry *act;
+
+ /* not perfect, but better than nothing for a debug build
+ * sanity check */
+ ASSERT(rw_is_locked(&t->lock));
+
+ act = &_active_entry(t, e);
+ spin_lock(&act->lock);
+
+ return act;
+}
+
+static inline void active_entry_release(struct active_grant_entry *act)
+{
+ spin_unlock(&act->lock);
+}
+
/* Check if the page has been paged out, or needs unsharing.
If rc == GNTST_okay, *page contains the page struct with a ref taken.
Caller must do put_page(*page).
@@ -228,30 +251,6 @@ static int __get_paged_frame(unsigned long gfn, unsigned long *frame, struct pag
return rc;
}
-static inline void
-double_maptrack_lock(struct grant_table *lgt, struct grant_table *rgt)
-{
- if ( lgt < rgt )
- {
- spin_lock(&lgt->maptrack_lock);
- spin_lock(&rgt->maptrack_lock);
- }
- else
- {
- if ( lgt != rgt )
- spin_lock(&rgt->maptrack_lock);
- spin_lock(&lgt->maptrack_lock);
- }
-}
-
-static inline void
-double_maptrack_unlock(struct grant_table *lgt, struct grant_table *rgt)
-{
- spin_unlock(&lgt->maptrack_lock);
- if ( lgt != rgt )
- spin_unlock(&rgt->maptrack_lock);
-}
-
static inline int
__get_maptrack_handle(
struct grant_table *t)
@@ -316,7 +315,8 @@ get_maptrack_handle(
return handle;
}
-/* Number of grant table entries. Caller must hold d's grant table lock. */
+/* Number of grant table entries. Caller must hold d's grant table
+ read lock. */
static unsigned int nr_grant_entries(struct grant_table *gt)
{
ASSERT(gt->gt_version != 0);
@@ -505,7 +505,6 @@ static int grant_map_exists(const struct domain *ld,
unsigned long mfn,
unsigned int *ref_count)
{
- const struct active_grant_entry *act;
unsigned int ref, max_iter;
ASSERT(rw_is_locked(&rgt->lock));
@@ -514,17 +513,22 @@ static int grant_map_exists(const struct domain *ld,
nr_grant_entries(rgt));
for ( ref = *ref_count; ref < max_iter; ref++ )
{
- act = &active_entry(rgt, ref);
+ struct active_grant_entry *act;
+ act = active_entry_acquire(rgt, ref);
- if ( !act->pin )
- continue;
+#define CONTINUE_IF(cond) \
+if ((cond)) { \
+ active_entry_release(act); \
+ continue; \
+}
- if ( act->domid != ld->domain_id )
- continue;
+ CONTINUE_IF(!act->pin);
+ CONTINUE_IF(act->domid != ld->domain_id);
+ CONTINUE_IF(act->frame != mfn);
- if ( act->frame != mfn )
- continue;
+#undef CONTINUE_IF
+ active_entry_release(act);
return 0;
}
@@ -537,6 +541,12 @@ static int grant_map_exists(const struct domain *ld,
return -EINVAL;
}
+/* Count the number of mapped ro or rw entries tracked in the left
+ * grant table given a provided mfn provided by a foreign domain.
+ *
+ * This function takes the maptrack lock from the left grant table and
+ * must be called with the right grant table's rwlock held.
+ */
static void mapcount(
struct grant_table *lgt, struct domain *rd, unsigned long mfn,
unsigned int *wrc, unsigned int *rdc)
@@ -545,16 +555,32 @@ static void mapcount(
grant_handle_t handle;
*wrc = *rdc = 0;
+ ASSERT(rw_is_locked(&rd->grant_table->lock));
+
+ /* N.B.: while taking the left side maptrack spinlock prevents
+ * any mapping changes, the right side active entries could be
+ * changing while we are counting. To be perfectly correct, the
+ * caller should hold the grant table write lock, or some other
+ * mechanism should be used to prevent concurrent changes during
+ * this operation. This is tricky because we can't promote a read
+ * lock into a write lock.
+ */
+ spin_lock(&lgt->maptrack_lock);
for ( handle = 0; handle < lgt->maptrack_limit; handle++ )
{
+ struct active_grant_entry *act;
+
map = &maptrack_entry(lgt, handle);
if ( !(map->flags & (GNTMAP_device_map|GNTMAP_host_map)) ||
map->domid != rd->domain_id )
continue;
- if ( active_entry(rd->grant_table, map->ref).frame == mfn )
+ act = &_active_entry(rd->grant_table, map->ref);
+ if ( act->frame == mfn )
(map->flags & GNTMAP_readonly) ? (*rdc)++ : (*wrc)++;
}
+
+ spin_unlock(&lgt->maptrack_lock);
}
/*
@@ -576,7 +602,6 @@ __gnttab_map_grant_ref(
struct page_info *pg = NULL;
int rc = GNTST_okay;
u32 old_pin;
- u32 act_pin;
unsigned int cache_flags;
struct active_grant_entry *act = NULL;
struct grant_mapping *mt;
@@ -639,7 +664,7 @@ __gnttab_map_grant_ref(
if ( unlikely(op->ref >= nr_grant_entries(rgt)))
PIN_FAIL(unlock_out, GNTST_bad_gntref, "Bad ref (%d).\n", op->ref);
- act = &active_entry(rgt, op->ref);
+ act = active_entry_acquire(rgt, op->ref);
shah = shared_entry_header(rgt, op->ref);
if (rgt->gt_version == 1) {
sha1 = &shared_entry_v1(rgt, op->ref);
@@ -656,7 +681,7 @@ __gnttab_map_grant_ref(
((act->domid != ld->domain_id) ||
(act->pin & 0x80808080U) != 0 ||
(act->is_sub_page)) )
- PIN_FAIL(unlock_out, GNTST_general_error,
+ PIN_FAIL(act_release_out, GNTST_general_error,
"Bad domain (%d != %d), or risk of counter overflow %08x, or subpage %d\n",
act->domid, ld->domain_id, act->pin, act->is_sub_page);
@@ -667,7 +692,7 @@ __gnttab_map_grant_ref(
if ( (rc = _set_status(rgt->gt_version, ld->domain_id,
op->flags & GNTMAP_readonly,
1, shah, act, status) ) != GNTST_okay )
- goto unlock_out;
+ goto act_release_out;
if ( !act->pin )
{
@@ -698,12 +723,9 @@ __gnttab_map_grant_ref(
GNTPIN_hstr_inc : GNTPIN_hstw_inc;
frame = act->frame;
- act_pin = act->pin;
cache_flags = (shah->flags & (GTF_PAT | GTF_PWT | GTF_PCD) );
- read_unlock(&rgt->lock);
-
/* pg may be set, with a refcount included, from __get_paged_frame */
if ( !pg )
{
@@ -778,8 +800,6 @@ __gnttab_map_grant_ref(
goto undo_out;
}
- double_maptrack_lock(lgt, rgt);
-
if ( gnttab_need_iommu_mapping(ld) )
{
unsigned int wrc, rdc;
@@ -787,21 +807,20 @@ __gnttab_map_grant_ref(
/* We're not translated, so we know that gmfns and mfns are
the same things, so the IOMMU entry is always 1-to-1. */
mapcount(lgt, rd, frame, &wrc, &rdc);
- if ( (act_pin & (GNTPIN_hstw_mask|GNTPIN_devw_mask)) &&
+ if ( (act->pin & (GNTPIN_hstw_mask|GNTPIN_devw_mask)) &&
!(old_pin & (GNTPIN_hstw_mask|GNTPIN_devw_mask)) )
{
if ( wrc == 0 )
err = iommu_map_page(ld, frame, frame,
IOMMUF_readable|IOMMUF_writable);
}
- else if ( act_pin && !old_pin )
+ else if ( act->pin && !old_pin )
{
if ( (wrc + rdc) == 0 )
err = iommu_map_page(ld, frame, frame, IOMMUF_readable);
}
if ( err )
{
- double_maptrack_unlock(lgt, rgt);
rc = GNTST_general_error;
goto undo_out;
}
@@ -814,7 +833,8 @@ __gnttab_map_grant_ref(
mt->ref = op->ref;
mt->flags = op->flags;
- double_maptrack_unlock(lgt, rgt);
+ active_entry_release(act);
+ read_unlock(&rgt->lock);
op->dev_bus_addr = (u64)frame << PAGE_SHIFT;
op->handle = handle;
@@ -837,10 +857,6 @@ __gnttab_map_grant_ref(
put_page(pg);
}
- read_lock(&rgt->lock);
-
- act = &active_entry(rgt, op->ref);
-
if ( op->flags & GNTMAP_device_map )
act->pin -= (op->flags & GNTMAP_readonly) ?
GNTPIN_devr_inc : GNTPIN_devw_inc;
@@ -856,6 +872,9 @@ __gnttab_map_grant_ref(
if ( !act->pin )
gnttab_clear_flag(_GTF_reading, status);
+ act_release_out:
+ active_entry_release(act);
+
unlock_out:
read_unlock(&rgt->lock);
op->status = rc;
@@ -941,18 +960,19 @@ __gnttab_unmap_common(
rgt = rd->grant_table;
read_lock(&rgt->lock);
- double_maptrack_lock(lgt, rgt);
+
+ read_lock(&rgt->lock);
+ act = active_entry_acquire(rgt, op->map->ref);
op->flags = op->map->flags;
if ( unlikely(!op->flags) || unlikely(op->map->domid != dom) )
{
gdprintk(XENLOG_WARNING, "Unstable handle %u\n", op->handle);
rc = GNTST_bad_handle;
- goto unmap_out;
+ goto act_release_out;
}
op->rd = rd;
- act = &active_entry(rgt, op->map->ref);
if ( op->frame == 0 )
{
@@ -961,7 +981,7 @@ __gnttab_unmap_common(
else
{
if ( unlikely(op->frame != act->frame) )
- PIN_FAIL(unmap_out, GNTST_general_error,
+ PIN_FAIL(act_release_out, GNTST_general_error,
"Bad frame number doesn't match gntref. (%lx != %lx)\n",
op->frame, act->frame);
if ( op->flags & GNTMAP_device_map )
@@ -980,7 +1000,7 @@ __gnttab_unmap_common(
if ( (rc = replace_grant_host_mapping(op->host_addr,
op->frame, op->new_addr,
op->flags)) < 0 )
- goto unmap_out;
+ goto act_release_out;
ASSERT(act->pin & (GNTPIN_hstw_mask | GNTPIN_hstr_mask));
op->map->flags &= ~GNTMAP_host_map;
@@ -1002,7 +1022,7 @@ __gnttab_unmap_common(
if ( err )
{
rc = GNTST_general_error;
- goto unmap_out;
+ goto act_release_out;
}
}
@@ -1010,9 +1030,10 @@ __gnttab_unmap_common(
if ( !(op->flags & GNTMAP_readonly) )
gnttab_mark_dirty(rd, op->frame);
- unmap_out:
- double_maptrack_unlock(lgt, rgt);
+ act_release_out:
+ active_entry_release(act);
read_unlock(&rgt->lock);
+
op->status = rc;
rcu_unlock_domain(rd);
}
@@ -1045,9 +1066,9 @@ __gnttab_unmap_common_complete(struct gnttab_unmap_common *op)
read_lock(&rgt->lock);
if ( rgt->gt_version == 0 )
- goto unmap_out;
+ goto unlock_out;
- act = &active_entry(rgt, op->map->ref);
+ act = active_entry_acquire(rgt, op->map->ref);
sha = shared_entry_header(rgt, op->map->ref);
if ( rgt->gt_version == 1 )
@@ -1061,7 +1082,7 @@ __gnttab_unmap_common_complete(struct gnttab_unmap_common *op)
* Suggests that __gntab_unmap_common failed early and so
* nothing further to do
*/
- goto unmap_out;
+ goto act_release_out;
}
pg = mfn_to_page(op->frame);
@@ -1085,7 +1106,7 @@ __gnttab_unmap_common_complete(struct gnttab_unmap_common *op)
* Suggests that __gntab_unmap_common failed in
* replace_grant_host_mapping() so nothing further to do
*/
- goto unmap_out;
+ goto act_release_out;
}
if ( !is_iomem_page(op->frame) )
@@ -1106,8 +1127,12 @@ __gnttab_unmap_common_complete(struct gnttab_unmap_common *op)
if ( act->pin == 0 )
gnttab_clear_flag(_GTF_reading, status);
- unmap_out:
+ act_release_out:
+ active_entry_release(act);
+
+ unlock_out:
read_unlock(&rgt->lock);
+
if ( put_handle )
{
op->map->flags = 0;
@@ -1301,7 +1326,7 @@ gnttab_grow_table(struct domain *d, unsigned int req_nr_frames)
/* d's grant table write lock must be held by the caller */
struct grant_table *gt = d->grant_table;
- unsigned int i;
+ unsigned int i, j;
ASSERT(req_nr_frames <= max_grant_frames);
@@ -1316,6 +1341,8 @@ gnttab_grow_table(struct domain *d, unsigned int req_nr_frames)
if ( (gt->active[i] = alloc_xenheap_page()) == NULL )
goto active_alloc_failed;
clear_page(gt->active[i]);
+ for ( j = 0; j < ACGNT_PER_PAGE; j++ )
+ spin_lock_init(>->active[i][j].lock);
}
/* Shared */
@@ -1810,7 +1837,7 @@ __release_grant_for_copy(
read_lock(&rgt->lock);
- act = &active_entry(rgt, gref);
+ act = active_entry_acquire(rgt, gref);
sha = shared_entry_header(rgt, gref);
r_frame = act->frame;
@@ -1849,6 +1876,7 @@ __release_grant_for_copy(
released_read = 1;
}
+ active_entry_release(act);
read_unlock(&rgt->lock);
if ( td != rd )
@@ -1910,14 +1938,14 @@ __acquire_grant_for_copy(
read_lock(&rgt->lock);
if ( rgt->gt_version == 0 )
- PIN_FAIL(unlock_out, GNTST_general_error,
+ PIN_FAIL(gnt_unlock_out, GNTST_general_error,
"remote grant table not ready\n");
if ( unlikely(gref >= nr_grant_entries(rgt)) )
- PIN_FAIL(unlock_out, GNTST_bad_gntref,
+ PIN_FAIL(gnt_unlock_out, GNTST_bad_gntref,
"Bad grant reference %ld\n", gref);
- act = &active_entry(rgt, gref);
+ act = active_entry_acquire(rgt, gref);
shah = shared_entry_header(rgt, gref);
if ( rgt->gt_version == 1 )
{
@@ -1980,6 +2008,7 @@ __acquire_grant_for_copy(
/* __acquire_grant_for_copy() could take the read lock on
the right table (if rd == td), so we have to drop the
lock here and reacquire */
+ active_entry_release(act);
read_unlock(&rgt->lock);
rc = __acquire_grant_for_copy(td, trans_gref, rd->domain_id,
@@ -1987,8 +2016,11 @@ __acquire_grant_for_copy(
&trans_page_off, &trans_length, 0);
read_lock(&rgt->lock);
+ act = active_entry_acquire(rgt, gref);
+
if ( rc != GNTST_okay ) {
__fixup_status_for_copy_pin(act, status);
+ active_entry_release(act);
read_unlock(&rgt->lock);
rcu_unlock_domain(td);
return rc;
@@ -2002,6 +2034,7 @@ __acquire_grant_for_copy(
{
__fixup_status_for_copy_pin(act, status);
rcu_unlock_domain(td);
+ active_entry_release(act);
read_unlock(&rgt->lock);
put_page(*page);
return __acquire_grant_for_copy(rd, gref, ldom, readonly,
@@ -2070,6 +2103,7 @@ __acquire_grant_for_copy(
*length = act->length;
*frame = act->frame;
+ active_entry_release(act);
read_unlock(&rgt->lock);
return rc;
@@ -2082,6 +2116,9 @@ __acquire_grant_for_copy(
gnttab_clear_flag(_GTF_reading, status);
unlock_out:
+ active_entry_release(act);
+
+ gnt_unlock_out:
read_unlock(&rgt->lock);
return rc;
}
@@ -2265,16 +2302,18 @@ gnttab_set_version(XEN_GUEST_HANDLE_PARAM(gnttab_set_version_t) uop)
{
for ( i = GNTTAB_NR_RESERVED_ENTRIES; i < nr_grant_entries(gt); i++ )
{
- act = &active_entry(gt, i);
+ act = active_entry_acquire(gt, i);
if ( act->pin != 0 )
{
gdprintk(XENLOG_WARNING,
"tried to change grant table version from %d to %d, but some grant entries still in use\n",
gt->gt_version,
op.version);
+ active_entry_release(act);
res = -EBUSY;
goto out_unlock;
}
+ active_entry_release(act);
}
}
@@ -2453,9 +2492,16 @@ __gnttab_swap_grant_ref(grant_ref_t ref_a, grant_ref_t ref_b)
{
struct domain *d = rcu_lock_current_domain();
struct grant_table *gt = d->grant_table;
- struct active_grant_entry *act;
+ struct active_grant_entry *act_a = NULL;
+ struct active_grant_entry *act_b = NULL;
s16 rc = GNTST_okay;
+ if ( ref_a == ref_b )
+ /* noop, so avoid acquiring the same active entry
+ * twice which would case a deadlock.
+ */
+ return rc;
+
read_lock(>->lock);
/* Bounds check on the grant refs */
@@ -2464,12 +2510,12 @@ __gnttab_swap_grant_ref(grant_ref_t ref_a, grant_ref_t ref_b)
if ( unlikely(ref_b >= nr_grant_entries(d->grant_table)))
PIN_FAIL(out, GNTST_bad_gntref, "Bad ref-b (%d).\n", ref_b);
- act = &active_entry(gt, ref_a);
- if ( act->pin )
+ act_a = active_entry_acquire(gt, ref_a);
+ if ( act_a->pin )
PIN_FAIL(out, GNTST_eagain, "ref a %ld busy\n", (long)ref_a);
- act = &active_entry(gt, ref_b);
- if ( act->pin )
+ act_b = active_entry_acquire(gt, ref_b);
+ if ( act_b->pin )
PIN_FAIL(out, GNTST_eagain, "ref b %ld busy\n", (long)ref_b);
if ( gt->gt_version == 1 )
@@ -2496,8 +2542,11 @@ __gnttab_swap_grant_ref(grant_ref_t ref_a, grant_ref_t ref_b)
}
out:
+ if ( act_b != NULL )
+ active_entry_release(act_b);
+ if ( act_a != NULL )
+ active_entry_release(act_a);
read_unlock(>->lock);
-
rcu_unlock_domain(d);
return rc;
@@ -2805,7 +2854,7 @@ grant_table_create(
struct domain *d)
{
struct grant_table *t;
- int i;
+ int i, j;
if ( (t = xzalloc(struct grant_table)) == NULL )
goto no_mem_0;
@@ -2825,6 +2874,8 @@ grant_table_create(
if ( (t->active[i] = alloc_xenheap_page()) == NULL )
goto no_mem_2;
clear_page(t->active[i]);
+ for ( j = 0; j < ACGNT_PER_PAGE; j++ )
+ spin_lock_init(&t->active[i][j].lock);
}
/* Tracking of mapped foreign frames table */
@@ -2921,7 +2972,7 @@ gnttab_release_mappings(
rgt = rd->grant_table;
read_lock(&rgt->lock);
- act = &active_entry(rgt, ref);
+ act = active_entry_acquire(rgt, ref);
sha = shared_entry_header(rgt, ref);
if (rgt->gt_version == 1)
status = &sha->flags;
@@ -2979,6 +3030,7 @@ gnttab_release_mappings(
if ( act->pin == 0 )
gnttab_clear_flag(_GTF_reading, status);
+ active_entry_release(act);
read_unlock(&rgt->lock);
rcu_unlock_domain(rd);
@@ -2999,7 +3051,7 @@ grant_table_destroy(
return;
write_lock(&t->lock);
-
+
for ( i = 0; i < nr_grant_frames(t); i++ )
free_xenheap_page(t->shared_raw[i]);
xfree(t->shared_raw);
@@ -3045,9 +3097,11 @@ static void gnttab_usage_print(struct domain *rd)
uint16_t status;
uint64_t frame;
- act = &active_entry(gt, ref);
- if ( !act->pin )
+ act = active_entry_acquire(gt, ref);
+ if ( !act->pin ) {
+ active_entry_release(act);
continue;
+ }
sha = shared_entry_header(gt, ref);
@@ -3077,6 +3131,7 @@ static void gnttab_usage_print(struct domain *rd)
printk("[%3d] %5d 0x%06lx 0x%08x %5d 0x%06"PRIx64" 0x%02x\n",
ref, act->domid, act->frame, act->pin,
sha->domid, frame, status);
+ active_entry_release(act);
}
out:
--
1.7.9.5
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v4 1/2] gnttab: Introduce rwlock to protect updates to grant table state
2015-01-09 15:12 ` [PATCH v4 1/2] gnttab: Introduce rwlock to protect updates to grant table state Christoph Egger
@ 2015-01-12 15:09 ` Jan Beulich
2015-01-12 16:03 ` Egger, Christoph
0 siblings, 1 reply; 7+ messages in thread
From: Jan Beulich @ 2015-01-12 15:09 UTC (permalink / raw)
To: Christoph Egger; +Cc: Julien Grall, Keir Fraser, Matt Wilson, xen-devel
>>> On 09.01.15 at 16:12, <chegger@amazon.de> wrote:
> @@ -899,26 +899,27 @@ __gnttab_unmap_common(
>
> op->frame = (unsigned long)(op->dev_bus_addr >> PAGE_SHIFT);
>
> + read_lock(&lgt->lock);
> if ( unlikely(op->handle >= lgt->maptrack_limit) )
> {
> + read_unlock(&lgt->lock);
> gdprintk(XENLOG_INFO, "Bad handle (%d).\n", op->handle);
> op->status = GNTST_bad_handle;
> return;
> }
>
> op->map = &maptrack_entry(lgt, op->handle);
> - spin_lock(&lgt->lock);
The extension of the locked region is still not being mentioned in the
description - as said for v3, if this is really needed, it should then be
fixed by a separate, much smaller change. (The main reason why
the first if() doesn't need to happen under lock is - afair - that
lgt->maptrack_limit can only ever increase.)
> @@ -939,7 +940,8 @@ __gnttab_unmap_common(
> TRACE_1D(TRC_MEM_PAGE_GRANT_UNMAP, dom);
>
> rgt = rd->grant_table;
> - double_gt_lock(lgt, rgt);
> + read_lock(&rgt->lock);
> + double_maptrack_lock(lgt, rgt);
Repeating what I said for v3: "The nesting of the two locks should
be mentioned in the doc change" (at the very least).
> @@ -1290,10 +1293,12 @@ gnttab_unpopulate_status_frames(struct domain *d, struct grant_table *gt)
> gt->nr_status_frames = 0;
> }
>
> +/* Grow the grant table. The caller must hold the grant table's
> + write lock before calling this function. */
Above you said you fixed the coding style issues, but at least here
you didn't.
> @@ -1971,17 +1976,21 @@ __acquire_grant_for_copy(
> PIN_FAIL(unlock_out_clear, GNTST_general_error,
> "transitive grant referenced bad domain %d\n",
> trans_domid);
> - spin_unlock(&rgt->lock);
> +
> + /* __acquire_grant_for_copy() could take the read lock on
> + the right table (if rd == td), so we have to drop the
> + lock here and reacquire */
Nor here.
> @@ -2447,7 +2456,7 @@ __gnttab_swap_grant_ref(grant_ref_t ref_a, grant_ref_t
> ref_b)
> struct active_grant_entry *act;
> s16 rc = GNTST_okay;
>
> - spin_lock(>->lock);
> + read_lock(>->lock);
Considering the purpose of this function, wouldn't this need to be a
write lock?
> @@ -2909,7 +2919,7 @@ gnttab_release_mappings(
> }
>
> rgt = rd->grant_table;
> - spin_lock(&rgt->lock);
> + read_lock(&rgt->lock);
>
> act = &active_entry(rgt, ref);
> sha = shared_entry_header(rgt, ref);
> @@ -2969,7 +2979,7 @@ gnttab_release_mappings(
> if ( act->pin == 0 )
> gnttab_clear_flag(_GTF_reading, status);
>
> - spin_unlock(&rgt->lock);
> + read_unlock(&rgt->lock);
Repeating the question on v3: "The maptrack entries are being
accessed between these two - don't you need both locks here?"
Overall I find it quite unfriendly (wasting reviewing bandwidth) to
submit a new version with a meaningful number of comments on the
previous version un-addressed.
Jan
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v4 2/2] gnttab: refactor locking for scalability
2015-01-09 15:12 ` [PATCH v4 2/2] gnttab: refactor locking for scalability Christoph Egger
@ 2015-01-12 15:25 ` Jan Beulich
0 siblings, 0 replies; 7+ messages in thread
From: Jan Beulich @ 2015-01-12 15:25 UTC (permalink / raw)
To: Christoph Egger; +Cc: Keir Fraser, Matt Wilson, Anthony Liguori, xen-devel
>>> On 09.01.15 at 16:12, <chegger@amazon.de> wrote:
> @@ -188,6 +191,26 @@ nr_active_grant_frames(struct grant_table *gt)
> return num_act_frames_from_sha_frames(nr_grant_frames(gt));
> }
>
> +static inline struct active_grant_entry *
> +active_entry_acquire(struct grant_table *t, grant_ref_t e)
> +{
> + struct active_grant_entry *act;
> +
> + /* not perfect, but better than nothing for a debug build
> + * sanity check */
When coding style issues are being pointed out, please make sure
you go through your patch series and address _all_ of them, even
if not each and every violation was explicitly named.
> @@ -514,17 +513,22 @@ static int grant_map_exists(const struct domain *ld,
> nr_grant_entries(rgt));
> for ( ref = *ref_count; ref < max_iter; ref++ )
> {
> - act = &active_entry(rgt, ref);
> + struct active_grant_entry *act;
> + act = active_entry_acquire(rgt, ref);
>
> - if ( !act->pin )
> - continue;
> +#define CONTINUE_IF(cond) \
> +if ((cond)) { \
> + active_entry_release(act); \
> + continue; \
> +}
>
> - if ( act->domid != ld->domain_id )
> - continue;
> + CONTINUE_IF(!act->pin);
> + CONTINUE_IF(act->domid != ld->domain_id);
> + CONTINUE_IF(act->frame != mfn);
I highly question this is in any way better than the v3 code. Is there
a clear reason not to follow the advice of putting the
active_entry_release(act) into the third of the for() expressions?
And if you _really_ want/need it this way, please again get the coding
style right (and drop the pointless redundant parentheses).
> @@ -545,16 +555,32 @@ static void mapcount(
> grant_handle_t handle;
>
> *wrc = *rdc = 0;
> + ASSERT(rw_is_locked(&rd->grant_table->lock));
> +
> + /* N.B.: while taking the left side maptrack spinlock prevents
> + * any mapping changes, the right side active entries could be
> + * changing while we are counting. To be perfectly correct, the
> + * caller should hold the grant table write lock, or some other
> + * mechanism should be used to prevent concurrent changes during
> + * this operation. This is tricky because we can't promote a read
> + * lock into a write lock.
> + */
> + spin_lock(&lgt->maptrack_lock);
So you added the suggested ASSERT(), but you didn't adjust the
comment (which actually should precede the ASSERT() imo) in any
way.
> @@ -698,12 +723,9 @@ __gnttab_map_grant_ref(
> GNTPIN_hstr_inc : GNTPIN_hstw_inc;
>
> frame = act->frame;
> - act_pin = act->pin;
>
> cache_flags = (shah->flags & (GTF_PAT | GTF_PWT | GTF_PCD) );
>
> - read_unlock(&rgt->lock);
> -
> /* pg may be set, with a refcount included, from __get_paged_frame */
> if ( !pg )
> {
> @@ -778,8 +800,6 @@ __gnttab_map_grant_ref(
> goto undo_out;
> }
>
> - double_maptrack_lock(lgt, rgt);
Again an un-addressed comment from v3: "Taking these two together
- isn't patch 1 wrong then, in that it acquires both maptrack locks in
double_gt_lock()?"
> @@ -941,18 +960,19 @@ __gnttab_unmap_common(
>
> rgt = rd->grant_table;
> read_lock(&rgt->lock);
> - double_maptrack_lock(lgt, rgt);
> +
> + read_lock(&rgt->lock);
Acquiring the same read lock twice in a row?
> @@ -3045,9 +3097,11 @@ static void gnttab_usage_print(struct domain *rd)
> uint16_t status;
> uint64_t frame;
>
> - act = &active_entry(gt, ref);
> - if ( !act->pin )
> + act = active_entry_acquire(gt, ref);
> + if ( !act->pin ) {
Coding style again.
Jan
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v4 1/2] gnttab: Introduce rwlock to protect updates to grant table state
2015-01-12 15:09 ` Jan Beulich
@ 2015-01-12 16:03 ` Egger, Christoph
2015-01-12 16:22 ` Jan Beulich
0 siblings, 1 reply; 7+ messages in thread
From: Egger, Christoph @ 2015-01-12 16:03 UTC (permalink / raw)
To: Jan Beulich; +Cc: Julien Grall, Keir Fraser, Matt Wilson, xen-devel
On 2015/01/12 16:09, Jan Beulich wrote:
>>>> On 09.01.15 at 16:12, <chegger@amazon.de> wrote:
>> @@ -899,26 +899,27 @@ __gnttab_unmap_common(
>>
>> op->frame = (unsigned long)(op->dev_bus_addr >> PAGE_SHIFT);
>>
>> + read_lock(&lgt->lock);
>> if ( unlikely(op->handle >= lgt->maptrack_limit) )
>> {
>> + read_unlock(&lgt->lock);
>> gdprintk(XENLOG_INFO, "Bad handle (%d).\n", op->handle);
>> op->status = GNTST_bad_handle;
>> return;
>> }
>>
>> op->map = &maptrack_entry(lgt, op->handle);
>> - spin_lock(&lgt->lock);
>
> The extension of the locked region is still not being mentioned in the
> description - as said for v3, if this is really needed, it should then be
> fixed by a separate, much smaller change. (The main reason why
> the first if() doesn't need to happen under lock is - afair - that
> lgt->maptrack_limit can only ever increase.)
It is mentioned in the doc change to docs/misc/grant-tables.txt:
+ The maptrack state is protected by its own spinlock. Any access (read
+ or write) of struct grant_table members that have a "maptrack_"
+ prefix must be made while holding the maptrack lock. The maptrack
+ state can be rapidly modified under some workloads, and the critical
+ sections are very small, thus we use a spinlock to protect them.
>> @@ -939,7 +940,8 @@ __gnttab_unmap_common(
>> TRACE_1D(TRC_MEM_PAGE_GRANT_UNMAP, dom);
>>
>> rgt = rd->grant_table;
>> - double_gt_lock(lgt, rgt);
>> + read_lock(&rgt->lock);
>> + double_maptrack_lock(lgt, rgt);
>
> Repeating what I said for v3: "The nesting of the two locks should
> be mentioned in the doc change" (at the very least).
... to be removed again in the second patch?
double_maptrack_lock() goes away in the second patch.
>> @@ -1290,10 +1293,12 @@ gnttab_unpopulate_status_frames(struct domain *d, struct grant_table *gt)
>> gt->nr_status_frames = 0;
>> }
>>
>> +/* Grow the grant table. The caller must hold the grant table's
>> + write lock before calling this function. */
>
> Above you said you fixed the coding style issues, but at least here
> you didn't.
The comments match with the style as done everywhere in that file.
>> @@ -2447,7 +2456,7 @@ __gnttab_swap_grant_ref(grant_ref_t ref_a, grant_ref_t
>> ref_b)
>> struct active_grant_entry *act;
>> s16 rc = GNTST_okay;
>>
>> - spin_lock(>->lock);
>> + read_lock(>->lock);
>
> Considering the purpose of this function, wouldn't this need to be a
> write lock?
Yes, right.
>> @@ -2909,7 +2919,7 @@ gnttab_release_mappings(
>> }
>>
>> rgt = rd->grant_table;
>> - spin_lock(&rgt->lock);
>> + read_lock(&rgt->lock);
>>
>> act = &active_entry(rgt, ref);
>> sha = shared_entry_header(rgt, ref);
>> @@ -2969,7 +2979,7 @@ gnttab_release_mappings(
>> if ( act->pin == 0 )
>> gnttab_clear_flag(_GTF_reading, status);
>>
>> - spin_unlock(&rgt->lock);
>> + read_unlock(&rgt->lock);
>
> Repeating the question on v3: "The maptrack entries are being
> accessed between these two - don't you need both locks here?"
yes, and be removed in the second patch again.
> Overall I find it quite unfriendly (wasting reviewing bandwidth) to
> submit a new version with a meaningful number of comments on the
> previous version un-addressed.
Sorry, I lost track of what I did and what I did not.
Christoph
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v4 1/2] gnttab: Introduce rwlock to protect updates to grant table state
2015-01-12 16:03 ` Egger, Christoph
@ 2015-01-12 16:22 ` Jan Beulich
0 siblings, 0 replies; 7+ messages in thread
From: Jan Beulich @ 2015-01-12 16:22 UTC (permalink / raw)
To: Christoph Egger; +Cc: Julien Grall, Keir Fraser, Matt Wilson, xen-devel
>>> On 12.01.15 at 17:03, <chegger@amazon.de> wrote:
> On 2015/01/12 16:09, Jan Beulich wrote:
>>>>> On 09.01.15 at 16:12, <chegger@amazon.de> wrote:
>>> @@ -899,26 +899,27 @@ __gnttab_unmap_common(
>>>
>>> op->frame = (unsigned long)(op->dev_bus_addr >> PAGE_SHIFT);
>>>
>>> + read_lock(&lgt->lock);
>>> if ( unlikely(op->handle >= lgt->maptrack_limit) )
>>> {
>>> + read_unlock(&lgt->lock);
>>> gdprintk(XENLOG_INFO, "Bad handle (%d).\n", op->handle);
>>> op->status = GNTST_bad_handle;
>>> return;
>>> }
>>>
>>> op->map = &maptrack_entry(lgt, op->handle);
>>> - spin_lock(&lgt->lock);
>>
>> The extension of the locked region is still not being mentioned in the
>> description - as said for v3, if this is really needed, it should then be
>> fixed by a separate, much smaller change. (The main reason why
>> the first if() doesn't need to happen under lock is - afair - that
>> lgt->maptrack_limit can only ever increase.)
>
> It is mentioned in the doc change to docs/misc/grant-tables.txt:
>
> + The maptrack state is protected by its own spinlock. Any access (read
> + or write) of struct grant_table members that have a "maptrack_"
> + prefix must be made while holding the maptrack lock. The maptrack
> + state can be rapidly modified under some workloads, and the critical
> + sections are very small, thus we use a spinlock to protect them.
That's only a very remote connection. If there are (perceived)
locking deficiencies, they should be called out explicitly (and
fixed separately). But as mentioned - I think you should not be
altering locking here.
>>> @@ -939,7 +940,8 @@ __gnttab_unmap_common(
>>> TRACE_1D(TRC_MEM_PAGE_GRANT_UNMAP, dom);
>>>
>>> rgt = rd->grant_table;
>>> - double_gt_lock(lgt, rgt);
>>> + read_lock(&rgt->lock);
>>> + double_maptrack_lock(lgt, rgt);
>>
>> Repeating what I said for v3: "The nesting of the two locks should
>> be mentioned in the doc change" (at the very least).
>
> ... to be removed again in the second patch?
> double_maptrack_lock() goes away in the second patch.
Yes, sorry, any patch series needs to be fully bisectable (or at
the very least no intentionally be broken, knowing that a later
patch will fix the brokenness). If a patch really depends on a
later one to fix it, the patches shouldn't have been split.
>>> @@ -1290,10 +1293,12 @@ gnttab_unpopulate_status_frames(struct domain *d, struct grant_table *gt)
>>> gt->nr_status_frames = 0;
>>> }
>>>
>>> +/* Grow the grant table. The caller must hold the grant table's
>>> + write lock before calling this function. */
>>
>> Above you said you fixed the coding style issues, but at least here
>> you didn't.
>
> The comments match with the style as done everywhere in that file.
The existing style in the file is not the reference; please see
./CODING_STYLE.
>>> @@ -2909,7 +2919,7 @@ gnttab_release_mappings(
>>> }
>>>
>>> rgt = rd->grant_table;
>>> - spin_lock(&rgt->lock);
>>> + read_lock(&rgt->lock);
>>>
>>> act = &active_entry(rgt, ref);
>>> sha = shared_entry_header(rgt, ref);
>>> @@ -2969,7 +2979,7 @@ gnttab_release_mappings(
>>> if ( act->pin == 0 )
>>> gnttab_clear_flag(_GTF_reading, status);
>>>
>>> - spin_unlock(&rgt->lock);
>>> + read_unlock(&rgt->lock);
>>
>> Repeating the question on v3: "The maptrack entries are being
>> accessed between these two - don't you need both locks here?"
>
> yes, and be removed in the second patch again.
>
>> Overall I find it quite unfriendly (wasting reviewing bandwidth) to
>> submit a new version with a meaningful number of comments on the
>> previous version un-addressed.
>
> Sorry, I lost track of what I did and what I did not.
But it can't be that difficult to go through the comments you got
on the previous version again before you send out the new one.
Please appreciate that reviewing bandwidth is limited.
Jan
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2015-01-12 16:22 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-01-09 15:12 [PATCH v4 0/2] gnttab: Improve scaleability Christoph Egger
2015-01-09 15:12 ` [PATCH v4 1/2] gnttab: Introduce rwlock to protect updates to grant table state Christoph Egger
2015-01-12 15:09 ` Jan Beulich
2015-01-12 16:03 ` Egger, Christoph
2015-01-12 16:22 ` Jan Beulich
2015-01-09 15:12 ` [PATCH v4 2/2] gnttab: refactor locking for scalability Christoph Egger
2015-01-12 15:25 ` Jan Beulich
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.