* [PATCHv11 1/4] gnttab: per-active entry locking
2015-06-02 16:26 [PATCHv11 0/4] gnttab: Improve scaleability David Vrabel
@ 2015-06-02 16:26 ` David Vrabel
2015-06-05 13:32 ` Jan Beulich
2015-06-02 16:26 ` [PATCHv11 2/4] gnttab: introduce maptrack lock David Vrabel
` (2 subsequent siblings)
3 siblings, 1 reply; 22+ messages in thread
From: David Vrabel @ 2015-06-02 16:26 UTC (permalink / raw)
To: xen-devel
Cc: Keir Fraser, Ian Campbell, Tim Deegan, David Vrabel, Jan Beulich,
Malcolm Crossley
Introduce a per-active entry spin lock to protect active entry state
The grant table lock must be locked before acquiring (locking) an
active entry.
This is a step in reducing contention on the grant table lock, but
will only do so once the grant table lock is turned into a read-write
lock.
Based on a patch originally by Matt Wilson <msw@amazon.com>.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
Changes in v11:
- Fix exists test in gnttab_map_exists().
Changes in v10:
- Reduce scope of act in grant_map_exists().
- Make unlock sequence in error paths consistent in
__acquire_grant_for_copy().
- gnt_unlock_out -> to gt_unlock_out in __acquire_grant_for_copy().
---
docs/misc/grant-tables.txt | 42 ++++++++++++-
xen/common/grant_table.c | 146 ++++++++++++++++++++++++++++++++------------
2 files changed, 147 insertions(+), 41 deletions(-)
diff --git a/docs/misc/grant-tables.txt b/docs/misc/grant-tables.txt
index 19db4ec..2a3f591 100644
--- a/docs/misc/grant-tables.txt
+++ b/docs/misc/grant-tables.txt
@@ -63,6 +63,7 @@ is complete.
act->domid : remote domain being granted rights
act->frame : machine frame being granted
act->pin : used to hold reference counts
+ act->lock : spinlock used to serialize access to active entry state
Map tracking
~~~~~~~~~~~~
@@ -74,7 +75,46 @@ is complete.
matching map track entry is then removed, as if unmap had been invoked.
These are not used by the transfer mechanism.
map->domid : owner of the mapped frame
- map->ref_and_flags : grant reference, ro/rw, mapped for host or device access
+ map->ref : grant reference
+ map->flags : ro/rw, mapped for host or device access
+
+********************************************************************************
+ Locking
+ ~~~~~~~
+ Xen uses several locks to serialize access to the internal grant table state.
+
+ grant_table->lock : lock used to prevent readers from accessing
+ inconsistent grant table state such as current
+ version, partially initialized active table pages,
+ etc.
+ active_grant_entry->lock : spinlock used to serialize modifications to
+ active entries
+
+ The primary lock for the grant table is a spinlock. All functions
+ that access members of struct grant_table must acquire the lock
+ around critical sections.
+
+ Active entries are obtained by calling active_entry_acquire(gt, ref).
+ This function returns a pointer to the active entry after locking its
+ spinlock. The caller must hold the grant table lock for the gt in
+ question before calling active_entry_acquire(). This is because the
+ grant table can be dynamically extended via gnttab_grow_table() while
+ a domain is running and must be fully initialized. Once all access to
+ the active entry is complete, release the lock by calling
+ active_entry_release(act).
+
+ Summary of rules for locking:
+ active_entry_acquire() and active_entry_release() can only be
+ called when holding the relevant grant table's lock. I.e.:
+ spin_lock(>->lock);
+ act = active_entry_acquire(gt, ref);
+ ...
+ active_entry_release(act);
+ spin_unlock(>->lock);
+
+ Active entries cannot be acquired while holding the maptrack lock.
+ Multiple active entries can be acquired while holding the grant table
+ lock.
********************************************************************************
diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index dfb45f8..d91661b 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -157,10 +157,13 @@ struct active_grant_entry {
in the page. */
unsigned length:16; /* For sub-page grants, the length of the
grant. */
+ spinlock_t lock; /* lock to protect access of this entry.
+ see docs/misc/grant-tables.txt for
+ locking protocol */
};
#define ACGNT_PER_PAGE (PAGE_SIZE / sizeof(struct active_grant_entry))
-#define active_entry(t, e) \
+#define _active_entry(t, e) \
((t)->active[(e)/ACGNT_PER_PAGE][(e)%ACGNT_PER_PAGE])
static inline void gnttab_flush_tlb(const struct domain *d)
@@ -188,6 +191,24 @@ nr_active_grant_frames(struct grant_table *gt)
return num_act_frames_from_sha_frames(nr_grant_frames(gt));
}
+static inline struct active_grant_entry *
+active_entry_acquire(struct grant_table *t, grant_ref_t e)
+{
+ struct active_grant_entry *act;
+
+ ASSERT(spin_is_locked(&t->lock));
+
+ act = &_active_entry(t, e);
+ spin_lock(&act->lock);
+
+ return act;
+}
+
+static inline void active_entry_release(struct active_grant_entry *act)
+{
+ spin_unlock(&act->lock);
+}
+
/* Check if the page has been paged out, or needs unsharing.
If rc == GNTST_okay, *page contains the page struct with a ref taken.
Caller must do put_page(*page).
@@ -505,7 +526,6 @@ static int grant_map_exists(const struct domain *ld,
unsigned long mfn,
unsigned int *ref_count)
{
- const struct active_grant_entry *act;
unsigned int ref, max_iter;
ASSERT(spin_is_locked(&rgt->lock));
@@ -514,18 +534,19 @@ static int grant_map_exists(const struct domain *ld,
nr_grant_entries(rgt));
for ( ref = *ref_count; ref < max_iter; ref++ )
{
- act = &active_entry(rgt, ref);
+ struct active_grant_entry *act;
+ bool_t exists;
- if ( !act->pin )
- continue;
+ act = active_entry_acquire(rgt, ref);
- if ( act->domid != ld->domain_id )
- continue;
+ exists = act->pin
+ && act->domid == ld->domain_id
+ && act->frame == mfn;
- if ( act->frame != mfn )
- continue;
+ active_entry_release(act);
- return 0;
+ if ( exists )
+ return 0;
}
if ( ref < nr_grant_entries(rgt) )
@@ -546,13 +567,19 @@ static void mapcount(
*wrc = *rdc = 0;
+ /*
+ * Must have the remote domain's grant table lock while counting
+ * its active entries.
+ */
+ ASSERT(spin_is_locked(&rd->grant_table->lock));
+
for ( handle = 0; handle < lgt->maptrack_limit; handle++ )
{
map = &maptrack_entry(lgt, handle);
if ( !(map->flags & (GNTMAP_device_map|GNTMAP_host_map)) ||
map->domid != rd->domain_id )
continue;
- if ( active_entry(rd->grant_table, map->ref).frame == mfn )
+ if ( _active_entry(rd->grant_table, map->ref).frame == mfn )
(map->flags & GNTMAP_readonly) ? (*rdc)++ : (*wrc)++;
}
}
@@ -639,7 +666,7 @@ __gnttab_map_grant_ref(
if ( unlikely(op->ref >= nr_grant_entries(rgt)))
PIN_FAIL(unlock_out, GNTST_bad_gntref, "Bad ref (%d).\n", op->ref);
- act = &active_entry(rgt, op->ref);
+ act = active_entry_acquire(rgt, op->ref);
shah = shared_entry_header(rgt, op->ref);
if (rgt->gt_version == 1) {
sha1 = &shared_entry_v1(rgt, op->ref);
@@ -656,7 +683,7 @@ __gnttab_map_grant_ref(
((act->domid != ld->domain_id) ||
(act->pin & 0x80808080U) != 0 ||
(act->is_sub_page)) )
- PIN_FAIL(unlock_out, GNTST_general_error,
+ PIN_FAIL(act_release_out, GNTST_general_error,
"Bad domain (%d != %d), or risk of counter overflow %08x, or subpage %d\n",
act->domid, ld->domain_id, act->pin, act->is_sub_page);
@@ -667,7 +694,7 @@ __gnttab_map_grant_ref(
if ( (rc = _set_status(rgt->gt_version, ld->domain_id,
op->flags & GNTMAP_readonly,
1, shah, act, status) ) != GNTST_okay )
- goto unlock_out;
+ goto act_release_out;
if ( !act->pin )
{
@@ -702,6 +729,7 @@ __gnttab_map_grant_ref(
cache_flags = (shah->flags & (GTF_PAT | GTF_PWT | GTF_PCD) );
+ active_entry_release(act);
spin_unlock(&rgt->lock);
/* pg may be set, with a refcount included, from __get_paged_frame */
@@ -839,7 +867,7 @@ __gnttab_map_grant_ref(
spin_lock(&rgt->lock);
- act = &active_entry(rgt, op->ref);
+ act = active_entry_acquire(rgt, op->ref);
if ( op->flags & GNTMAP_device_map )
act->pin -= (op->flags & GNTMAP_readonly) ?
@@ -856,6 +884,9 @@ __gnttab_map_grant_ref(
if ( !act->pin )
gnttab_clear_flag(_GTF_reading, status);
+ act_release_out:
+ active_entry_release(act);
+
unlock_out:
spin_unlock(&rgt->lock);
op->status = rc;
@@ -950,7 +981,7 @@ __gnttab_unmap_common(
}
op->rd = rd;
- act = &active_entry(rgt, op->map->ref);
+ act = active_entry_acquire(rgt, op->map->ref);
if ( op->frame == 0 )
{
@@ -959,7 +990,7 @@ __gnttab_unmap_common(
else
{
if ( unlikely(op->frame != act->frame) )
- PIN_FAIL(unmap_out, GNTST_general_error,
+ PIN_FAIL(act_release_out, GNTST_general_error,
"Bad frame number doesn't match gntref. (%lx != %lx)\n",
op->frame, act->frame);
if ( op->flags & GNTMAP_device_map )
@@ -978,7 +1009,7 @@ __gnttab_unmap_common(
if ( (rc = replace_grant_host_mapping(op->host_addr,
op->frame, op->new_addr,
op->flags)) < 0 )
- goto unmap_out;
+ goto act_release_out;
ASSERT(act->pin & (GNTPIN_hstw_mask | GNTPIN_hstr_mask));
op->map->flags &= ~GNTMAP_host_map;
@@ -1000,7 +1031,7 @@ __gnttab_unmap_common(
if ( err )
{
rc = GNTST_general_error;
- goto unmap_out;
+ goto act_release_out;
}
}
@@ -1008,8 +1039,11 @@ __gnttab_unmap_common(
if ( !(op->flags & GNTMAP_readonly) )
gnttab_mark_dirty(rd, op->frame);
+ act_release_out:
+ active_entry_release(act);
unmap_out:
double_gt_unlock(lgt, rgt);
+
op->status = rc;
rcu_unlock_domain(rd);
}
@@ -1042,9 +1076,9 @@ __gnttab_unmap_common_complete(struct gnttab_unmap_common *op)
spin_lock(&rgt->lock);
if ( rgt->gt_version == 0 )
- goto unmap_out;
+ goto unlock_out;
- act = &active_entry(rgt, op->map->ref);
+ act = active_entry_acquire(rgt, op->map->ref);
sha = shared_entry_header(rgt, op->map->ref);
if ( rgt->gt_version == 1 )
@@ -1058,7 +1092,7 @@ __gnttab_unmap_common_complete(struct gnttab_unmap_common *op)
* Suggests that __gntab_unmap_common failed early and so
* nothing further to do
*/
- goto unmap_out;
+ goto act_release_out;
}
pg = mfn_to_page(op->frame);
@@ -1082,7 +1116,7 @@ __gnttab_unmap_common_complete(struct gnttab_unmap_common *op)
* Suggests that __gntab_unmap_common failed in
* replace_grant_host_mapping() so nothing further to do
*/
- goto unmap_out;
+ goto act_release_out;
}
if ( !is_iomem_page(op->frame) )
@@ -1103,8 +1137,11 @@ __gnttab_unmap_common_complete(struct gnttab_unmap_common *op)
if ( act->pin == 0 )
gnttab_clear_flag(_GTF_reading, status);
- unmap_out:
+ act_release_out:
+ active_entry_release(act);
+ unlock_out:
spin_unlock(&rgt->lock);
+
if ( put_handle )
{
op->map->flags = 0;
@@ -1296,7 +1333,7 @@ gnttab_grow_table(struct domain *d, unsigned int req_nr_frames)
/* d's grant table lock must be held by the caller */
struct grant_table *gt = d->grant_table;
- unsigned int i;
+ unsigned int i, j;
ASSERT(req_nr_frames <= max_grant_frames);
@@ -1311,6 +1348,8 @@ gnttab_grow_table(struct domain *d, unsigned int req_nr_frames)
if ( (gt->active[i] = alloc_xenheap_page()) == NULL )
goto active_alloc_failed;
clear_page(gt->active[i]);
+ for ( j = 0; j < ACGNT_PER_PAGE; j++ )
+ spin_lock_init(>->active[i][j].lock);
}
/* Shared */
@@ -1806,7 +1845,7 @@ __release_grant_for_copy(
spin_lock(&rgt->lock);
- act = &active_entry(rgt, gref);
+ act = active_entry_acquire(rgt, gref);
sha = shared_entry_header(rgt, gref);
r_frame = act->frame;
@@ -1845,6 +1884,7 @@ __release_grant_for_copy(
released_read = 1;
}
+ active_entry_release(act);
spin_unlock(&rgt->lock);
if ( td != rd )
@@ -1906,14 +1946,14 @@ __acquire_grant_for_copy(
spin_lock(&rgt->lock);
if ( rgt->gt_version == 0 )
- PIN_FAIL(unlock_out, GNTST_general_error,
+ PIN_FAIL(gt_unlock_out, GNTST_general_error,
"remote grant table not ready\n");
if ( unlikely(gref >= nr_grant_entries(rgt)) )
- PIN_FAIL(unlock_out, GNTST_bad_gntref,
+ PIN_FAIL(gt_unlock_out, GNTST_bad_gntref,
"Bad grant reference %ld\n", gref);
- act = &active_entry(rgt, gref);
+ act = active_entry_acquire(rgt, gref);
shah = shared_entry_header(rgt, gref);
if ( rgt->gt_version == 1 )
{
@@ -1972,6 +2012,13 @@ __acquire_grant_for_copy(
PIN_FAIL(unlock_out_clear, GNTST_general_error,
"transitive grant referenced bad domain %d\n",
trans_domid);
+
+ /*
+ * __acquire_grant_for_copy() could take the lock on the
+ * remote table (if rd == td), so we have to drop the lock
+ * here and reacquire
+ */
+ active_entry_release(act);
spin_unlock(&rgt->lock);
rc = __acquire_grant_for_copy(td, trans_gref, rd->domain_id,
@@ -1979,9 +2026,12 @@ __acquire_grant_for_copy(
&trans_page_off, &trans_length, 0);
spin_lock(&rgt->lock);
+ act = active_entry_acquire(rgt, gref);
+
if ( rc != GNTST_okay ) {
__fixup_status_for_copy_pin(act, status);
rcu_unlock_domain(td);
+ active_entry_release(act);
spin_unlock(&rgt->lock);
return rc;
}
@@ -1994,6 +2044,7 @@ __acquire_grant_for_copy(
{
__fixup_status_for_copy_pin(act, status);
rcu_unlock_domain(td);
+ active_entry_release(act);
spin_unlock(&rgt->lock);
put_page(*page);
return __acquire_grant_for_copy(rd, gref, ldom, readonly,
@@ -2062,6 +2113,7 @@ __acquire_grant_for_copy(
*length = act->length;
*frame = act->frame;
+ active_entry_release(act);
spin_unlock(&rgt->lock);
return rc;
@@ -2074,7 +2126,11 @@ __acquire_grant_for_copy(
gnttab_clear_flag(_GTF_reading, status);
unlock_out:
+ active_entry_release(act);
+
+ gt_unlock_out:
spin_unlock(&rgt->lock);
+
return rc;
}
@@ -2374,7 +2430,6 @@ gnttab_set_version(XEN_GUEST_HANDLE_PARAM(gnttab_set_version_t) uop)
gnttab_set_version_t op;
struct domain *d = current->domain;
struct grant_table *gt = d->grant_table;
- struct active_grant_entry *act;
grant_entry_v1_t reserved_entries[GNTTAB_NR_RESERVED_ENTRIES];
long res;
int i;
@@ -2399,8 +2454,7 @@ gnttab_set_version(XEN_GUEST_HANDLE_PARAM(gnttab_set_version_t) uop)
{
for ( i = GNTTAB_NR_RESERVED_ENTRIES; i < nr_grant_entries(gt); i++ )
{
- act = &active_entry(gt, i);
- if ( act->pin != 0 )
+ if ( read_atomic(&_active_entry(gt, i).pin) != 0 )
{
gdprintk(XENLOG_WARNING,
"tried to change grant table version from %d to %d, but some grant entries still in use\n",
@@ -2587,7 +2641,8 @@ __gnttab_swap_grant_ref(grant_ref_t ref_a, grant_ref_t ref_b)
{
struct domain *d = rcu_lock_current_domain();
struct grant_table *gt = d->grant_table;
- struct active_grant_entry *act;
+ struct active_grant_entry *act_a = NULL;
+ struct active_grant_entry *act_b = NULL;
s16 rc = GNTST_okay;
spin_lock(>->lock);
@@ -2598,12 +2653,12 @@ __gnttab_swap_grant_ref(grant_ref_t ref_a, grant_ref_t ref_b)
if ( unlikely(ref_b >= nr_grant_entries(d->grant_table)))
PIN_FAIL(out, GNTST_bad_gntref, "Bad ref-b (%d).\n", ref_b);
- act = &active_entry(gt, ref_a);
- if ( act->pin )
+ act_a = active_entry_acquire(gt, ref_a);
+ if ( act_a->pin )
PIN_FAIL(out, GNTST_eagain, "ref a %ld busy\n", (long)ref_a);
- act = &active_entry(gt, ref_b);
- if ( act->pin )
+ act_b = active_entry_acquire(gt, ref_b);
+ if ( act_b->pin )
PIN_FAIL(out, GNTST_eagain, "ref b %ld busy\n", (long)ref_b);
if ( gt->gt_version == 1 )
@@ -2630,6 +2685,10 @@ __gnttab_swap_grant_ref(grant_ref_t ref_a, grant_ref_t ref_b)
}
out:
+ if ( act_b != NULL )
+ active_entry_release(act_b);
+ if ( act_a != NULL )
+ active_entry_release(act_a);
spin_unlock(>->lock);
rcu_unlock_domain(d);
@@ -2939,7 +2998,7 @@ grant_table_create(
struct domain *d)
{
struct grant_table *t;
- int i;
+ unsigned int i, j;
if ( (t = xzalloc(struct grant_table)) == NULL )
goto no_mem_0;
@@ -2958,6 +3017,8 @@ grant_table_create(
if ( (t->active[i] = alloc_xenheap_page()) == NULL )
goto no_mem_2;
clear_page(t->active[i]);
+ for ( j = 0; j < ACGNT_PER_PAGE; j++ )
+ spin_lock_init(&t->active[i][j].lock);
}
/* Tracking of mapped foreign frames table */
@@ -3054,7 +3115,7 @@ gnttab_release_mappings(
rgt = rd->grant_table;
spin_lock(&rgt->lock);
- act = &active_entry(rgt, ref);
+ act = active_entry_acquire(rgt, ref);
sha = shared_entry_header(rgt, ref);
if (rgt->gt_version == 1)
status = &sha->flags;
@@ -3112,6 +3173,7 @@ gnttab_release_mappings(
if ( act->pin == 0 )
gnttab_clear_flag(_GTF_reading, status);
+ active_entry_release(act);
spin_unlock(&rgt->lock);
rcu_unlock_domain(rd);
@@ -3174,9 +3236,12 @@ static void gnttab_usage_print(struct domain *rd)
uint16_t status;
uint64_t frame;
- act = &active_entry(gt, ref);
+ act = active_entry_acquire(gt, ref);
if ( !act->pin )
+ {
+ active_entry_release(act);
continue;
+ }
sha = shared_entry_header(gt, ref);
@@ -3206,6 +3271,7 @@ static void gnttab_usage_print(struct domain *rd)
printk("[%3d] %5d 0x%06lx 0x%08x %5d 0x%06"PRIx64" 0x%02x\n",
ref, act->domid, act->frame, act->pin,
sha->domid, frame, status);
+ active_entry_release(act);
}
out:
--
1.7.10.4
^ permalink raw reply related [flat|nested] 22+ messages in thread* Re: [PATCHv11 1/4] gnttab: per-active entry locking
2015-06-02 16:26 ` [PATCHv11 1/4] gnttab: per-active entry locking David Vrabel
@ 2015-06-05 13:32 ` Jan Beulich
2015-06-05 13:41 ` David Vrabel
0 siblings, 1 reply; 22+ messages in thread
From: Jan Beulich @ 2015-06-05 13:32 UTC (permalink / raw)
To: David Vrabel
Cc: xen-devel, Malcolm Crossley, Keir Fraser, Ian Campbell,
Tim Deegan
>>> On 02.06.15 at 18:26, <david.vrabel@citrix.com> wrote:
> @@ -546,13 +567,19 @@ static void mapcount(
>
> *wrc = *rdc = 0;
>
> + /*
> + * Must have the remote domain's grant table lock while counting
> + * its active entries.
> + */
> + ASSERT(spin_is_locked(&rd->grant_table->lock));
> +
> for ( handle = 0; handle < lgt->maptrack_limit; handle++ )
The comment and the use of lgt->maptrack_limit contradict the
documentation you add in patch 2, and it indeed is unclear why
at this point the local domain's lock (its maptrack lock starting
with patch 2) doesn't also need to be held.
Jan
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCHv11 1/4] gnttab: per-active entry locking
2015-06-05 13:32 ` Jan Beulich
@ 2015-06-05 13:41 ` David Vrabel
2015-06-05 14:37 ` Jan Beulich
0 siblings, 1 reply; 22+ messages in thread
From: David Vrabel @ 2015-06-05 13:41 UTC (permalink / raw)
To: Jan Beulich
Cc: xen-devel, Malcolm Crossley, Keir Fraser, Ian Campbell,
Tim Deegan
On 05/06/15 14:32, Jan Beulich wrote:
>>>> On 02.06.15 at 18:26, <david.vrabel@citrix.com> wrote:
>> @@ -546,13 +567,19 @@ static void mapcount(
>>
>> *wrc = *rdc = 0;
>>
>> + /*
>> + * Must have the remote domain's grant table lock while counting
>> + * its active entries.
>> + */
>> + ASSERT(spin_is_locked(&rd->grant_table->lock));
>> +
>> for ( handle = 0; handle < lgt->maptrack_limit; handle++ )
>
> The comment and the use of lgt->maptrack_limit contradict the
> documentation you add in patch 2, and it indeed is unclear why
> at this point the local domain's lock (its maptrack lock starting
> with patch 2) doesn't also need to be held.
Maybe it's a a bit subtle, but holding the local (write) lock means no
new mappings can be created and thus lgt->maptrack_limit cannot change.
Should I add
ASSERT(spin_is_locked(&lgt->lock));
here as well?
David
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCHv11 1/4] gnttab: per-active entry locking
2015-06-05 13:41 ` David Vrabel
@ 2015-06-05 14:37 ` Jan Beulich
0 siblings, 0 replies; 22+ messages in thread
From: Jan Beulich @ 2015-06-05 14:37 UTC (permalink / raw)
To: David Vrabel
Cc: xen-devel, Malcolm Crossley, Keir Fraser, Ian Campbell,
Tim Deegan
>>> On 05.06.15 at 15:41, <david.vrabel@citrix.com> wrote:
> On 05/06/15 14:32, Jan Beulich wrote:
>>>>> On 02.06.15 at 18:26, <david.vrabel@citrix.com> wrote:
>>> @@ -546,13 +567,19 @@ static void mapcount(
>>>
>>> *wrc = *rdc = 0;
>>>
>>> + /*
>>> + * Must have the remote domain's grant table lock while counting
>>> + * its active entries.
>>> + */
>>> + ASSERT(spin_is_locked(&rd->grant_table->lock));
>>> +
>>> for ( handle = 0; handle < lgt->maptrack_limit; handle++ )
>>
>> The comment and the use of lgt->maptrack_limit contradict the
>> documentation you add in patch 2, and it indeed is unclear why
>> at this point the local domain's lock (its maptrack lock starting
>> with patch 2) doesn't also need to be held.
>
> Maybe it's a a bit subtle, but holding the local (write) lock means no
> new mappings can be created and thus lgt->maptrack_limit cannot change.
>
> Should I add
>
> ASSERT(spin_is_locked(&lgt->lock));
>
> here as well?
Yes, that's what I was basically asking about. And the subtlety
should probably be spelled out in a comment.
Jan
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCHv11 2/4] gnttab: introduce maptrack lock
2015-06-02 16:26 [PATCHv11 0/4] gnttab: Improve scaleability David Vrabel
2015-06-02 16:26 ` [PATCHv11 1/4] gnttab: per-active entry locking David Vrabel
@ 2015-06-02 16:26 ` David Vrabel
2015-06-05 13:35 ` Jan Beulich
2015-06-02 16:26 ` [PATCHv11 3/4] gnttab: make the grant table lock a read-write lock David Vrabel
2015-06-02 16:26 ` [PATCHv11 4/4] gnttab: use per-VCPU maptrack free lists David Vrabel
3 siblings, 1 reply; 22+ messages in thread
From: David Vrabel @ 2015-06-02 16:26 UTC (permalink / raw)
To: xen-devel
Cc: Keir Fraser, Ian Campbell, Tim Deegan, David Vrabel, Jan Beulich,
Malcolm Crossley
Split grant table lock into two separate locks. One to protect
maptrack state (maptrack_lock) and one for everything else (lock).
Based on a patch originally by Matt Wilson <msw@amazon.com>.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
docs/misc/grant-tables.txt | 9 +++++++++
xen/common/grant_table.c | 9 +++++----
xen/include/xen/grant_table.h | 2 ++
3 files changed, 16 insertions(+), 4 deletions(-)
diff --git a/docs/misc/grant-tables.txt b/docs/misc/grant-tables.txt
index 2a3f591..83b3454 100644
--- a/docs/misc/grant-tables.txt
+++ b/docs/misc/grant-tables.txt
@@ -87,6 +87,7 @@ is complete.
inconsistent grant table state such as current
version, partially initialized active table pages,
etc.
+ grant_table->maptrack_lock : spinlock used to protect the maptrack state
active_grant_entry->lock : spinlock used to serialize modifications to
active entries
@@ -94,6 +95,14 @@ is complete.
that access members of struct grant_table must acquire the lock
around critical sections.
+ The maptrack state is protected by its own spinlock. Any access (read
+ or write) of struct grant_table members that have a "maptrack_"
+ prefix must be made while holding the maptrack lock. The maptrack
+ state can be rapidly modified under some workloads, and the critical
+ sections are very small, thus we use a spinlock to protect them.
+
+ The maptrack lock may be locked while holding the grant table lock.
+
Active entries are obtained by calling active_entry_acquire(gt, ref).
This function returns a pointer to the active entry after locking its
spinlock. The caller must hold the grant table lock for the gt in
diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index d91661b..e0ac16f 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -288,10 +288,10 @@ static inline void
put_maptrack_handle(
struct grant_table *t, int handle)
{
- spin_lock(&t->lock);
+ spin_lock(&t->maptrack_lock);
maptrack_entry(t, handle).ref = t->maptrack_head;
t->maptrack_head = handle;
- spin_unlock(&t->lock);
+ spin_unlock(&t->maptrack_lock);
}
static inline int
@@ -303,7 +303,7 @@ get_maptrack_handle(
struct grant_mapping *new_mt;
unsigned int new_mt_limit, nr_frames;
- spin_lock(&lgt->lock);
+ spin_lock(&lgt->maptrack_lock);
while ( unlikely((handle = __get_maptrack_handle(lgt)) == -1) )
{
@@ -332,7 +332,7 @@ get_maptrack_handle(
nr_frames + 1);
}
- spin_unlock(&lgt->lock);
+ spin_unlock(&lgt->maptrack_lock);
return handle;
}
@@ -3005,6 +3005,7 @@ grant_table_create(
/* Simple stuff. */
spin_lock_init(&t->lock);
+ spin_lock_init(&t->maptrack_lock);
t->nr_grant_frames = INITIAL_NR_GRANT_FRAMES;
/* Active grant table. */
diff --git a/xen/include/xen/grant_table.h b/xen/include/xen/grant_table.h
index 32f5786..0b35a5e 100644
--- a/xen/include/xen/grant_table.h
+++ b/xen/include/xen/grant_table.h
@@ -82,6 +82,8 @@ struct grant_table {
struct grant_mapping **maptrack;
unsigned int maptrack_head;
unsigned int maptrack_limit;
+ /* Lock protecting the maptrack page list, head, and limit */
+ spinlock_t maptrack_lock;
/* Lock protecting updates to active and shared grant tables. */
spinlock_t lock;
/* The defined versions are 1 and 2. Set to 0 if we don't know
--
1.7.10.4
^ permalink raw reply related [flat|nested] 22+ messages in thread* Re: [PATCHv11 2/4] gnttab: introduce maptrack lock
2015-06-02 16:26 ` [PATCHv11 2/4] gnttab: introduce maptrack lock David Vrabel
@ 2015-06-05 13:35 ` Jan Beulich
2015-06-05 13:44 ` David Vrabel
0 siblings, 1 reply; 22+ messages in thread
From: Jan Beulich @ 2015-06-05 13:35 UTC (permalink / raw)
To: David Vrabel
Cc: xen-devel, Malcolm Crossley, Keir Fraser, Ian Campbell,
Tim Deegan
>>> On 02.06.15 at 18:26, <david.vrabel@citrix.com> wrote:
> --- a/xen/common/grant_table.c
> +++ b/xen/common/grant_table.c
> @@ -288,10 +288,10 @@ static inline void put_maptrack_handle(
> struct grant_table *t, int handle)
> {
> - spin_lock(&t->lock);
> + spin_lock(&t->maptrack_lock);
> maptrack_entry(t, handle).ref = t->maptrack_head;
> t->maptrack_head = handle;
> - spin_unlock(&t->lock);
> + spin_unlock(&t->maptrack_lock);
Starting with this change it becomes unclear what protects the
maptrack entries - the grant table lock or the maptrack one?
Jan
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCHv11 2/4] gnttab: introduce maptrack lock
2015-06-05 13:35 ` Jan Beulich
@ 2015-06-05 13:44 ` David Vrabel
2015-06-05 14:38 ` Jan Beulich
0 siblings, 1 reply; 22+ messages in thread
From: David Vrabel @ 2015-06-05 13:44 UTC (permalink / raw)
To: Jan Beulich
Cc: xen-devel, Malcolm Crossley, Keir Fraser, Ian Campbell,
Tim Deegan
On 05/06/15 14:35, Jan Beulich wrote:
>>>> On 02.06.15 at 18:26, <david.vrabel@citrix.com> wrote:
>> --- a/xen/common/grant_table.c
>> +++ b/xen/common/grant_table.c
>> @@ -288,10 +288,10 @@ static inline void put_maptrack_handle(
>> struct grant_table *t, int handle)
>> {
>> - spin_lock(&t->lock);
>> + spin_lock(&t->maptrack_lock);
>> maptrack_entry(t, handle).ref = t->maptrack_head;
>> t->maptrack_head = handle;
>> - spin_unlock(&t->lock);
>> + spin_unlock(&t->maptrack_lock);
>
> Starting with this change it becomes unclear what protects the
> maptrack entries - the grant table lock or the maptrack one?
The grant table lock. The docs aren't very good here -- the maptrack
lock is only for the maptrack free list.
David
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCHv11 2/4] gnttab: introduce maptrack lock
2015-06-05 13:44 ` David Vrabel
@ 2015-06-05 14:38 ` Jan Beulich
2015-06-05 14:41 ` David Vrabel
0 siblings, 1 reply; 22+ messages in thread
From: Jan Beulich @ 2015-06-05 14:38 UTC (permalink / raw)
To: David Vrabel
Cc: xen-devel, Malcolm Crossley, Keir Fraser, Ian Campbell,
Tim Deegan
>>> On 05.06.15 at 15:44, <david.vrabel@citrix.com> wrote:
> On 05/06/15 14:35, Jan Beulich wrote:
>>>>> On 02.06.15 at 18:26, <david.vrabel@citrix.com> wrote:
>>> --- a/xen/common/grant_table.c
>>> +++ b/xen/common/grant_table.c
>>> @@ -288,10 +288,10 @@ static inline void put_maptrack_handle(
>>> struct grant_table *t, int handle)
>>> {
>>> - spin_lock(&t->lock);
>>> + spin_lock(&t->maptrack_lock);
>>> maptrack_entry(t, handle).ref = t->maptrack_head;
>>> t->maptrack_head = handle;
>>> - spin_unlock(&t->lock);
>>> + spin_unlock(&t->maptrack_lock);
>>
>> Starting with this change it becomes unclear what protects the
>> maptrack entries - the grant table lock or the maptrack one?
>
> The grant table lock. The docs aren't very good here -- the maptrack
> lock is only for the maptrack free list.
I don't think the grant table lock serves that purpose, at least not
anymore after being converted to RW. See my reply on patch 3.
Jan
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCHv11 2/4] gnttab: introduce maptrack lock
2015-06-05 14:38 ` Jan Beulich
@ 2015-06-05 14:41 ` David Vrabel
0 siblings, 0 replies; 22+ messages in thread
From: David Vrabel @ 2015-06-05 14:41 UTC (permalink / raw)
To: Jan Beulich
Cc: xen-devel, Malcolm Crossley, Keir Fraser, Ian Campbell,
Tim Deegan
On 05/06/15 15:38, Jan Beulich wrote:
>>>> On 05.06.15 at 15:44, <david.vrabel@citrix.com> wrote:
>> On 05/06/15 14:35, Jan Beulich wrote:
>>>>>> On 02.06.15 at 18:26, <david.vrabel@citrix.com> wrote:
>>>> --- a/xen/common/grant_table.c
>>>> +++ b/xen/common/grant_table.c
>>>> @@ -288,10 +288,10 @@ static inline void put_maptrack_handle(
>>>> struct grant_table *t, int handle)
>>>> {
>>>> - spin_lock(&t->lock);
>>>> + spin_lock(&t->maptrack_lock);
>>>> maptrack_entry(t, handle).ref = t->maptrack_head;
>>>> t->maptrack_head = handle;
>>>> - spin_unlock(&t->lock);
>>>> + spin_unlock(&t->maptrack_lock);
>>>
>>> Starting with this change it becomes unclear what protects the
>>> maptrack entries - the grant table lock or the maptrack one?
>>
>> The grant table lock. The docs aren't very good here -- the maptrack
>> lock is only for the maptrack free list.
>
> I don't think the grant table lock serves that purpose, at least not
> anymore after being converted to RW. See my reply on patch 3.
Sorry for not being clear, after this patch #2 it is the grant table
lock that serves this purpose. After #3 it's as you said there.
David
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCHv11 3/4] gnttab: make the grant table lock a read-write lock
2015-06-02 16:26 [PATCHv11 0/4] gnttab: Improve scaleability David Vrabel
2015-06-02 16:26 ` [PATCHv11 1/4] gnttab: per-active entry locking David Vrabel
2015-06-02 16:26 ` [PATCHv11 2/4] gnttab: introduce maptrack lock David Vrabel
@ 2015-06-02 16:26 ` David Vrabel
2015-06-05 14:34 ` Jan Beulich
2015-06-02 16:26 ` [PATCHv11 4/4] gnttab: use per-VCPU maptrack free lists David Vrabel
3 siblings, 1 reply; 22+ messages in thread
From: David Vrabel @ 2015-06-02 16:26 UTC (permalink / raw)
To: xen-devel
Cc: Keir Fraser, Ian Campbell, Tim Deegan, David Vrabel, Jan Beulich,
Malcolm Crossley
In combination with the per-active entry locks, the grant table lock
can be made a read-write lock since the majority of cases only the
read lock is required. The grant table read lock protects against
changes to the table version or size (which are done with the write
lock held).
The write lock is also required when two active entries must be
acquired.
The double lock is still required when updating IOMMU page tables.
With the lock contention being only on the maptrack lock (unless IOMMU
updates are required), performance and scalability is improved.
Based on a patch originally by Matt Wilson <msw@amazon.com>.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
v11:
- Call gnttab_need_iommu_mapping() once for each lock/unlock.
- Take active entry lock in gnttab_transfer().
- Use read_atomic() when checking maptrack flags for validity (note:
all other reads of map->flags are either after it is validated as
stable or protected by the active entry lock and thus don't need
read_atomic()).
- Add comment to double_gt_lock().
v10:
- In gnttab_map_grant_ref(), keep double lock around maptrack update
if gnttab_need_iommu_mapping(). Use a wmb(), otherwise.
---
docs/misc/grant-tables.txt | 30 ++++----
xen/arch/arm/mm.c | 4 +-
xen/arch/x86/mm.c | 4 +-
xen/common/grant_table.c | 171 ++++++++++++++++++++++++-----------------
xen/include/xen/grant_table.h | 9 ++-
5 files changed, 126 insertions(+), 92 deletions(-)
diff --git a/docs/misc/grant-tables.txt b/docs/misc/grant-tables.txt
index 83b3454..9d9d01f 100644
--- a/docs/misc/grant-tables.txt
+++ b/docs/misc/grant-tables.txt
@@ -83,7 +83,7 @@ is complete.
~~~~~~~
Xen uses several locks to serialize access to the internal grant table state.
- grant_table->lock : lock used to prevent readers from accessing
+ grant_table->lock : rwlock used to prevent readers from accessing
inconsistent grant table state such as current
version, partially initialized active table pages,
etc.
@@ -91,9 +91,13 @@ is complete.
active_grant_entry->lock : spinlock used to serialize modifications to
active entries
- The primary lock for the grant table is a spinlock. All functions
- that access members of struct grant_table must acquire the lock
- around critical sections.
+ The primary lock for the grant table is a read/write spinlock. All
+ functions that access members of struct grant_table must acquire a
+ read lock around critical sections. Any modification to the members
+ of struct grant_table (e.g., nr_status_frames, nr_grant_frames,
+ active frames, etc.) must only be made if the write lock is
+ held. These elements are read-mostly, and read critical sections can
+ be large, which makes a rwlock a good choice.
The maptrack state is protected by its own spinlock. Any access (read
or write) of struct grant_table members that have a "maptrack_"
@@ -105,25 +109,25 @@ is complete.
Active entries are obtained by calling active_entry_acquire(gt, ref).
This function returns a pointer to the active entry after locking its
- spinlock. The caller must hold the grant table lock for the gt in
- question before calling active_entry_acquire(). This is because the
- grant table can be dynamically extended via gnttab_grow_table() while
- a domain is running and must be fully initialized. Once all access to
- the active entry is complete, release the lock by calling
+ spinlock. The caller must hold the grant table read lock before
+ calling active_entry_acquire(). This is because the grant table can
+ be dynamically extended via gnttab_grow_table() while a domain is
+ running and must be fully initialized. Once all access to the active
+ entry is complete, release the lock by calling
active_entry_release(act).
Summary of rules for locking:
active_entry_acquire() and active_entry_release() can only be
- called when holding the relevant grant table's lock. I.e.:
- spin_lock(>->lock);
+ called when holding the relevant grant table's read lock. I.e.:
+ read_lock(>->lock);
act = active_entry_acquire(gt, ref);
...
active_entry_release(act);
- spin_unlock(>->lock);
+ read_unlock(>->lock);
Active entries cannot be acquired while holding the maptrack lock.
Multiple active entries can be acquired while holding the grant table
- lock.
+ _write_ lock.
********************************************************************************
diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index a91ea77..b305041 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -1048,7 +1048,7 @@ int xenmem_add_to_physmap_one(
switch ( space )
{
case XENMAPSPACE_grant_table:
- spin_lock(&d->grant_table->lock);
+ write_lock(&d->grant_table->lock);
if ( d->grant_table->gt_version == 0 )
d->grant_table->gt_version = 1;
@@ -1078,7 +1078,7 @@ int xenmem_add_to_physmap_one(
t = p2m_ram_rw;
- spin_unlock(&d->grant_table->lock);
+ write_unlock(&d->grant_table->lock);
break;
case XENMAPSPACE_shared_info:
if ( idx != 0 )
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 7a7a854..83639e7 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4589,7 +4589,7 @@ int xenmem_add_to_physmap_one(
mfn = virt_to_mfn(d->shared_info);
break;
case XENMAPSPACE_grant_table:
- spin_lock(&d->grant_table->lock);
+ write_lock(&d->grant_table->lock);
if ( d->grant_table->gt_version == 0 )
d->grant_table->gt_version = 1;
@@ -4611,7 +4611,7 @@ int xenmem_add_to_physmap_one(
mfn = virt_to_mfn(d->grant_table->shared_raw[idx]);
}
- spin_unlock(&d->grant_table->lock);
+ write_unlock(&d->grant_table->lock);
break;
case XENMAPSPACE_gmfn_range:
case XENMAPSPACE_gmfn:
diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index e0ac16f..b16e696 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -196,7 +196,7 @@ active_entry_acquire(struct grant_table *t, grant_ref_t e)
{
struct active_grant_entry *act;
- ASSERT(spin_is_locked(&t->lock));
+ ASSERT(rw_is_locked(&t->lock));
act = &_active_entry(t, e);
spin_lock(&act->lock);
@@ -252,25 +252,29 @@ static int __get_paged_frame(unsigned long gfn, unsigned long *frame, struct pag
static inline void
double_gt_lock(struct grant_table *lgt, struct grant_table *rgt)
{
+ /*
+ * See mapcount() for why the write lock is also required for the
+ * remote domain.
+ */
if ( lgt < rgt )
{
- spin_lock(&lgt->lock);
- spin_lock(&rgt->lock);
+ write_lock(&lgt->lock);
+ write_lock(&rgt->lock);
}
else
{
if ( lgt != rgt )
- spin_lock(&rgt->lock);
- spin_lock(&lgt->lock);
+ write_lock(&rgt->lock);
+ write_lock(&lgt->lock);
}
}
static inline void
double_gt_unlock(struct grant_table *lgt, struct grant_table *rgt)
{
- spin_unlock(&lgt->lock);
+ write_unlock(&lgt->lock);
if ( lgt != rgt )
- spin_unlock(&rgt->lock);
+ write_unlock(&rgt->lock);
}
static inline int
@@ -528,7 +532,7 @@ static int grant_map_exists(const struct domain *ld,
{
unsigned int ref, max_iter;
- ASSERT(spin_is_locked(&rgt->lock));
+ ASSERT(rw_is_locked(&rgt->lock));
max_iter = min(*ref_count + (1 << GNTTABOP_CONTINUATION_ARG_SHIFT),
nr_grant_entries(rgt));
@@ -568,10 +572,10 @@ static void mapcount(
*wrc = *rdc = 0;
/*
- * Must have the remote domain's grant table lock while counting
- * its active entries.
+ * Must have the remote domain's grant table write lock while
+ * counting its active entries.
*/
- ASSERT(spin_is_locked(&rd->grant_table->lock));
+ ASSERT(rw_is_write_locked(&rd->grant_table->lock));
for ( handle = 0; handle < lgt->maptrack_limit; handle++ )
{
@@ -611,6 +615,7 @@ __gnttab_map_grant_ref(
grant_entry_v2_t *sha2;
grant_entry_header_t *shah;
uint16_t *status;
+ bool_t need_iommu;
led = current;
ld = led->domain;
@@ -656,7 +661,7 @@ __gnttab_map_grant_ref(
}
rgt = rd->grant_table;
- spin_lock(&rgt->lock);
+ read_lock(&rgt->lock);
if ( rgt->gt_version == 0 )
PIN_FAIL(unlock_out, GNTST_general_error,
@@ -730,7 +735,7 @@ __gnttab_map_grant_ref(
cache_flags = (shah->flags & (GTF_PAT | GTF_PWT | GTF_PCD) );
active_entry_release(act);
- spin_unlock(&rgt->lock);
+ read_unlock(&rgt->lock);
/* pg may be set, with a refcount included, from __get_paged_frame */
if ( !pg )
@@ -806,12 +811,14 @@ __gnttab_map_grant_ref(
goto undo_out;
}
- double_gt_lock(lgt, rgt);
-
- if ( gnttab_need_iommu_mapping(ld) )
+ need_iommu = gnttab_need_iommu_mapping(ld);
+ if ( need_iommu )
{
unsigned int wrc, rdc;
int err = 0;
+
+ double_gt_lock(lgt, rgt);
+
/* We're not translated, so we know that gmfns and mfns are
the same things, so the IOMMU entry is always 1-to-1. */
mapcount(lgt, rd, frame, &wrc, &rdc);
@@ -837,12 +844,22 @@ __gnttab_map_grant_ref(
TRACE_1D(TRC_MEM_PAGE_GRANT_MAP, op->dom);
+ /*
+ * All maptrack entry users check mt->flags first before using the
+ * other fields so just ensure the flags field is stored last.
+ *
+ * However, if gnttab_need_iommu_mapping() then this would race
+ * with a concurrent mapcount() call (on an unmap, for example)
+ * and a lock is required.
+ */
mt = &maptrack_entry(lgt, handle);
mt->domid = op->dom;
mt->ref = op->ref;
- mt->flags = op->flags;
+ wmb();
+ write_atomic(&mt->flags, op->flags);
- double_gt_unlock(lgt, rgt);
+ if ( need_iommu )
+ double_gt_unlock(lgt, rgt);
op->dev_bus_addr = (u64)frame << PAGE_SHIFT;
op->handle = handle;
@@ -865,7 +882,7 @@ __gnttab_map_grant_ref(
put_page(pg);
}
- spin_lock(&rgt->lock);
+ read_lock(&rgt->lock);
act = active_entry_acquire(rgt, op->ref);
@@ -888,7 +905,7 @@ __gnttab_map_grant_ref(
active_entry_release(act);
unlock_out:
- spin_unlock(&rgt->lock);
+ read_unlock(&rgt->lock);
op->status = rc;
put_maptrack_handle(lgt, handle);
rcu_unlock_domain(rd);
@@ -938,18 +955,19 @@ __gnttab_unmap_common(
}
op->map = &maptrack_entry(lgt, op->handle);
- spin_lock(&lgt->lock);
- if ( unlikely(!op->map->flags) )
+ read_lock(&lgt->lock);
+
+ if ( unlikely(!read_atomic(&op->map->flags)) )
{
- spin_unlock(&lgt->lock);
+ read_unlock(&lgt->lock);
gdprintk(XENLOG_INFO, "Zero flags for handle (%d).\n", op->handle);
op->status = GNTST_bad_handle;
return;
}
dom = op->map->domid;
- spin_unlock(&lgt->lock);
+ read_unlock(&lgt->lock);
if ( unlikely((rd = rcu_lock_domain_by_id(dom)) == NULL) )
{
@@ -970,9 +988,10 @@ __gnttab_unmap_common(
TRACE_1D(TRC_MEM_PAGE_GRANT_UNMAP, dom);
rgt = rd->grant_table;
- double_gt_lock(lgt, rgt);
- op->flags = op->map->flags;
+ read_lock(&rgt->lock);
+
+ op->flags = read_atomic(&op->map->flags);
if ( unlikely(!op->flags) || unlikely(op->map->domid != dom) )
{
gdprintk(XENLOG_WARNING, "Unstable handle %u\n", op->handle);
@@ -1019,31 +1038,34 @@ __gnttab_unmap_common(
act->pin -= GNTPIN_hstw_inc;
}
- if ( gnttab_need_iommu_mapping(ld) )
+ act_release_out:
+ active_entry_release(act);
+ unmap_out:
+ read_unlock(&rgt->lock);
+
+ if ( rc == GNTST_okay && gnttab_need_iommu_mapping(ld) )
{
unsigned int wrc, rdc;
int err = 0;
+
+ double_gt_lock(lgt, rgt);
+
mapcount(lgt, rd, op->frame, &wrc, &rdc);
if ( (wrc + rdc) == 0 )
err = iommu_unmap_page(ld, op->frame);
else if ( wrc == 0 )
err = iommu_map_page(ld, op->frame, op->frame, IOMMUF_readable);
+
+ double_gt_unlock(lgt, rgt);
+
if ( err )
- {
rc = GNTST_general_error;
- goto act_release_out;
- }
}
/* If just unmapped a writable mapping, mark as dirtied */
- if ( !(op->flags & GNTMAP_readonly) )
+ if ( rc == GNTST_okay && !(op->flags & GNTMAP_readonly) )
gnttab_mark_dirty(rd, op->frame);
- act_release_out:
- active_entry_release(act);
- unmap_out:
- double_gt_unlock(lgt, rgt);
-
op->status = rc;
rcu_unlock_domain(rd);
}
@@ -1073,8 +1095,8 @@ __gnttab_unmap_common_complete(struct gnttab_unmap_common *op)
rcu_lock_domain(rd);
rgt = rd->grant_table;
- spin_lock(&rgt->lock);
+ read_lock(&rgt->lock);
if ( rgt->gt_version == 0 )
goto unlock_out;
@@ -1140,7 +1162,7 @@ __gnttab_unmap_common_complete(struct gnttab_unmap_common *op)
act_release_out:
active_entry_release(act);
unlock_out:
- spin_unlock(&rgt->lock);
+ read_unlock(&rgt->lock);
if ( put_handle )
{
@@ -1327,11 +1349,13 @@ gnttab_unpopulate_status_frames(struct domain *d, struct grant_table *gt)
gt->nr_status_frames = 0;
}
+/*
+ * Grow the grant table. The caller must hold the grant table's
+ * write lock before calling this function.
+ */
int
gnttab_grow_table(struct domain *d, unsigned int req_nr_frames)
{
- /* d's grant table lock must be held by the caller */
-
struct grant_table *gt = d->grant_table;
unsigned int i, j;
@@ -1437,7 +1461,7 @@ gnttab_setup_table(
}
gt = d->grant_table;
- spin_lock(>->lock);
+ write_lock(>->lock);
if ( gt->gt_version == 0 )
gt->gt_version = 1;
@@ -1465,7 +1489,7 @@ gnttab_setup_table(
}
out3:
- spin_unlock(>->lock);
+ write_unlock(>->lock);
out2:
rcu_unlock_domain(d);
out1:
@@ -1507,13 +1531,13 @@ gnttab_query_size(
goto query_out_unlock;
}
- spin_lock(&d->grant_table->lock);
+ read_lock(&d->grant_table->lock);
op.nr_frames = nr_grant_frames(d->grant_table);
op.max_nr_frames = max_grant_frames;
op.status = GNTST_okay;
- spin_unlock(&d->grant_table->lock);
+ read_unlock(&d->grant_table->lock);
query_out_unlock:
@@ -1539,7 +1563,7 @@ gnttab_prepare_for_transfer(
union grant_combo scombo, prev_scombo, new_scombo;
int retries = 0;
- spin_lock(&rgt->lock);
+ read_lock(&rgt->lock);
if ( rgt->gt_version == 0 )
{
@@ -1590,11 +1614,11 @@ gnttab_prepare_for_transfer(
scombo = prev_scombo;
}
- spin_unlock(&rgt->lock);
+ read_unlock(&rgt->lock);
return 1;
fail:
- spin_unlock(&rgt->lock);
+ read_unlock(&rgt->lock);
return 0;
}
@@ -1609,6 +1633,7 @@ gnttab_transfer(
struct gnttab_transfer gop;
unsigned long mfn;
unsigned int max_bitsize;
+ struct active_grant_entry *act;
for ( i = 0; i < count; i++ )
{
@@ -1787,7 +1812,8 @@ gnttab_transfer(
TRACE_1D(TRC_MEM_PAGE_GRANT_TRANSFER, e->domain_id);
/* Tell the guest about its new page frame. */
- spin_lock(&e->grant_table->lock);
+ read_lock(&e->grant_table->lock);
+ act = active_entry_acquire(e->grant_table, gop.ref);
if ( e->grant_table->gt_version == 1 )
{
@@ -1805,7 +1831,8 @@ gnttab_transfer(
shared_entry_header(e->grant_table, gop.ref)->flags |=
GTF_transfer_completed;
- spin_unlock(&e->grant_table->lock);
+ active_entry_release(act);
+ read_unlock(&e->grant_table->lock);
rcu_unlock_domain(e);
@@ -1843,7 +1870,7 @@ __release_grant_for_copy(
released_read = 0;
released_write = 0;
- spin_lock(&rgt->lock);
+ read_lock(&rgt->lock);
act = active_entry_acquire(rgt, gref);
sha = shared_entry_header(rgt, gref);
@@ -1885,7 +1912,7 @@ __release_grant_for_copy(
}
active_entry_release(act);
- spin_unlock(&rgt->lock);
+ read_unlock(&rgt->lock);
if ( td != rd )
{
@@ -1943,7 +1970,7 @@ __acquire_grant_for_copy(
*page = NULL;
- spin_lock(&rgt->lock);
+ read_lock(&rgt->lock);
if ( rgt->gt_version == 0 )
PIN_FAIL(gt_unlock_out, GNTST_general_error,
@@ -2019,20 +2046,20 @@ __acquire_grant_for_copy(
* here and reacquire
*/
active_entry_release(act);
- spin_unlock(&rgt->lock);
+ read_unlock(&rgt->lock);
rc = __acquire_grant_for_copy(td, trans_gref, rd->domain_id,
readonly, &grant_frame, page,
&trans_page_off, &trans_length, 0);
- spin_lock(&rgt->lock);
+ read_lock(&rgt->lock);
act = active_entry_acquire(rgt, gref);
if ( rc != GNTST_okay ) {
__fixup_status_for_copy_pin(act, status);
rcu_unlock_domain(td);
active_entry_release(act);
- spin_unlock(&rgt->lock);
+ read_unlock(&rgt->lock);
return rc;
}
@@ -2045,7 +2072,7 @@ __acquire_grant_for_copy(
__fixup_status_for_copy_pin(act, status);
rcu_unlock_domain(td);
active_entry_release(act);
- spin_unlock(&rgt->lock);
+ read_unlock(&rgt->lock);
put_page(*page);
return __acquire_grant_for_copy(rd, gref, ldom, readonly,
frame, page, page_off, length,
@@ -2114,7 +2141,7 @@ __acquire_grant_for_copy(
*frame = act->frame;
active_entry_release(act);
- spin_unlock(&rgt->lock);
+ read_unlock(&rgt->lock);
return rc;
unlock_out_clear:
@@ -2129,7 +2156,7 @@ __acquire_grant_for_copy(
active_entry_release(act);
gt_unlock_out:
- spin_unlock(&rgt->lock);
+ read_unlock(&rgt->lock);
return rc;
}
@@ -2445,7 +2472,7 @@ gnttab_set_version(XEN_GUEST_HANDLE_PARAM(gnttab_set_version_t) uop)
if ( gt->gt_version == op.version )
goto out;
- spin_lock(>->lock);
+ write_lock(>->lock);
/* Make sure that the grant table isn't currently in use when we
change the version number, except for the first 8 entries which
are allowed to be in use (xenstore/xenconsole keeps them mapped).
@@ -2530,7 +2557,7 @@ gnttab_set_version(XEN_GUEST_HANDLE_PARAM(gnttab_set_version_t) uop)
gt->gt_version = op.version;
out_unlock:
- spin_unlock(>->lock);
+ write_unlock(>->lock);
out:
op.version = gt->gt_version;
@@ -2586,7 +2613,7 @@ gnttab_get_status_frames(XEN_GUEST_HANDLE_PARAM(gnttab_get_status_frames_t) uop,
op.status = GNTST_okay;
- spin_lock(>->lock);
+ read_lock(>->lock);
for ( i = 0; i < op.nr_frames; i++ )
{
@@ -2595,7 +2622,7 @@ gnttab_get_status_frames(XEN_GUEST_HANDLE_PARAM(gnttab_get_status_frames_t) uop,
op.status = GNTST_bad_virt_addr;
}
- spin_unlock(>->lock);
+ read_unlock(>->lock);
out2:
rcu_unlock_domain(d);
out1:
@@ -2645,7 +2672,7 @@ __gnttab_swap_grant_ref(grant_ref_t ref_a, grant_ref_t ref_b)
struct active_grant_entry *act_b = NULL;
s16 rc = GNTST_okay;
- spin_lock(>->lock);
+ write_lock(>->lock);
/* Bounds check on the grant refs */
if ( unlikely(ref_a >= nr_grant_entries(d->grant_table)))
@@ -2689,7 +2716,7 @@ out:
active_entry_release(act_b);
if ( act_a != NULL )
active_entry_release(act_a);
- spin_unlock(>->lock);
+ write_unlock(>->lock);
rcu_unlock_domain(d);
@@ -2760,12 +2787,12 @@ static int __gnttab_cache_flush(gnttab_cache_flush_t *cflush,
if ( d != owner )
{
- spin_lock(&owner->grant_table->lock);
+ read_lock(&owner->grant_table->lock);
ret = grant_map_exists(d, owner->grant_table, mfn, ref_count);
if ( ret != 0 )
{
- spin_unlock(&owner->grant_table->lock);
+ read_unlock(&owner->grant_table->lock);
rcu_unlock_domain(d);
put_page(page);
return ret;
@@ -2785,7 +2812,7 @@ static int __gnttab_cache_flush(gnttab_cache_flush_t *cflush,
ret = 0;
if ( d != owner )
- spin_unlock(&owner->grant_table->lock);
+ read_unlock(&owner->grant_table->lock);
unmap_domain_page(v);
put_page(page);
@@ -3004,7 +3031,7 @@ grant_table_create(
goto no_mem_0;
/* Simple stuff. */
- spin_lock_init(&t->lock);
+ rwlock_init(&t->lock);
spin_lock_init(&t->maptrack_lock);
t->nr_grant_frames = INITIAL_NR_GRANT_FRAMES;
@@ -3114,7 +3141,7 @@ gnttab_release_mappings(
}
rgt = rd->grant_table;
- spin_lock(&rgt->lock);
+ read_lock(&rgt->lock);
act = active_entry_acquire(rgt, ref);
sha = shared_entry_header(rgt, ref);
@@ -3175,7 +3202,7 @@ gnttab_release_mappings(
gnttab_clear_flag(_GTF_reading, status);
active_entry_release(act);
- spin_unlock(&rgt->lock);
+ read_unlock(&rgt->lock);
rcu_unlock_domain(rd);
@@ -3223,7 +3250,7 @@ static void gnttab_usage_print(struct domain *rd)
printk(" -------- active -------- -------- shared --------\n");
printk("[ref] localdom mfn pin localdom gmfn flags\n");
- spin_lock(>->lock);
+ read_lock(>->lock);
if ( gt->gt_version == 0 )
goto out;
@@ -3276,7 +3303,7 @@ static void gnttab_usage_print(struct domain *rd)
}
out:
- spin_unlock(>->lock);
+ read_unlock(>->lock);
if ( first )
printk("grant-table for remote domain:%5d ... "
diff --git a/xen/include/xen/grant_table.h b/xen/include/xen/grant_table.h
index 0b35a5e..f22ebd0 100644
--- a/xen/include/xen/grant_table.h
+++ b/xen/include/xen/grant_table.h
@@ -64,6 +64,11 @@ struct grant_mapping {
/* Per-domain grant information. */
struct grant_table {
+ /*
+ * Lock protecting updates to grant table state (version, active
+ * entry list, etc.)
+ */
+ rwlock_t lock;
/* Table size. Number of frames shared with guest */
unsigned int nr_grant_frames;
/* Shared grant table (see include/public/grant_table.h). */
@@ -84,8 +89,6 @@ struct grant_table {
unsigned int maptrack_limit;
/* Lock protecting the maptrack page list, head, and limit */
spinlock_t maptrack_lock;
- /* Lock protecting updates to active and shared grant tables. */
- spinlock_t lock;
/* The defined versions are 1 and 2. Set to 0 if we don't know
what version to use yet. */
unsigned gt_version;
@@ -103,7 +106,7 @@ gnttab_release_mappings(
struct domain *d);
/* Increase the size of a domain's grant table.
- * Caller must hold d's grant table lock.
+ * Caller must hold d's grant table write lock.
*/
int
gnttab_grow_table(struct domain *d, unsigned int req_nr_frames);
--
1.7.10.4
^ permalink raw reply related [flat|nested] 22+ messages in thread* Re: [PATCHv11 3/4] gnttab: make the grant table lock a read-write lock
2015-06-02 16:26 ` [PATCHv11 3/4] gnttab: make the grant table lock a read-write lock David Vrabel
@ 2015-06-05 14:34 ` Jan Beulich
2015-06-05 14:39 ` David Vrabel
0 siblings, 1 reply; 22+ messages in thread
From: Jan Beulich @ 2015-06-05 14:34 UTC (permalink / raw)
To: David Vrabel
Cc: xen-devel, Malcolm Crossley, Keir Fraser, Ian Campbell,
Tim Deegan
>>> On 02.06.15 at 18:26, <david.vrabel@citrix.com> wrote:
> @@ -970,9 +988,10 @@ __gnttab_unmap_common(
> TRACE_1D(TRC_MEM_PAGE_GRANT_UNMAP, dom);
>
> rgt = rd->grant_table;
> - double_gt_lock(lgt, rgt);
>
> - op->flags = op->map->flags;
> + read_lock(&rgt->lock);
> +
> + op->flags = read_atomic(&op->map->flags);
> if ( unlikely(!op->flags) || unlikely(op->map->domid != dom) )
> {
> gdprintk(XENLOG_WARNING, "Unstable handle %u\n", op->handle);
> @@ -1019,31 +1038,34 @@ __gnttab_unmap_common(
> act->pin -= GNTPIN_hstw_inc;
> }
>
> - if ( gnttab_need_iommu_mapping(ld) )
> + act_release_out:
> + active_entry_release(act);
> + unmap_out:
> + read_unlock(&rgt->lock);
Trying to answer the question on what protects the maptrack
entries: With the flags check done first and, after initial setup, the
ref field never changing, all modifications of flags as well as the
decision whether to drop the maptrack handle appear to be guarded
by ref's active entry lock. I think this is implicit enough to require
being properly spelled out somewhere.
Together with struct grant_mapping not being used elsewhere (I
just now created a patch to make this more explicit by moving its
declaration to the C file) I think this addresses that particular
concern. If you agree, feel free to add
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCHv11 3/4] gnttab: make the grant table lock a read-write lock
2015-06-05 14:34 ` Jan Beulich
@ 2015-06-05 14:39 ` David Vrabel
0 siblings, 0 replies; 22+ messages in thread
From: David Vrabel @ 2015-06-05 14:39 UTC (permalink / raw)
To: Jan Beulich
Cc: xen-devel, Malcolm Crossley, Keir Fraser, Ian Campbell,
Tim Deegan
On 05/06/15 15:34, Jan Beulich wrote:
>>>> On 02.06.15 at 18:26, <david.vrabel@citrix.com> wrote:
>> @@ -970,9 +988,10 @@ __gnttab_unmap_common(
>> TRACE_1D(TRC_MEM_PAGE_GRANT_UNMAP, dom);
>>
>> rgt = rd->grant_table;
>> - double_gt_lock(lgt, rgt);
>>
>> - op->flags = op->map->flags;
>> + read_lock(&rgt->lock);
>> +
>> + op->flags = read_atomic(&op->map->flags);
>> if ( unlikely(!op->flags) || unlikely(op->map->domid != dom) )
>> {
>> gdprintk(XENLOG_WARNING, "Unstable handle %u\n", op->handle);
>> @@ -1019,31 +1038,34 @@ __gnttab_unmap_common(
>> act->pin -= GNTPIN_hstw_inc;
>> }
>>
>> - if ( gnttab_need_iommu_mapping(ld) )
>> + act_release_out:
>> + active_entry_release(act);
>> + unmap_out:
>> + read_unlock(&rgt->lock);
>
> Trying to answer the question on what protects the maptrack
> entries: With the flags check done first and, after initial setup, the
> ref field never changing, all modifications of flags as well as the
> decision whether to drop the maptrack handle appear to be guarded
> by ref's active entry lock. I think this is implicit enough to require
> being properly spelled out somewhere.
>
> Together with struct grant_mapping not being used elsewhere (I
> just now created a patch to make this more explicit by moving its
> declaration to the C file) I think this addresses that particular
> concern. If you agree, feel free to add
Yes, this was exactly my reasoning. I shall improve the docs/comments
to make this clearer in v12.
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Thanks!
David
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCHv11 4/4] gnttab: use per-VCPU maptrack free lists
2015-06-02 16:26 [PATCHv11 0/4] gnttab: Improve scaleability David Vrabel
` (2 preceding siblings ...)
2015-06-02 16:26 ` [PATCHv11 3/4] gnttab: make the grant table lock a read-write lock David Vrabel
@ 2015-06-02 16:26 ` David Vrabel
2015-06-03 9:00 ` David Vrabel
2015-06-05 14:51 ` Jan Beulich
3 siblings, 2 replies; 22+ messages in thread
From: David Vrabel @ 2015-06-02 16:26 UTC (permalink / raw)
To: xen-devel
Cc: Keir Fraser, Ian Campbell, Tim Deegan, David Vrabel, Jan Beulich,
Malcolm Crossley
Performance analysis of aggregate network throughput with many VMs
shows that performance is signficantly limited by contention on the
maptrack lock when obtaining/releasing maptrack handles from the free
list.
Instead of a single free list use a per-VCPU list. This avoids any
contention when obtaining a handle. Handles must be released back to
their original list and since this may occur on a different VCPU there
is some contention on the destination VCPU's free list tail pointer
(but this is much better than a per-domain lock).
Increase the default maximum number of maptrack frames by 4 times
because: a) struct grant_mapping is now 16 bytes (instead of 8); and
b) a guest may not evenly distribute all the grant map operations
across the VCPUs (meaning some VCPUs need more maptrack entries than
others).
Signed-off-by: Malcolm Crossley <malcolm.crossley@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
---
v11:
- Allocate maptrack array with vzalloc() since it may be >1 page.
v10:
- Divide max_maptrack_frames evenly amongst the VCPUs.
- Increase default max_maptrack_frames to compensate.
---
xen/common/grant_table.c | 149 +++++++++++++++++++++++++----------------
xen/include/xen/grant_table.h | 8 ++-
xen/include/xen/sched.h | 5 ++
3 files changed, 101 insertions(+), 61 deletions(-)
diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index b16e696..b35fe12 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -37,6 +37,7 @@
#include <xen/iommu.h>
#include <xen/paging.h>
#include <xen/keyhandler.h>
+#include <xen/vmap.h>
#include <xsm/xsm.h>
#include <asm/flushtlb.h>
@@ -57,7 +58,7 @@ integer_param("gnttab_max_frames", max_grant_frames);
* New options allow to set max_maptrack_frames and
* map_grant_table_frames independently.
*/
-#define DEFAULT_MAX_MAPTRACK_FRAMES 256
+#define DEFAULT_MAX_MAPTRACK_FRAMES 1024
static unsigned int __read_mostly max_maptrack_frames;
integer_param("gnttab_max_maptrack_frames", max_maptrack_frames);
@@ -117,12 +118,6 @@ struct gnttab_unmap_common {
#define maptrack_entry(t, e) \
((t)->maptrack[(e)/MAPTRACK_PER_PAGE][(e)%MAPTRACK_PER_PAGE])
-static inline unsigned int
-nr_maptrack_frames(struct grant_table *t)
-{
- return t->maptrack_limit / MAPTRACK_PER_PAGE;
-}
-
#define MAPTRACK_TAIL (~0u)
#define SHGNT_PER_PAGE_V1 (PAGE_SIZE / sizeof(grant_entry_v1_t))
@@ -279,66 +274,104 @@ double_gt_unlock(struct grant_table *lgt, struct grant_table *rgt)
static inline int
__get_maptrack_handle(
- struct grant_table *t)
+ struct grant_table *t,
+ struct vcpu *v)
{
- unsigned int h;
- if ( unlikely((h = t->maptrack_head) == MAPTRACK_TAIL) )
+ unsigned int head, next;
+
+ /* No maptrack pages allocated for this VCPU yet? */
+ head = v->maptrack_head;
+ if ( unlikely(head == MAPTRACK_TAIL) )
return -1;
- t->maptrack_head = maptrack_entry(t, h).ref;
- return h;
+
+ /*
+ * Always keep one entry in the free list to make it easier to add
+ * free entries to the tail.
+ */
+ next = read_atomic(&maptrack_entry(t, head).ref);
+ if ( unlikely(next == MAPTRACK_TAIL) )
+ return -1;
+
+ v->maptrack_head = next;
+
+ return head;
}
static inline void
put_maptrack_handle(
struct grant_table *t, int handle)
{
- spin_lock(&t->maptrack_lock);
- maptrack_entry(t, handle).ref = t->maptrack_head;
- t->maptrack_head = handle;
- spin_unlock(&t->maptrack_lock);
+ struct domain *d = current->domain;
+ struct vcpu *v;
+ unsigned int prev_tail, cur_tail;
+
+ /* 1. Set entry to be a tail. */
+ maptrack_entry(t, handle).ref = MAPTRACK_TAIL;
+
+ /* 2. Add entry to the tail of the list on the original VCPU. */
+ v = d->vcpu[maptrack_entry(t,handle).vcpu];
+
+ cur_tail = read_atomic(&v->maptrack_tail);
+ do {
+ prev_tail = cur_tail;
+ cur_tail = cmpxchg(&v->maptrack_tail, prev_tail, handle);
+ } while ( cur_tail != prev_tail );
+
+ /* 3. Update the old tail entry to point to the new entry. */
+ write_atomic(&maptrack_entry(t, prev_tail).ref, handle);
}
static inline int
get_maptrack_handle(
struct grant_table *lgt)
{
+ struct vcpu *v = current;
int i;
grant_handle_t handle;
struct grant_mapping *new_mt;
- unsigned int new_mt_limit, nr_frames;
- spin_lock(&lgt->maptrack_lock);
+ handle = __get_maptrack_handle(lgt, v);
+ if ( likely(handle != -1) )
+ return handle;
- while ( unlikely((handle = __get_maptrack_handle(lgt)) == -1) )
- {
- nr_frames = nr_maptrack_frames(lgt);
- if ( nr_frames >= max_maptrack_frames )
- break;
+ /*
+ * max_maptrack_frames is per domain so each VCPU gets a share of
+ * the maximum, but allow at least one frame per VCPU.
+ */
+ if ( v->maptrack_frames
+ && v->maptrack_frames >= max_maptrack_frames / v->domain->max_vcpus )
+ return -1;
- new_mt = alloc_xenheap_page();
- if ( !new_mt )
- break;
+ new_mt = alloc_xenheap_page();
+ if ( !new_mt )
+ return -1;
+ clear_page(new_mt);
- clear_page(new_mt);
+ spin_lock(&lgt->maptrack_lock);
- new_mt_limit = lgt->maptrack_limit + MAPTRACK_PER_PAGE;
+ for ( i = 1; i < MAPTRACK_PER_PAGE; i++ )
+ {
+ new_mt[i - 1].ref = (lgt->maptrack_pages * MAPTRACK_PER_PAGE) + i;
+ new_mt[i - 1].vcpu = v->vcpu_id;
+ }
+ /* Set last entry vcpu and ref */
+ new_mt[i - 1].ref = v->maptrack_head;
+ new_mt[i - 1].vcpu = v->vcpu_id;
- for ( i = 1; i < MAPTRACK_PER_PAGE; i++ )
- new_mt[i - 1].ref = lgt->maptrack_limit + i;
- new_mt[i - 1].ref = lgt->maptrack_head;
- lgt->maptrack_head = lgt->maptrack_limit;
+ v->maptrack_head = lgt->maptrack_pages * MAPTRACK_PER_PAGE;
- lgt->maptrack[nr_frames] = new_mt;
- smp_wmb();
- lgt->maptrack_limit = new_mt_limit;
+ /* Set tail directly if this is the first page for this VCPU. */
+ if ( v->maptrack_tail == MAPTRACK_TAIL )
+ v->maptrack_tail = (lgt->maptrack_pages * MAPTRACK_PER_PAGE)
+ + MAPTRACK_PER_PAGE - 1;
- gdprintk(XENLOG_INFO, "Increased maptrack size to %u frames\n",
- nr_frames + 1);
- }
+ lgt->maptrack[lgt->maptrack_pages++] = new_mt;
spin_unlock(&lgt->maptrack_lock);
- return handle;
+ v->maptrack_frames++;
+
+ return __get_maptrack_handle(lgt, v);
}
/* Number of grant table entries. Caller must hold d's grant table lock. */
@@ -577,7 +610,8 @@ static void mapcount(
*/
ASSERT(rw_is_write_locked(&rd->grant_table->lock));
- for ( handle = 0; handle < lgt->maptrack_limit; handle++ )
+ for ( handle = 0; handle < lgt->maptrack_pages * MAPTRACK_PER_PAGE;
+ handle++ )
{
map = &maptrack_entry(lgt, handle);
if ( !(map->flags & (GNTMAP_device_map|GNTMAP_host_map)) ||
@@ -947,7 +981,7 @@ __gnttab_unmap_common(
op->frame = (unsigned long)(op->dev_bus_addr >> PAGE_SHIFT);
- if ( unlikely(op->handle >= lgt->maptrack_limit) )
+ if ( unlikely(op->handle >= lgt->maptrack_pages * MAPTRACK_PER_PAGE) )
{
gdprintk(XENLOG_INFO, "Bad handle (%d).\n", op->handle);
op->status = GNTST_bad_handle;
@@ -1422,6 +1456,7 @@ gnttab_setup_table(
struct gnttab_setup_table op;
struct domain *d;
struct grant_table *gt;
+ struct vcpu *v;
int i;
xen_pfn_t gmfn;
@@ -1463,6 +1498,17 @@ gnttab_setup_table(
gt = d->grant_table;
write_lock(>->lock);
+ /* Tracking of mapped foreign frames table */
+ gt->maptrack = vzalloc(max_maptrack_frames * sizeof(*gt->maptrack));
+ if ( gt->maptrack == NULL )
+ goto out3;
+ for_each_vcpu( d, v )
+ {
+ v->maptrack_head = MAPTRACK_TAIL;
+ v->maptrack_tail = MAPTRACK_TAIL;
+ }
+ gt->maptrack_pages = 0;
+
if ( gt->gt_version == 0 )
gt->gt_version = 1;
@@ -3049,18 +3095,6 @@ grant_table_create(
spin_lock_init(&t->active[i][j].lock);
}
- /* Tracking of mapped foreign frames table */
- if ( (t->maptrack = xzalloc_array(struct grant_mapping *,
- max_maptrack_frames)) == NULL )
- goto no_mem_2;
- if ( (t->maptrack[0] = alloc_xenheap_page()) == NULL )
- goto no_mem_3;
- clear_page(t->maptrack[0]);
- t->maptrack_limit = MAPTRACK_PER_PAGE;
- for ( i = 1; i < MAPTRACK_PER_PAGE; i++ )
- t->maptrack[0][i - 1].ref = i;
- t->maptrack[0][i - 1].ref = MAPTRACK_TAIL;
-
/* Shared grant table. */
if ( (t->shared_raw = xzalloc_array(void *, max_grant_frames)) == NULL )
goto no_mem_3;
@@ -3091,8 +3125,6 @@ grant_table_create(
free_xenheap_page(t->shared_raw[i]);
xfree(t->shared_raw);
no_mem_3:
- free_xenheap_page(t->maptrack[0]);
- xfree(t->maptrack);
no_mem_2:
for ( i = 0;
i < num_act_frames_from_sha_frames(INITIAL_NR_GRANT_FRAMES); i++ )
@@ -3120,7 +3152,8 @@ gnttab_release_mappings(
BUG_ON(!d->is_dying);
- for ( handle = 0; handle < gt->maptrack_limit; handle++ )
+ for ( handle = 0; handle < gt->maptrack_pages * MAPTRACK_PER_PAGE;
+ handle++ )
{
map = &maptrack_entry(gt, handle);
if ( !(map->flags & (GNTMAP_device_map|GNTMAP_host_map)) )
@@ -3225,9 +3258,9 @@ grant_table_destroy(
free_xenheap_page(t->shared_raw[i]);
xfree(t->shared_raw);
- for ( i = 0; i < nr_maptrack_frames(t); i++ )
+ for ( i = 0; i < t->maptrack_pages; i++ )
free_xenheap_page(t->maptrack[i]);
- xfree(t->maptrack);
+ vfree(t->maptrack);
for ( i = 0; i < nr_active_grant_frames(t); i++ )
free_xenheap_page(t->active[i]);
diff --git a/xen/include/xen/grant_table.h b/xen/include/xen/grant_table.h
index f22ebd0..c6e4ebf 100644
--- a/xen/include/xen/grant_table.h
+++ b/xen/include/xen/grant_table.h
@@ -60,6 +60,8 @@ struct grant_mapping {
u32 ref; /* grant ref */
u16 flags; /* 0-4: GNTMAP_* ; 5-15: unused */
domid_t domid; /* granting domain */
+ u32 vcpu; /* vcpu which created the grant mapping */
+ u32 pad; /* round size to a power of 2 */
};
/* Per-domain grant information. */
@@ -83,10 +85,10 @@ struct grant_table {
grant_status_t **status;
/* Active grant table. */
struct active_grant_entry **active;
- /* Mapping tracking table. */
+ /* Mapping tracking table per vcpu. */
struct grant_mapping **maptrack;
- unsigned int maptrack_head;
- unsigned int maptrack_limit;
+ /* Total pages used for mapping tracking table */
+ unsigned int maptrack_pages;
/* Lock protecting the maptrack page list, head, and limit */
spinlock_t maptrack_lock;
/* The defined versions are 1 and 2. Set to 0 if we don't know
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 80c6f62..d46a561 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -219,6 +219,11 @@ struct vcpu
/* VCPU paused by system controller. */
int controller_pause_count;
+ /* Maptrack */
+ unsigned int maptrack_head;
+ unsigned int maptrack_tail;
+ unsigned int maptrack_frames;
+
/* IRQ-safe virq_lock protects against delivering VIRQ to stale evtchn. */
evtchn_port_t virq_to_evtchn[NR_VIRQS];
spinlock_t virq_lock;
--
1.7.10.4
^ permalink raw reply related [flat|nested] 22+ messages in thread* Re: [PATCHv11 4/4] gnttab: use per-VCPU maptrack free lists
2015-06-02 16:26 ` [PATCHv11 4/4] gnttab: use per-VCPU maptrack free lists David Vrabel
@ 2015-06-03 9:00 ` David Vrabel
2015-06-05 14:51 ` Jan Beulich
1 sibling, 0 replies; 22+ messages in thread
From: David Vrabel @ 2015-06-03 9:00 UTC (permalink / raw)
To: David Vrabel, xen-devel
Cc: Tim Deegan, Malcolm Crossley, Keir Fraser, Ian Campbell,
Jan Beulich
On 02/06/15 17:26, David Vrabel wrote:
> Performance analysis of aggregate network throughput with many VMs
> shows that performance is signficantly limited by contention on the
> maptrack lock when obtaining/releasing maptrack handles from the free
> list.
Sometime during all this refactoring, I dropped the authorship of this
patch. Please ensure this is
From: Malcolm Crossley <malcolm.crossley@citrix.com>
Thanks.
David
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCHv11 4/4] gnttab: use per-VCPU maptrack free lists
2015-06-02 16:26 ` [PATCHv11 4/4] gnttab: use per-VCPU maptrack free lists David Vrabel
2015-06-03 9:00 ` David Vrabel
@ 2015-06-05 14:51 ` Jan Beulich
2015-06-05 15:55 ` David Vrabel
2015-06-11 15:28 ` Tim Deegan
1 sibling, 2 replies; 22+ messages in thread
From: Jan Beulich @ 2015-06-05 14:51 UTC (permalink / raw)
To: David Vrabel, Tim Deegan
Cc: xen-devel, Malcolm Crossley, Keir Fraser, Ian Campbell
>>> On 02.06.15 at 18:26, <david.vrabel@citrix.com> wrote:
> Performance analysis of aggregate network throughput with many VMs
> shows that performance is signficantly limited by contention on the
> maptrack lock when obtaining/releasing maptrack handles from the free
> list.
>
> Instead of a single free list use a per-VCPU list. This avoids any
> contention when obtaining a handle. Handles must be released back to
> their original list and since this may occur on a different VCPU there
> is some contention on the destination VCPU's free list tail pointer
> (but this is much better than a per-domain lock).
>
> Increase the default maximum number of maptrack frames by 4 times
> because: a) struct grant_mapping is now 16 bytes (instead of 8); and
> b) a guest may not evenly distribute all the grant map operations
> across the VCPUs (meaning some VCPUs need more maptrack entries than
> others).
>
> Signed-off-by: Malcolm Crossley <malcolm.crossley@citrix.com>
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> Acked-by: Tim Deegan <tim@xen.org>
What version was that ack given for?
> ---
> v11:
> - Allocate maptrack array with vzalloc() since it may be >1 page.
>
> v10:
> - Divide max_maptrack_frames evenly amongst the VCPUs.
> - Increase default max_maptrack_frames to compensate.
Iirc before both of these changes, and the v10 ones imo should
have invalidated it. Tim, I'm particularly trying to understand
whether you're okay with the original's (potentially even heavier)
resource use and/or this version's (risking to run out of maptrack
entries _much_ earlier than currently).
> static inline void
> put_maptrack_handle(
> struct grant_table *t, int handle)
> {
> - spin_lock(&t->maptrack_lock);
> - maptrack_entry(t, handle).ref = t->maptrack_head;
> - t->maptrack_head = handle;
> - spin_unlock(&t->maptrack_lock);
I think the dropping of the lock here makes it necessary to add a
write barrier between __gnttab_unmap_common_complete()
zeroing op->map->flags and the call of this function.
> static inline int
> get_maptrack_handle(
> struct grant_table *lgt)
> {
> + struct vcpu *v = current;
> int i;
> grant_handle_t handle;
> struct grant_mapping *new_mt;
> - unsigned int new_mt_limit, nr_frames;
>
> - spin_lock(&lgt->maptrack_lock);
> + handle = __get_maptrack_handle(lgt, v);
> + if ( likely(handle != -1) )
> + return handle;
>
> - while ( unlikely((handle = __get_maptrack_handle(lgt)) == -1) )
> - {
> - nr_frames = nr_maptrack_frames(lgt);
> - if ( nr_frames >= max_maptrack_frames )
> - break;
> + /*
> + * max_maptrack_frames is per domain so each VCPU gets a share of
> + * the maximum, but allow at least one frame per VCPU.
> + */
> + if ( v->maptrack_frames
> + && v->maptrack_frames >= max_maptrack_frames / v->domain->max_vcpus )
> + return -1;
So with e.g. max_maptrack_frames being 256 and ->max_vcpus
being 129 you'd potentially allow each vCPU to only have exactly
one page despite there being 127 more to use.
Jan
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCHv11 4/4] gnttab: use per-VCPU maptrack free lists
2015-06-05 14:51 ` Jan Beulich
@ 2015-06-05 15:55 ` David Vrabel
2015-06-05 16:11 ` Jan Beulich
2015-06-11 15:28 ` Tim Deegan
1 sibling, 1 reply; 22+ messages in thread
From: David Vrabel @ 2015-06-05 15:55 UTC (permalink / raw)
To: Jan Beulich, Tim Deegan
Cc: xen-devel, Malcolm Crossley, Keir Fraser, Ian Campbell
On 05/06/15 15:51, Jan Beulich wrote:
>>>> On 02.06.15 at 18:26, <david.vrabel@citrix.com> wrote:
>> Performance analysis of aggregate network throughput with many VMs
>> shows that performance is signficantly limited by contention on the
>> maptrack lock when obtaining/releasing maptrack handles from the free
>> list.
>>
>> Instead of a single free list use a per-VCPU list. This avoids any
>> contention when obtaining a handle. Handles must be released back to
>> their original list and since this may occur on a different VCPU there
>> is some contention on the destination VCPU's free list tail pointer
>> (but this is much better than a per-domain lock).
>>
>> Increase the default maximum number of maptrack frames by 4 times
>> because: a) struct grant_mapping is now 16 bytes (instead of 8); and
>> b) a guest may not evenly distribute all the grant map operations
>> across the VCPUs (meaning some VCPUs need more maptrack entries than
>> others).
>>
>> Signed-off-by: Malcolm Crossley <malcolm.crossley@citrix.com>
>> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
>> Acked-by: Tim Deegan <tim@xen.org>
>
> What version was that ack given for?
v9
>> + /*
>> + * max_maptrack_frames is per domain so each VCPU gets a share of
>> + * the maximum, but allow at least one frame per VCPU.
>> + */
>> + if ( v->maptrack_frames
>> + && v->maptrack_frames >= max_maptrack_frames / v->domain->max_vcpus )
>> + return -1;
>
> So with e.g. max_maptrack_frames being 256 and ->max_vcpus
> being 129 you'd potentially allow each vCPU to only have exactly
> one page despite there being 127 more to use.
There's a limit to how many wacky combinations we can support with a
single default limit.
With the standard defaults and 129 VCPUs:
Before
131072 entries (256 * 4096 / 8)
After
231168 entries (1024 / 129 * 129 * 4096 / 16)
1792 entries per vcpu.
David
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCHv11 4/4] gnttab: use per-VCPU maptrack free lists
2015-06-05 15:55 ` David Vrabel
@ 2015-06-05 16:11 ` Jan Beulich
2015-06-05 16:42 ` David Vrabel
0 siblings, 1 reply; 22+ messages in thread
From: Jan Beulich @ 2015-06-05 16:11 UTC (permalink / raw)
To: David Vrabel
Cc: xen-devel, Malcolm Crossley, Keir Fraser, Ian Campbell,
Tim Deegan
>>> On 05.06.15 at 17:55, <david.vrabel@citrix.com> wrote:
> On 05/06/15 15:51, Jan Beulich wrote:
>>>>> On 02.06.15 at 18:26, <david.vrabel@citrix.com> wrote:
>>> + /*
>>> + * max_maptrack_frames is per domain so each VCPU gets a share of
>>> + * the maximum, but allow at least one frame per VCPU.
>>> + */
>>> + if ( v->maptrack_frames
>>> + && v->maptrack_frames >= max_maptrack_frames / v->domain->max_vcpus )
>>> + return -1;
>>
>> So with e.g. max_maptrack_frames being 256 and ->max_vcpus
>> being 129 you'd potentially allow each vCPU to only have exactly
>> one page despite there being 127 more to use.
>
> There's a limit to how many wacky combinations we can support with a
> single default limit.
>
> With the standard defaults and 129 VCPUs:
>
> Before
>
> 131072 entries (256 * 4096 / 8)
>
> After
>
> 231168 entries (1024 / 129 * 129 * 4096 / 16)
> 1792 entries per vcpu.
And that's why I'm putting the currently proposed resource
management model under question.
Jan
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCHv11 4/4] gnttab: use per-VCPU maptrack free lists
2015-06-05 16:11 ` Jan Beulich
@ 2015-06-05 16:42 ` David Vrabel
2015-06-08 6:54 ` Jan Beulich
0 siblings, 1 reply; 22+ messages in thread
From: David Vrabel @ 2015-06-05 16:42 UTC (permalink / raw)
To: Jan Beulich
Cc: xen-devel, Malcolm Crossley, Keir Fraser, Ian Campbell,
Tim Deegan
On 05/06/15 17:11, Jan Beulich wrote:
>>>> On 05.06.15 at 17:55, <david.vrabel@citrix.com> wrote:
>> On 05/06/15 15:51, Jan Beulich wrote:
>>>>>> On 02.06.15 at 18:26, <david.vrabel@citrix.com> wrote:
>>>> + /*
>>>> + * max_maptrack_frames is per domain so each VCPU gets a share of
>>>> + * the maximum, but allow at least one frame per VCPU.
>>>> + */
>>>> + if ( v->maptrack_frames
>>>> + && v->maptrack_frames >= max_maptrack_frames / v->domain->max_vcpus )
>>>> + return -1;
>>>
>>> So with e.g. max_maptrack_frames being 256 and ->max_vcpus
>>> being 129 you'd potentially allow each vCPU to only have exactly
>>> one page despite there being 127 more to use.
>>
>> There's a limit to how many wacky combinations we can support with a
>> single default limit.
>>
>> With the standard defaults and 129 VCPUs:
>>
>> Before
>>
>> 131072 entries (256 * 4096 / 8)
>>
>> After
>>
>> 231168 entries (1024 / 129 * 129 * 4096 / 16)
>> 1792 entries per vcpu.
>
> And that's why I'm putting the currently proposed resource
> management model under question.
The new default of 1024 frames ensures that with any number of VCPUs
results in the domain having /more/ entries than then old default (256
frames).
It's not at all clear what you want here. Can you provide a proposal?
David
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCHv11 4/4] gnttab: use per-VCPU maptrack free lists
2015-06-05 16:42 ` David Vrabel
@ 2015-06-08 6:54 ` Jan Beulich
0 siblings, 0 replies; 22+ messages in thread
From: Jan Beulich @ 2015-06-08 6:54 UTC (permalink / raw)
To: David Vrabel
Cc: xen-devel, Malcolm Crossley, Keir Fraser, Ian Campbell,
Tim Deegan
>>> On 05.06.15 at 18:42, <david.vrabel@citrix.com> wrote:
> On 05/06/15 17:11, Jan Beulich wrote:
>>>>> On 05.06.15 at 17:55, <david.vrabel@citrix.com> wrote:
>>> On 05/06/15 15:51, Jan Beulich wrote:
>>>>>>> On 02.06.15 at 18:26, <david.vrabel@citrix.com> wrote:
>>>>> + /*
>>>>> + * max_maptrack_frames is per domain so each VCPU gets a share of
>>>>> + * the maximum, but allow at least one frame per VCPU.
>>>>> + */
>>>>> + if ( v->maptrack_frames
>>>>> + && v->maptrack_frames >= max_maptrack_frames / v->domain->max_vcpus )
>>>>> + return -1;
>>>>
>>>> So with e.g. max_maptrack_frames being 256 and ->max_vcpus
>>>> being 129 you'd potentially allow each vCPU to only have exactly
>>>> one page despite there being 127 more to use.
>>>
>>> There's a limit to how many wacky combinations we can support with a
>>> single default limit.
>>>
>>> With the standard defaults and 129 VCPUs:
>>>
>>> Before
>>>
>>> 131072 entries (256 * 4096 / 8)
>>>
>>> After
>>>
>>> 231168 entries (1024 / 129 * 129 * 4096 / 16)
>>> 1792 entries per vcpu.
>>
>> And that's why I'm putting the currently proposed resource
>> management model under question.
>
> The new default of 1024 frames ensures that with any number of VCPUs
> results in the domain having /more/ entries than then old default (256
> frames).
>
> It's not at all clear what you want here. Can you provide a proposal?
There being more frames per domain doesn't help a guest e.g.
first doing one or more mappings on each vCPU and then wanting
to do very many mappings on a single vCPU. I think there needs to
be a (slow) fallback path where using "foreign" vCPU-s' maptrack
entries is possible. Or something else to avoid regressing in
scenarios that work prior to your change.
Jan
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCHv11 4/4] gnttab: use per-VCPU maptrack free lists
2015-06-05 14:51 ` Jan Beulich
2015-06-05 15:55 ` David Vrabel
@ 2015-06-11 15:28 ` Tim Deegan
2015-06-12 7:00 ` Jan Beulich
1 sibling, 1 reply; 22+ messages in thread
From: Tim Deegan @ 2015-06-11 15:28 UTC (permalink / raw)
To: Jan Beulich
Cc: xen-devel, Malcolm Crossley, Keir Fraser, David Vrabel,
Ian Campbell
At 15:51 +0100 on 05 Jun (1433519478), Jan Beulich wrote:
> >>> On 02.06.15 at 18:26, <david.vrabel@citrix.com> wrote:
> > Performance analysis of aggregate network throughput with many VMs
> > shows that performance is signficantly limited by contention on the
> > maptrack lock when obtaining/releasing maptrack handles from the free
> > list.
> >
> > Instead of a single free list use a per-VCPU list. This avoids any
> > contention when obtaining a handle. Handles must be released back to
> > their original list and since this may occur on a different VCPU there
> > is some contention on the destination VCPU's free list tail pointer
> > (but this is much better than a per-domain lock).
> >
> > Increase the default maximum number of maptrack frames by 4 times
> > because: a) struct grant_mapping is now 16 bytes (instead of 8); and
> > b) a guest may not evenly distribute all the grant map operations
> > across the VCPUs (meaning some VCPUs need more maptrack entries than
> > others).
> >
> > Signed-off-by: Malcolm Crossley <malcolm.crossley@citrix.com>
> > Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> > Acked-by: Tim Deegan <tim@xen.org>
>
> What version was that ack given for?
v7, IIRC.
> Iirc before both of these changes, and the v10 ones imo should
> have invalidated it. Tim, I'm particularly trying to understand
> whether you're okay with the original's (potentially even heavier)
> resource use and/or this version's (risking to run out of maptrack
> entries _much_ earlier than currently).
The concern with the earlier version being that the maximum maptrack
limit gets a lot higher with many vcpus? I was OK with that. There
are a lot of things that scale with #vcpus, and xenheap pages are not
particularly scarce any more. So let's say I don't find one 128-vcpu
guest much different from 128 1-vcpu guests for this purpose.
Cheers,
Tim.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCHv11 4/4] gnttab: use per-VCPU maptrack free lists
2015-06-11 15:28 ` Tim Deegan
@ 2015-06-12 7:00 ` Jan Beulich
0 siblings, 0 replies; 22+ messages in thread
From: Jan Beulich @ 2015-06-12 7:00 UTC (permalink / raw)
To: Tim Deegan
Cc: xen-devel, Malcolm Crossley, Keir Fraser, David Vrabel,
Ian Campbell
>>> On 11.06.15 at 17:28, <tim@xen.org> wrote:
> At 15:51 +0100 on 05 Jun (1433519478), Jan Beulich wrote:
>> Iirc before both of these changes, and the v10 ones imo should
>> have invalidated it. Tim, I'm particularly trying to understand
>> whether you're okay with the original's (potentially even heavier)
>> resource use and/or this version's (risking to run out of maptrack
>> entries _much_ earlier than currently).
>
> The concern with the earlier version being that the maximum maptrack
> limit gets a lot higher with many vcpus? I was OK with that. There
> are a lot of things that scale with #vcpus, and xenheap pages are not
> particularly scarce any more. So let's say I don't find one 128-vcpu
> guest much different from 128 1-vcpu guests for this purpose.
I certainly can see that resource consumption may scale with the
number of vCPU-s a guest has. But whether that ought to apply to
everything, i.e. including all per-domain (rather than per-vCPU)
resources (which the maptrack table is only an example of) I'm not
sure about. In the end, if we were to talk about v9 and the default
of 256 frames, a 1024-vCPU DomU could consume 1Gb of memory
for its maptrack even if (quite likely) it doesn't serve any other
guest. IOW to me it would seem that a necessary prereq to this
change would be to make the limit a per-domain setting, with only
Dom0 getting its limit bumped by default (and even then the
at-least-one-page-per-vCPU requirement undermines that, i.e. a
slow path to avoid going over the set limit would still seem
necessary).
Additionally, with the maptrack pages no longer shared across
vCPU-s, running into an allocation failure namely on ballooned
setups (where Xen typically doesn't have much memory available,
since the ballooning ordinarily only accounts for the immediate
domain needs) is going to become quite a bit more likely. Yes, I
already hear David saying "I don't care about the ballooning case",
but no, I don't think we can simply ignore this case (unless we
want to declare ballooning a deprecated, no longer supported
feature).
Jan
^ permalink raw reply [flat|nested] 22+ messages in thread