From mboxrd@z Thu Jan  1 00:00:00 1970
From: David Vrabel <david.vrabel@citrix.com>
Subject: Re: Possible Xen grant table locking improvements
Date: Fri, 7 Nov 2014 15:41:37 +0000
Message-ID: <545CE831.7070409@citrix.com>
References: <544547F3.4000506@citrix.com>
	<20141023094619.GA89807@deinos.phlegethon.org>
Mime-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Return-path: <xen-devel-bounces@lists.xen.org>
In-Reply-To: <20141023094619.GA89807@deinos.phlegethon.org>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Tim Deegan <tim@xen.org>, David Vrabel <david.vrabel@citrix.com>
Cc: Keir Fraser <keir@xen.org>, Matt Wilson <msw@amazon.com>, Jan Beulich <jbeulich@suse.com>, "Xen-devel@lists.xen.org" <Xen-devel@lists.xen.org>
List-Id: xen-devel@lists.xenproject.org

On 23/10/14 10:46, Tim Deegan wrote:
> Hi,
> =

> At 18:35 +0100 on 20 Oct (1413826547), David Vrabel wrote:
>> Most guests do not map a grant reference more than twice (Linux, for
>> example, will typically map a gref once in the kernel address space, or
>> twice if a userspace mapping is required).  The maptrack entries for
>> these two mappings can be stored in the active entry (the "fast"
>> entries).  If more than two mappings are required, the existing maptrack
>> table can be used (the "slow" entries).
> =

> Sounds good, as long as the hit rate is indeed high.  Do you know if
> the BSD/windows client code behaves this way too?

I don't know about BSD but XenServer's Windows PV drivers don't support
mapping grants (only granting access).

>> A maptrack handle for a "fast" entry is encoded as:
>>
>>     31 30          16  15            0
>>   +---+---------------+---------------+
>>   | F | domid         | gref          |
>>   +---+---------------+---------------+
>>
>> F is set for a "fast" entry, and clear for a "slow" one. Grant
>> references above 2^16 will have to be tracked with "slow" entries.
> =

> How restricting is that limit?  Would 2^15=BD and also encoding
> which of the two entries to look at be good?

Oh,  I forgot about the bit for the entry index.

2^15 entries allows for (e.g.,) 8 multiqueue VIFs in a 8 VCPU guest.
Which is not a huge number and not a limit I would like to introduce.

One possibility would be guests wanting to use the fast path have to use
a new grant unmap hypercall that also passes the original grant ref and
domain.

>> We can omit taking the grant table lock to check the validity of a grant
>> ref or maptrack handle since these tables only grow and do not shrink.
> =

> Can you also avoid the lock for accessing the entry itself, with a bit
> of RCU magic?  Maybe that's overengineering things.

I don't think this will be necessary -- the active entry lock won't be
contended.

>> If strict IOMMU mode is used, IOMMU mappings are updated on every grant
>> map/unmap.  These are currently setup such that BFN =3D=3D MFN which
>> requires reference counting the IOMMU mappings so they are only torn
>> down when all grefs for that MFN are unmapped.  This requires an
>> expensive mapcount() operation that iterates over the whole maptrack tab=
le.
>>
>> There is no requirement for BFN =3D=3D MFN so each grant map can create =
its
>> own IOMMU mapping.  This will require a region of bus address space that
>> does not overlap with RAM.
> =

> Hrmn.  That could be tricky to arrange.  And the reference counting
> might end up being cheaper than the extra IOMMU flush operations.
> (Also, how much would you bet that clients actually use the returned
> BFN correctly?)

Hmm. Yes, if a guest has assumed that BFN =3D=3D MFN, it could read the PTE
and get the BFN that way.

> Would it be enough to optimise mapcount() a bit?  We could organise the
> in-use maptrack entries as a hash table instead of (or as well as) a
> single linked list.
> =

> On similar lines, would it be worth fragmenting the maptrack itself
> (e.g. with per-page locks) to reduce locking contention instead of
> moving maptrack entries into the active entry?  If might be Good
> Enough[tm], and simpler to build/maintain than this proposal.

A per maptrack page lock you would still need a per-domain maptrack lock
protecting the maptrack free list.

A better idea may be to hash the domain and grant ref and have a
maptrack table per-hash bucket.  Each maptrack table would have its own
maptrack lock.

The maptrack handle could be:

    31               16  15            0
    +-------------------+---------------+
    | bucket            | index         |
    +-------------------+---------------+

We should probably try something simpler like this before getting
carried away...

David