From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Vrabel Subject: Re: Possible Xen grant table locking improvements Date: Fri, 7 Nov 2014 15:41:37 +0000 Message-ID: <545CE831.7070409@citrix.com> References: <544547F3.4000506@citrix.com> <20141023094619.GA89807@deinos.phlegethon.org> Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: <20141023094619.GA89807@deinos.phlegethon.org> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Tim Deegan , David Vrabel Cc: Keir Fraser , Matt Wilson , Jan Beulich , "Xen-devel@lists.xen.org" List-Id: xen-devel@lists.xenproject.org On 23/10/14 10:46, Tim Deegan wrote: > Hi, > = > At 18:35 +0100 on 20 Oct (1413826547), David Vrabel wrote: >> Most guests do not map a grant reference more than twice (Linux, for >> example, will typically map a gref once in the kernel address space, or >> twice if a userspace mapping is required). The maptrack entries for >> these two mappings can be stored in the active entry (the "fast" >> entries). If more than two mappings are required, the existing maptrack >> table can be used (the "slow" entries). > = > Sounds good, as long as the hit rate is indeed high. Do you know if > the BSD/windows client code behaves this way too? I don't know about BSD but XenServer's Windows PV drivers don't support mapping grants (only granting access). >> A maptrack handle for a "fast" entry is encoded as: >> >> 31 30 16 15 0 >> +---+---------------+---------------+ >> | F | domid | gref | >> +---+---------------+---------------+ >> >> F is set for a "fast" entry, and clear for a "slow" one. Grant >> references above 2^16 will have to be tracked with "slow" entries. > = > How restricting is that limit? Would 2^15=BD and also encoding > which of the two entries to look at be good? Oh, I forgot about the bit for the entry index. 2^15 entries allows for (e.g.,) 8 multiqueue VIFs in a 8 VCPU guest. Which is not a huge number and not a limit I would like to introduce. One possibility would be guests wanting to use the fast path have to use a new grant unmap hypercall that also passes the original grant ref and domain. >> We can omit taking the grant table lock to check the validity of a grant >> ref or maptrack handle since these tables only grow and do not shrink. > = > Can you also avoid the lock for accessing the entry itself, with a bit > of RCU magic? Maybe that's overengineering things. I don't think this will be necessary -- the active entry lock won't be contended. >> If strict IOMMU mode is used, IOMMU mappings are updated on every grant >> map/unmap. These are currently setup such that BFN =3D=3D MFN which >> requires reference counting the IOMMU mappings so they are only torn >> down when all grefs for that MFN are unmapped. This requires an >> expensive mapcount() operation that iterates over the whole maptrack tab= le. >> >> There is no requirement for BFN =3D=3D MFN so each grant map can create = its >> own IOMMU mapping. This will require a region of bus address space that >> does not overlap with RAM. > = > Hrmn. That could be tricky to arrange. And the reference counting > might end up being cheaper than the extra IOMMU flush operations. > (Also, how much would you bet that clients actually use the returned > BFN correctly?) Hmm. Yes, if a guest has assumed that BFN =3D=3D MFN, it could read the PTE and get the BFN that way. > Would it be enough to optimise mapcount() a bit? We could organise the > in-use maptrack entries as a hash table instead of (or as well as) a > single linked list. > = > On similar lines, would it be worth fragmenting the maptrack itself > (e.g. with per-page locks) to reduce locking contention instead of > moving maptrack entries into the active entry? If might be Good > Enough[tm], and simpler to build/maintain than this proposal. A per maptrack page lock you would still need a per-domain maptrack lock protecting the maptrack free list. A better idea may be to hash the domain and grant ref and have a maptrack table per-hash bucket. Each maptrack table would have its own maptrack lock. The maptrack handle could be: 31 16 15 0 +-------------------+---------------+ | bucket | index | +-------------------+---------------+ We should probably try something simpler like this before getting carried away... David