From: "Roger Pau Monné" <roger.pau@citrix.com>
To: James Harper <james.harper@bendigoit.com.au>
Cc: "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
Subject: Re: [PATCH RFC] Persistent grant maps for xen blk drivers
Date: Fri, 19 Oct 2012 13:28:46 +0200 [thread overview]
Message-ID: <5081396E.4040409@citrix.com> (raw)
In-Reply-To: <6035A0D088A63A46850C3988ED045A4B32C11932@BITCOM1.int.sbss.com.au>
On 19/10/12 12:46, James Harper wrote:
>>
>> On 19/10/12 03:34, James Harper wrote:
>>>>
>>>> This patch implements persistent grants for the xen-blk{front,back}
>>>> mechanism. The effect of this change is to reduce the number of unmap
>>>> operations performed, since they cause a (costly) TLB shootdown. This
>>>> allows the I/O performance to scale better when a large number of VMs
>>>> are performing I/O.
>>>>
>>>> Previously, the blkfront driver was supplied a bvec[] from the
>>>> request queue. This was granted to dom0; dom0 performed the I/O and
>>>> wrote directly into the grant-mapped memory and unmapped it; blkfront
>>>> then removed foreign access for that grant. The cost of unmapping
>>>> scales badly with the number of CPUs in Dom0. An experiment showed
>>>> that when
>>>> Dom0 has 24 VCPUs, and guests are performing parallel I/O to a
>>>> ramdisk, the IPIs from performing unmap's is a bottleneck at 5 guests
>>>> (at which point
>>>> 650,000 IOPS are being performed in total). If more than 5 guests are
>>>> used, the performance declines. By 10 guests, only
>>>> 400,000 IOPS are being performed.
>>>>
>>>> This patch improves performance by only unmapping when the
>> connection
>>>> between blkfront and back is broken.
>>>
>>> I assume network drivers would suffer from the same affliction... Would a
>> more general persistent map solution be worth considering (or be possible)?
>> So a common interface to this persistent mapping allowing the persistent
>> pool to be shared between all drivers in the DomU?
>>
>> Yes, there are plans to implement the same for network drivers. I would
>> generally avoid having a shared pool of grants for all the devices of a DomU,
>> as said in the description of the patch:
>>
>> Blkback stores a mapping of grefs=>{page mapped to by gref} in a red-black
>> tree. As the grefs are not known apriori, and provide no guarantees on their
>> ordering, we have to perform a search through this tree to find the page, for
>> every gref we receive. This operation takes O(log n) time in the worst case.
>>
>> Having a shared pool with all grants would mean that n will become much
>> higher, and so the search time for a grant would increase.
>
> I'm asking because I vaguely started a similar project a while back, but didn't get much further than investigating data structures. I had something like the following:
>
> . redefined gref so that high bit indicates a persistent mapping (on the basis that no DomU is ever going to have >2^31 grants). High bit set indicates a persistent grant which is handled differently.
I don't understand why you need to change the way to pass a gref
arround, this will break compatibility with non-persistent backends,
unless you negotiate the use of persistent grants before actually
starting the data tranfer, but if you do that you already know you are
using persistent grants, so there's no need to set any bit in the gref.
> . New hypercall mem-op's to allocate/deallocate a persistent grant, returning a handle from Dom0 (with high bit set). Dom0 maintains a table of mapped grants with the handle being the index. Ref counting tracks usage so that an unmap won't be allowed when ref>0. I was taking the approach that a chunk of persistent grants would be allocated at boot time and so the actual map/unmap is not done often so the requirement of a hypercall wasn't a big deal. I hadn't figured out how to manage the size of this table yet.
The so called persistent grants are no different from normal grants,
it's just that we agree in blk{front/back} that the same set of grants
will be used for all transations, there's no need to introduce any new
hypercalls, since they are just "regular" grants.
I agree that we could allocate them when initializing blkfront, but I
prefer to allocate them on request, since we won't probably use the
maximum number (RING_SIZE * SEGMENTS_PER_REQUEST).
> . Mapping a gref with the high bit set in Dom0 becomes a lookup into the persistent table and a ref++ rather than an actual mapping operation. Unmapping becomes a ref--.
>
>> Also, if the pool is
>> shared some kind of concurrency control should be added, which will make it
>> even slower.
>>
>
> Yes, but I think I only needed to worry about that for the actual alloc/dealloc of the persistent map entry which would be an infrequent event. As I said, I never got much further than the above concept so I hadn't fully explored that - at the time I was chasing an imaginary problem with grant tables which turned out to be freelist contention in DomU.
As far as I can see (correct me if I'm wrong), you are proposing a
solution that involves changes to both the guests and the hypervisor
side, I think this introduces uncessary complexity to a problem that can
be solved by merely changing the way blk{front/back} behaves, without
requiring the hypervisor to know if we are using persistent grants or not.
> James
>
next prev parent reply other threads:[~2012-10-19 11:28 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-10-18 11:22 [PATCH RFC] Persistent grant maps for xen blk drivers Roger Pau Monne
2012-10-19 1:34 ` [Xen-devel] " James Harper
2012-10-19 1:34 ` James Harper
2012-10-19 8:26 ` [Xen-devel] " Roger Pau Monné
2012-10-19 10:46 ` James Harper
2012-10-19 11:19 ` James Harper
2012-10-19 11:28 ` Roger Pau Monné [this message]
2012-10-19 8:26 ` Roger Pau Monné
2012-10-22 13:47 ` Konrad Rzeszutek Wilk
2012-10-22 13:47 ` Konrad Rzeszutek Wilk
2012-10-23 16:07 ` Roger Pau Monné
2012-10-23 16:07 ` Roger Pau Monné
2012-10-23 17:20 ` Konrad Rzeszutek Wilk
2012-10-23 18:09 ` Roger Pau Monné
2012-10-23 18:09 ` Roger Pau Monné
2012-10-23 18:50 ` Konrad Rzeszutek Wilk
2012-10-23 18:50 ` [Xen-devel] " Konrad Rzeszutek Wilk
2012-10-24 7:40 ` Jan Beulich
2012-10-25 12:40 ` Konrad Rzeszutek Wilk
2012-10-25 12:40 ` Konrad Rzeszutek Wilk
2012-10-24 7:40 ` Jan Beulich
2012-10-24 10:45 ` Roger Pau Monné
2012-10-24 10:45 ` [Xen-devel] " Roger Pau Monné
2012-10-23 17:20 ` Konrad Rzeszutek Wilk
-- strict thread matches above, loose matches on Subject: below --
2012-10-18 11:22 Roger Pau Monne
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5081396E.4040409@citrix.com \
--to=roger.pau@citrix.com \
--cc=james.harper@bendigoit.com.au \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.