xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [Hackathon minutes] PV block improvements
@ 2013-05-24 15:06 Roger Pau Monné
  2013-06-21 17:10 ` Roger Pau Monné
  0 siblings, 1 reply; 24+ messages in thread
From: Roger Pau Monné @ 2013-05-24 15:06 UTC (permalink / raw)
  To: xen-devel

Hello,

This are the notes about the block improvements discussed at the
Hackathon, some of them, if not all, have already been incorporated into:

https://docs.google.com/document/d/1Vh5T8Z3Tx3sUEhVB0DnNDKBNiqB_ZA8Z5YVqAsCIjuI/edit

Here is a list of future work items, more measurable and limited:

A) Separate request and response rings. This has several benefits, we
will be able to reduce the size of the response struct, since it no
longer has to be the same size of the request. Also, we could increase
the number of in-flight requests, since we are no longer limited by the
size of the request ring. We still need to make sure that all in-flight
requests can be written to the response ring once they are finished, or
added to a queue that writes them to the response ring when there's a
free slot.

B) Clean up the size differences between 32/64bit structs, also while
there reduce the size of a request so it is aligned to the cache (64bits).

C) Investigate the interrupt rate between blkfront/blkback and if needed
add support for polling, switching between polling or events could be
done automatically by blkfront/blkback when a high interrupt rate is
detected.

D) Multipage ring support.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Hackathon minutes] PV block improvements
  2013-05-24 15:06 [Hackathon minutes] PV block improvements Roger Pau Monné
@ 2013-06-21 17:10 ` Roger Pau Monné
  2013-06-21 18:07   ` Matt Wilson
  2013-06-21 20:16   ` Konrad Rzeszutek Wilk
  0 siblings, 2 replies; 24+ messages in thread
From: Roger Pau Monné @ 2013-06-21 17:10 UTC (permalink / raw)
  To: xen-devel; +Cc: Konrad Rzeszutek Wilk

Hello,

While working on further block improvements I've found an issue with
persistent grants in blkfront.

Persistent grants basically allocate grants and then they are never
released, so both blkfront and blkback keep using the same memory pages
for all the transactions.

This is not a problem in blkback, because we can dynamically choose how
many grants we want to map. On the other hand, blkfront cannot remove
the access to those grants at any point, because blkfront doesn't know
if blkback has this grants mapped persistently or not.

So if for example we start expanding the number of segments in indirect
requests, to a value like 512 segments per requests, blkfront will
probably try to persistently map 512*32+512 = 16896 grants per device,
that's much more grants that the current default, which is 32*256 = 8192
(if using grant tables v2). This can cause serious problems to other
interfaces inside the DomU, since blkfront basically starts hoarding all
possible grants, leaving other interfaces completely locked.

I've been thinking about different ways to solve this, but so far I
haven't been able to found a nice solution:

1. Limit the number of persistent grants a blkfront instance can use,
let's say that only the first X used grants will be persistently mapped
by both blkfront and blkback, and if more grants are needed the previous
map/unmap will be used.

2. Switch to grant copy in blkback, and get rid of persistent grants (I
have not benchmarked this solution, but I'm quite sure it will involve a
performance regression, specially when scaling to a high number of domains).

3. Increase the size of the grant_table or the size of a single grant
(from 4k to 2M) (this is from Stefano Stabellini).

4. Introduce a new request type that we can use to request blkback to
unmap certain grefs so we can free them in blkfront.

So far none of them looks like a suitable solution.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Hackathon minutes] PV block improvements
  2013-06-21 17:10 ` Roger Pau Monné
@ 2013-06-21 18:07   ` Matt Wilson
  2013-06-22  7:11     ` Roger Pau Monné
  2013-06-27 15:12     ` Roger Pau Monné
  2013-06-21 20:16   ` Konrad Rzeszutek Wilk
  1 sibling, 2 replies; 24+ messages in thread
From: Matt Wilson @ 2013-06-21 18:07 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Konrad Rzeszutek Wilk, xen-devel

On Fri, Jun 21, 2013 at 07:10:59PM +0200, Roger Pau Monné wrote:
> Hello,
> 
> While working on further block improvements I've found an issue with
> persistent grants in blkfront.
> 
> Persistent grants basically allocate grants and then they are never
> released, so both blkfront and blkback keep using the same memory pages
> for all the transactions.
> 
> This is not a problem in blkback, because we can dynamically choose how
> many grants we want to map. On the other hand, blkfront cannot remove
> the access to those grants at any point, because blkfront doesn't know
> if blkback has this grants mapped persistently or not.
> 
> So if for example we start expanding the number of segments in indirect
> requests, to a value like 512 segments per requests, blkfront will
> probably try to persistently map 512*32+512 = 16896 grants per device,
> that's much more grants that the current default, which is 32*256 = 8192
> (if using grant tables v2). This can cause serious problems to other
> interfaces inside the DomU, since blkfront basically starts hoarding all
> possible grants, leaving other interfaces completely locked.

Yikes.

> I've been thinking about different ways to solve this, but so far I
> haven't been able to found a nice solution:
> 
> 1. Limit the number of persistent grants a blkfront instance can use,
> let's say that only the first X used grants will be persistently mapped
> by both blkfront and blkback, and if more grants are needed the previous
> map/unmap will be used.

I'm not thrilled with this option. It would likely introduce some
significant performance variability, wouldn't it?

> 2. Switch to grant copy in blkback, and get rid of persistent grants (I
> have not benchmarked this solution, but I'm quite sure it will involve a
> performance regression, specially when scaling to a high number of domains).

Why do you think so?

> 3. Increase the size of the grant_table or the size of a single grant
> (from 4k to 2M) (this is from Stefano Stabellini).

Seems like a bit of a bigger hammer approach.

> 4. Introduce a new request type that we can use to request blkback to
> unmap certain grefs so we can free them in blkfront.

Sounds complex.

> So far none of them looks like a suitable solution.

I agree. Of these, I think #2 is worth a little more attention.

--msw

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Hackathon minutes] PV block improvements
  2013-06-21 17:10 ` Roger Pau Monné
  2013-06-21 18:07   ` Matt Wilson
@ 2013-06-21 20:16   ` Konrad Rzeszutek Wilk
  2013-06-21 23:17     ` Wei Liu
  2013-06-22  7:17     ` Roger Pau Monné
  1 sibling, 2 replies; 24+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-06-21 20:16 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel

On Fri, Jun 21, 2013 at 07:10:59PM +0200, Roger Pau Monné wrote:
> Hello,
> 
> While working on further block improvements I've found an issue with
> persistent grants in blkfront.
> 
> Persistent grants basically allocate grants and then they are never
> released, so both blkfront and blkback keep using the same memory pages
> for all the transactions.
> 
> This is not a problem in blkback, because we can dynamically choose how
> many grants we want to map. On the other hand, blkfront cannot remove
> the access to those grants at any point, because blkfront doesn't know
> if blkback has this grants mapped persistently or not.
> 
> So if for example we start expanding the number of segments in indirect
> requests, to a value like 512 segments per requests, blkfront will
> probably try to persistently map 512*32+512 = 16896 grants per device,
> that's much more grants that the current default, which is 32*256 = 8192
> (if using grant tables v2). This can cause serious problems to other
> interfaces inside the DomU, since blkfront basically starts hoarding all
> possible grants, leaving other interfaces completely locked.
> 
> I've been thinking about different ways to solve this, but so far I
> haven't been able to found a nice solution:
> 
> 1. Limit the number of persistent grants a blkfront instance can use,
> let's say that only the first X used grants will be persistently mapped
> by both blkfront and blkback, and if more grants are needed the previous
> map/unmap will be used.
> 
> 2. Switch to grant copy in blkback, and get rid of persistent grants (I
> have not benchmarked this solution, but I'm quite sure it will involve a
> performance regression, specially when scaling to a high number of domains).
> 
> 3. Increase the size of the grant_table or the size of a single grant
> (from 4k to 2M) (this is from Stefano Stabellini).
> 
> 4. Introduce a new request type that we can use to request blkback to
> unmap certain grefs so we can free them in blkfront.


5). Lift the limit of grant pages a domain can have.

6). Have an outstanding of grant pools that are mapped to a guest and
recycle them? That way both netfront and blkfront could use them as needed?

> 
> So far none of them looks like a suitable solution.
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Hackathon minutes] PV block improvements
  2013-06-21 20:16   ` Konrad Rzeszutek Wilk
@ 2013-06-21 23:17     ` Wei Liu
  2013-06-24 11:06       ` Stefano Stabellini
  2013-06-22  7:17     ` Roger Pau Monné
  1 sibling, 1 reply; 24+ messages in thread
From: Wei Liu @ 2013-06-21 23:17 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel, wei.liu2, Roger Pau Monné

On Fri, Jun 21, 2013 at 04:16:25PM -0400, Konrad Rzeszutek Wilk wrote:
> On Fri, Jun 21, 2013 at 07:10:59PM +0200, Roger Pau Monné wrote:
> > Hello,
> > 
> > While working on further block improvements I've found an issue with
> > persistent grants in blkfront.
> > 
> > Persistent grants basically allocate grants and then they are never
> > released, so both blkfront and blkback keep using the same memory pages
> > for all the transactions.
> > 
> > This is not a problem in blkback, because we can dynamically choose how
> > many grants we want to map. On the other hand, blkfront cannot remove
> > the access to those grants at any point, because blkfront doesn't know
> > if blkback has this grants mapped persistently or not.
> > 
> > So if for example we start expanding the number of segments in indirect
> > requests, to a value like 512 segments per requests, blkfront will
> > probably try to persistently map 512*32+512 = 16896 grants per device,
> > that's much more grants that the current default, which is 32*256 = 8192
> > (if using grant tables v2). This can cause serious problems to other
> > interfaces inside the DomU, since blkfront basically starts hoarding all
> > possible grants, leaving other interfaces completely locked.
> > 
> > I've been thinking about different ways to solve this, but so far I
> > haven't been able to found a nice solution:
> > 
> > 1. Limit the number of persistent grants a blkfront instance can use,
> > let's say that only the first X used grants will be persistently mapped
> > by both blkfront and blkback, and if more grants are needed the previous
> > map/unmap will be used.
> > 
> > 2. Switch to grant copy in blkback, and get rid of persistent grants (I
> > have not benchmarked this solution, but I'm quite sure it will involve a
> > performance regression, specially when scaling to a high number of domains).
> > 

Any chance that the speed of copying is fast enough for block devices?

> > 3. Increase the size of the grant_table or the size of a single grant
> > (from 4k to 2M) (this is from Stefano Stabellini).
> > 
> > 4. Introduce a new request type that we can use to request blkback to
> > unmap certain grefs so we can free them in blkfront.
> 
> 
> 5). Lift the limit of grant pages a domain can have.

If I'm not mistaken, this is basically the same as "increase the size of
the grant_table" in #3.

> 
> 6). Have an outstanding of grant pools that are mapped to a guest and
> recycle them? That way both netfront and blkfront could use them as needed?
> 

Is there an easy way to instrument the network stack to use those pages
only?


Wei.

> > 
> > So far none of them looks like a suitable solution.
> > 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Hackathon minutes] PV block improvements
  2013-06-21 18:07   ` Matt Wilson
@ 2013-06-22  7:11     ` Roger Pau Monné
  2013-06-25  6:09       ` Matt Wilson
                         ` (2 more replies)
  2013-06-27 15:12     ` Roger Pau Monné
  1 sibling, 3 replies; 24+ messages in thread
From: Roger Pau Monné @ 2013-06-22  7:11 UTC (permalink / raw)
  To: Matt Wilson; +Cc: Konrad Rzeszutek Wilk, xen-devel

On 21/06/13 20:07, Matt Wilson wrote:
> On Fri, Jun 21, 2013 at 07:10:59PM +0200, Roger Pau Monné wrote:
>> Hello,
>>
>> While working on further block improvements I've found an issue with
>> persistent grants in blkfront.
>>
>> Persistent grants basically allocate grants and then they are never
>> released, so both blkfront and blkback keep using the same memory pages
>> for all the transactions.
>>
>> This is not a problem in blkback, because we can dynamically choose how
>> many grants we want to map. On the other hand, blkfront cannot remove
>> the access to those grants at any point, because blkfront doesn't know
>> if blkback has this grants mapped persistently or not.
>>
>> So if for example we start expanding the number of segments in indirect
>> requests, to a value like 512 segments per requests, blkfront will
>> probably try to persistently map 512*32+512 = 16896 grants per device,
>> that's much more grants that the current default, which is 32*256 = 8192
>> (if using grant tables v2). This can cause serious problems to other
>> interfaces inside the DomU, since blkfront basically starts hoarding all
>> possible grants, leaving other interfaces completely locked.
> 
> Yikes.
> 
>> I've been thinking about different ways to solve this, but so far I
>> haven't been able to found a nice solution:
>>
>> 1. Limit the number of persistent grants a blkfront instance can use,
>> let's say that only the first X used grants will be persistently mapped
>> by both blkfront and blkback, and if more grants are needed the previous
>> map/unmap will be used.
> 
> I'm not thrilled with this option. It would likely introduce some
> significant performance variability, wouldn't it?

Probably, and also it will be hard to distribute the number of available
grant across the different interfaces in a performance sensible way,
specially given the fact that once a grant is assigned to a interface it
cannot be returned back to the pool of grants.

So if we had two interfaces with very different usage (one very busy and
another one almost idle), and equally distribute the grants amongst
them, one will have a lot of unused grants while the other will suffer
from starvation.

> 
>> 2. Switch to grant copy in blkback, and get rid of persistent grants (I
>> have not benchmarked this solution, but I'm quite sure it will involve a
>> performance regression, specially when scaling to a high number of domains).
> 
> Why do you think so?

First because grant_copy is done by the hypervisor, while when using
persistent grants the copy is done by the guest. Also, grant_copy takes
the grant lock, so when scaling to a large number of domains there's
going to be contention around this lock. Persistent grants don't need
any shared lock, and thus scale better.

> 
>> 3. Increase the size of the grant_table or the size of a single grant
>> (from 4k to 2M) (this is from Stefano Stabellini).
> 
> Seems like a bit of a bigger hammer approach.
> 
>> 4. Introduce a new request type that we can use to request blkback to
>> unmap certain grefs so we can free them in blkfront.
> 
> Sounds complex.
> 
>> So far none of them looks like a suitable solution.
> 
> I agree. Of these, I think #2 is worth a little more attention.
> 
> --msw
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Hackathon minutes] PV block improvements
  2013-06-21 20:16   ` Konrad Rzeszutek Wilk
  2013-06-21 23:17     ` Wei Liu
@ 2013-06-22  7:17     ` Roger Pau Monné
  1 sibling, 0 replies; 24+ messages in thread
From: Roger Pau Monné @ 2013-06-22  7:17 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel

On 21/06/13 22:16, Konrad Rzeszutek Wilk wrote:
> On Fri, Jun 21, 2013 at 07:10:59PM +0200, Roger Pau Monné wrote:
>> Hello,
>>
>> While working on further block improvements I've found an issue with
>> persistent grants in blkfront.
>>
>> Persistent grants basically allocate grants and then they are never
>> released, so both blkfront and blkback keep using the same memory pages
>> for all the transactions.
>>
>> This is not a problem in blkback, because we can dynamically choose how
>> many grants we want to map. On the other hand, blkfront cannot remove
>> the access to those grants at any point, because blkfront doesn't know
>> if blkback has this grants mapped persistently or not.
>>
>> So if for example we start expanding the number of segments in indirect
>> requests, to a value like 512 segments per requests, blkfront will
>> probably try to persistently map 512*32+512 = 16896 grants per device,
>> that's much more grants that the current default, which is 32*256 = 8192
>> (if using grant tables v2). This can cause serious problems to other
>> interfaces inside the DomU, since blkfront basically starts hoarding all
>> possible grants, leaving other interfaces completely locked.
>>
>> I've been thinking about different ways to solve this, but so far I
>> haven't been able to found a nice solution:
>>
>> 1. Limit the number of persistent grants a blkfront instance can use,
>> let's say that only the first X used grants will be persistently mapped
>> by both blkfront and blkback, and if more grants are needed the previous
>> map/unmap will be used.
>>
>> 2. Switch to grant copy in blkback, and get rid of persistent grants (I
>> have not benchmarked this solution, but I'm quite sure it will involve a
>> performance regression, specially when scaling to a high number of domains).
>>
>> 3. Increase the size of the grant_table or the size of a single grant
>> (from 4k to 2M) (this is from Stefano Stabellini).
>>
>> 4. Introduce a new request type that we can use to request blkback to
>> unmap certain grefs so we can free them in blkfront.
> 
> 
> 5). Lift the limit of grant pages a domain can have.
> 
> 6). Have an outstanding of grant pools that are mapped to a guest and
> recycle them? That way both netfront and blkfront could use them as needed?

If all the backends run in the same guest that could be a viable option,
but if we have backends running in different domains we will end up with
several different pools for each backend domain, and thus the scenario
is going to be quite similar to what we have now (a pool can hoard all
available grants and leave the others starving).

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Hackathon minutes] PV block improvements
  2013-06-21 23:17     ` Wei Liu
@ 2013-06-24 11:06       ` Stefano Stabellini
  2013-07-02 11:49         ` Roger Pau Monné
  0 siblings, 1 reply; 24+ messages in thread
From: Stefano Stabellini @ 2013-06-24 11:06 UTC (permalink / raw)
  To: Wei Liu; +Cc: xen-devel, Roger Pau Monné, Konrad Rzeszutek Wilk

[-- Attachment #1: Type: text/plain, Size: 2772 bytes --]

On Sat, 22 Jun 2013, Wei Liu wrote:
> On Fri, Jun 21, 2013 at 04:16:25PM -0400, Konrad Rzeszutek Wilk wrote:
> > On Fri, Jun 21, 2013 at 07:10:59PM +0200, Roger Pau Monné wrote:
> > > Hello,
> > > 
> > > While working on further block improvements I've found an issue with
> > > persistent grants in blkfront.
> > > 
> > > Persistent grants basically allocate grants and then they are never
> > > released, so both blkfront and blkback keep using the same memory pages
> > > for all the transactions.
> > > 
> > > This is not a problem in blkback, because we can dynamically choose how
> > > many grants we want to map. On the other hand, blkfront cannot remove
> > > the access to those grants at any point, because blkfront doesn't know
> > > if blkback has this grants mapped persistently or not.
> > > 
> > > So if for example we start expanding the number of segments in indirect
> > > requests, to a value like 512 segments per requests, blkfront will
> > > probably try to persistently map 512*32+512 = 16896 grants per device,
> > > that's much more grants that the current default, which is 32*256 = 8192
> > > (if using grant tables v2). This can cause serious problems to other
> > > interfaces inside the DomU, since blkfront basically starts hoarding all
> > > possible grants, leaving other interfaces completely locked.
> > > 
> > > I've been thinking about different ways to solve this, but so far I
> > > haven't been able to found a nice solution:
> > > 
> > > 1. Limit the number of persistent grants a blkfront instance can use,
> > > let's say that only the first X used grants will be persistently mapped
> > > by both blkfront and blkback, and if more grants are needed the previous
> > > map/unmap will be used.
> > > 
> > > 2. Switch to grant copy in blkback, and get rid of persistent grants (I
> > > have not benchmarked this solution, but I'm quite sure it will involve a
> > > performance regression, specially when scaling to a high number of domains).
> > > 
> 
> Any chance that the speed of copying is fast enough for block devices?
> 
> > > 3. Increase the size of the grant_table or the size of a single grant
> > > (from 4k to 2M) (this is from Stefano Stabellini).
> > > 
> > > 4. Introduce a new request type that we can use to request blkback to
> > > unmap certain grefs so we can free them in blkfront.
> > 
> > 
> > 5). Lift the limit of grant pages a domain can have.
> 
> If I'm not mistaken, this is basically the same as "increase the size of
> the grant_table" in #3.

Yes, that was one of the things I was suggesting, but it needs
investigating: I wouldn't want that increasing the number of grant
frames would reach a different scalability limit of the data structure.

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Hackathon minutes] PV block improvements
  2013-06-22  7:11     ` Roger Pau Monné
@ 2013-06-25  6:09       ` Matt Wilson
  2013-06-25 13:01         ` Wei Liu
  2013-06-25 15:53       ` Ian Campbell
  2013-06-25 15:57       ` Ian Campbell
  2 siblings, 1 reply; 24+ messages in thread
From: Matt Wilson @ 2013-06-25  6:09 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Konrad Rzeszutek Wilk, xen-devel

On Sat, Jun 22, 2013 at 09:11:20AM +0200, Roger Pau Monné wrote:
> On 21/06/13 20:07, Matt Wilson wrote:
> > On Fri, Jun 21, 2013 at 07:10:59PM +0200, Roger Pau Monné wrote:

[...]

> >> 2. Switch to grant copy in blkback, and get rid of persistent grants (I
> >> have not benchmarked this solution, but I'm quite sure it will involve a
> >> performance regression, specially when scaling to a high number of domains).
> > 
> > Why do you think so?
> 
> First because grant_copy is done by the hypervisor, while when using
> persistent grants the copy is done by the guest. Also, grant_copy takes
> the grant lock, so when scaling to a large number of domains there's
> going to be contention around this lock. Persistent grants don't need
> any shared lock, and thus scale better.

It'd benefit xen-netback to make the locking in the copy path more
fine grained. That would help multi-vif domUs today, and multi-queue
vifs later on.

Thoughts?

--msw

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Hackathon minutes] PV block improvements
  2013-06-25  6:09       ` Matt Wilson
@ 2013-06-25 13:01         ` Wei Liu
  2013-06-25 15:39           ` Matt Wilson
  0 siblings, 1 reply; 24+ messages in thread
From: Wei Liu @ 2013-06-25 13:01 UTC (permalink / raw)
  To: Matt Wilson
  Cc: wei.liu2, Konrad Rzeszutek Wilk, xen-devel, Roger Pau Monné

On Mon, Jun 24, 2013 at 11:09:19PM -0700, Matt Wilson wrote:
> On Sat, Jun 22, 2013 at 09:11:20AM +0200, Roger Pau Monné wrote:
> > On 21/06/13 20:07, Matt Wilson wrote:
> > > On Fri, Jun 21, 2013 at 07:10:59PM +0200, Roger Pau Monné wrote:
> 
> [...]
> 
> > >> 2. Switch to grant copy in blkback, and get rid of persistent grants (I
> > >> have not benchmarked this solution, but I'm quite sure it will involve a
> > >> performance regression, specially when scaling to a high number of domains).
> > > 
> > > Why do you think so?
> > 
> > First because grant_copy is done by the hypervisor, while when using
> > persistent grants the copy is done by the guest. Also, grant_copy takes
> > the grant lock, so when scaling to a large number of domains there's
> > going to be contention around this lock. Persistent grants don't need
> > any shared lock, and thus scale better.
> 
> It'd benefit xen-netback to make the locking in the copy path more
> fine grained. That would help multi-vif domUs today, and multi-queue
> vifs later on.
> 

I'm not sure I follow. I presume you mean using persistent grant in
xen-netback to help scale better?


Wei.

> Thoughts?
> 
> --msw
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Hackathon minutes] PV block improvements
  2013-06-25 13:01         ` Wei Liu
@ 2013-06-25 15:39           ` Matt Wilson
  0 siblings, 0 replies; 24+ messages in thread
From: Matt Wilson @ 2013-06-25 15:39 UTC (permalink / raw)
  To: Wei Liu; +Cc: Konrad Rzeszutek Wilk, xen-devel, Roger Pau Monné

On Tue, Jun 25, 2013 at 02:01:30PM +0100, Wei Liu wrote:
> On Mon, Jun 24, 2013 at 11:09:19PM -0700, Matt Wilson wrote:
> > On Sat, Jun 22, 2013 at 09:11:20AM +0200, Roger Pau Monné wrote:
> > > On 21/06/13 20:07, Matt Wilson wrote:
> > > > On Fri, Jun 21, 2013 at 07:10:59PM +0200, Roger Pau Monné wrote:
> > 
> > [...]
> > 
> > > >> 2. Switch to grant copy in blkback, and get rid of persistent grants (I
> > > >> have not benchmarked this solution, but I'm quite sure it will involve a
> > > >> performance regression, specially when scaling to a high number of domains).
> > > > 
> > > > Why do you think so?
> > > 
> > > First because grant_copy is done by the hypervisor, while when using
> > > persistent grants the copy is done by the guest. Also, grant_copy takes
> > > the grant lock, so when scaling to a large number of domains there's
> > > going to be contention around this lock. Persistent grants don't need
> > > any shared lock, and thus scale better.
> > 
> > It'd benefit xen-netback to make the locking in the copy path more
> > fine grained. That would help multi-vif domUs today, and multi-queue
> > vifs later on.
> > 
> 
> I'm not sure I follow. I presume you mean using persistent grant in
> xen-netback to help scale better?

No, I mean further scaling improvements in the GNTTABOP_copy path
would benefit xen-netback performance when a single guest has multiple
vifs, and will be needed for good multi-queue performance. Given we
might need to do some work there, would it make sense to change
blkback to use GNTTABOP_copy to avoid the problem he's identified with
persistent grants.

--msw

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Hackathon minutes] PV block improvements
  2013-06-22  7:11     ` Roger Pau Monné
  2013-06-25  6:09       ` Matt Wilson
@ 2013-06-25 15:53       ` Ian Campbell
  2013-06-25 18:04         ` Stefano Stabellini
  2013-06-25 15:57       ` Ian Campbell
  2 siblings, 1 reply; 24+ messages in thread
From: Ian Campbell @ 2013-06-25 15:53 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel, Matt Wilson, Konrad Rzeszutek Wilk

On Sat, 2013-06-22 at 09:11 +0200, Roger Pau Monné wrote:
> On 21/06/13 20:07, Matt Wilson wrote:
> > On Fri, Jun 21, 2013 at 07:10:59PM +0200, Roger Pau Monné wrote:
> >> Hello,
> >>
> >> While working on further block improvements I've found an issue with
> >> persistent grants in blkfront.
> >>
> >> Persistent grants basically allocate grants and then they are never
> >> released, so both blkfront and blkback keep using the same memory pages
> >> for all the transactions.
> >>
> >> This is not a problem in blkback, because we can dynamically choose how
> >> many grants we want to map. On the other hand, blkfront cannot remove
> >> the access to those grants at any point, because blkfront doesn't know
> >> if blkback has this grants mapped persistently or not.
> >>
> >> So if for example we start expanding the number of segments in indirect
> >> requests, to a value like 512 segments per requests, blkfront will
> >> probably try to persistently map 512*32+512 = 16896 grants per device,
> >> that's much more grants that the current default, which is 32*256 = 8192
> >> (if using grant tables v2). This can cause serious problems to other
> >> interfaces inside the DomU, since blkfront basically starts hoarding all
> >> possible grants, leaving other interfaces completely locked.
> > 
> > Yikes.
> > 
> >> I've been thinking about different ways to solve this, but so far I
> >> haven't been able to found a nice solution:
> >>
> >> 1. Limit the number of persistent grants a blkfront instance can use,
> >> let's say that only the first X used grants will be persistently mapped
> >> by both blkfront and blkback, and if more grants are needed the previous
> >> map/unmap will be used.
> > 
> > I'm not thrilled with this option. It would likely introduce some
> > significant performance variability, wouldn't it?
> 
> Probably, and also it will be hard to distribute the number of available
> grant across the different interfaces in a performance sensible way,
> specially given the fact that once a grant is assigned to a interface it
> cannot be returned back to the pool of grants.
> 
> So if we had two interfaces with very different usage (one very busy and
> another one almost idle), and equally distribute the grants amongst
> them, one will have a lot of unused grants while the other will suffer
> from starvation.

I do think we need to implement some sort of reclaim scheme, which
probably does mean a specific request (per your #4). We simply can't
have a device which once upon a time had high throughput but is no
mostly ideal continue to tie up all those grants.

If you make the reuse of grants use an MRU scheme and reclaim the
currently unused tail fairly infrequently and in large batches then the
perf overhead should be minimal, I think.

I also don't think I would discount the idea of using ephemeral grants
to cover bursts so easily either, in fact it might fall out quite
naturally from an MRU scheme? In that scheme bursting up is pretty cheap
since grant map is relative inexpensive, and recovering from the burst
shouldn't be too expensive if you batch it. If it turns out to be not a
burst but a sustained level of I/O then the MRU scheme would mean you
wouldn't be recovering them.

I also think there probably needs to be some tunable per device limit on
the maximum persistent grants, perhaps minimum and maximum pool sizes
ties in with an MRU scheme? If nothing else it gives the admin the
ability to prioritise devices.

Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Hackathon minutes] PV block improvements
  2013-06-22  7:11     ` Roger Pau Monné
  2013-06-25  6:09       ` Matt Wilson
  2013-06-25 15:53       ` Ian Campbell
@ 2013-06-25 15:57       ` Ian Campbell
  2013-06-25 16:05         ` Jan Beulich
  2013-06-25 16:30         ` Roger Pau Monné
  2 siblings, 2 replies; 24+ messages in thread
From: Ian Campbell @ 2013-06-25 15:57 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, Jan Beulich, Matt Wilson, Konrad Rzeszutek Wilk

On Sat, 2013-06-22 at 09:11 +0200, Roger Pau Monné wrote:
> First because grant_copy is done by the hypervisor, while when using
> persistent grants the copy is done by the guest.

This is true and a reasonable concern.

> Also, grant_copy takes
> the grant lock, so when scaling to a large number of domains there's
> going to be contention around this lock.

Does grant copy really take the lock for the duration of the copy,
preventing any other grant ops from the source and/or target domain?

If true then that sounds like an area which is ripe for optimisation!

However I am hopeful that you are mistaken... __acquire_grant_for_copy()
takes the grant lock while it pins the entry into the active grant entry
list and not for the actual duration of the copy (and likewise
__release_grant_for-copy). I hope Jan can confirm this!

Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Hackathon minutes] PV block improvements
  2013-06-25 15:57       ` Ian Campbell
@ 2013-06-25 16:05         ` Jan Beulich
  2013-06-25 16:30         ` Roger Pau Monné
  1 sibling, 0 replies; 24+ messages in thread
From: Jan Beulich @ 2013-06-25 16:05 UTC (permalink / raw)
  To: Ian Campbell, roger.pau; +Cc: Konrad Rzeszutek Wilk, Matt Wilson, xen-devel

>>> On 25.06.13 at 17:57, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> On Sat, 2013-06-22 at 09:11 +0200, Roger Pau Monné wrote:
>> Also, grant_copy takes
>> the grant lock, so when scaling to a large number of domains there's
>> going to be contention around this lock.
> 
> Does grant copy really take the lock for the duration of the copy,
> preventing any other grant ops from the source and/or target domain?
> 
> If true then that sounds like an area which is ripe for optimisation!
> 
> However I am hopeful that you are mistaken... __acquire_grant_for_copy()
> takes the grant lock while it pins the entry into the active grant entry
> list and not for the actual duration of the copy (and likewise
> __release_grant_for-copy). I hope Jan can confirm this!

Yes, that's how I recall it works since the removal of the per-domain
lock uses from those paths.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Hackathon minutes] PV block improvements
  2013-06-25 15:57       ` Ian Campbell
  2013-06-25 16:05         ` Jan Beulich
@ 2013-06-25 16:30         ` Roger Pau Monné
  1 sibling, 0 replies; 24+ messages in thread
From: Roger Pau Monné @ 2013-06-25 16:30 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel, Jan Beulich, Matt Wilson, Konrad Rzeszutek Wilk

On 25/06/13 17:57, Ian Campbell wrote:
> On Sat, 2013-06-22 at 09:11 +0200, Roger Pau Monné wrote:
>> First because grant_copy is done by the hypervisor, while when using
>> persistent grants the copy is done by the guest.
> 
> This is true and a reasonable concern.
> 
>> Also, grant_copy takes
>> the grant lock, so when scaling to a large number of domains there's
>> going to be contention around this lock.
> 
> Does grant copy really take the lock for the duration of the copy,
> preventing any other grant ops from the source and/or target domain?
> 
> If true then that sounds like an area which is ripe for optimisation!
> 
> However I am hopeful that you are mistaken... __acquire_grant_for_copy()
> takes the grant lock while it pins the entry into the active grant entry
> list and not for the actual duration of the copy (and likewise
> __release_grant_for-copy). I hope Jan can confirm this!

Sorry, probably I haven't detailed enough here. I didn't want to mean it
takes the lock for the duration of the whole copy, but it is used at
some places during the grant copy operation so it might introduce
contention when the number of domains is high (although I have not
measured it).

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Hackathon minutes] PV block improvements
  2013-06-25 15:53       ` Ian Campbell
@ 2013-06-25 18:04         ` Stefano Stabellini
  2013-06-26  9:37           ` George Dunlap
  0 siblings, 1 reply; 24+ messages in thread
From: Stefano Stabellini @ 2013-06-25 18:04 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Konrad Rzeszutek Wilk, xen-devel, Matt Wilson,
	Roger Pau Monné

[-- Attachment #1: Type: text/plain, Size: 4226 bytes --]

On Tue, 25 Jun 2013, Ian Campbell wrote:
> On Sat, 2013-06-22 at 09:11 +0200, Roger Pau Monné wrote:
> > On 21/06/13 20:07, Matt Wilson wrote:
> > > On Fri, Jun 21, 2013 at 07:10:59PM +0200, Roger Pau Monné wrote:
> > >> Hello,
> > >>
> > >> While working on further block improvements I've found an issue with
> > >> persistent grants in blkfront.
> > >>
> > >> Persistent grants basically allocate grants and then they are never
> > >> released, so both blkfront and blkback keep using the same memory pages
> > >> for all the transactions.
> > >>
> > >> This is not a problem in blkback, because we can dynamically choose how
> > >> many grants we want to map. On the other hand, blkfront cannot remove
> > >> the access to those grants at any point, because blkfront doesn't know
> > >> if blkback has this grants mapped persistently or not.
> > >>
> > >> So if for example we start expanding the number of segments in indirect
> > >> requests, to a value like 512 segments per requests, blkfront will
> > >> probably try to persistently map 512*32+512 = 16896 grants per device,
> > >> that's much more grants that the current default, which is 32*256 = 8192
> > >> (if using grant tables v2). This can cause serious problems to other
> > >> interfaces inside the DomU, since blkfront basically starts hoarding all
> > >> possible grants, leaving other interfaces completely locked.
> > > 
> > > Yikes.
> > > 
> > >> I've been thinking about different ways to solve this, but so far I
> > >> haven't been able to found a nice solution:
> > >>
> > >> 1. Limit the number of persistent grants a blkfront instance can use,
> > >> let's say that only the first X used grants will be persistently mapped
> > >> by both blkfront and blkback, and if more grants are needed the previous
> > >> map/unmap will be used.
> > > 
> > > I'm not thrilled with this option. It would likely introduce some
> > > significant performance variability, wouldn't it?
> > 
> > Probably, and also it will be hard to distribute the number of available
> > grant across the different interfaces in a performance sensible way,
> > specially given the fact that once a grant is assigned to a interface it
> > cannot be returned back to the pool of grants.
> > 
> > So if we had two interfaces with very different usage (one very busy and
> > another one almost idle), and equally distribute the grants amongst
> > them, one will have a lot of unused grants while the other will suffer
> > from starvation.
> 
> I do think we need to implement some sort of reclaim scheme, which
> probably does mean a specific request (per your #4). We simply can't
> have a device which once upon a time had high throughput but is no
> mostly ideal continue to tie up all those grants.
> 
> If you make the reuse of grants use an MRU scheme and reclaim the
> currently unused tail fairly infrequently and in large batches then the
> perf overhead should be minimal, I think.
> 
> I also don't think I would discount the idea of using ephemeral grants
> to cover bursts so easily either, in fact it might fall out quite
> naturally from an MRU scheme? In that scheme bursting up is pretty cheap
> since grant map is relative inexpensive, and recovering from the burst
> shouldn't be too expensive if you batch it. If it turns out to be not a
> burst but a sustained level of I/O then the MRU scheme would mean you
> wouldn't be recovering them.
> 
> I also think there probably needs to be some tunable per device limit on
> the maximum persistent grants, perhaps minimum and maximum pool sizes
> ties in with an MRU scheme? If nothing else it gives the admin the
> ability to prioritise devices.

If we introduce a reclaim call we have to be careful not to fall back
to a map/unmap scheme like we had before.

The way I see it either these additional grants are useful or not.
In the first case we could just limit the maximum amount of persistent
grants and be done with it.
If they are not useful (they have been allocated for one very large
request and not used much after that), could we find a way to identify
unusually large requests and avoid using persistent grants for those?

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Hackathon minutes] PV block improvements
  2013-06-25 18:04         ` Stefano Stabellini
@ 2013-06-26  9:37           ` George Dunlap
  2013-06-26 11:37             ` Ian Campbell
  0 siblings, 1 reply; 24+ messages in thread
From: George Dunlap @ 2013-06-26  9:37 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Roger Pau Monné, xen-devel, Ian Campbell, Matt Wilson,
	Konrad Rzeszutek Wilk

On Tue, Jun 25, 2013 at 7:04 PM, Stefano Stabellini
<stefano.stabellini@eu.citrix.com> wrote:
> On Tue, 25 Jun 2013, Ian Campbell wrote:
>> On Sat, 2013-06-22 at 09:11 +0200, Roger Pau Monné wrote:
>> > On 21/06/13 20:07, Matt Wilson wrote:
>> > > On Fri, Jun 21, 2013 at 07:10:59PM +0200, Roger Pau Monné wrote:
>> > >> Hello,
>> > >>
>> > >> While working on further block improvements I've found an issue with
>> > >> persistent grants in blkfront.
>> > >>
>> > >> Persistent grants basically allocate grants and then they are never
>> > >> released, so both blkfront and blkback keep using the same memory pages
>> > >> for all the transactions.
>> > >>
>> > >> This is not a problem in blkback, because we can dynamically choose how
>> > >> many grants we want to map. On the other hand, blkfront cannot remove
>> > >> the access to those grants at any point, because blkfront doesn't know
>> > >> if blkback has this grants mapped persistently or not.
>> > >>
>> > >> So if for example we start expanding the number of segments in indirect
>> > >> requests, to a value like 512 segments per requests, blkfront will
>> > >> probably try to persistently map 512*32+512 = 16896 grants per device,
>> > >> that's much more grants that the current default, which is 32*256 = 8192
>> > >> (if using grant tables v2). This can cause serious problems to other
>> > >> interfaces inside the DomU, since blkfront basically starts hoarding all
>> > >> possible grants, leaving other interfaces completely locked.
>> > >
>> > > Yikes.
>> > >
>> > >> I've been thinking about different ways to solve this, but so far I
>> > >> haven't been able to found a nice solution:
>> > >>
>> > >> 1. Limit the number of persistent grants a blkfront instance can use,
>> > >> let's say that only the first X used grants will be persistently mapped
>> > >> by both blkfront and blkback, and if more grants are needed the previous
>> > >> map/unmap will be used.
>> > >
>> > > I'm not thrilled with this option. It would likely introduce some
>> > > significant performance variability, wouldn't it?
>> >
>> > Probably, and also it will be hard to distribute the number of available
>> > grant across the different interfaces in a performance sensible way,
>> > specially given the fact that once a grant is assigned to a interface it
>> > cannot be returned back to the pool of grants.
>> >
>> > So if we had two interfaces with very different usage (one very busy and
>> > another one almost idle), and equally distribute the grants amongst
>> > them, one will have a lot of unused grants while the other will suffer
>> > from starvation.
>>
>> I do think we need to implement some sort of reclaim scheme, which
>> probably does mean a specific request (per your #4). We simply can't
>> have a device which once upon a time had high throughput but is no
>> mostly ideal continue to tie up all those grants.
>>
>> If you make the reuse of grants use an MRU scheme and reclaim the
>> currently unused tail fairly infrequently and in large batches then the
>> perf overhead should be minimal, I think.
>>
>> I also don't think I would discount the idea of using ephemeral grants
>> to cover bursts so easily either, in fact it might fall out quite
>> naturally from an MRU scheme? In that scheme bursting up is pretty cheap
>> since grant map is relative inexpensive, and recovering from the burst
>> shouldn't be too expensive if you batch it. If it turns out to be not a
>> burst but a sustained level of I/O then the MRU scheme would mean you
>> wouldn't be recovering them.
>>
>> I also think there probably needs to be some tunable per device limit on
>> the maximum persistent grants, perhaps minimum and maximum pool sizes
>> ties in with an MRU scheme? If nothing else it gives the admin the
>> ability to prioritise devices.
>
> If we introduce a reclaim call we have to be careful not to fall back
> to a map/unmap scheme like we had before.
>
> The way I see it either these additional grants are useful or not.
> In the first case we could just limit the maximum amount of persistent
> grants and be done with it.
> If they are not useful (they have been allocated for one very large
> request and not used much after that), could we find a way to identify
> unusually large requests and avoid using persistent grants for those?

Isn't it possible that these grants are useful for some periods of
time, but not for others?  You wouldn't say, "Caching the disk data in
main memory is either useful or not; if it is not useful (if it was
allocated for one very large request and not used much after that), we
should find a way to identify unusually large requests and avoid
caching it."  If you're playing a movie, sure; but in most cases, the
cache was useful for a time, then stopped being useful.  Treating the
persistent grants the same way makes sense to me.

 -George

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Hackathon minutes] PV block improvements
  2013-06-26  9:37           ` George Dunlap
@ 2013-06-26 11:37             ` Ian Campbell
  2013-06-27 13:58               ` George Dunlap
  0 siblings, 1 reply; 24+ messages in thread
From: Ian Campbell @ 2013-06-26 11:37 UTC (permalink / raw)
  To: George Dunlap
  Cc: Konrad Rzeszutek Wilk, xen-devel, Roger Pau Monné,
	Matt Wilson, Stefano Stabellini

On Wed, 2013-06-26 at 10:37 +0100, George Dunlap wrote:
> On Tue, Jun 25, 2013 at 7:04 PM, Stefano Stabellini
> <stefano.stabellini@eu.citrix.com> wrote:
> > On Tue, 25 Jun 2013, Ian Campbell wrote:
> >> On Sat, 2013-06-22 at 09:11 +0200, Roger Pau Monné wrote:
> >> > On 21/06/13 20:07, Matt Wilson wrote:
> >> > > On Fri, Jun 21, 2013 at 07:10:59PM +0200, Roger Pau Monné wrote:
> >> > >> Hello,
> >> > >>
> >> > >> While working on further block improvements I've found an issue with
> >> > >> persistent grants in blkfront.
> >> > >>
> >> > >> Persistent grants basically allocate grants and then they are never
> >> > >> released, so both blkfront and blkback keep using the same memory pages
> >> > >> for all the transactions.
> >> > >>
> >> > >> This is not a problem in blkback, because we can dynamically choose how
> >> > >> many grants we want to map. On the other hand, blkfront cannot remove
> >> > >> the access to those grants at any point, because blkfront doesn't know
> >> > >> if blkback has this grants mapped persistently or not.
> >> > >>
> >> > >> So if for example we start expanding the number of segments in indirect
> >> > >> requests, to a value like 512 segments per requests, blkfront will
> >> > >> probably try to persistently map 512*32+512 = 16896 grants per device,
> >> > >> that's much more grants that the current default, which is 32*256 = 8192
> >> > >> (if using grant tables v2). This can cause serious problems to other
> >> > >> interfaces inside the DomU, since blkfront basically starts hoarding all
> >> > >> possible grants, leaving other interfaces completely locked.
> >> > >
> >> > > Yikes.
> >> > >
> >> > >> I've been thinking about different ways to solve this, but so far I
> >> > >> haven't been able to found a nice solution:
> >> > >>
> >> > >> 1. Limit the number of persistent grants a blkfront instance can use,
> >> > >> let's say that only the first X used grants will be persistently mapped
> >> > >> by both blkfront and blkback, and if more grants are needed the previous
> >> > >> map/unmap will be used.
> >> > >
> >> > > I'm not thrilled with this option. It would likely introduce some
> >> > > significant performance variability, wouldn't it?
> >> >
> >> > Probably, and also it will be hard to distribute the number of available
> >> > grant across the different interfaces in a performance sensible way,
> >> > specially given the fact that once a grant is assigned to a interface it
> >> > cannot be returned back to the pool of grants.
> >> >
> >> > So if we had two interfaces with very different usage (one very busy and
> >> > another one almost idle), and equally distribute the grants amongst
> >> > them, one will have a lot of unused grants while the other will suffer
> >> > from starvation.
> >>
> >> I do think we need to implement some sort of reclaim scheme, which
> >> probably does mean a specific request (per your #4). We simply can't
> >> have a device which once upon a time had high throughput but is no
> >> mostly ideal continue to tie up all those grants.
> >>
> >> If you make the reuse of grants use an MRU scheme and reclaim the
> >> currently unused tail fairly infrequently and in large batches then the
> >> perf overhead should be minimal, I think.
> >>
> >> I also don't think I would discount the idea of using ephemeral grants
> >> to cover bursts so easily either, in fact it might fall out quite
> >> naturally from an MRU scheme? In that scheme bursting up is pretty cheap
> >> since grant map is relative inexpensive, and recovering from the burst
> >> shouldn't be too expensive if you batch it. If it turns out to be not a
> >> burst but a sustained level of I/O then the MRU scheme would mean you
> >> wouldn't be recovering them.
> >>
> >> I also think there probably needs to be some tunable per device limit on
> >> the maximum persistent grants, perhaps minimum and maximum pool sizes
> >> ties in with an MRU scheme? If nothing else it gives the admin the
> >> ability to prioritise devices.
> >
> > If we introduce a reclaim call we have to be careful not to fall back
> > to a map/unmap scheme like we had before.
> >
> > The way I see it either these additional grants are useful or not.
> > In the first case we could just limit the maximum amount of persistent
> > grants and be done with it.
> > If they are not useful (they have been allocated for one very large
> > request and not used much after that), could we find a way to identify
> > unusually large requests and avoid using persistent grants for those?
> 
> Isn't it possible that these grants are useful for some periods of
> time, but not for others?  You wouldn't say, "Caching the disk data in
> main memory is either useful or not; if it is not useful (if it was
> allocated for one very large request and not used much after that), we
> should find a way to identify unusually large requests and avoid
> caching it."  If you're playing a movie, sure; but in most cases, the
> cache was useful for a time, then stopped being useful.  Treating the
> persistent grants the same way makes sense to me.

Right, this is what I was trying to suggest with the MRU scheme. If you
are using lots of grants and you keep on reusing them then they remain
persistent and don't get reclaimed. If you are not reusing them for a
while then they get reclaimed. If you make "for a while" big enough then
you should find you aren't unintentionally falling back to a map/unmap
scheme.


Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Hackathon minutes] PV block improvements
  2013-06-26 11:37             ` Ian Campbell
@ 2013-06-27 13:58               ` George Dunlap
  2013-06-27 14:21                 ` Ian Campbell
  0 siblings, 1 reply; 24+ messages in thread
From: George Dunlap @ 2013-06-27 13:58 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Konrad Rzeszutek Wilk, xen-devel, Roger Pau Monné,
	Matt Wilson, Stefano Stabellini

On 26/06/13 12:37, Ian Campbell wrote:
> On Wed, 2013-06-26 at 10:37 +0100, George Dunlap wrote:
>> On Tue, Jun 25, 2013 at 7:04 PM, Stefano Stabellini
>> <stefano.stabellini@eu.citrix.com> wrote:
>>> On Tue, 25 Jun 2013, Ian Campbell wrote:
>>>> On Sat, 2013-06-22 at 09:11 +0200, Roger Pau Monné wrote:
>>>>> On 21/06/13 20:07, Matt Wilson wrote:
>>>>>> On Fri, Jun 21, 2013 at 07:10:59PM +0200, Roger Pau Monné wrote:
>>>>>>> Hello,
>>>>>>>
>>>>>>> While working on further block improvements I've found an issue with
>>>>>>> persistent grants in blkfront.
>>>>>>>
>>>>>>> Persistent grants basically allocate grants and then they are never
>>>>>>> released, so both blkfront and blkback keep using the same memory pages
>>>>>>> for all the transactions.
>>>>>>>
>>>>>>> This is not a problem in blkback, because we can dynamically choose how
>>>>>>> many grants we want to map. On the other hand, blkfront cannot remove
>>>>>>> the access to those grants at any point, because blkfront doesn't know
>>>>>>> if blkback has this grants mapped persistently or not.
>>>>>>>
>>>>>>> So if for example we start expanding the number of segments in indirect
>>>>>>> requests, to a value like 512 segments per requests, blkfront will
>>>>>>> probably try to persistently map 512*32+512 = 16896 grants per device,
>>>>>>> that's much more grants that the current default, which is 32*256 = 8192
>>>>>>> (if using grant tables v2). This can cause serious problems to other
>>>>>>> interfaces inside the DomU, since blkfront basically starts hoarding all
>>>>>>> possible grants, leaving other interfaces completely locked.
>>>>>> Yikes.
>>>>>>
>>>>>>> I've been thinking about different ways to solve this, but so far I
>>>>>>> haven't been able to found a nice solution:
>>>>>>>
>>>>>>> 1. Limit the number of persistent grants a blkfront instance can use,
>>>>>>> let's say that only the first X used grants will be persistently mapped
>>>>>>> by both blkfront and blkback, and if more grants are needed the previous
>>>>>>> map/unmap will be used.
>>>>>> I'm not thrilled with this option. It would likely introduce some
>>>>>> significant performance variability, wouldn't it?
>>>>> Probably, and also it will be hard to distribute the number of available
>>>>> grant across the different interfaces in a performance sensible way,
>>>>> specially given the fact that once a grant is assigned to a interface it
>>>>> cannot be returned back to the pool of grants.
>>>>>
>>>>> So if we had two interfaces with very different usage (one very busy and
>>>>> another one almost idle), and equally distribute the grants amongst
>>>>> them, one will have a lot of unused grants while the other will suffer
>>>>> from starvation.
>>>> I do think we need to implement some sort of reclaim scheme, which
>>>> probably does mean a specific request (per your #4). We simply can't
>>>> have a device which once upon a time had high throughput but is no
>>>> mostly ideal continue to tie up all those grants.
>>>>
>>>> If you make the reuse of grants use an MRU scheme and reclaim the
>>>> currently unused tail fairly infrequently and in large batches then the
>>>> perf overhead should be minimal, I think.
>>>>
>>>> I also don't think I would discount the idea of using ephemeral grants
>>>> to cover bursts so easily either, in fact it might fall out quite
>>>> naturally from an MRU scheme? In that scheme bursting up is pretty cheap
>>>> since grant map is relative inexpensive, and recovering from the burst
>>>> shouldn't be too expensive if you batch it. If it turns out to be not a
>>>> burst but a sustained level of I/O then the MRU scheme would mean you
>>>> wouldn't be recovering them.
>>>>
>>>> I also think there probably needs to be some tunable per device limit on
>>>> the maximum persistent grants, perhaps minimum and maximum pool sizes
>>>> ties in with an MRU scheme? If nothing else it gives the admin the
>>>> ability to prioritise devices.
>>> If we introduce a reclaim call we have to be careful not to fall back
>>> to a map/unmap scheme like we had before.
>>>
>>> The way I see it either these additional grants are useful or not.
>>> In the first case we could just limit the maximum amount of persistent
>>> grants and be done with it.
>>> If they are not useful (they have been allocated for one very large
>>> request and not used much after that), could we find a way to identify
>>> unusually large requests and avoid using persistent grants for those?
>> Isn't it possible that these grants are useful for some periods of
>> time, but not for others?  You wouldn't say, "Caching the disk data in
>> main memory is either useful or not; if it is not useful (if it was
>> allocated for one very large request and not used much after that), we
>> should find a way to identify unusually large requests and avoid
>> caching it."  If you're playing a movie, sure; but in most cases, the
>> cache was useful for a time, then stopped being useful.  Treating the
>> persistent grants the same way makes sense to me.
> Right, this is what I was trying to suggest with the MRU scheme. If you
> are using lots of grants and you keep on reusing them then they remain
> persistent and don't get reclaimed. If you are not reusing them for a
> while then they get reclaimed. If you make "for a while" big enough then
> you should find you aren't unintentionally falling back to a map/unmap
> scheme.

And I was trying to say that I agreed with you. :-)

BTW, I presume "MRU" stands for "Most Recently Used", and means "Keep 
the most recently used"; is there a practical difference between that 
and "LRU" ("Discard the Least Recently Used")?

Presumably we could implement the clock algorithm pretty reasonably...

  -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Hackathon minutes] PV block improvements
  2013-06-27 13:58               ` George Dunlap
@ 2013-06-27 14:21                 ` Ian Campbell
  2013-06-27 15:20                   ` Roger Pau Monné
  0 siblings, 1 reply; 24+ messages in thread
From: Ian Campbell @ 2013-06-27 14:21 UTC (permalink / raw)
  To: George Dunlap
  Cc: Konrad Rzeszutek Wilk, xen-devel, Roger Pau Monné,
	Matt Wilson, Stefano Stabellini

On Thu, 2013-06-27 at 14:58 +0100, George Dunlap wrote:
> On 26/06/13 12:37, Ian Campbell wrote:
> > On Wed, 2013-06-26 at 10:37 +0100, George Dunlap wrote:
> >> On Tue, Jun 25, 2013 at 7:04 PM, Stefano Stabellini
> >> <stefano.stabellini@eu.citrix.com> wrote:
> >>> On Tue, 25 Jun 2013, Ian Campbell wrote:
> >>>> On Sat, 2013-06-22 at 09:11 +0200, Roger Pau Monné wrote:
> >>>>> On 21/06/13 20:07, Matt Wilson wrote:
> >>>>>> On Fri, Jun 21, 2013 at 07:10:59PM +0200, Roger Pau Monné wrote:
> >>>>>>> Hello,
> >>>>>>>
> >>>>>>> While working on further block improvements I've found an issue with
> >>>>>>> persistent grants in blkfront.
> >>>>>>>
> >>>>>>> Persistent grants basically allocate grants and then they are never
> >>>>>>> released, so both blkfront and blkback keep using the same memory pages
> >>>>>>> for all the transactions.
> >>>>>>>
> >>>>>>> This is not a problem in blkback, because we can dynamically choose how
> >>>>>>> many grants we want to map. On the other hand, blkfront cannot remove
> >>>>>>> the access to those grants at any point, because blkfront doesn't know
> >>>>>>> if blkback has this grants mapped persistently or not.
> >>>>>>>
> >>>>>>> So if for example we start expanding the number of segments in indirect
> >>>>>>> requests, to a value like 512 segments per requests, blkfront will
> >>>>>>> probably try to persistently map 512*32+512 = 16896 grants per device,
> >>>>>>> that's much more grants that the current default, which is 32*256 = 8192
> >>>>>>> (if using grant tables v2). This can cause serious problems to other
> >>>>>>> interfaces inside the DomU, since blkfront basically starts hoarding all
> >>>>>>> possible grants, leaving other interfaces completely locked.
> >>>>>> Yikes.
> >>>>>>
> >>>>>>> I've been thinking about different ways to solve this, but so far I
> >>>>>>> haven't been able to found a nice solution:
> >>>>>>>
> >>>>>>> 1. Limit the number of persistent grants a blkfront instance can use,
> >>>>>>> let's say that only the first X used grants will be persistently mapped
> >>>>>>> by both blkfront and blkback, and if more grants are needed the previous
> >>>>>>> map/unmap will be used.
> >>>>>> I'm not thrilled with this option. It would likely introduce some
> >>>>>> significant performance variability, wouldn't it?
> >>>>> Probably, and also it will be hard to distribute the number of available
> >>>>> grant across the different interfaces in a performance sensible way,
> >>>>> specially given the fact that once a grant is assigned to a interface it
> >>>>> cannot be returned back to the pool of grants.
> >>>>>
> >>>>> So if we had two interfaces with very different usage (one very busy and
> >>>>> another one almost idle), and equally distribute the grants amongst
> >>>>> them, one will have a lot of unused grants while the other will suffer
> >>>>> from starvation.
> >>>> I do think we need to implement some sort of reclaim scheme, which
> >>>> probably does mean a specific request (per your #4). We simply can't
> >>>> have a device which once upon a time had high throughput but is no
> >>>> mostly ideal continue to tie up all those grants.
> >>>>
> >>>> If you make the reuse of grants use an MRU scheme and reclaim the
> >>>> currently unused tail fairly infrequently and in large batches then the
> >>>> perf overhead should be minimal, I think.
> >>>>
> >>>> I also don't think I would discount the idea of using ephemeral grants
> >>>> to cover bursts so easily either, in fact it might fall out quite
> >>>> naturally from an MRU scheme? In that scheme bursting up is pretty cheap
> >>>> since grant map is relative inexpensive, and recovering from the burst
> >>>> shouldn't be too expensive if you batch it. If it turns out to be not a
> >>>> burst but a sustained level of I/O then the MRU scheme would mean you
> >>>> wouldn't be recovering them.
> >>>>
> >>>> I also think there probably needs to be some tunable per device limit on
> >>>> the maximum persistent grants, perhaps minimum and maximum pool sizes
> >>>> ties in with an MRU scheme? If nothing else it gives the admin the
> >>>> ability to prioritise devices.
> >>> If we introduce a reclaim call we have to be careful not to fall back
> >>> to a map/unmap scheme like we had before.
> >>>
> >>> The way I see it either these additional grants are useful or not.
> >>> In the first case we could just limit the maximum amount of persistent
> >>> grants and be done with it.
> >>> If they are not useful (they have been allocated for one very large
> >>> request and not used much after that), could we find a way to identify
> >>> unusually large requests and avoid using persistent grants for those?
> >> Isn't it possible that these grants are useful for some periods of
> >> time, but not for others?  You wouldn't say, "Caching the disk data in
> >> main memory is either useful or not; if it is not useful (if it was
> >> allocated for one very large request and not used much after that), we
> >> should find a way to identify unusually large requests and avoid
> >> caching it."  If you're playing a movie, sure; but in most cases, the
> >> cache was useful for a time, then stopped being useful.  Treating the
> >> persistent grants the same way makes sense to me.
> > Right, this is what I was trying to suggest with the MRU scheme. If you
> > are using lots of grants and you keep on reusing them then they remain
> > persistent and don't get reclaimed. If you are not reusing them for a
> > while then they get reclaimed. If you make "for a while" big enough then
> > you should find you aren't unintentionally falling back to a map/unmap
> > scheme.
> 
> And I was trying to say that I agreed with you. :-)

Excellent ;-)

> BTW, I presume "MRU" stands for "Most Recently Used", and means "Keep 
> the most recently used"; is there a practical difference between that 
> and "LRU" ("Discard the Least Recently Used")?

I started off with LRU and then got my self confused and changed it
everywhere. Yes I mean keep Most Recently Used == discard Least Recently
Used.

> Presumably we could implement the clock algorithm pretty reasonably...

That's the sort of approach I was imagining...

Ian.



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Hackathon minutes] PV block improvements
  2013-06-21 18:07   ` Matt Wilson
  2013-06-22  7:11     ` Roger Pau Monné
@ 2013-06-27 15:12     ` Roger Pau Monné
  2013-06-27 15:26       ` Stefano Stabellini
  1 sibling, 1 reply; 24+ messages in thread
From: Roger Pau Monné @ 2013-06-27 15:12 UTC (permalink / raw)
  To: Matt Wilson; +Cc: Konrad Rzeszutek Wilk, xen-devel

On 21/06/13 20:07, Matt Wilson wrote:
> On Fri, Jun 21, 2013 at 07:10:59PM +0200, Roger Pau Monné wrote:
>> Hello,
>>
>> While working on further block improvements I've found an issue with
>> persistent grants in blkfront.
>>
>> Persistent grants basically allocate grants and then they are never
>> released, so both blkfront and blkback keep using the same memory pages
>> for all the transactions.
>>
>> This is not a problem in blkback, because we can dynamically choose how
>> many grants we want to map. On the other hand, blkfront cannot remove
>> the access to those grants at any point, because blkfront doesn't know
>> if blkback has this grants mapped persistently or not.
>>
>> So if for example we start expanding the number of segments in indirect
>> requests, to a value like 512 segments per requests, blkfront will
>> probably try to persistently map 512*32+512 = 16896 grants per device,
>> that's much more grants that the current default, which is 32*256 = 8192
>> (if using grant tables v2). This can cause serious problems to other
>> interfaces inside the DomU, since blkfront basically starts hoarding all
>> possible grants, leaving other interfaces completely locked.
> 
> Yikes.
> 
>> I've been thinking about different ways to solve this, but so far I
>> haven't been able to found a nice solution:
>>
>> 1. Limit the number of persistent grants a blkfront instance can use,
>> let's say that only the first X used grants will be persistently mapped
>> by both blkfront and blkback, and if more grants are needed the previous
>> map/unmap will be used.
> 
> I'm not thrilled with this option. It would likely introduce some
> significant performance variability, wouldn't it?
> 
>> 2. Switch to grant copy in blkback, and get rid of persistent grants (I
>> have not benchmarked this solution, but I'm quite sure it will involve a
>> performance regression, specially when scaling to a high number of domains).
> 
> Why do you think so?

I've hacked a prototype blkback using grant_copy instead of persistent
grants, and removed the persistent grants support in blkfront and indeed
the performance of grant_copy is lower than persistent grants, and it
seems to scale much worse. I've run several fio read/write benchmarks,
using 512 segments per request on a ramdisk, and the output is the
following:

http://xenbits.xen.org/people/royger/grant_copy/

Roger.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Hackathon minutes] PV block improvements
  2013-06-27 14:21                 ` Ian Campbell
@ 2013-06-27 15:20                   ` Roger Pau Monné
  0 siblings, 0 replies; 24+ messages in thread
From: Roger Pau Monné @ 2013-06-27 15:20 UTC (permalink / raw)
  To: Ian Campbell
  Cc: George Dunlap, Konrad Rzeszutek Wilk, xen-devel, Matt Wilson,
	Stefano Stabellini

On 27/06/13 16:21, Ian Campbell wrote:
> On Thu, 2013-06-27 at 14:58 +0100, George Dunlap wrote:
>> On 26/06/13 12:37, Ian Campbell wrote:
>>> On Wed, 2013-06-26 at 10:37 +0100, George Dunlap wrote:
>>>> On Tue, Jun 25, 2013 at 7:04 PM, Stefano Stabellini
>>>> <stefano.stabellini@eu.citrix.com> wrote:
>>>>> On Tue, 25 Jun 2013, Ian Campbell wrote:
>>>>>> On Sat, 2013-06-22 at 09:11 +0200, Roger Pau Monné wrote:
>>>>>>> On 21/06/13 20:07, Matt Wilson wrote:
>>>>>>>> On Fri, Jun 21, 2013 at 07:10:59PM +0200, Roger Pau Monné wrote:
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> While working on further block improvements I've found an issue with
>>>>>>>>> persistent grants in blkfront.
>>>>>>>>>
>>>>>>>>> Persistent grants basically allocate grants and then they are never
>>>>>>>>> released, so both blkfront and blkback keep using the same memory pages
>>>>>>>>> for all the transactions.
>>>>>>>>>
>>>>>>>>> This is not a problem in blkback, because we can dynamically choose how
>>>>>>>>> many grants we want to map. On the other hand, blkfront cannot remove
>>>>>>>>> the access to those grants at any point, because blkfront doesn't know
>>>>>>>>> if blkback has this grants mapped persistently or not.
>>>>>>>>>
>>>>>>>>> So if for example we start expanding the number of segments in indirect
>>>>>>>>> requests, to a value like 512 segments per requests, blkfront will
>>>>>>>>> probably try to persistently map 512*32+512 = 16896 grants per device,
>>>>>>>>> that's much more grants that the current default, which is 32*256 = 8192
>>>>>>>>> (if using grant tables v2). This can cause serious problems to other
>>>>>>>>> interfaces inside the DomU, since blkfront basically starts hoarding all
>>>>>>>>> possible grants, leaving other interfaces completely locked.
>>>>>>>> Yikes.
>>>>>>>>
>>>>>>>>> I've been thinking about different ways to solve this, but so far I
>>>>>>>>> haven't been able to found a nice solution:
>>>>>>>>>
>>>>>>>>> 1. Limit the number of persistent grants a blkfront instance can use,
>>>>>>>>> let's say that only the first X used grants will be persistently mapped
>>>>>>>>> by both blkfront and blkback, and if more grants are needed the previous
>>>>>>>>> map/unmap will be used.
>>>>>>>> I'm not thrilled with this option. It would likely introduce some
>>>>>>>> significant performance variability, wouldn't it?
>>>>>>> Probably, and also it will be hard to distribute the number of available
>>>>>>> grant across the different interfaces in a performance sensible way,
>>>>>>> specially given the fact that once a grant is assigned to a interface it
>>>>>>> cannot be returned back to the pool of grants.
>>>>>>>
>>>>>>> So if we had two interfaces with very different usage (one very busy and
>>>>>>> another one almost idle), and equally distribute the grants amongst
>>>>>>> them, one will have a lot of unused grants while the other will suffer
>>>>>>> from starvation.
>>>>>> I do think we need to implement some sort of reclaim scheme, which
>>>>>> probably does mean a specific request (per your #4). We simply can't
>>>>>> have a device which once upon a time had high throughput but is no
>>>>>> mostly ideal continue to tie up all those grants.
>>>>>>
>>>>>> If you make the reuse of grants use an MRU scheme and reclaim the
>>>>>> currently unused tail fairly infrequently and in large batches then the
>>>>>> perf overhead should be minimal, I think.
>>>>>>
>>>>>> I also don't think I would discount the idea of using ephemeral grants
>>>>>> to cover bursts so easily either, in fact it might fall out quite
>>>>>> naturally from an MRU scheme? In that scheme bursting up is pretty cheap
>>>>>> since grant map is relative inexpensive, and recovering from the burst
>>>>>> shouldn't be too expensive if you batch it. If it turns out to be not a
>>>>>> burst but a sustained level of I/O then the MRU scheme would mean you
>>>>>> wouldn't be recovering them.
>>>>>>
>>>>>> I also think there probably needs to be some tunable per device limit on
>>>>>> the maximum persistent grants, perhaps minimum and maximum pool sizes
>>>>>> ties in with an MRU scheme? If nothing else it gives the admin the
>>>>>> ability to prioritise devices.
>>>>> If we introduce a reclaim call we have to be careful not to fall back
>>>>> to a map/unmap scheme like we had before.
>>>>>
>>>>> The way I see it either these additional grants are useful or not.
>>>>> In the first case we could just limit the maximum amount of persistent
>>>>> grants and be done with it.
>>>>> If they are not useful (they have been allocated for one very large
>>>>> request and not used much after that), could we find a way to identify
>>>>> unusually large requests and avoid using persistent grants for those?
>>>> Isn't it possible that these grants are useful for some periods of
>>>> time, but not for others?  You wouldn't say, "Caching the disk data in
>>>> main memory is either useful or not; if it is not useful (if it was
>>>> allocated for one very large request and not used much after that), we
>>>> should find a way to identify unusually large requests and avoid
>>>> caching it."  If you're playing a movie, sure; but in most cases, the
>>>> cache was useful for a time, then stopped being useful.  Treating the
>>>> persistent grants the same way makes sense to me.
>>> Right, this is what I was trying to suggest with the MRU scheme. If you
>>> are using lots of grants and you keep on reusing them then they remain
>>> persistent and don't get reclaimed. If you are not reusing them for a
>>> while then they get reclaimed. If you make "for a while" big enough then
>>> you should find you aren't unintentionally falling back to a map/unmap
>>> scheme.
>>
>> And I was trying to say that I agreed with you. :-)
> 
> Excellent ;-)

I also agree that this is the best solution, I will start looking at
implementing it.

>> BTW, I presume "MRU" stands for "Most Recently Used", and means "Keep 
>> the most recently used"; is there a practical difference between that 
>> and "LRU" ("Discard the Least Recently Used")?
> 
> I started off with LRU and then got my self confused and changed it
> everywhere. Yes I mean keep Most Recently Used == discard Least Recently
> Used.

This will help if the disk is only doing intermittent bursts of data,
but if the disk is under high I/O during a long time we might end up
under the same situation (all grants hoarded by a single disk). We
should make sure that there's always a buffer of unused grants so other
disks or nic interfaces can continue to work as expected.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Hackathon minutes] PV block improvements
  2013-06-27 15:12     ` Roger Pau Monné
@ 2013-06-27 15:26       ` Stefano Stabellini
  0 siblings, 0 replies; 24+ messages in thread
From: Stefano Stabellini @ 2013-06-27 15:26 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel, Matt Wilson, Konrad Rzeszutek Wilk

[-- Attachment #1: Type: text/plain, Size: 2680 bytes --]

On Thu, 27 Jun 2013, Roger Pau Monné wrote:
> On 21/06/13 20:07, Matt Wilson wrote:
> > On Fri, Jun 21, 2013 at 07:10:59PM +0200, Roger Pau Monné wrote:
> >> Hello,
> >>
> >> While working on further block improvements I've found an issue with
> >> persistent grants in blkfront.
> >>
> >> Persistent grants basically allocate grants and then they are never
> >> released, so both blkfront and blkback keep using the same memory pages
> >> for all the transactions.
> >>
> >> This is not a problem in blkback, because we can dynamically choose how
> >> many grants we want to map. On the other hand, blkfront cannot remove
> >> the access to those grants at any point, because blkfront doesn't know
> >> if blkback has this grants mapped persistently or not.
> >>
> >> So if for example we start expanding the number of segments in indirect
> >> requests, to a value like 512 segments per requests, blkfront will
> >> probably try to persistently map 512*32+512 = 16896 grants per device,
> >> that's much more grants that the current default, which is 32*256 = 8192
> >> (if using grant tables v2). This can cause serious problems to other
> >> interfaces inside the DomU, since blkfront basically starts hoarding all
> >> possible grants, leaving other interfaces completely locked.
> > 
> > Yikes.
> > 
> >> I've been thinking about different ways to solve this, but so far I
> >> haven't been able to found a nice solution:
> >>
> >> 1. Limit the number of persistent grants a blkfront instance can use,
> >> let's say that only the first X used grants will be persistently mapped
> >> by both blkfront and blkback, and if more grants are needed the previous
> >> map/unmap will be used.
> > 
> > I'm not thrilled with this option. It would likely introduce some
> > significant performance variability, wouldn't it?
> > 
> >> 2. Switch to grant copy in blkback, and get rid of persistent grants (I
> >> have not benchmarked this solution, but I'm quite sure it will involve a
> >> performance regression, specially when scaling to a high number of domains).
> > 
> > Why do you think so?
> 
> I've hacked a prototype blkback using grant_copy instead of persistent
> grants, and removed the persistent grants support in blkfront and indeed
> the performance of grant_copy is lower than persistent grants, and it
> seems to scale much worse. I've run several fio read/write benchmarks,
> using 512 segments per request on a ramdisk, and the output is the
> following:
> 
> http://xenbits.xen.org/people/royger/grant_copy/

Very impressive. We should consider doing the same experiment with
netfront/netback at some point.

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Hackathon minutes] PV block improvements
  2013-06-24 11:06       ` Stefano Stabellini
@ 2013-07-02 11:49         ` Roger Pau Monné
  0 siblings, 0 replies; 24+ messages in thread
From: Roger Pau Monné @ 2013-07-02 11:49 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel, Wei Liu

On 24/06/13 13:06, Stefano Stabellini wrote:
> On Sat, 22 Jun 2013, Wei Liu wrote:
>> On Fri, Jun 21, 2013 at 04:16:25PM -0400, Konrad Rzeszutek Wilk wrote:
>>> On Fri, Jun 21, 2013 at 07:10:59PM +0200, Roger Pau Monné wrote:
>>>> Hello,
>>>>
>>>> While working on further block improvements I've found an issue with
>>>> persistent grants in blkfront.
>>>>
>>>> Persistent grants basically allocate grants and then they are never
>>>> released, so both blkfront and blkback keep using the same memory pages
>>>> for all the transactions.
>>>>
>>>> This is not a problem in blkback, because we can dynamically choose how
>>>> many grants we want to map. On the other hand, blkfront cannot remove
>>>> the access to those grants at any point, because blkfront doesn't know
>>>> if blkback has this grants mapped persistently or not.
>>>>
>>>> So if for example we start expanding the number of segments in indirect
>>>> requests, to a value like 512 segments per requests, blkfront will
>>>> probably try to persistently map 512*32+512 = 16896 grants per device,
>>>> that's much more grants that the current default, which is 32*256 = 8192
>>>> (if using grant tables v2). This can cause serious problems to other
>>>> interfaces inside the DomU, since blkfront basically starts hoarding all
>>>> possible grants, leaving other interfaces completely locked.
>>>>
>>>> I've been thinking about different ways to solve this, but so far I
>>>> haven't been able to found a nice solution:
>>>>
>>>> 1. Limit the number of persistent grants a blkfront instance can use,
>>>> let's say that only the first X used grants will be persistently mapped
>>>> by both blkfront and blkback, and if more grants are needed the previous
>>>> map/unmap will be used.
>>>>
>>>> 2. Switch to grant copy in blkback, and get rid of persistent grants (I
>>>> have not benchmarked this solution, but I'm quite sure it will involve a
>>>> performance regression, specially when scaling to a high number of domains).
>>>>
>>
>> Any chance that the speed of copying is fast enough for block devices?
>>
>>>> 3. Increase the size of the grant_table or the size of a single grant
>>>> (from 4k to 2M) (this is from Stefano Stabellini).
>>>>
>>>> 4. Introduce a new request type that we can use to request blkback to
>>>> unmap certain grefs so we can free them in blkfront.
>>>
>>>
>>> 5). Lift the limit of grant pages a domain can have.
>>
>> If I'm not mistaken, this is basically the same as "increase the size of
>> the grant_table" in #3.
> 
> Yes, that was one of the things I was suggesting, but it needs
> investigating: I wouldn't want that increasing the number of grant
> frames would reach a different scalability limit of the data structure.

I don't think there's any implicit scalability limit in the data
structure itself, it's just an array and grants are ordered as
array[gref]. I've discussed with Stefano the usage of domain pages to
increase the size of the grant table, so instead of using xenheap pages
we could use domain pages and thus remove the limitation (since we will
be consuming domain memory). I have a very hacky prototype that uses
domain pages instead of xenheap pages for expanding the grant table, but
I think that before implementing this it would be more suitable to
implement #4, even if we are using domain pages to increase the grant
table, we need a way to allow blkfront to remove persistent grants, or
we will end up with a lot of unsused pages in blkfront after I/O bursts.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2013-07-02 11:49 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-05-24 15:06 [Hackathon minutes] PV block improvements Roger Pau Monné
2013-06-21 17:10 ` Roger Pau Monné
2013-06-21 18:07   ` Matt Wilson
2013-06-22  7:11     ` Roger Pau Monné
2013-06-25  6:09       ` Matt Wilson
2013-06-25 13:01         ` Wei Liu
2013-06-25 15:39           ` Matt Wilson
2013-06-25 15:53       ` Ian Campbell
2013-06-25 18:04         ` Stefano Stabellini
2013-06-26  9:37           ` George Dunlap
2013-06-26 11:37             ` Ian Campbell
2013-06-27 13:58               ` George Dunlap
2013-06-27 14:21                 ` Ian Campbell
2013-06-27 15:20                   ` Roger Pau Monné
2013-06-25 15:57       ` Ian Campbell
2013-06-25 16:05         ` Jan Beulich
2013-06-25 16:30         ` Roger Pau Monné
2013-06-27 15:12     ` Roger Pau Monné
2013-06-27 15:26       ` Stefano Stabellini
2013-06-21 20:16   ` Konrad Rzeszutek Wilk
2013-06-21 23:17     ` Wei Liu
2013-06-24 11:06       ` Stefano Stabellini
2013-07-02 11:49         ` Roger Pau Monné
2013-06-22  7:17     ` Roger Pau Monné

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).