* [RFC] Extending numbers of event channels
@ 2012-12-03 16:29 Wei Liu
2012-12-03 17:35 ` Jan Beulich
` (2 more replies)
0 siblings, 3 replies; 20+ messages in thread
From: Wei Liu @ 2012-12-03 16:29 UTC (permalink / raw)
To: xen-devel; +Cc: wei.liu2, Jan Beulich
Hi all
There has been discussion on extending number of event channels back in
September [0].
Regarding Jan's comment in [0], I don't think allowing user to specify
arbitrary number of levels a good idea. Because only the last level
should be shared among vcpus, other level should be in percpu struct to
allow for quicker lookup. The idea to let user specify levels will be
too complicated in implementation and blow up percpu section (since the
size grows exponentially). Three levels should be quite enough. See
maths below.
Number of event channels:
* 32bit: 1024 * sizeof(unsigned long long) * BITS_PER_BYTE = 64k
* 64bit: 4096 * sizeof(unsigned long long) * BITS_PER_BYTE = 512k
Basically the third level is a new ABI, so I choose to use unsigned long
long here to get more event channels.
Pages occupied by the third level (if PAGE_SIZE=4k):
* 32bit: 64k / 8 / 4k = 2
* 64bit: 512k / 8 / 4k = 16
Making second level percpu will incur overhead. In fact we move the
array in shared info into percpu struct:
* 32bit: sizeof(unsigned long) * 8 * sizeof(unsigned long) = 128 byte
* 64bit: sizeof(unsigned long) * 8 * sizeof(unsigned long) = 512 byte
What concerns me is that the struct evtchn buckets are allocated all at
once during initialization phrase. To save memory inside Xen, the
internal allocation/free scheme for evtchn needs to be modified. Ian
suggested we do small number of buckets at start of day then dynamically
fault in more as required.
To sum up:
1. Guest should allocate pages for third level evtchn.
2. Guest should register third level pages via a new hypercall op.
3. Hypervisor should setup third level evtchn in that hypercall op.
4. Only last level (third in this case) should be shared among
vcpus.
5. Need a flexible allocation/free scheme of struct evtchn.
6. Debug dumping should use snapshot to avoid holding event lock
for too long. (Jan's concern in [0])
Any comments are welcomed.
Wei.
[0] http://thread.gmane.org/gmane.comp.emulators.xen.devel/139921
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC] Extending numbers of event channels
2012-12-03 16:29 [RFC] Extending numbers of event channels Wei Liu
@ 2012-12-03 17:35 ` Jan Beulich
2012-12-03 17:52 ` Wei Liu
2012-12-03 17:43 ` Ian Campbell
2012-12-03 18:52 ` David Vrabel
2 siblings, 1 reply; 20+ messages in thread
From: Jan Beulich @ 2012-12-03 17:35 UTC (permalink / raw)
To: Wei Liu; +Cc: xen-devel
>>> On 03.12.12 at 17:29, Wei Liu <Wei.Liu2@citrix.com> wrote:
> Regarding Jan's comment in [0], I don't think allowing user to specify
> arbitrary number of levels a good idea. Because only the last level
> should be shared among vcpus, other level should be in percpu struct to
> allow for quicker lookup. The idea to let user specify levels will be
> too complicated in implementation and blow up percpu section (since the
> size grows exponentially). Three levels should be quite enough. See
> maths below.
I didn't ask to implement more than three levels, I just asked for
the interface to establish the number of levels a guest wants to
use to allow for higher numbers (passing of which would result in
-EINVAL in your implementation).
> Number of event channels:
> * 32bit: 1024 * sizeof(unsigned long long) * BITS_PER_BYTE = 64k
> * 64bit: 4096 * sizeof(unsigned long long) * BITS_PER_BYTE = 512k
> Basically the third level is a new ABI, so I choose to use unsigned long
> long here to get more event channels.
Please don't: This would make things less consistent to handle
at least in the guest side code. And I don't see why you would
have a need to do so anyway (or else your argument above
against further levels would become questionable).
> Pages occupied by the third level (if PAGE_SIZE=4k):
> * 32bit: 64k / 8 / 4k = 2
> * 64bit: 512k / 8 / 4k = 16
>
> Making second level percpu will incur overhead. In fact we move the
> array in shared info into percpu struct:
> * 32bit: sizeof(unsigned long) * 8 * sizeof(unsigned long) = 128 byte
> * 64bit: sizeof(unsigned long) * 8 * sizeof(unsigned long) = 512 byte
>
> What concerns me is that the struct evtchn buckets are allocated all at
> once during initialization phrase. To save memory inside Xen, the
> internal allocation/free scheme for evtchn needs to be modified. Ian
> suggested we do small number of buckets at start of day then dynamically
> fault in more as required.
>
> To sum up:
> 1. Guest should allocate pages for third level evtchn.
> 2. Guest should register third level pages via a new hypercall op.
Doesn't the guest also need to set up space for the 2nd level?
Jan
> 3. Hypervisor should setup third level evtchn in that hypercall op.
> 4. Only last level (third in this case) should be shared among
> vcpus.
> 5. Need a flexible allocation/free scheme of struct evtchn.
> 6. Debug dumping should use snapshot to avoid holding event lock
> for too long. (Jan's concern in [0])
>
> Any comments are welcomed.
>
>
> Wei.
>
> [0] http://thread.gmane.org/gmane.comp.emulators.xen.devel/139921
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC] Extending numbers of event channels
2012-12-03 16:29 [RFC] Extending numbers of event channels Wei Liu
2012-12-03 17:35 ` Jan Beulich
@ 2012-12-03 17:43 ` Ian Campbell
2012-12-03 17:48 ` Jan Beulich
2012-12-03 18:52 ` David Vrabel
2 siblings, 1 reply; 20+ messages in thread
From: Ian Campbell @ 2012-12-03 17:43 UTC (permalink / raw)
To: Wei Liu; +Cc: Jan Beulich, xen-devel@lists.xen.org
On Mon, 2012-12-03 at 16:29 +0000, Wei Liu wrote:
> Hi all
>
> There has been discussion on extending number of event channels back in
> September [0].
While we are changing the ABI anyway, it would be nice to consider the
possibility for some small-ish number of per-vcpu evtchns.
These would potentially be useful for things like IPI vectors and such
which every vcpu has.
e.g. reusing the per-vcpu space in evtchn_pending_sel would give 32 or
64 which would be more than sufficient.
Ian.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC] Extending numbers of event channels
2012-12-03 17:43 ` Ian Campbell
@ 2012-12-03 17:48 ` Jan Beulich
2012-12-03 17:50 ` Ian Campbell
0 siblings, 1 reply; 20+ messages in thread
From: Jan Beulich @ 2012-12-03 17:48 UTC (permalink / raw)
To: Ian Campbell, Wei Liu; +Cc: xen-devel@lists.xen.org
>>> On 03.12.12 at 18:43, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> On Mon, 2012-12-03 at 16:29 +0000, Wei Liu wrote:
>> Hi all
>>
>> There has been discussion on extending number of event channels back in
>> September [0].
>
> While we are changing the ABI anyway, it would be nice to consider the
> possibility for some small-ish number of per-vcpu evtchns.
>
> These would potentially be useful for things like IPI vectors and such
> which every vcpu has.
While I agree to this, ...
> e.g. reusing the per-vcpu space in evtchn_pending_sel would give 32 or
> 64 which would be more than sufficient.
... I would have hoped for this field to retain its use as top level
selector.
Jan
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC] Extending numbers of event channels
2012-12-03 17:48 ` Jan Beulich
@ 2012-12-03 17:50 ` Ian Campbell
0 siblings, 0 replies; 20+ messages in thread
From: Ian Campbell @ 2012-12-03 17:50 UTC (permalink / raw)
To: Jan Beulich; +Cc: Wei Liu, xen-devel@lists.xen.org
On Mon, 2012-12-03 at 17:48 +0000, Jan Beulich wrote:
> >>> On 03.12.12 at 18:43, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> > On Mon, 2012-12-03 at 16:29 +0000, Wei Liu wrote:
> >> Hi all
> >>
> >> There has been discussion on extending number of event channels back in
> >> September [0].
> >
> > While we are changing the ABI anyway, it would be nice to consider the
> > possibility for some small-ish number of per-vcpu evtchns.
> >
> > These would potentially be useful for things like IPI vectors and such
> > which every vcpu has.
>
> While I agree to this, ...
>
> > e.g. reusing the per-vcpu space in evtchn_pending_sel would give 32 or
> > 64 which would be more than sufficient.
>
> ... I would have hoped for this field to retain its use as top level
> selector.
Yes, I'm not sure why I thought it became unused...
Ian.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC] Extending numbers of event channels
2012-12-03 17:35 ` Jan Beulich
@ 2012-12-03 17:52 ` Wei Liu
2012-12-03 17:57 ` Ian Campbell
2012-12-03 18:00 ` Jan Beulich
0 siblings, 2 replies; 20+ messages in thread
From: Wei Liu @ 2012-12-03 17:52 UTC (permalink / raw)
To: Jan Beulich; +Cc: wei.liu2, ian.campbell, xen-devel@lists.xen.org
On Mon, 2012-12-03 at 17:35 +0000, Jan Beulich wrote:
> >>> On 03.12.12 at 17:29, Wei Liu <Wei.Liu2@citrix.com> wrote:
> > Regarding Jan's comment in [0], I don't think allowing user to specify
> > arbitrary number of levels a good idea. Because only the last level
> > should be shared among vcpus, other level should be in percpu struct to
> > allow for quicker lookup. The idea to let user specify levels will be
> > too complicated in implementation and blow up percpu section (since the
> > size grows exponentially). Three levels should be quite enough. See
> > maths below.
>
> I didn't ask to implement more than three levels, I just asked for
> the interface to establish the number of levels a guest wants to
> use to allow for higher numbers (passing of which would result in
> -EINVAL in your implementation).
>
Ah, I understand now. How about something like this:
struct EVTCHNOP_reg_nlevel {
int levels;
void *level_specified_reg_struct;
}
> > Number of event channels:
> > * 32bit: 1024 * sizeof(unsigned long long) * BITS_PER_BYTE = 64k
> > * 64bit: 4096 * sizeof(unsigned long long) * BITS_PER_BYTE = 512k
> > Basically the third level is a new ABI, so I choose to use unsigned long
> > long here to get more event channels.
>
> Please don't: This would make things less consistent to handle
> at least in the guest side code. And I don't see why you would
> have a need to do so anyway (or else your argument above
> against further levels would become questionable).
>
It was suggested by Ian to use unsigned long long. Ian, why do you
prefer unsigned long long to unsigned long?
> > Pages occupied by the third level (if PAGE_SIZE=4k):
> > * 32bit: 64k / 8 / 4k = 2
> > * 64bit: 512k / 8 / 4k = 16
> >
> > Making second level percpu will incur overhead. In fact we move the
> > array in shared info into percpu struct:
> > * 32bit: sizeof(unsigned long) * 8 * sizeof(unsigned long) = 128 byte
> > * 64bit: sizeof(unsigned long) * 8 * sizeof(unsigned long) = 512 byte
> >
> > What concerns me is that the struct evtchn buckets are allocated all at
> > once during initialization phrase. To save memory inside Xen, the
> > internal allocation/free scheme for evtchn needs to be modified. Ian
> > suggested we do small number of buckets at start of day then dynamically
> > fault in more as required.
> >
> > To sum up:
> > 1. Guest should allocate pages for third level evtchn.
> > 2. Guest should register third level pages via a new hypercall op.
>
> Doesn't the guest also need to set up space for the 2nd level?
>
Yes. That will be embedded in percpu struct vcpu_info, which will be
also register via the same hypercall op.
Wei.
> Jan
>
> > 3. Hypervisor should setup third level evtchn in that hypercall op.
> > 4. Only last level (third in this case) should be shared among
> > vcpus.
> > 5. Need a flexible allocation/free scheme of struct evtchn.
> > 6. Debug dumping should use snapshot to avoid holding event lock
> > for too long. (Jan's concern in [0])
> >
> > Any comments are welcomed.
> >
> >
> > Wei.
> >
> > [0] http://thread.gmane.org/gmane.comp.emulators.xen.devel/139921
>
>
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC] Extending numbers of event channels
2012-12-03 17:52 ` Wei Liu
@ 2012-12-03 17:57 ` Ian Campbell
2012-12-03 18:15 ` Wei Liu
2012-12-03 18:00 ` Jan Beulich
1 sibling, 1 reply; 20+ messages in thread
From: Ian Campbell @ 2012-12-03 17:57 UTC (permalink / raw)
To: Wei Liu; +Cc: Jan Beulich, xen-devel@lists.xen.org
On Mon, 2012-12-03 at 17:52 +0000, Wei Liu wrote:
> On Mon, 2012-12-03 at 17:35 +0000, Jan Beulich wrote:
> > >>> On 03.12.12 at 17:29, Wei Liu <Wei.Liu2@citrix.com> wrote:
> > > Regarding Jan's comment in [0], I don't think allowing user to specify
> > > arbitrary number of levels a good idea. Because only the last level
> > > should be shared among vcpus, other level should be in percpu struct to
> > > allow for quicker lookup. The idea to let user specify levels will be
> > > too complicated in implementation and blow up percpu section (since the
> > > size grows exponentially). Three levels should be quite enough. See
> > > maths below.
> >
> > I didn't ask to implement more than three levels, I just asked for
> > the interface to establish the number of levels a guest wants to
> > use to allow for higher numbers (passing of which would result in
> > -EINVAL in your implementation).
> >
>
> Ah, I understand now. How about something like this:
>
> struct EVTCHNOP_reg_nlevel {
> int levels;
> void *level_specified_reg_struct;
> }
>
> > > Number of event channels:
> > > * 32bit: 1024 * sizeof(unsigned long long) * BITS_PER_BYTE = 64k
> > > * 64bit: 4096 * sizeof(unsigned long long) * BITS_PER_BYTE = 512k
> > > Basically the third level is a new ABI, so I choose to use unsigned long
> > > long here to get more event channels.
> >
> > Please don't: This would make things less consistent to handle
> > at least in the guest side code. And I don't see why you would
> > have a need to do so anyway (or else your argument above
> > against further levels would become questionable).
> >
>
> It was suggested by Ian to use unsigned long long. Ian, why do you
> prefer unsigned long long to unsigned long?
I thought having 32 and 64 bit be the same might simplify some things,
but if not then that's fine.
Is 32k event channels going to be enough in the long run? I suppose any
system capable of running such a number of guests ought to be using 64
bit == 512k which should at least last a bit longer.
> > > Pages occupied by the third level (if PAGE_SIZE=4k):
> > > * 32bit: 64k / 8 / 4k = 2
> > > * 64bit: 512k / 8 / 4k = 16
> > >
> > > Making second level percpu will incur overhead. In fact we move the
> > > array in shared info into percpu struct:
> > > * 32bit: sizeof(unsigned long) * 8 * sizeof(unsigned long) = 128 byte
> > > * 64bit: sizeof(unsigned long) * 8 * sizeof(unsigned long) = 512 byte
> > >
> > > What concerns me is that the struct evtchn buckets are allocated all at
> > > once during initialization phrase. To save memory inside Xen, the
> > > internal allocation/free scheme for evtchn needs to be modified. Ian
> > > suggested we do small number of buckets at start of day then dynamically
> > > fault in more as required.
> > >
> > > To sum up:
> > > 1. Guest should allocate pages for third level evtchn.
> > > 2. Guest should register third level pages via a new hypercall op.
> >
> > Doesn't the guest also need to set up space for the 2nd level?
> >
>
> Yes. That will be embedded in percpu struct vcpu_info, which will be
> also register via the same hypercall op.
NB that there is already a vcpu info placement hypercall. I have no
problem making this be a prerequisite for this work.
Ian.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC] Extending numbers of event channels
2012-12-03 17:52 ` Wei Liu
2012-12-03 17:57 ` Ian Campbell
@ 2012-12-03 18:00 ` Jan Beulich
2012-12-03 18:09 ` Wei Liu
1 sibling, 1 reply; 20+ messages in thread
From: Jan Beulich @ 2012-12-03 18:00 UTC (permalink / raw)
To: Wei Liu; +Cc: ian.campbell, xen-devel@lists.xen.org
>>> On 03.12.12 at 18:52, Wei Liu <Wei.Liu2@citrix.com> wrote:
> On Mon, 2012-12-03 at 17:35 +0000, Jan Beulich wrote:
>> >>> On 03.12.12 at 17:29, Wei Liu <Wei.Liu2@citrix.com> wrote:
>> > Regarding Jan's comment in [0], I don't think allowing user to specify
>> > arbitrary number of levels a good idea. Because only the last level
>> > should be shared among vcpus, other level should be in percpu struct to
>> > allow for quicker lookup. The idea to let user specify levels will be
>> > too complicated in implementation and blow up percpu section (since the
>> > size grows exponentially). Three levels should be quite enough. See
>> > maths below.
>>
>> I didn't ask to implement more than three levels, I just asked for
>> the interface to establish the number of levels a guest wants to
>> use to allow for higher numbers (passing of which would result in
>> -EINVAL in your implementation).
>>
>
> Ah, I understand now. How about something like this:
>
> struct EVTCHNOP_reg_nlevel {
> int levels;
> void *level_specified_reg_struct;
> }
Yes, just "unsigned int" please.
>> > To sum up:
>> > 1. Guest should allocate pages for third level evtchn.
>> > 2. Guest should register third level pages via a new hypercall op.
>>
>> Doesn't the guest also need to set up space for the 2nd level?
>>
>
> Yes. That will be embedded in percpu struct vcpu_info, which will be
> also register via the same hypercall op.
"struct vcpu_info"? Same hypercall? Or are you mixing up types?
Jan
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC] Extending numbers of event channels
2012-12-03 18:00 ` Jan Beulich
@ 2012-12-03 18:09 ` Wei Liu
2012-12-04 8:05 ` Jan Beulich
0 siblings, 1 reply; 20+ messages in thread
From: Wei Liu @ 2012-12-03 18:09 UTC (permalink / raw)
To: Jan Beulich; +Cc: wei.liu2, Ian Campbell, xen-devel@lists.xen.org
On Mon, 2012-12-03 at 18:00 +0000, Jan Beulich wrote:
> >>> On 03.12.12 at 18:52, Wei Liu <Wei.Liu2@citrix.com> wrote:
> > On Mon, 2012-12-03 at 17:35 +0000, Jan Beulich wrote:
> >> >>> On 03.12.12 at 17:29, Wei Liu <Wei.Liu2@citrix.com> wrote:
> >> > Regarding Jan's comment in [0], I don't think allowing user to specify
> >> > arbitrary number of levels a good idea. Because only the last level
> >> > should be shared among vcpus, other level should be in percpu struct to
> >> > allow for quicker lookup. The idea to let user specify levels will be
> >> > too complicated in implementation and blow up percpu section (since the
> >> > size grows exponentially). Three levels should be quite enough. See
> >> > maths below.
> >>
> >> I didn't ask to implement more than three levels, I just asked for
> >> the interface to establish the number of levels a guest wants to
> >> use to allow for higher numbers (passing of which would result in
> >> -EINVAL in your implementation).
> >>
> >
> > Ah, I understand now. How about something like this:
> >
> > struct EVTCHNOP_reg_nlevel {
> > int levels;
> > void *level_specified_reg_struct;
> > }
>
> Yes, just "unsigned int" please.
>
Right, "unsigned int".
> >> > To sum up:
> >> > 1. Guest should allocate pages for third level evtchn.
> >> > 2. Guest should register third level pages via a new hypercall op.
> >>
> >> Doesn't the guest also need to set up space for the 2nd level?
> >>
> >
> > Yes. That will be embedded in percpu struct vcpu_info, which will be
> > also register via the same hypercall op.
>
> "struct vcpu_info"? Same hypercall? Or are you mixing up types?
>
What I meant was the second level will be embedded in struct vcpu_info,
and the 2nd level will be registered via some hypercall (not the struct
vcpu_info).
Wei.
> Jan
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC] Extending numbers of event channels
2012-12-03 17:57 ` Ian Campbell
@ 2012-12-03 18:15 ` Wei Liu
0 siblings, 0 replies; 20+ messages in thread
From: Wei Liu @ 2012-12-03 18:15 UTC (permalink / raw)
To: Ian Campbell; +Cc: wei.liu2, Jan Beulich, xen-devel@lists.xen.org
On Mon, 2012-12-03 at 17:57 +0000, Ian Campbell wrote:
> On Mon, 2012-12-03 at 17:52 +0000, Wei Liu wrote:
> > On Mon, 2012-12-03 at 17:35 +0000, Jan Beulich wrote:
> > > >>> On 03.12.12 at 17:29, Wei Liu <Wei.Liu2@citrix.com> wrote:
> > > > Regarding Jan's comment in [0], I don't think allowing user to specify
> > > > arbitrary number of levels a good idea. Because only the last level
> > > > should be shared among vcpus, other level should be in percpu struct to
> > > > allow for quicker lookup. The idea to let user specify levels will be
> > > > too complicated in implementation and blow up percpu section (since the
> > > > size grows exponentially). Three levels should be quite enough. See
> > > > maths below.
> > >
> > > I didn't ask to implement more than three levels, I just asked for
> > > the interface to establish the number of levels a guest wants to
> > > use to allow for higher numbers (passing of which would result in
> > > -EINVAL in your implementation).
> > >
> >
> > Ah, I understand now. How about something like this:
> >
> > struct EVTCHNOP_reg_nlevel {
> > int levels;
> > void *level_specified_reg_struct;
> > }
> >
> > > > Number of event channels:
> > > > * 32bit: 1024 * sizeof(unsigned long long) * BITS_PER_BYTE = 64k
> > > > * 64bit: 4096 * sizeof(unsigned long long) * BITS_PER_BYTE = 512k
> > > > Basically the third level is a new ABI, so I choose to use unsigned long
> > > > long here to get more event channels.
> > >
> > > Please don't: This would make things less consistent to handle
> > > at least in the guest side code. And I don't see why you would
> > > have a need to do so anyway (or else your argument above
> > > against further levels would become questionable).
> > >
> >
> > It was suggested by Ian to use unsigned long long. Ian, why do you
> > prefer unsigned long long to unsigned long?
>
> I thought having 32 and 64 bit be the same might simplify some things,
> but if not then that's fine.
>
> Is 32k event channels going to be enough in the long run? I suppose any
> system capable of running such a number of guests ought to be using 64
> bit == 512k which should at least last a bit longer.
>
I think 32k is quite enough for 32bit machines. And I agree with "system
capable of running such a number of guests ought to be using 64 bit ==
512k" ;-)
> > > > Pages occupied by the third level (if PAGE_SIZE=4k):
> > > > * 32bit: 64k / 8 / 4k = 2
> > > > * 64bit: 512k / 8 / 4k = 16
> > > >
> > > > Making second level percpu will incur overhead. In fact we move the
> > > > array in shared info into percpu struct:
> > > > * 32bit: sizeof(unsigned long) * 8 * sizeof(unsigned long) = 128 byte
> > > > * 64bit: sizeof(unsigned long) * 8 * sizeof(unsigned long) = 512 byte
> > > >
> > > > What concerns me is that the struct evtchn buckets are allocated all at
> > > > once during initialization phrase. To save memory inside Xen, the
> > > > internal allocation/free scheme for evtchn needs to be modified. Ian
> > > > suggested we do small number of buckets at start of day then dynamically
> > > > fault in more as required.
> > > >
> > > > To sum up:
> > > > 1. Guest should allocate pages for third level evtchn.
> > > > 2. Guest should register third level pages via a new hypercall op.
> > >
> > > Doesn't the guest also need to set up space for the 2nd level?
> > >
> >
> > Yes. That will be embedded in percpu struct vcpu_info, which will be
> > also register via the same hypercall op.
>
> NB that there is already a vcpu info placement hypercall. I have no
> problem making this be a prerequisite for this work.
>
I saw that one. But that's something down to implementation, so I didn't
go into details.
Wei.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC] Extending numbers of event channels
2012-12-03 16:29 [RFC] Extending numbers of event channels Wei Liu
2012-12-03 17:35 ` Jan Beulich
2012-12-03 17:43 ` Ian Campbell
@ 2012-12-03 18:52 ` David Vrabel
2012-12-03 19:11 ` Wei Liu
` (2 more replies)
2 siblings, 3 replies; 20+ messages in thread
From: David Vrabel @ 2012-12-03 18:52 UTC (permalink / raw)
To: Wei Liu; +Cc: Jan Beulich, xen-devel
On 03/12/12 16:29, Wei Liu wrote:
> Hi all
>
> There has been discussion on extending number of event channels back in
> September [0].
It seems that the decision has been made to go for this N-level
approach. Were any other methods considered?
Would a per-VCPU ring of pending events work? The ABI will be easier to
extend in the future for more event channels. The guest side code will
be simpler. It will be easier to fairly service the events as they will
be processed in the order they were raised.
The complexity would be in ensuring that events were not lost due to
lack of space in the ring. This may make the ring prohibitively large
or require complex or expensive tracking of pending events inside Xen.
> Regarding Jan's comment in [0], I don't think allowing user to specify
> arbitrary number of levels a good idea. Because only the last level
> should be shared among vcpus, other level should be in percpu struct to
> allow for quicker lookup. The idea to let user specify levels will be
> too complicated in implementation and blow up percpu section (since the
> size grows exponentially). Three levels should be quite enough. See
> maths below.
>
> Number of event channels:
> * 32bit: 1024 * sizeof(unsigned long long) * BITS_PER_BYTE = 64k
> * 64bit: 4096 * sizeof(unsigned long long) * BITS_PER_BYTE = 512k
> Basically the third level is a new ABI, so I choose to use unsigned long
> long here to get more event channels.
32-bit guests will have to treat the unsigned long long as two separate
words and iterate over then individually.
This is easy to do -- we have an experimental build of Xen and the
kernel that extends the number of event channels for 32-bit dom0 to 4096
by having each selector bit select a group of 4 words.
David
>
> Pages occupied by the third level (if PAGE_SIZE=4k):
> * 32bit: 64k / 8 / 4k = 2
> * 64bit: 512k / 8 / 4k = 16
>
> Making second level percpu will incur overhead. In fact we move the
> array in shared info into percpu struct:
> * 32bit: sizeof(unsigned long) * 8 * sizeof(unsigned long) = 128 byte
> * 64bit: sizeof(unsigned long) * 8 * sizeof(unsigned long) = 512 byte
>
> What concerns me is that the struct evtchn buckets are allocated all at
> once during initialization phrase. To save memory inside Xen, the
> internal allocation/free scheme for evtchn needs to be modified. Ian
> suggested we do small number of buckets at start of day then dynamically
> fault in more as required.
>
> To sum up:
> 1. Guest should allocate pages for third level evtchn.
> 2. Guest should register third level pages via a new hypercall op.
> 3. Hypervisor should setup third level evtchn in that hypercall op.
> 4. Only last level (third in this case) should be shared among
> vcpus.
> 5. Need a flexible allocation/free scheme of struct evtchn.
> 6. Debug dumping should use snapshot to avoid holding event lock
> for too long. (Jan's concern in [0])
>
> Any comments are welcomed.
>
>
> Wei.
>
> [0] http://thread.gmane.org/gmane.comp.emulators.xen.devel/139921
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC] Extending numbers of event channels
2012-12-03 18:52 ` David Vrabel
@ 2012-12-03 19:11 ` Wei Liu
2012-12-03 20:56 ` Wei Liu
2012-12-04 11:29 ` George Dunlap
2 siblings, 0 replies; 20+ messages in thread
From: Wei Liu @ 2012-12-03 19:11 UTC (permalink / raw)
To: David Vrabel; +Cc: wei.liu2, Jan Beulich, xen-devel@lists.xen.org
On Mon, 2012-12-03 at 18:52 +0000, David Vrabel wrote:
> On 03/12/12 16:29, Wei Liu wrote:
> > Hi all
> >
> > There has been discussion on extending number of event channels back in
> > September [0].
>
> It seems that the decision has been made to go for this N-level
> approach. Were any other methods considered?
>
Not yet. The discussion is still open.
> Would a per-VCPU ring of pending events work? The ABI will be easier to
> extend in the future for more event channels. The guest side code will
> be simpler. It will be easier to fairly service the events as they will
> be processed in the order they were raised.
>
Will there be scenario that we need to raise some evtchn's priority? The
ring approach is completely fair.
> The complexity would be in ensuring that events were not lost due to
> lack of space in the ring. This may make the ring prohibitively large
> or require complex or expensive tracking of pending events inside Xen.
>
This also needs to be considered and evaluated...
> > Regarding Jan's comment in [0], I don't think allowing user to specify
> > arbitrary number of levels a good idea. Because only the last level
> > should be shared among vcpus, other level should be in percpu struct to
> > allow for quicker lookup. The idea to let user specify levels will be
> > too complicated in implementation and blow up percpu section (since the
> > size grows exponentially). Three levels should be quite enough. See
> > maths below.
> >
> > Number of event channels:
> > * 32bit: 1024 * sizeof(unsigned long long) * BITS_PER_BYTE = 64k
> > * 64bit: 4096 * sizeof(unsigned long long) * BITS_PER_BYTE = 512k
> > Basically the third level is a new ABI, so I choose to use unsigned long
> > long here to get more event channels.
>
> 32-bit guests will have to treat the unsigned long long as two separate
> words and iterate over then individually.
>
> This is easy to do -- we have an experimental build of Xen and the
> kernel that extends the number of event channels for 32-bit dom0 to 4096
> by having each selector bit select a group of 4 words.
>
4096 is not enough. But this idea of each selector bit selecting more
words could be useful. Can you send me your patches then I will see what
I can do.
Wei.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC] Extending numbers of event channels
2012-12-03 18:52 ` David Vrabel
2012-12-03 19:11 ` Wei Liu
@ 2012-12-03 20:56 ` Wei Liu
2012-12-04 11:35 ` David Vrabel
2012-12-04 11:29 ` George Dunlap
2 siblings, 1 reply; 20+ messages in thread
From: Wei Liu @ 2012-12-03 20:56 UTC (permalink / raw)
To: David Vrabel; +Cc: wei.liu2, Jan Beulich, xen-devel@lists.xen.org
On Mon, Dec 03, 2012 at 06:52:41PM +0000, David Vrabel wrote:
> On 03/12/12 16:29, Wei Liu wrote:
> > Hi all
> >
> > There has been discussion on extending number of event channels back in
> > September [0].
>
> It seems that the decision has been made to go for this N-level
> approach. Were any other methods considered?
>
> Would a per-VCPU ring of pending events work? The ABI will be easier to
> extend in the future for more event channels. The guest side code will
> be simpler. It will be easier to fairly service the events as they will
> be processed in the order they were raised.
>
> The complexity would be in ensuring that events were not lost due to
> lack of space in the ring. This may make the ring prohibitively large
> or require complex or expensive tracking of pending events inside Xen.
>
If I understand correctly, one event will always be queued up for
processing in the ring model, will this be too overkill? What if event
generation is much faster than processing?
In the current implementation, one event channel can be raised multiple
times but it is only processed once.
Wei.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC] Extending numbers of event channels
2012-12-03 18:09 ` Wei Liu
@ 2012-12-04 8:05 ` Jan Beulich
2012-12-04 9:30 ` Ian Campbell
0 siblings, 1 reply; 20+ messages in thread
From: Jan Beulich @ 2012-12-04 8:05 UTC (permalink / raw)
To: Wei Liu; +Cc: Ian Campbell, xen-devel@lists.xen.org
>>> On 03.12.12 at 19:09, Wei Liu <Wei.Liu2@citrix.com> wrote:
> On Mon, 2012-12-03 at 18:00 +0000, Jan Beulich wrote:
>> >>> On 03.12.12 at 18:52, Wei Liu <Wei.Liu2@citrix.com> wrote:
>> > On Mon, 2012-12-03 at 17:35 +0000, Jan Beulich wrote:
>> >> Doesn't the guest also need to set up space for the 2nd level?
>> >>
>> >
>> > Yes. That will be embedded in percpu struct vcpu_info, which will be
>> > also register via the same hypercall op.
>>
>> "struct vcpu_info"? Same hypercall? Or are you mixing up types?
>>
>
> What I meant was the second level will be embedded in struct vcpu_info,
> and the 2nd level will be registered via some hypercall (not the struct
> vcpu_info).
I would strongly recommend against embedding in
struct vcpu_info, particularly in the context of intending to
allow for having further levels in the future.
Plus I don't think you really can embed this - there's just not
enough space left (I'm sure you're aware that you can't
extend the structure in size), in fact there's no space left at
all in the architecture independent part of the structure.
The only option you have is to declare the array you need to
add to immediately follow the structure when having used
the placement hypercall. That would probably be acceptable
for the second from the top level, but you'd again run into
(space) issues when wanting more than 3 levels (as you
validly said, all but the leaf level ought to be per-vCPU).
Jan
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC] Extending numbers of event channels
2012-12-04 8:05 ` Jan Beulich
@ 2012-12-04 9:30 ` Ian Campbell
2012-12-04 9:37 ` Jan Beulich
0 siblings, 1 reply; 20+ messages in thread
From: Ian Campbell @ 2012-12-04 9:30 UTC (permalink / raw)
To: Jan Beulich; +Cc: Wei Liu, xen-devel@lists.xen.org
On Tue, 2012-12-04 at 08:05 +0000, Jan Beulich wrote:
> >>> On 03.12.12 at 19:09, Wei Liu <Wei.Liu2@citrix.com> wrote:
> > On Mon, 2012-12-03 at 18:00 +0000, Jan Beulich wrote:
> >> >>> On 03.12.12 at 18:52, Wei Liu <Wei.Liu2@citrix.com> wrote:
> >> > On Mon, 2012-12-03 at 17:35 +0000, Jan Beulich wrote:
> >> >> Doesn't the guest also need to set up space for the 2nd level?
> >> >>
> >> >
> >> > Yes. That will be embedded in percpu struct vcpu_info, which will be
> >> > also register via the same hypercall op.
> >>
> >> "struct vcpu_info"? Same hypercall? Or are you mixing up types?
> >>
> >
> > What I meant was the second level will be embedded in struct vcpu_info,
> > and the 2nd level will be registered via some hypercall (not the struct
> > vcpu_info).
>
> I would strongly recommend against embedding in
> struct vcpu_info, particularly in the context of intending to
> allow for having further levels in the future.
>
> Plus I don't think you really can embed this - there's just not
> enough space left (I'm sure you're aware that you can't
> extend the structure in size), in fact there's no space left at
> all in the architecture independent part of the structure.
>
> The only option you have is to declare the array you need to
> add to immediately follow the structure when having used
> the placement hypercall. That would probably be acceptable
> for the second from the top level, but you'd again run into
> (space) issues when wanting more than 3 levels (as you
> validly said, all but the leaf level ought to be per-vCPU).
The amount of space needed for levels 1..N-1 ought to be pretty
formulaic for any N, so we could choose to require that enough space be
available after vcpu_info for levels 1..N-1?
With larger N that might become multiple pages, is that a problem? It
might be preferable to have an explicit call which takes potentially
non-contiguous addresses.
Ian.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC] Extending numbers of event channels
2012-12-04 9:30 ` Ian Campbell
@ 2012-12-04 9:37 ` Jan Beulich
0 siblings, 0 replies; 20+ messages in thread
From: Jan Beulich @ 2012-12-04 9:37 UTC (permalink / raw)
To: Ian Campbell; +Cc: Wei Liu, xen-devel@lists.xen.org
>>> On 04.12.12 at 10:30, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> The amount of space needed for levels 1..N-1 ought to be pretty
> formulaic for any N, so we could choose to require that enough space be
> available after vcpu_info for levels 1..N-1?
>
> With larger N that might become multiple pages, is that a problem? It
> might be preferable to have an explicit call which takes potentially
> non-contiguous addresses.
Yes, that was my point about space restriction. The current
registration call allows only for a single page.
Additionally, if we add something variably sized to vcpu_info,
I don't think we should make this immediately successive - that
would prevent fixed size extensions in the future. Instead, we
should have the guest tell at what offset from the base of the
structure the bitmaps start.
Jan
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC] Extending numbers of event channels
2012-12-03 18:52 ` David Vrabel
2012-12-03 19:11 ` Wei Liu
2012-12-03 20:56 ` Wei Liu
@ 2012-12-04 11:29 ` George Dunlap
2012-12-04 13:45 ` Jan Beulich
2 siblings, 1 reply; 20+ messages in thread
From: George Dunlap @ 2012-12-04 11:29 UTC (permalink / raw)
To: David Vrabel; +Cc: Wei Liu, Jan Beulich, xen-devel@lists.xen.org
[-- Attachment #1.1: Type: text/plain, Size: 823 bytes --]
On Mon, Dec 3, 2012 at 6:52 PM, David Vrabel <david.vrabel@citrix.com>wrote:
> On 03/12/12 16:29, Wei Liu wrote:
> > Hi all
> >
> > There has been discussion on extending number of event channels back in
> > September [0].
>
> It seems that the decision has been made to go for this N-level
> approach. Were any other methods considered?
>
> Would a per-VCPU ring of pending events work? The ABI will be easier to
> extend in the future for more event channels. The guest side code will
> be simpler. It will be easier to fairly service the events as they will
> be processed in the order they were raised.
>
Not having looked in depth at either solution, it does seem like having
something like this, rather than an ever-growing array if bits that needs
to be examined, would be simpler and more scalable.
-George
[-- Attachment #1.2: Type: text/html, Size: 1202 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC] Extending numbers of event channels
2012-12-03 20:56 ` Wei Liu
@ 2012-12-04 11:35 ` David Vrabel
2012-12-06 10:03 ` Tim Deegan
0 siblings, 1 reply; 20+ messages in thread
From: David Vrabel @ 2012-12-04 11:35 UTC (permalink / raw)
To: Wei Liu; +Cc: Jan Beulich, xen-devel@lists.xen.org
On 03/12/12 20:56, Wei Liu wrote:
> On Mon, Dec 03, 2012 at 06:52:41PM +0000, David Vrabel wrote:
>> On 03/12/12 16:29, Wei Liu wrote:
>>> Hi all
>>>
>>> There has been discussion on extending number of event channels back in
>>> September [0].
>>
>> It seems that the decision has been made to go for this N-level
>> approach. Were any other methods considered?
>>
>> Would a per-VCPU ring of pending events work? The ABI will be easier to
>> extend in the future for more event channels. The guest side code will
>> be simpler. It will be easier to fairly service the events as they will
>> be processed in the order they were raised.
>>
>> The complexity would be in ensuring that events were not lost due to
>> lack of space in the ring. This may make the ring prohibitively large
>> or require complex or expensive tracking of pending events inside Xen.
>>
>
> If I understand correctly, one event will always be queued up for
> processing in the ring model, will this be too overkill? What if event
> generation is much faster than processing?
>
> In the current implementation, one event channel can be raised multiple
> times but it is only processed once.
There would have to be something in Xen to ensure an event was only
added to the ring once. i.e., if it's already in the ring, it doesn't
get added. This is the tricky bit and I can't immediately think of how
this would work.
David
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC] Extending numbers of event channels
2012-12-04 11:29 ` George Dunlap
@ 2012-12-04 13:45 ` Jan Beulich
0 siblings, 0 replies; 20+ messages in thread
From: Jan Beulich @ 2012-12-04 13:45 UTC (permalink / raw)
To: George Dunlap; +Cc: Wei Liu, David Vrabel, xen-devel@lists.xen.org
>>> On 04.12.12 at 12:29, George Dunlap <dunlapg@umich.edu> wrote:
> On Mon, Dec 3, 2012 at 6:52 PM, David Vrabel <david.vrabel@citrix.com>wrote:
>
>> On 03/12/12 16:29, Wei Liu wrote:
>> > Hi all
>> >
>> > There has been discussion on extending number of event channels back in
>> > September [0].
>>
>> It seems that the decision has been made to go for this N-level
>> approach. Were any other methods considered?
>>
>> Would a per-VCPU ring of pending events work? The ABI will be easier to
>> extend in the future for more event channels. The guest side code will
>> be simpler. It will be easier to fairly service the events as they will
>> be processed in the order they were raised.
>>
>
> Not having looked in depth at either solution, it does seem like having
> something like this, rather than an ever-growing array if bits that needs
> to be examined, would be simpler and more scalable.
Surely more scalable, but simpler?
Jan
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC] Extending numbers of event channels
2012-12-04 11:35 ` David Vrabel
@ 2012-12-06 10:03 ` Tim Deegan
0 siblings, 0 replies; 20+ messages in thread
From: Tim Deegan @ 2012-12-06 10:03 UTC (permalink / raw)
To: David Vrabel; +Cc: Wei Liu, Jan Beulich, xen-devel@lists.xen.org
At 11:35 +0000 on 04 Dec (1354620912), David Vrabel wrote:
> On 03/12/12 20:56, Wei Liu wrote:
> > On Mon, Dec 03, 2012 at 06:52:41PM +0000, David Vrabel wrote:
> >> On 03/12/12 16:29, Wei Liu wrote:
> >>> Hi all
> >>>
> >>> There has been discussion on extending number of event channels back in
> >>> September [0].
> >>
> >> It seems that the decision has been made to go for this N-level
> >> approach. Were any other methods considered?
> >>
> >> Would a per-VCPU ring of pending events work? The ABI will be easier to
> >> extend in the future for more event channels. The guest side code will
> >> be simpler. It will be easier to fairly service the events as they will
> >> be processed in the order they were raised.
> >>
> >> The complexity would be in ensuring that events were not lost due to
> >> lack of space in the ring. This may make the ring prohibitively large
> >> or require complex or expensive tracking of pending events inside Xen.
> >>
> >
> > If I understand correctly, one event will always be queued up for
> > processing in the ring model, will this be too overkill? What if event
> > generation is much faster than processing?
> >
> > In the current implementation, one event channel can be raised multiple
> > times but it is only processed once.
>
> There would have to be something in Xen to ensure an event was only
> added to the ring once. i.e., if it's already in the ring, it doesn't
> get added. This is the tricky bit and I can't immediately think of how
> this would work.
Well, Xen could keep a bitmap of which events it had inserted in the
ring, and then the guest could clear bits from the bitmap as it serviced
the events... :)
More seriously, though, I prefer the existing bitmap interface. It
handles masking and merging naturally, and it lets the guest service
events in whatever order it chooses (not having to process 1 word per
event of potentially uninteresting events off the ring to get to the one
it wants).
Tim.
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2012-12-06 10:03 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-12-03 16:29 [RFC] Extending numbers of event channels Wei Liu
2012-12-03 17:35 ` Jan Beulich
2012-12-03 17:52 ` Wei Liu
2012-12-03 17:57 ` Ian Campbell
2012-12-03 18:15 ` Wei Liu
2012-12-03 18:00 ` Jan Beulich
2012-12-03 18:09 ` Wei Liu
2012-12-04 8:05 ` Jan Beulich
2012-12-04 9:30 ` Ian Campbell
2012-12-04 9:37 ` Jan Beulich
2012-12-03 17:43 ` Ian Campbell
2012-12-03 17:48 ` Jan Beulich
2012-12-03 17:50 ` Ian Campbell
2012-12-03 18:52 ` David Vrabel
2012-12-03 19:11 ` Wei Liu
2012-12-03 20:56 ` Wei Liu
2012-12-04 11:35 ` David Vrabel
2012-12-06 10:03 ` Tim Deegan
2012-12-04 11:29 ` George Dunlap
2012-12-04 13:45 ` Jan Beulich
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).