All of lore.kernel.org
 help / color / mirror / Atom feed
* page ref/type count overflows
@ 2009-01-26 13:10 Jan Beulich
  2009-01-26 13:33 ` Keir Fraser
  0 siblings, 1 reply; 25+ messages in thread
From: Jan Beulich @ 2009-01-26 13:10 UTC (permalink / raw)
  To: xen-devel

With pretty trivial user mode programs being able to crash the kernel due to
the ref counter widths in Xen being more narrow than in Linux, I started an
attempt to put together a kernel side fix. While addressing the plain
hypercalls is pretty strait forward, dealing with multicalls (both when using
them for lazy mmu mode batching and when explicitly using them in e.g.
netback - the backends are susceptible to this too since a malicious guest
could pass grant references to pages that cannot have additional
references obtained) isn't. I'm therefore trying to find alternatives,
preferably ones mostly transparent to the kernel.

The way I was intending to address this on the kernel side in the general
accessors was to re-issue failed hypercalls (or page table writes) with
_PAGE_PRESENT replaced by _PAGE_PROTNONE, thus converting (non-
recoverable) kernel mode faults (under the right circumstances bringing
down the whole kernel) into user mode faults. In order to avoid the
complications of handling failed multicall elements I'm now considering to
add a mechanism for the kernel to tell Xen to
- automatically do a fallback operation like this at least on failed L1 table
writes
- continue handling mmu updates when one failed (at least in the case
where the guest obviously is not prepared to pick up the pieces, i.e. when
the 'pdone' argument is NULL, or alternatively by [dynamically] altering
the meaning of that pointer so that it could point to a bitmap).

The backend drivers in my opinion have no alternative to getting taught
to do full error checking in order to avoid the respective DomU-induced
problems.

Thanks for any thoughts or opinions,
Jan

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: page ref/type count overflows
  2009-01-26 13:10 page ref/type count overflows Jan Beulich
@ 2009-01-26 13:33 ` Keir Fraser
  2009-01-26 13:51   ` Keir Fraser
  2009-01-26 14:10   ` Jan Beulich
  0 siblings, 2 replies; 25+ messages in thread
From: Keir Fraser @ 2009-01-26 13:33 UTC (permalink / raw)
  To: Jan Beulich, xen-devel

On 26/01/2009 13:10, "Jan Beulich" <jbeulich@novell.com> wrote:

> The backend drivers in my opinion have no alternative to getting taught
> to do full error checking in order to avoid the respective DomU-induced
> problems.

Backend drivers, which get their mappings via grants, have sufficient
checking already don't they?

As for the general issue, why fudge around it when for any modern system we
can just fix it? By which I mean, x64 systems with CMPXCHG16 support
(everything but ancient Opterons?) can have a full domain pointer and a long
count_info in struct page_info, and still update both atomically.

It'd be a smaller patch, and it'd be less kludgy. Disadvantages are extra 8
bytes per page (which I think is okay, particularly to fix this nasty issue
in a clean way) and only fixes the issue for x64 (I personally don't care
about non-regressions for i386, especially when the cost of the fix in this
case would be IMO high in code complexity and ugliness).

Already there is no reason why type_info (an unsigned long) could not have a
wider count. It's just no use until count_info is widened.

What do you say to that then? :-)

 -- Keir

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: page ref/type count overflows
  2009-01-26 13:33 ` Keir Fraser
@ 2009-01-26 13:51   ` Keir Fraser
  2009-01-26 14:10   ` Jan Beulich
  1 sibling, 0 replies; 25+ messages in thread
From: Keir Fraser @ 2009-01-26 13:51 UTC (permalink / raw)
  To: Jan Beulich, xen-devel

On 26/01/2009 13:33, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:

> As for the general issue, why fudge around it when for any modern system we
> can just fix it? By which I mean, x64 systems with CMPXCHG16 support
> (everything but ancient Opterons?) can have a full domain pointer and a long
> count_info in struct page_info, and still update both atomically.

Actually I think we could do without CX16. I'm not sure count_info and
domain need to be checked/updated at the same time. I think it would suffice
to update count_info, then check domain and undo on mismatch. That'd get rid
of the existing ugly open-coded cmpxchg8b operations too.

I think I'll look into this bit at least...

 -- Keir

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: page ref/type count overflows
  2009-01-26 13:33 ` Keir Fraser
  2009-01-26 13:51   ` Keir Fraser
@ 2009-01-26 14:10   ` Jan Beulich
  2009-01-26 14:19     ` Keir Fraser
  1 sibling, 1 reply; 25+ messages in thread
From: Jan Beulich @ 2009-01-26 14:10 UTC (permalink / raw)
  To: Keir Fraser; +Cc: xen-devel

>>> Keir Fraser <keir.fraser@eu.citrix.com> 26.01.09 14:33 >>>
>On 26/01/2009 13:10, "Jan Beulich" <jbeulich@novell.com> wrote:
>
>> The backend drivers in my opinion have no alternative to getting taught
>> to do full error checking in order to avoid the respective DomU-induced
>> problems.
>
>Backend drivers, which get their mappings via grants, have sufficient
>checking already don't they?

Actually, after some checking, almost. There's one leak in netback, but
that's trivial to fix.

>As for the general issue, why fudge around it when for any modern system we
>can just fix it? By which I mean, x64 systems with CMPXCHG16 support
>(everything but ancient Opterons?) can have a full domain pointer and a long
>count_info in struct page_info, and still update both atomically.
>
>It'd be a smaller patch, and it'd be less kludgy. Disadvantages are extra 8
>bytes per page (which I think is okay, particularly to fix this nasty issue
>in a clean way) and only fixes the issue for x64 (I personally don't care
>about non-regressions for i386, especially when the cost of the fix in this
>case would be IMO high in code complexity and ugliness).
>
>Already there is no reason why type_info (an unsigned long) could not have a
>wider count. It's just no use until count_info is widened.
>
>What do you say to that then? :-)

I considered all of this first, but afair it's not only ancient Opterons that
don't have cmpxchg16b.

But (just having got your second response) if we can do without that ugly
cmpxchg8b altogether, then of course this is the much preferred solution.
The growing of struct page_info of course isn't very fortunate, but pretty
much unavoidable.

Jan

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: page ref/type count overflows
  2009-01-26 14:10   ` Jan Beulich
@ 2009-01-26 14:19     ` Keir Fraser
  2009-01-26 14:38       ` Jan Beulich
  0 siblings, 1 reply; 25+ messages in thread
From: Keir Fraser @ 2009-01-26 14:19 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

On 26/01/2009 14:10, "Jan Beulich" <jbeulich@novell.com> wrote:

> But (just having got your second response) if we can do without that ugly
> cmpxchg8b altogether, then of course this is the much preferred solution.
> The growing of struct page_info of course isn't very fortunate, but pretty
> much unavoidable.

I'm having at the cmpxchg8b right now. I'll also widen the fields I guess,
since that's pretty easy.

 -- Keir

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: page ref/type count overflows
  2009-01-26 14:19     ` Keir Fraser
@ 2009-01-26 14:38       ` Jan Beulich
  2009-01-26 14:54         ` Keir Fraser
  0 siblings, 1 reply; 25+ messages in thread
From: Jan Beulich @ 2009-01-26 14:38 UTC (permalink / raw)
  To: Keir Fraser; +Cc: xen-devel

>>> Keir Fraser <keir.fraser@eu.citrix.com> 26.01.09 15:19 >>>
>On 26/01/2009 14:10, "Jan Beulich" <jbeulich@novell.com> wrote:
>
>> But (just having got your second response) if we can do without that ugly
>> cmpxchg8b altogether, then of course this is the much preferred solution.
>> The growing of struct page_info of course isn't very fortunate, but pretty
>> much unavoidable.
>
>I'm having at the cmpxchg8b right now. I'll also widen the fields I guess,
>since that's pretty easy.

The count_info one only, or are you also intending to change _domain?
With struct page_info pretty large already, I'd want to avoid growing it
needlessly.

Partly related to this and partly to the recent heap changes - the need to
constrain the Xen heap to the lower 4G is purely due pickle_domptr(), right?
Wouldn't it make sense to change alloc_domain_struct() then to allocate
a page directly, have pickle_domptr() convert to pfn, and remove the
restriction? Apart from the (auto-ballooning) issues pointed out before, I
realize that such a restriction may make it impossible to create domains
altogether, since there continues to be no way to control what specific
pages can be ballooned out of Dom0 (which as we know has been limiting
the ability to create 32-bit domains).

And in order to possibly buy back the amount the structure will grow now,
wouldn't it make sense to have specialized linked list handling for the
page_info structures that also uses pfn encoding (rather than a full
pointer)?

Jan

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: page ref/type count overflows
  2009-01-26 14:38       ` Jan Beulich
@ 2009-01-26 14:54         ` Keir Fraser
  2009-01-26 16:15           ` Jan Beulich
  2009-01-26 17:01           ` Keir Fraser
  0 siblings, 2 replies; 25+ messages in thread
From: Keir Fraser @ 2009-01-26 14:54 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

On 26/01/2009 14:38, "Jan Beulich" <jbeulich@novell.com> wrote:

>> I'm having at the cmpxchg8b right now. I'll also widen the fields I guess,
>> since that's pretty easy.
> 
> The count_info one only, or are you also intending to change _domain?
> With struct page_info pretty large already, I'd want to avoid growing it
> needlessly.

I'm going to change both and do it the easy way, growing the structure from
40 to 48 bytes. You can shrink it again with the tricks you describe if
you're keen. I'm not sure how ugly the list stuff would end up, which would
be my main concern, but I suppose you can hide it behind list.h-style
macros. I don't see there's much duplication of effort to phase the work
like this.

 -- Keir

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: page ref/type count overflows
  2009-01-26 14:54         ` Keir Fraser
@ 2009-01-26 16:15           ` Jan Beulich
  2009-01-26 16:30             ` Keir Fraser
  2009-01-26 17:01           ` Keir Fraser
  1 sibling, 1 reply; 25+ messages in thread
From: Jan Beulich @ 2009-01-26 16:15 UTC (permalink / raw)
  To: Keir Fraser; +Cc: xen-devel

>>> Keir Fraser <keir.fraser@eu.citrix.com> 26.01.09 15:54 >>>
>On 26/01/2009 14:38, "Jan Beulich" <jbeulich@novell.com> wrote:
>
>>> I'm having at the cmpxchg8b right now. I'll also widen the fields I guess,
>>> since that's pretty easy.
>> 
>> The count_info one only, or are you also intending to change _domain?
>> With struct page_info pretty large already, I'd want to avoid growing it
>> needlessly.
>
>I'm going to change both and do it the easy way, growing the structure from
>40 to 48 bytes. You can shrink it again with the tricks you describe if
>you're keen. I'm not sure how ugly the list stuff would end up, which would
>be my main concern, but I suppose you can hide it behind list.h-style
>macros. I don't see there's much duplication of effort to phase the work
>like this.

No word on the question regarding the below-4G restriction of the Xen heap?
Also, what's the purpose of those (_domain & 1) checks in unpickle_domptr()?

Jan

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: page ref/type count overflows
  2009-01-26 16:15           ` Jan Beulich
@ 2009-01-26 16:30             ` Keir Fraser
  2009-01-27  9:34               ` Tim Deegan
  0 siblings, 1 reply; 25+ messages in thread
From: Keir Fraser @ 2009-01-26 16:30 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

On 26/01/2009 16:15, "Jan Beulich" <jbeulich@novell.com> wrote:

>> I'm going to change both and do it the easy way, growing the structure from
>> 40 to 48 bytes. You can shrink it again with the tricks you describe if
>> you're keen. I'm not sure how ugly the list stuff would end up, which would
>> be my main concern, but I suppose you can hide it behind list.h-style
>> macros. I don't see there's much duplication of effort to phase the work
>> like this.
> 
> No word on the question regarding the below-4G restriction of the Xen heap?

Got rid of it. It was only needed for the pickle_domptr() thing, as you
suspected.

> Also, what's the purpose of those (_domain & 1) checks in unpickle_domptr()?

Introduced by shadow2 code, and I think it's never been needed. I'm going to
nuke it, but I will be checking some other things with the shadow code
maintainers so I'll double check that too.

 -- Keir

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: page ref/type count overflows
  2009-01-26 14:54         ` Keir Fraser
  2009-01-26 16:15           ` Jan Beulich
@ 2009-01-26 17:01           ` Keir Fraser
  2009-01-27 10:16             ` Jan Beulich
  1 sibling, 1 reply; 25+ messages in thread
From: Keir Fraser @ 2009-01-26 17:01 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

On 26/01/2009 14:54, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:

>> The count_info one only, or are you also intending to change _domain?
>> With struct page_info pretty large already, I'd want to avoid growing it
>> needlessly.
> 
> I'm going to change both and do it the easy way, growing the structure from
> 40 to 48 bytes. You can shrink it again with the tricks you describe if
> you're keen. I'm not sure how ugly the list stuff would end up, which would
> be my main concern, but I suppose you can hide it behind list.h-style
> macros. I don't see there's much duplication of effort to phase the work
> like this.

By the way, unless you can see some really clever way to shrink page_info to
32 bytes then I think it is only worth doing compression tricks on the
list_head field, to save 8 bytes (struct becomes 40 bytes). Compressing the
domain field won't get you down to another multiple-of-eight size. It may be
a trick to keep in mind for future though...

And all my stuff is in, as of changeset 19093.

 -- Keir

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: page ref/type count overflows
  2009-01-26 16:30             ` Keir Fraser
@ 2009-01-27  9:34               ` Tim Deegan
  0 siblings, 0 replies; 25+ messages in thread
From: Tim Deegan @ 2009-01-27  9:34 UTC (permalink / raw)
  To: Keir Fraser; +Cc: xen-devel

At 16:30 +0000 on 26 Jan (1232987412), Keir Fraser wrote:
> Introduced by shadow2 code, and I think it's never been needed. 

Certainly not needed now; at one point it was used to recover more
usable bits in the shadow page_info struct.

Cheers,

Tim.

-- 
Tim Deegan <Tim.Deegan@citrix.com>
Principal Software Engineer, Citrix Systems (R&D) Ltd.
[Company #02300071, SL9 0DZ, UK.]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: page ref/type count overflows
  2009-01-26 17:01           ` Keir Fraser
@ 2009-01-27 10:16             ` Jan Beulich
  2009-01-27 10:24               ` Keir Fraser
  0 siblings, 1 reply; 25+ messages in thread
From: Jan Beulich @ 2009-01-27 10:16 UTC (permalink / raw)
  To: Keir Fraser; +Cc: xen-devel

>>> Keir Fraser <keir.fraser@eu.citrix.com> 26.01.09 18:01 >>>
>On 26/01/2009 14:54, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:
>
>>> The count_info one only, or are you also intending to change _domain?
>>> With struct page_info pretty large already, I'd want to avoid growing it
>>> needlessly.
>> 
>> I'm going to change both and do it the easy way, growing the structure from
>> 40 to 48 bytes. You can shrink it again with the tricks you describe if
>> you're keen. I'm not sure how ugly the list stuff would end up, which would
>> be my main concern, but I suppose you can hide it behind list.h-style
>> macros. I don't see there's much duplication of effort to phase the work
>> like this.
>
>By the way, unless you can see some really clever way to shrink page_info to
>32 bytes then I think it is only worth doing compression tricks on the
>list_head field, to save 8 bytes (struct becomes 40 bytes). Compressing the
>domain field won't get you down to another multiple-of-eight size. It may be
>a trick to keep in mind for future though...

Yes, I too realized that on my way home yesterday. The only thing I see as
viable for consideration would be to remove the embedded spinlock again, and
use the same bit-lock as x86-32 does.

And of course I'd like to see the cpumask gone, but that's gonna be more
tricky (if possible at all).

>And all my stuff is in, as of changeset 19093.

Thanks! One thing though that puzzled me regarding c/s 19093: How can
the mbz field have changed from being an overlay of u.inuse._domain to
being one of count_info? And if this was indeed intended that way, then
the comment prior to shadow_check_page_struct_offsets() should be
updated accordingly.

And shouldn't shadow's count field also be widened to BITS_PER_LONG-6?

Jan

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: page ref/type count overflows
  2009-01-27 10:16             ` Jan Beulich
@ 2009-01-27 10:24               ` Keir Fraser
  2009-01-27 11:22                 ` Keir Fraser
  2009-01-29  8:34                 ` Jan Beulich
  0 siblings, 2 replies; 25+ messages in thread
From: Keir Fraser @ 2009-01-27 10:24 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Tim Deegan, xen-devel

On 27/01/2009 10:16, "Jan Beulich" <jbeulich@novell.com> wrote:

> Yes, I too realized that on my way home yesterday. The only thing I see as
> viable for consideration would be to remove the embedded spinlock again, and
> use the same bit-lock as x86-32 does.

Makes page_unlock() more expensive on x64. Don't know how much that matters.

> And of course I'd like to see the cpumask gone, but that's gonna be more
> tricky (if possible at all).

Given that page allocations tend to be bursty, we could tlbflush-check all
CPUs? Or make it a groups-of-CPUs mask? I suppose until we really care about
NR_CPUS > 64 it's not an important thing to get rid of anyhow.

>> And all my stuff is in, as of changeset 19093.
> 
> Thanks! One thing though that puzzled me regarding c/s 19093: How can
> the mbz field have changed from being an overlay of u.inuse._domain to
> being one of count_info? And if this was indeed intended that way, then
> the comment prior to shadow_check_page_struct_offsets() should be
> updated accordingly.

It was deliberate because of how get_page() now works. Zero count_info is
properly how we determine 'ordinary' allocated pages from free pages and
shadow pages. I think keeping the owner field as mbz instead was a bit
fragile. Yes the comment needs updating.

> And shouldn't shadow's count field also be widened to BITS_PER_LONG-6?

Would be nice. Hopefully either Tim or Gianluca will see to that.

 -- Keir

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: page ref/type count overflows
  2009-01-27 10:24               ` Keir Fraser
@ 2009-01-27 11:22                 ` Keir Fraser
  2009-01-27 15:38                   ` Keir Fraser
  2009-01-29  8:34                 ` Jan Beulich
  1 sibling, 1 reply; 25+ messages in thread
From: Keir Fraser @ 2009-01-27 11:22 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

On 27/01/2009 10:24, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:

> On 27/01/2009 10:16, "Jan Beulich" <jbeulich@novell.com> wrote:
> 
>> Yes, I too realized that on my way home yesterday. The only thing I see as
>> viable for consideration would be to remove the embedded spinlock again, and
>> use the same bit-lock as x86-32 does.
> 
> Makes page_unlock() more expensive on x64. Don't know how much that matters.

I think we could perhaps merge it into type_info and take/release it at the
same time as get_page_type/put_page_type, to avoid extra atomic ops. I'll
look into that.

For the cpumask, I have some ideas there too...

 -- Keir

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: page ref/type count overflows
  2009-01-27 11:22                 ` Keir Fraser
@ 2009-01-27 15:38                   ` Keir Fraser
  2009-01-27 15:49                     ` Jan Beulich
  0 siblings, 1 reply; 25+ messages in thread
From: Keir Fraser @ 2009-01-27 15:38 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel@lists.xensource.com

On 27/01/2009 11:22, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:

>> Makes page_unlock() more expensive on x64. Don't know how much that matters.
> 
> I think we could perhaps merge it into type_info and take/release it at the
> same time as get_page_type/put_page_type, to avoid extra atomic ops. I'll look
> into that.
> 
> For the cpumask, I have some ideas there too...

Jan,

I have a patch for the lock field. It doesn't work yet but I should have it
done quite soon. It cleans up various races at the same time. I will also
look into the cpumask (not so important right now, but I think it will be
quite easy to get rid of anyway).

So... For the rest I thought I would leave it in your court. Since you might
be looking into it already. :-) What I see needing still to be done:
 * Shrink list_head
 * Shrink u.inuse._domain
 * Also will need to shrink sh_page_info (currently 48 bytes). It contains a
few chain pointers which you could probably compress down like page_info's
list_head.

So, overall nothing very hard, but probably a bunch of code to have to
modify! But it should get us down to 32 bytes, I think.
 
 -- Keir

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: page ref/type count overflows
  2009-01-27 15:38                   ` Keir Fraser
@ 2009-01-27 15:49                     ` Jan Beulich
  2009-01-27 16:03                       ` Keir Fraser
  0 siblings, 1 reply; 25+ messages in thread
From: Jan Beulich @ 2009-01-27 15:49 UTC (permalink / raw)
  To: Keir Fraser; +Cc: xen-devel@lists.xensource.com

>>> Keir Fraser <keir.fraser@eu.citrix.com> 27.01.09 16:38 >>>
>So... For the rest I thought I would leave it in your court. Since you might
>be looking into it already. :-) What I see needing still to be done:
> * Shrink list_head
> * Shrink u.inuse._domain
> * Also will need to shrink sh_page_info (currently 48 bytes). It contains a
>few chain pointers which you could probably compress down like page_info's
>list_head.

Yes, I think I already have about half of the needed changes done.

Jan

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: page ref/type count overflows
  2009-01-27 15:49                     ` Jan Beulich
@ 2009-01-27 16:03                       ` Keir Fraser
  0 siblings, 0 replies; 25+ messages in thread
From: Keir Fraser @ 2009-01-27 16:03 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel@lists.xensource.com

On 27/01/2009 15:49, "Jan Beulich" <jbeulich@novell.com> wrote:

>>>> Keir Fraser <keir.fraser@eu.citrix.com> 27.01.09 16:38 >>>
>> So... For the rest I thought I would leave it in your court. Since you might
>> be looking into it already. :-) What I see needing still to be done:
>> * Shrink list_head
>> * Shrink u.inuse._domain
>> * Also will need to shrink sh_page_info (currently 48 bytes). It contains a
>> few chain pointers which you could probably compress down like page_info's
>> list_head.
> 
> Yes, I think I already have about half of the needed changes done.

Excellent. Mine is now in as c/s 19103. You'll need that to get down to 32
bytes.

 -- Keir

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: page ref/type count overflows
  2009-01-27 10:24               ` Keir Fraser
  2009-01-27 11:22                 ` Keir Fraser
@ 2009-01-29  8:34                 ` Jan Beulich
  2009-01-29  8:45                   ` Keir Fraser
  2009-01-29 10:12                   ` Tim Deegan
  1 sibling, 2 replies; 25+ messages in thread
From: Jan Beulich @ 2009-01-29  8:34 UTC (permalink / raw)
  To: Gianluca Guida, Tim Deegan; +Cc: xen-devel, Keir Fraser

>>> Keir Fraser <keir.fraser@eu.citrix.com> 27.01.09 11:24 >>>
>On 27/01/2009 10:16, "Jan Beulich" <jbeulich@novell.com> wrote:
>> And shouldn't shadow's count field also be widened to BITS_PER_LONG-6?
>
>Would be nice. Hopefully either Tim or Gianluca will see to that.

Actually, I'd like to go a step further: Is there any reason why struct
shadow_page_info must be separate from struct page_info (rather than
sharing the definition, requiring some re-ordering of its elements)?

Thanks, Jan

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: page ref/type count overflows
  2009-01-29  8:34                 ` Jan Beulich
@ 2009-01-29  8:45                   ` Keir Fraser
  2009-01-29 10:54                     ` Jan Beulich
  2009-01-29 10:12                   ` Tim Deegan
  1 sibling, 1 reply; 25+ messages in thread
From: Keir Fraser @ 2009-01-29  8:45 UTC (permalink / raw)
  To: Jan Beulich, Gianluca Guida, Tim Deegan; +Cc: xen-devel

On 29/01/2009 08:34, "Jan Beulich" <jbeulich@novell.com> wrote:

>>>> Keir Fraser <keir.fraser@eu.citrix.com> 27.01.09 11:24 >>>
>> On 27/01/2009 10:16, "Jan Beulich" <jbeulich@novell.com> wrote:
>>> And shouldn't shadow's count field also be widened to BITS_PER_LONG-6?
>> 
>> Would be nice. Hopefully either Tim or Gianluca will see to that.
> 
> Actually, I'd like to go a step further: Is there any reason why struct
> shadow_page_info must be separate from struct page_info (rather than
> sharing the definition, requiring some re-ordering of its elements)?

Not really, apart from wanting to keep shadow stuff in one place in a
private header file, I suppose. Would it risk turning page_info's definition
into crazy union soup? If it could be done as something like:
 unsigned long count_info
 union {
  struct { page_info stuff }; // anonymous
  struct { sh_page_info stuff }; // anonymous
 } // anonymous
That would be nicer than what we currently have, I'd agree. And the more
stuff we can pull out of the anonymous union (e.g., perhaps a list_head?)
then the better, of course.

 -- Keir

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: page ref/type count overflows
  2009-01-29  8:34                 ` Jan Beulich
  2009-01-29  8:45                   ` Keir Fraser
@ 2009-01-29 10:12                   ` Tim Deegan
  2009-01-29 10:57                     ` Jan Beulich
  1 sibling, 1 reply; 25+ messages in thread
From: Tim Deegan @ 2009-01-29 10:12 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Gianluca Guida, xen-devel, Keir Fraser

At 08:34 +0000 on 29 Jan (1233218074), Jan Beulich wrote:
> >>> Keir Fraser <keir.fraser@eu.citrix.com> 27.01.09 11:24 >>>
> >On 27/01/2009 10:16, "Jan Beulich" <jbeulich@novell.com> wrote:
> >> And shouldn't shadow's count field also be widened to BITS_PER_LONG-6?
> >
> >Would be nice. Hopefully either Tim or Gianluca will see to that.
> 
> Actually, I'd like to go a step further: Is there any reason why struct
> shadow_page_info must be separate from struct page_info (rather than
> sharing the definition, requiring some re-ordering of its elements)?

Well, when it _did_ use struct page_info the code was full of ugly hacks
to wedge information into fields with misleading names. :) I also like
the type-safety of not having the two structs anonymous-unioned
together; it's already confusing which field names are valid at any
time.

I've no objection to having the fields merged if it can be done without
hoicking lots of internal shadow-code definitions back out into common
code (it took me ages to separate it all!), and if it gets some real
benefit (like sharing most of the existing fields).

Cheers,

Tim.

-- 
Tim Deegan <Tim.Deegan@citrix.com>
Principal Software Engineer, Citrix Systems (R&D) Ltd.
[Company #02300071, SL9 0DZ, UK.]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: page ref/type count overflows
  2009-01-29  8:45                   ` Keir Fraser
@ 2009-01-29 10:54                     ` Jan Beulich
  2009-01-29 11:07                       ` Tim Deegan
  2009-01-29 14:04                       ` Gianluca Guida
  0 siblings, 2 replies; 25+ messages in thread
From: Jan Beulich @ 2009-01-29 10:54 UTC (permalink / raw)
  To: Gianluca Guida, Keir Fraser, Tim Deegan; +Cc: xen-devel

>>> Keir Fraser <keir.fraser@eu.citrix.com> 29.01.09 09:45 >>>
>On 29/01/2009 08:34, "Jan Beulich" <jbeulich@novell.com> wrote:
>> Actually, I'd like to go a step further: Is there any reason why struct
>> shadow_page_info must be separate from struct page_info (rather than
>> sharing the definition, requiring some re-ordering of its elements)?
>
>Not really, apart from wanting to keep shadow stuff in one place in a
>private header file, I suppose. Would it risk turning page_info's definition
>into crazy union soup? If it could be done as something like:
> unsigned long count_info
> union {
>  struct { page_info stuff }; // anonymous
>  struct { sh_page_info stuff }; // anonymous
> } // anonymous
>That would be nicer than what we currently have, I'd agree. And the more
>stuff we can pull out of the anonymous union (e.g., perhaps a list_head?)
>then the better, of course.

Below is what I currently have it at. I'm afraid it won't get much simpler,
but I think it reasonably expresses the individual overlays. There are three
more transformations I plan to make:
- _domain -> unsigned int
- next_shadow -> __mfn_t
- split u into two unions (one having type_info, type/pinned/count, and
cpumask, the other having _domain, back, and order).

That last step is to avoid having to re-add __attribute__ ((__packed__)),
so that other (future) changes to the structure won't risk mis-aligning any
fields again.

Does this look acceptable?

Jan

/*
 * This definition is solely for the use in struct page_info (and
 * struct page_list_head), intended to allow easy adjustment once x86-64
 * wants to support more than 16Tb.
 * 'unsigned long' should be used for MFNs everywhere else.
 */
#define __mfn_t unsigned int

#ifndef __i386__
#undef page_list_entry
struct page_list_entry
{
    __mfn_t next, prev;
};
#endif

struct page_info
{
    union {
        /* Each frame can be threaded onto a doubly-linked list.
         *
         * For unused shadow pages, a list of pages of this order; for
         * pinnable shadows, if pinned, a list of other pinned shadows
         * (see sh_type_is_pinnable() below for the definition of
         * "pinnable" shadow types).
         */
        struct page_list_entry list;
        /* For non-pinnable shadows, a higher entry that points at us. */
        paddr_t up;
    };

    /* Reference count and various PGC_xxx flags and fields. */
    unsigned long count_info;

    /* Context-dependent fields follow... */
    union {

        /* Page is in use: ((count_info & PGC_count_mask) != 0). */
        struct {
            /* Owner of this page (NULL if page is anonymous). */
            unsigned long _domain; /* pickled format */
            /* Type reference count and various PGT_xxx flags and fields. */
            unsigned long type_info;
        } inuse;

        /* Page is in use by shadow code: count_info == 0. */
        struct {
            unsigned long type:5;   /* What kind of shadow is this? */
            unsigned long pinned:1; /* Is the shadow pinned? */
            unsigned long count:26; /* Reference count */
            union {
                /* When in use, GMFN of guest page we're a shadow of. */
                __mfn_t back;
                /* When free, order of the freelist we're on. */
                unsigned int order;
            };
        } sh;

        /* Page is on a free list: ((count_info & PGC_count_mask) == 0). */
        struct {
            /* Order-size of the free chunk this page is the head of. */
            u32 order;
            /* Mask of possibly-tainted TLBs. */
            cpumask_t cpumask;
        } free;

    } u;

    union {
        /*
         * Timestamp from 'TLB clock', used to avoid extra safety flushes.
         * Only valid for: a) free pages, and b) pages with zero type count
         * (except page table pages when the guest is in shadow mode).
         */
        u32 tlbflush_timestamp;

        /*
         * When PGT_partial is true then this field is valid and indicates
         * that PTEs in the range [0, @nr_validated_ptes) have been validated.
         * An extra page reference must be acquired (or not dropped) whenever
         * PGT_partial gets set, and it must be dropped when the flag gets
         * cleared. This is so that a get() leaving a page in partially
         * validated state (where the caller would drop the reference acquired
         * due to the getting of the type [apparently] failing [-EAGAIN])
         * would not accidentally result in a page left with zero general
         * reference count, but non-zero type reference count (possible when
         * the partial get() is followed immediately by domain destruction).
         * Likewise, the ownership of the single type reference for partially
         * (in-)validated pages is tied to this flag, i.e. the instance
         * setting the flag must not drop that reference, whereas the instance
         * clearing it will have to.
         *
         * If @partial_pte is positive then PTE at @nr_validated_ptes+1 has
         * been partially validated. This implies that the general reference
         * to the page (acquired from get_page_from_lNe()) would be dropped
         * (again due to the apparent failure) and hence must be re-acquired
         * when resuming the validation, but must not be dropped when picking
         * up the page for invalidation.
         *
         * If @partial_pte is negative then PTE at @nr_validated_ptes+1 has
         * been partially invalidated. This is basically the opposite case of
         * above, i.e. the general reference to the page was not dropped in
         * put_page_from_lNe() (due to the apparent failure), and hence it
         * must be dropped when the put operation is resumed (and completes),
         * but it must not be acquired if picking up the page for validation.
         */
        struct {
            u16 nr_validated_ptes;
            s8 partial_pte;
        };

        /*
         * Guest pages with a shadow.  This does not conflict with
         * tlbflush_timestamp since page table pages are explicitly not
         * tracked for TLB-flush avoidance when a guest runs in shadow mode.
         */
        u32 shadow_flags;

        /* When in use, next shadow in this hash chain */
        struct page_info *next_shadow;
    };
};

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: page ref/type count overflows
  2009-01-29 10:12                   ` Tim Deegan
@ 2009-01-29 10:57                     ` Jan Beulich
  0 siblings, 0 replies; 25+ messages in thread
From: Jan Beulich @ 2009-01-29 10:57 UTC (permalink / raw)
  To: Tim Deegan; +Cc: Gianluca Guida, xen-devel, Keir Fraser

>>> Tim Deegan <Tim.Deegan@citrix.com> 29.01.09 11:12 >>>
>Well, when it _did_ use struct page_info the code was full of ugly hacks
>to wedge information into fields with misleading names. :) I also like

It has mostly clear names now (I mostly kept their original shadow names),
but it avoids redundancy on fields that really have the same purpose in
and outside of shadow code.

>the type-safety of not having the two structs anonymous-unioned
>together; it's already confusing which field names are valid at any
>time.

If type-safety is a concern, then shadow_page_info could of course be
made a secondary definition. But I think with properly named fields that
may not be as much of a concern.

>I've no objection to having the fields merged if it can be done without
>hoicking lots of internal shadow-code definitions back out into common
>code (it took me ages to separate it all!), and if it gets some real
>benefit (like sharing most of the existing fields).

For this, see my other reply.

Jan

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: page ref/type count overflows
  2009-01-29 10:54                     ` Jan Beulich
@ 2009-01-29 11:07                       ` Tim Deegan
  2009-01-29 11:26                         ` Keir Fraser
  2009-01-29 14:04                       ` Gianluca Guida
  1 sibling, 1 reply; 25+ messages in thread
From: Tim Deegan @ 2009-01-29 11:07 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Gianluca Guida, xen-devel, Keir Fraser

At 10:54 +0000 on 29 Jan (1233226449), Jan Beulich wrote:
> Below is what I currently have it at. I'm afraid it won't get much simpler,
> but I think it reasonably expresses the individual overlays. There are three
> more transformations I plan to make:
> - _domain -> unsigned int
> - next_shadow -> __mfn_t
> - split u into two unions (one having type_info, type/pinned/count, and
> cpumask, the other having _domain, back, and order).
> 
> That last step is to avoid having to re-add __attribute__ ((__packed__)),
> so that other (future) changes to the structure won't risk mis-aligning any
> fields again.
> 
> Does this look acceptable?

Seems fine, which is to say I don't see much advantage but have no 
objection. :)  Please try to make it clear in the comments which fields
belong to a page which _has_ a shadow and which to a page that _is_ a
shadow.

Cheers,

Tim.

-- 
Tim Deegan <Tim.Deegan@citrix.com>
Principal Software Engineer, Citrix Systems (R&D) Ltd.
[Company #02300071, SL9 0DZ, UK.]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: page ref/type count overflows
  2009-01-29 11:07                       ` Tim Deegan
@ 2009-01-29 11:26                         ` Keir Fraser
  0 siblings, 0 replies; 25+ messages in thread
From: Keir Fraser @ 2009-01-29 11:26 UTC (permalink / raw)
  To: Tim Deegan, Jan Beulich; +Cc: Gianluca Guida, xen-devel

On 29/01/2009 11:07, "Tim Deegan" <Tim.Deegan@citrix.com> wrote:

>> Does this look acceptable?
> 
> Seems fine, which is to say I don't see much advantage but have no
> objection. :)  Please try to make it clear in the comments which fields
> belong to a page which _has_ a shadow and which to a page that _is_ a
> shadow.

I'd agree with Tim. I perhaps have a slight preference for keeping it all in
one struct (i.e., the new approach) but I'm happy either way.

 -- Keir

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: page ref/type count overflows
  2009-01-29 10:54                     ` Jan Beulich
  2009-01-29 11:07                       ` Tim Deegan
@ 2009-01-29 14:04                       ` Gianluca Guida
  1 sibling, 0 replies; 25+ messages in thread
From: Gianluca Guida @ 2009-01-29 14:04 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Tim Deegan, xen-devel, Keir Fraser

Jan Beulich wrote:
> Below is what I currently have it at. I'm afraid it won't get much simpler,
> but I think it reasonably expresses the individual overlays. There are three
> more transformations I plan to make:
> - _domain -> unsigned int
> - next_shadow -> __mfn_t
> - split u into two unions (one having type_info, type/pinned/count, and
> cpumask, the other having _domain, back, and order).
> 
> That last step is to avoid having to re-add __attribute__ ((__packed__)),
> so that other (future) changes to the structure won't risk mis-aligning any
> fields again.
> 
> Does this look acceptable?

Personally, while I don't have (and I can't have) anything against this, 
I think that this kind of unified struct would make the shadow code more 
error-prone, by making shadow pages and guest pages being casted by the 
same type.
Of course, every bug is fixable, and good programmers shouldn't do this 
kind of errors. But I think that being able to differentiate between 
shadow's page_info and normal page_info by something more than just a 
variable name would help a lot the clarity of this already complex code.

Thanks,
Gianluca

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2009-01-29 14:04 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-01-26 13:10 page ref/type count overflows Jan Beulich
2009-01-26 13:33 ` Keir Fraser
2009-01-26 13:51   ` Keir Fraser
2009-01-26 14:10   ` Jan Beulich
2009-01-26 14:19     ` Keir Fraser
2009-01-26 14:38       ` Jan Beulich
2009-01-26 14:54         ` Keir Fraser
2009-01-26 16:15           ` Jan Beulich
2009-01-26 16:30             ` Keir Fraser
2009-01-27  9:34               ` Tim Deegan
2009-01-26 17:01           ` Keir Fraser
2009-01-27 10:16             ` Jan Beulich
2009-01-27 10:24               ` Keir Fraser
2009-01-27 11:22                 ` Keir Fraser
2009-01-27 15:38                   ` Keir Fraser
2009-01-27 15:49                     ` Jan Beulich
2009-01-27 16:03                       ` Keir Fraser
2009-01-29  8:34                 ` Jan Beulich
2009-01-29  8:45                   ` Keir Fraser
2009-01-29 10:54                     ` Jan Beulich
2009-01-29 11:07                       ` Tim Deegan
2009-01-29 11:26                         ` Keir Fraser
2009-01-29 14:04                       ` Gianluca Guida
2009-01-29 10:12                   ` Tim Deegan
2009-01-29 10:57                     ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.