All of lore.kernel.org
 help / color / mirror / Atom feed
* The mfn of the frame, that holds a mlock-ed PV domU usermode page, can change
@ 2010-04-12 18:54 Rafal Wojtczuk
  2010-04-12 20:01 ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 9+ messages in thread
From: Rafal Wojtczuk @ 2010-04-12 18:54 UTC (permalink / raw)
  To: xen-devel; +Cc: joanna


[-- Attachment #1.1: Type: text/plain, Size: 2490 bytes --]

Hello,
Would someone on the list enlight me on the following issue, possibly related 
to mfn management in the PV guest.
Environment: xen-3.4.3, pvops 2.6.32.9 in dom0 and domU, all 64bit.
Usermode code
(if you are interested, at http://gitweb.qubes-os.org/gitweb/?p=mainstream/gui.git;a=blob;f=vchan/vchan/init.c;h=cb2fb851c3b97804b115dbf58fd47a30d6d0a8a3;hb=HEAD)
in PV domU does the following:
1) gets a page via
ring_ref_v=mmap(0, 4096, PROT_READ | PROT_WRITE, MAP_PRIVATE |MAP_ANON,-1, 0);
2) mlock(ring_ref_v, 4096)
3) determines the mfn number of the frame holding ring_ref_v via a call to 
u2mfn driver
(http://gitweb.qubes-os.org/gitweb/?p=mainstream/gui.git;a=blob;f=vchan/u2mfn/u2mfn.c;h=6ff113c07c50ef078ab04d9e61d2faab338357e7;hb=HEAD)
the driver basically does get_user_pages+kmap+virt_to_mfn (and later
kunmap+put_page for cleanup).

Then, the
usermode code in dom0 does xc_map_foreign_range on the returned mfn, and can
communicate with the code in domU over this shared page...
...but sometimes, apparently the page that backs ring_ref_v changes: if the 
domU application calls u2mfn ioctl again with ring_ref_v argument, it 
returns a different mfn.
And naturally the code in dom0 reads garbage from the address returned by
its pevious call to xc_map_foreign_range.

I find it puzzling. Is this behaviour normal/expected ?
Mlock man pages say "All pages that contain a part of the specified address 
range are guaranteed to be resident in RAM", not "be resident at the same
RAM location", but why would a frame backing a mlock-ed memory be changed ?
Is there some memory defragmentation going on ? Or maybe only frame->frame
number function changes (but why would it) ?

Anyway, this behaviour causes problems, as you can see in
http://www.qubes-os.org/trac/ticket/16#comment:4
It would be nice if the mfn of a frame that holds a given mlock-ed usermode 
page could be made constant. 

If you can offer some insight, that would be helpful, particularly:
1) Why this does not happen to pages allocated in kernel mode (if it did, it
would break the split drivers model) ?
2) Can this frame-changing behaviour be switched off at Xen/kernel level?
3) Would using grant tables (instead of brutal xc_map_foreign_range) change 
the situation ?
BTW, for Qubes it is necessary to map PV domU usermode pages in dom0; 
particularly, map X server composition buffers.

Regards,
Rafal Wojtczuk
The Qubes OS Project
http://qubes-os.org

[-- Attachment #1.2: Type: application/pgp-signature, Size: 197 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: The mfn of the frame, that holds a mlock-ed PV domU usermode page, can change
  2010-04-12 18:54 The mfn of the frame, that holds a mlock-ed PV domU usermode page, can change Rafal Wojtczuk
@ 2010-04-12 20:01 ` Jeremy Fitzhardinge
  2010-04-12 20:21   ` Joanna Rutkowska
  0 siblings, 1 reply; 9+ messages in thread
From: Jeremy Fitzhardinge @ 2010-04-12 20:01 UTC (permalink / raw)
  To: Rafal Wojtczuk; +Cc: xen-devel, joanna

On 04/12/2010 11:54 AM, Rafal Wojtczuk wrote:
> Hello,
> Would someone on the list enlight me on the following issue, possibly related 
> to mfn management in the PV guest.
> Environment: xen-3.4.3, pvops 2.6.32.9 in dom0 and domU, all 64bit.
> Usermode code
> (if you are interested, at http://gitweb.qubes-os.org/gitweb/?p=mainstream/gui.git;a=blob;f=vchan/vchan/init.c;h=cb2fb851c3b97804b115dbf58fd47a30d6d0a8a3;hb=HEAD)
> in PV domU does the following:
> 1) gets a page via
> ring_ref_v=mmap(0, 4096, PROT_READ | PROT_WRITE, MAP_PRIVATE |MAP_ANON,-1, 0);
> 2) mlock(ring_ref_v, 4096)
> 3) determines the mfn number of the frame holding ring_ref_v via a call to 
> u2mfn driver
> (http://gitweb.qubes-os.org/gitweb/?p=mainstream/gui.git;a=blob;f=vchan/u2mfn/u2mfn.c;h=6ff113c07c50ef078ab04d9e61d2faab338357e7;hb=HEAD)
> the driver basically does get_user_pages+kmap+virt_to_mfn (and later
> kunmap+put_page for cleanup).
>   

Yeah, this looks fundamentally suspect.  Using mlock in this way is
going to be fragile.

> Then, the
> usermode code in dom0 does xc_map_foreign_range on the returned mfn, and can
> communicate with the code in domU over this shared page...
> ...but sometimes, apparently the page that backs ring_ref_v changes: if the 
> domU application calls u2mfn ioctl again with ring_ref_v argument, it 
> returns a different mfn.
> And naturally the code in dom0 reads garbage from the address returned by
> its pevious call to xc_map_foreign_range.
>
> I find it puzzling. Is this behaviour normal/expected ?
>   

Yes.

> Mlock man pages say "All pages that contain a part of the specified address 
> range are guaranteed to be resident in RAM", not "be resident at the same
> RAM location", but why would a frame backing a mlock-ed memory be changed ?
> Is there some memory defragmentation going on ?

Yes, the kernel can move usermode memory around to defrag memory.  This
is done to allow higher-order memory allocations to keep working even on
a long-running system which would otherwise fragment the address space. 
Ideally it allows 2M page allocations to succeed.

>  Or maybe only frame->frame
> number function changes (but why would it) ?
>
> Anyway, this behaviour causes problems, as you can see in
> http://www.qubes-os.org/trac/ticket/16#comment:4
> It would be nice if the mfn of a frame that holds a given mlock-ed usermode 
> page could be made constant. 
>
> If you can offer some insight, that would be helpful, particularly:
> 1) Why this does not happen to pages allocated in kernel mode (if it did, it
> would break the split drivers model) ?
>   

No, kernel allocations are not movable by default.

> 2) Can this frame-changing behaviour be switched off at Xen/kernel level?
>   

Not that I know of, and it wouldn't be desirable if it could be.

> 3) Would using grant tables (instead of brutal xc_map_foreign_range) change 
> the situation ?
> BTW, for Qubes it is necessary to map PV domU usermode pages in dom0; 
> particularly, map X server composition buffers.
>   

Why is it necessary to map usermode pages?  It just seems like asking
for trouble.  Why not make it so that the domU X server gets the memory
from the kernel (via some kind of driver), and then map that through to
dom0?

    J

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: The mfn of the frame, that holds a mlock-ed PV domU usermode page, can change
  2010-04-12 20:01 ` Jeremy Fitzhardinge
@ 2010-04-12 20:21   ` Joanna Rutkowska
  2010-04-12 20:39     ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 9+ messages in thread
From: Joanna Rutkowska @ 2010-04-12 20:21 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: xen-devel, Rafal Wojtczuk, qubes-devel


[-- Attachment #1.1: Type: text/plain, Size: 618 bytes --]

On 04/12/2010 10:01 PM, Jeremy Fitzhardinge wrote:
> 
> Why is it necessary to map usermode pages?  It just seems like asking
> for trouble.  Why not make it so that the domU X server gets the memory
> from the kernel (via some kind of driver), and then map that through to
> dom0?

Because we want to avoid modifying Xorg sources -- it normally allocates
its composition buffers using malloc, and if we wanted to make it using
some kernel allocated memory (by our custom driver) we would need to
patch the Xorg, which we obviously wanted to avoid...

joanna.

ps. Copied this to qubes-devel as well.


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 226 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: The mfn of the frame, that holds a mlock-ed PV domU usermode page, can change
  2010-04-12 20:21   ` Joanna Rutkowska
@ 2010-04-12 20:39     ` Jeremy Fitzhardinge
  2010-04-12 21:19       ` Joanna Rutkowska
  2010-04-19 11:25       ` Rafal Wojtczuk
  0 siblings, 2 replies; 9+ messages in thread
From: Jeremy Fitzhardinge @ 2010-04-12 20:39 UTC (permalink / raw)
  To: Joanna Rutkowska; +Cc: xen-devel, Rafal Wojtczuk, qubes-devel

On 04/12/2010 01:21 PM, Joanna Rutkowska wrote:
> On 04/12/2010 10:01 PM, Jeremy Fitzhardinge wrote:
>   
>> Why is it necessary to map usermode pages?  It just seems like asking
>> for trouble.  Why not make it so that the domU X server gets the memory
>> from the kernel (via some kind of driver), and then map that through to
>> dom0?
>>     
> Because we want to avoid modifying Xorg sources -- it normally allocates
> its composition buffers using malloc, and if we wanted to make it using
> some kernel allocated memory (by our custom driver) we would need to
> patch the Xorg, which we obviously wanted to avoid...
>   

The referenced code doesn't do that; it allocates some memory with with
mmap, mlocks it, uses /proc/u2mfn to get the mfn then pokes it into xenbus.

But I assume you have other code which wants to grant through the
Xorg-allocated framebufer.  That complicates things a bit, but you could
still add a device (no /proc files, please) with an ioctl which:

   1. takes a range of usermode addresses
   2. increments the page refcount for those pages
   3. returns the mfns for those pages

That will prevent the pages from being migrated while you're referring
to their mfns.  You need to add something to explicitly decrement the
refcount to prevent a memory leak, presumably at the time you tear down
the mapping in dom0.  Ideally you'd arrange to do that triggered off
unmap of the memory range (by isolating the pages in their own new vma)
so that it all gets cleaned up on process exit.

    J

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: The mfn of the frame, that holds a mlock-ed PV domU usermode page, can change
  2010-04-12 20:39     ` Jeremy Fitzhardinge
@ 2010-04-12 21:19       ` Joanna Rutkowska
  2010-04-12 21:26         ` Jeremy Fitzhardinge
  2010-04-19 11:25       ` Rafal Wojtczuk
  1 sibling, 1 reply; 9+ messages in thread
From: Joanna Rutkowska @ 2010-04-12 21:19 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: xen-devel, Rafal Wojtczuk, qubes-devel


[-- Attachment #1.1: Type: text/plain, Size: 1464 bytes --]

On 04/12/2010 10:39 PM, Jeremy Fitzhardinge wrote:
> On 04/12/2010 01:21 PM, Joanna Rutkowska wrote:
>> On 04/12/2010 10:01 PM, Jeremy Fitzhardinge wrote:
>>   
>>> Why is it necessary to map usermode pages?  It just seems like asking
>>> for trouble.  Why not make it so that the domU X server gets the memory
>>> from the kernel (via some kind of driver), and then map that through to
>>> dom0?
>>>     
>> Because we want to avoid modifying Xorg sources -- it normally allocates
>> its composition buffers using malloc, and if we wanted to make it using
>> some kernel allocated memory (by our custom driver) we would need to
>> patch the Xorg, which we obviously wanted to avoid...
>>   
> 
> The referenced code doesn't do that; it allocates some memory with with
> mmap, mlocks it, uses /proc/u2mfn to get the mfn then pokes it into xenbus.
> 

Right, that's for the "ring" page, which we use to implement a ring
buffer, and we then pass mfns of the actual Xorg's composition buffers
over this ring buffer to Dom0.

Interestingly, I have never seen a garbage in any of the composition
buffers (which are directly displayed by our appviewers, so it would be
immediately visible), just like if only the mfn for the "ring" page
could be modified, but the composition buffer's mfn were somehow pinned...

This might suggest that the memory used by the composition buffers
(which are in usermode) is somehow locked?

Thanks,
j.


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 226 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: The mfn of the frame, that holds a mlock-ed PV domU usermode page, can change
  2010-04-12 21:19       ` Joanna Rutkowska
@ 2010-04-12 21:26         ` Jeremy Fitzhardinge
  2010-04-12 21:36           ` Joanna Rutkowska
  0 siblings, 1 reply; 9+ messages in thread
From: Jeremy Fitzhardinge @ 2010-04-12 21:26 UTC (permalink / raw)
  To: Joanna Rutkowska; +Cc: xen-devel, Rafal Wojtczuk, qubes-devel

On 04/12/2010 02:19 PM, Joanna Rutkowska wrote:
> Right, that's for the "ring" page, which we use to implement a ring
> buffer, and we then pass mfns of the actual Xorg's composition buffers
> over this ring buffer to Dom0.
>
> Interestingly, I have never seen a garbage in any of the composition
> buffers (which are directly displayed by our appviewers, so it would be
> immediately visible), just like if only the mfn for the "ring" page
> could be modified, but the composition buffer's mfn were somehow pinned...
>
> This might suggest that the memory used by the composition buffers
> (which are in usermode) is somehow locked?
>   

Worth looking into.

I'm not at all familiar with how X manages composition buffers, but it
seems to me that in normal use, one would want to be able to either
allocate that buffer in texture memory (so it can be used as a texture
source), or at least copy updates into texture memory.  Couldn't you
hook into that transfer to the composition hardware (ie, dom0)?

    J

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: The mfn of the frame, that holds a mlock-ed PV domU usermode page, can change
  2010-04-12 21:26         ` Jeremy Fitzhardinge
@ 2010-04-12 21:36           ` Joanna Rutkowska
  0 siblings, 0 replies; 9+ messages in thread
From: Joanna Rutkowska @ 2010-04-12 21:36 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: xen-devel, Rafal Wojtczuk, qubes-devel


[-- Attachment #1.1: Type: text/plain, Size: 1198 bytes --]

On 04/12/2010 11:26 PM, Jeremy Fitzhardinge wrote:
> On 04/12/2010 02:19 PM, Joanna Rutkowska wrote:
>> Right, that's for the "ring" page, which we use to implement a ring
>> buffer, and we then pass mfns of the actual Xorg's composition buffers
>> over this ring buffer to Dom0.
>>
>> Interestingly, I have never seen a garbage in any of the composition
>> buffers (which are directly displayed by our appviewers, so it would be
>> immediately visible), just like if only the mfn for the "ring" page
>> could be modified, but the composition buffer's mfn were somehow pinned...
>>
>> This might suggest that the memory used by the composition buffers
>> (which are in usermode) is somehow locked?
>>   
> 
> Worth looking into.
> 
> I'm not at all familiar with how X manages composition buffers, but it
> seems to me that in normal use, one would want to be able to either
> allocate that buffer in texture memory (so it can be used as a texture
> source), or at least copy updates into texture memory.  Couldn't you
> hook into that transfer to the composition hardware (ie, dom0)?
> 

We will definitely look into this. Thanks a lot for your help, Jeremy!

joanna.


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 226 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: The mfn of the frame, that holds a mlock-ed PV domU usermode page, can change
  2010-04-12 20:39     ` Jeremy Fitzhardinge
  2010-04-12 21:19       ` Joanna Rutkowska
@ 2010-04-19 11:25       ` Rafal Wojtczuk
  2010-04-19 16:48         ` Jeremy Fitzhardinge
  1 sibling, 1 reply; 9+ messages in thread
From: Rafal Wojtczuk @ 2010-04-19 11:25 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: xen-devel, qubes-devel

On Mon, Apr 12, 2010 at 01:39:43PM -0700, Jeremy Fitzhardinge wrote:
> But I assume you have other code which wants to grant through the
> Xorg-allocated framebufer.  That complicates things a bit, but you could
> still add a device (no /proc files, please) with an ioctl which:
> 
>    1. takes a range of usermode addresses
>    2. increments the page refcount for those pages
Or, do not decrement the count, by not calling put_page() ?

>    3. returns the mfns for those pages
> 
> That will prevent the pages from being migrated while you're referring
> to their mfns.
After removing the call to put_page() in u2mfn_ioctl(), see once again
http://gitweb.qubes-os.org/gitweb/?p=mainstream/gui.git;a=blob;f=vchan/u2mfn/u2mfn.c;h=6ff113c07c50ef078ab04d9e61d2faab338357e7;hb=HEAD#l35
the page's mfn changed again.
Even commenting out the kunmap() call in this function did not help, either.
Am I missing something ?

The only working way (for the ring buffer case) is to acquire memory via 
kmalloc and pass it to userspace via remap_pfn_range. But this is unsuitable 
for the case of X composition buffers, because we don't want to alter the
way X allocates memory (it calls plain malloc). We could hijack X's malloc()
via LD_PRELOAD, but then we cannot distinguish which calls are made because
of composition buffer allocation.

>  You need to add something to explicitly decrement the
> refcount to prevent a memory leak, presumably at the time you tear down
> the mapping in dom0.  Ideally you'd arrange to do that triggered off
> unmap of the memory range (by isolating the pages in their own new vma)
> so that it all gets cleaned up on process exit.
By "triggered off unmap" do you mean setting the vm_ops field in struct 
vm_area_struct to a custom struct vm_operations_struct (particularly, with a 
custom close() method), or is there something simpler ?

> I'm not at all familiar with how X manages composition buffers, but it
> seems to me that in normal use, one would want to be able to either
> allocate that buffer in texture memory (so it can be used as a texture
> source), or at least copy updates into texture memory.  Couldn't you
> hook into that transfer to the composition hardware (ie, dom0)?
We are talking about X running in domU; there is no related hardware.
We can determine where the composition buffer is only after it has
been allocated. 

> No, kernel allocations are not movable by default.
Could you mention a few details more on the related migration mechanism ?
E.g. which PG_ flag (set by kmalloc) makes a page unmovable ? Preferably, 
with pointers to relevant code ? 
I guess it is in linux/mm/migrate.c, but I am getting lost
trying to figure out which parts are NUMA specific and which are not; and
particularly, what triggers the migration.

Interestingly, Xorg guys claim X server does nothing special with the memory
acquired by malloc() for the composition buffer. Yet, so far no corruption
of the displayed images have been observed. Maybe a single page vma (that
stores the ring buffer) is particularly attractive for the
migration/defragmentation algorithm, and that is why it is easy to trigger
its relocation (but not so with the composition buffer case) ? 

Once again thanks a lot for your explanations.

Regards,
Rafal Wojtczuk
The Qubes OS Project
http://qubes-os.org

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: The mfn of the frame, that holds a mlock-ed PV domU usermode page, can change
  2010-04-19 11:25       ` Rafal Wojtczuk
@ 2010-04-19 16:48         ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 9+ messages in thread
From: Jeremy Fitzhardinge @ 2010-04-19 16:48 UTC (permalink / raw)
  To: Rafal Wojtczuk; +Cc: xen-devel, qubes-devel

On 04/19/2010 04:25 AM, Rafal Wojtczuk wrote:
> On Mon, Apr 12, 2010 at 01:39:43PM -0700, Jeremy Fitzhardinge wrote:
>   
>> But I assume you have other code which wants to grant through the
>> Xorg-allocated framebufer.  That complicates things a bit, but you could
>> still add a device (no /proc files, please) with an ioctl which:
>>
>>    1. takes a range of usermode addresses
>>    2. increments the page refcount for those pages
>>     
> Or, do not decrement the count, by not calling put_page() ?
>
>   
>>    3. returns the mfns for those pages
>>
>> That will prevent the pages from being migrated while you're referring
>> to their mfns.
>>     
> After removing the call to put_page() in u2mfn_ioctl(), see once again
> http://gitweb.qubes-os.org/gitweb/?p=mainstream/gui.git;a=blob;f=vchan/u2mfn/u2mfn.c;h=6ff113c07c50ef078ab04d9e61d2faab338357e7;hb=HEAD#l35
> the page's mfn changed again.
> Even commenting out the kunmap() call in this function did not help, either.
> Am I missing something ?
>   

It definitely shouldn't be possible to move a page with a non-zero
refcount.  So it looks like something else is going on there.  Even if
the process exits, those pages should remain in unusable limbo rather
than being freed and reallocated.

> The only working way (for the ring buffer case) is to acquire memory via 
> kmalloc and pass it to userspace via remap_pfn_range. But this is unsuitable 
> for the case of X composition buffers, because we don't want to alter the
> way X allocates memory (it calls plain malloc). We could hijack X's malloc()
> via LD_PRELOAD, but then we cannot distinguish which calls are made because
> of composition buffer allocation.
>   

Yes.  Unfortunately that has its own set of problems.  For example, if
the X server wants to fork for some reason then you become subject to
the whims of COW as to what page is being used in which process.

But it seems to me you're operating at the wrong architectural level
here.  I fully understand your short-term goal is "get it working", but
I think you're going to want to revise this for v2.0.  Your architecture
is not very different from a standard CPU+GPU compositing setup, except
your "GPU" is actually dom0 (which of course may be really using the
GPU).  X should already have all the interfaces you need to efficiently
pass an application's compositing buffer to the "GPU" for rendering.

(Maybe you need to do a "Xen DRI" driver to implement this?)

>>  You need to add something to explicitly decrement the
>> refcount to prevent a memory leak, presumably at the time you tear down
>> the mapping in dom0.  Ideally you'd arrange to do that triggered off
>> unmap of the memory range (by isolating the pages in their own new vma)
>> so that it all gets cleaned up on process exit.
>>     
> By "triggered off unmap" do you mean setting the vm_ops field in struct 
> vm_area_struct to a custom struct vm_operations_struct (particularly, with a 
> custom close() method), or is there something simpler ?
>   

Yes, that's what I had in mind.  You'd need to chop the VMA up to
isolate the virtual address range you want to apply the close to.  But
that assumes your range doesn't already have a close method of course;
it gets awkward if it does.

>> I'm not at all familiar with how X manages composition buffers, but it
>> seems to me that in normal use, one would want to be able to either
>> allocate that buffer in texture memory (so it can be used as a texture
>> source), or at least copy updates into texture memory.  Couldn't you
>> hook into that transfer to the composition hardware (ie, dom0)?
>>     
> We are talking about X running in domU; there is no related hardware.
> We can determine where the composition buffer is only after it has
> been allocated. 
>   

(See above.)

>> No, kernel allocations are not movable by default.
>>     
> Could you mention a few details more on the related migration mechanism ?
> E.g. which PG_ flag (set by kmalloc) makes a page unmovable ? Preferably, 
> with pointers to relevant code ? 
>   

__GFP_MOVABLE is the key thing to look at.  It causes page allocation to
allocate the page in a movable zone.  All user memory is allocated with
GFP_HIGHUSER_MOVABLE (in do_wp_page(), for example), which means that
the memory needn't be directly addressable by the kernel (HIGHUSER), and
can be moved or reclaimed when necessary (MOVABLE).

> I guess it is in linux/mm/migrate.c, but I am getting lost
> trying to figure out which parts are NUMA specific and which are not; and
> particularly, what triggers the migration.
>   

TBH I've never really looked into the mechanisms of how it works.  But I
think mm/migrate.c is actually something else, relating to moving pages
around between NUMA nodes.

I had a quick look at it just now, and migration definitely seems to
happen on demand in the buddy_allocator (mm/page_alloc.c), if it can't
satisfy a memory request.  I don't know whether it tries to actively
move pages around to decrease fragmentation.

> Interestingly, Xorg guys claim X server does nothing special with the memory
> acquired by malloc() for the composition buffer. Yet, so far no corruption
> of the displayed images have been observed. Maybe a single page vma (that
> stores the ring buffer) is particularly attractive for the
> migration/defragmentation algorithm, and that is why it is easy to trigger
> its relocation (but not so with the composition buffer case) ? 
>   

Hm, that doesn't ring true.  AFAIK all migration happens at the page
level with no reference to VMAs (though its possible that being mapped
into a process address space makes a page temporarily unmigratable, and
it needs to wait for something to shoot down/age out the ptes before
migrating the page).  Again, I'm not well versed in the details.

Its quite possible that the problem you're seeing has nothing to do with
page migration at all, and this is a goosechase.

    J

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2010-04-19 16:48 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-12 18:54 The mfn of the frame, that holds a mlock-ed PV domU usermode page, can change Rafal Wojtczuk
2010-04-12 20:01 ` Jeremy Fitzhardinge
2010-04-12 20:21   ` Joanna Rutkowska
2010-04-12 20:39     ` Jeremy Fitzhardinge
2010-04-12 21:19       ` Joanna Rutkowska
2010-04-12 21:26         ` Jeremy Fitzhardinge
2010-04-12 21:36           ` Joanna Rutkowska
2010-04-19 11:25       ` Rafal Wojtczuk
2010-04-19 16:48         ` Jeremy Fitzhardinge

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.