Q: behaviour of mlockall(MCL_FUTURE) and VM

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Q: behaviour of mlockall(MCL_FUTURE) and VM_GROWSDOWN segments
@ 2002-01-11 19:26 Manfred Spraul
  2002-01-11 20:49 ` Andrew Morton
  0 siblings, 1 reply; 10+ messages in thread
From: Manfred Spraul @ 2002-01-11 19:26 UTC (permalink / raw)
  To: linux-kernel

If an app has an VM_GROWS{DOWN,UP} stack and calls
mlockall(MCL_FUTURE|MCL_CURRENT), which pages should the kernel lock?

* grow the vma to the maximum size and lock all.
* just according to the current size.

What should happen if the segment is extended by more than one page
at once? (i.e. a function with 100 kB local variables)

* Just allocate the page that is needed to handle the page faults
* always fill holes immediately.

Right now segments are not grown during the mlockall syscall. Some
codepaths fill holes (find_extend_vma()), most don't (page fault
handlers)

What's the right thing (tm) to do?
I don't care which implementation is choosen, but IMHO all
implementations should be identical

--
	Manfred

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Q: behaviour of mlockall(MCL_FUTURE) and VM_GROWSDOWN segments
  2002-01-11 19:26 Manfred Spraul
@ 2002-01-11 20:49 ` Andrew Morton
  2002-01-11 23:45   ` Richard Gooch
  0 siblings, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2002-01-11 20:49 UTC (permalink / raw)
  To: Manfred Spraul; +Cc: linux-kernel

Manfred Spraul wrote:
> 
> If an app has an VM_GROWS{DOWN,UP} stack and calls
> mlockall(MCL_FUTURE|MCL_CURRENT), which pages should the kernel lock?
> 
> * grow the vma to the maximum size and lock all.
> * just according to the current size.
> 
> What should happen if the segment is extended by more than one page
> at once? (i.e. a function with 100 kB local variables)
> 
> * Just allocate the page that is needed to handle the page faults
> * always fill holes immediately.
> 
> Right now segments are not grown during the mlockall syscall. Some
> codepaths fill holes (find_extend_vma()), most don't (page fault
> handlers)
> 
> What's the right thing (tm) to do?
> I don't care which implementation is choosen, but IMHO all
> implementations should be identical

This was a problem encountered when taking a libpthread-based
application from 2.4.7 to 2.4.15.   It ran fine with mlockall
under 2.4.7, but under 2.4.15 everything wedged up.   This was, I assume,
because under 2.4.15, the many pthread stacks were fully faulted in and
locked at mlockall() time.    We ended up just not using mlockall
at all.

Really the 2.4.15 behaviour is correct, but undesirable.  It requires
each thread to know apriori what its maximum stack use will be.
(I'm assuming that there's a way of setting a thread's stack size
in libpthread).

So in this case, the behaviour I would prefer is MCL_FUTURE for
all vma's *except* the stack.   Stack pages should be locked
only when they are faulted in.   Hard call.

-

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Q: behaviour of mlockall(MCL_FUTURE) and VM_GROWSDOWN segments
  2002-01-11 20:49 ` Andrew Morton
@ 2002-01-11 23:45   ` Richard Gooch
  0 siblings, 0 replies; 10+ messages in thread
From: Richard Gooch @ 2002-01-11 23:45 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Manfred Spraul, linux-kernel

Andrew Morton writes:
> Manfred Spraul wrote:
> > 
> > If an app has an VM_GROWS{DOWN,UP} stack and calls
> > mlockall(MCL_FUTURE|MCL_CURRENT), which pages should the kernel lock?
> > 
> > * grow the vma to the maximum size and lock all.
> > * just according to the current size.
> > 
> > What should happen if the segment is extended by more than one page
> > at once? (i.e. a function with 100 kB local variables)
> > 
> > * Just allocate the page that is needed to handle the page faults
> > * always fill holes immediately.
> > 
> > Right now segments are not grown during the mlockall syscall. Some
> > codepaths fill holes (find_extend_vma()), most don't (page fault
> > handlers)
> > 
> > What's the right thing (tm) to do?
> > I don't care which implementation is choosen, but IMHO all
> > implementations should be identical
> 
> This was a problem encountered when taking a libpthread-based
> application from 2.4.7 to 2.4.15.   It ran fine with mlockall
> under 2.4.7, but under 2.4.15 everything wedged up.   This was, I assume,
> because under 2.4.15, the many pthread stacks were fully faulted in and
> locked at mlockall() time.    We ended up just not using mlockall
> at all.
> 
> Really the 2.4.15 behaviour is correct, but undesirable.  It requires
> each thread to know apriori what its maximum stack use will be.
> (I'm assuming that there's a way of setting a thread's stack size
> in libpthread).
> 
> So in this case, the behaviour I would prefer is MCL_FUTURE for
> all vma's *except* the stack.   Stack pages should be locked
> only when they are faulted in.   Hard call.

How about controlling this with a MCL_STACK flag or some such? If
there's no One True Path[tm], leave the decision to the application.

				Regards,

					Richard....
Permanent: rgooch@atnf.csiro.au
Current:   rgooch@ras.ucalgary.ca

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Q: behaviour of mlockall(MCL_FUTURE) and VM_GROWSDOWN segments
       [not found] ` <3C3F4FC6.97A6A66D@zip.com.au.suse.lists.linux.kernel>
@ 2002-01-12  0:33   ` Andi Kleen
  2002-01-12  1:04     ` Andrew Morton
  2002-01-12 15:33     ` Andrea Arcangeli
  0 siblings, 2 replies; 10+ messages in thread
From: Andi Kleen @ 2002-01-12  0:33 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

Andrew Morton <akpm@zip.com.au> writes:

> So in this case, the behaviour I would prefer is MCL_FUTURE for
> all vma's *except* the stack.   Stack pages should be locked
> only when they are faulted in.   Hard call.

There is just one problem: linuxthread stacks are just ordinary mappings
and they are in no way special to the kernel; they aren't VM_GROWSDOWN. 
You would need to add a way to the kernel first to tag the linux thread 
stacks in a way that is recognizable to mlockall and then do that 
from linuxthreads. 

I think for the normal stack - real VM_GROWSDOWN segments - mlockall
already does the right thing.

-Andi

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Q: behaviour of mlockall(MCL_FUTURE) and VM_GROWSDOWN segments
  2002-01-12  0:33   ` Q: behaviour of mlockall(MCL_FUTURE) and VM_GROWSDOWN segments Andi Kleen
@ 2002-01-12  1:04     ` Andrew Morton
  2002-01-12 15:33     ` Andrea Arcangeli
  1 sibling, 0 replies; 10+ messages in thread
From: Andrew Morton @ 2002-01-12  1:04 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel

Andi Kleen wrote:
> 
> Andrew Morton <akpm@zip.com.au> writes:
> 
> > So in this case, the behaviour I would prefer is MCL_FUTURE for
> > all vma's *except* the stack.   Stack pages should be locked
> > only when they are faulted in.   Hard call.
> 
> There is just one problem: linuxthread stacks are just ordinary mappings
> and they are in no way special to the kernel; they aren't VM_GROWSDOWN.
> You would need to add a way to the kernel first to tag the linux thread
> stacks in a way that is recognizable to mlockall and then do that
> from linuxthreads.
> 
> I think for the normal stack - real VM_GROWSDOWN segments - mlockall
> already does the right thing.

hmm.. So I wonder what changed between 2.4.7 and 2.4.15 which unbroke
MCL_FUTURE.

I suspect we can fix the problem by running mlockall(MCL_FUTURE)
and then an explicit munlock() of the stack area.

-

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Q: behaviour of mlockall(MCL_FUTURE) and VM_GROWSDOWN segments
  2002-01-12  0:33   ` Q: behaviour of mlockall(MCL_FUTURE) and VM_GROWSDOWN segments Andi Kleen
  2002-01-12  1:04     ` Andrew Morton
@ 2002-01-12 15:33     ` Andrea Arcangeli
  2002-01-12 15:54       ` Andi Kleen
  1 sibling, 1 reply; 10+ messages in thread
From: Andrea Arcangeli @ 2002-01-12 15:33 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Andrew Morton, linux-kernel, Manfred Spraul

On Sat, Jan 12, 2002 at 01:33:24AM +0100, Andi Kleen wrote:
> Andrew Morton <akpm@zip.com.au> writes:
> 
> > So in this case, the behaviour I would prefer is MCL_FUTURE for
> > all vma's *except* the stack.   Stack pages should be locked
> > only when they are faulted in.   Hard call.
> 
> There is just one problem: linuxthread stacks are just ordinary mappings
> and they are in no way special to the kernel; they aren't VM_GROWSDOWN. 
> You would need to add a way to the kernel first to tag the linux thread 
> stacks in a way that is recognizable to mlockall and then do that 
> from linuxthreads. 
> 
> I think for the normal stack - real VM_GROWSDOWN segments - mlockall
> already does the right thing.

it doesn't (of course depends "what's the right thing"), and that's why
Manfred is asking after I asked him if he was really sure the API was
the right one, but as said to him, something is wrong with the kernel
too somehow, either we remove mark_page_present from find_extend_vma, or
we add it to the page fault handler too.

What the current kernel is doing with page faults, is to fault in only
the touched pages, not the pages in between as well, this isn't a
security concern because the faulted in pages won't be swapped out, but
it may matter for some RT app, OTOH the RT apps would better memset the
whole stack they need before assuming they won't get page faults, first
of all because of all other kernels out there (this is what I mean with
a matter of API).

I guess it is cleaner if a VM_LOCKED vma has all its pages allocated
between vm_start and vm_end, so I guess adding the mark_page_present in
do_page_fault as suggested by Manfred is ok.

Andrea

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Q: behaviour of mlockall(MCL_FUTURE) and VM_GROWSDOWN segments
  2002-01-12 15:33     ` Andrea Arcangeli
@ 2002-01-12 15:54       ` Andi Kleen
  2002-01-12 16:07         ` Manfred Spraul
  2002-01-12 16:14         ` Andrea Arcangeli
  0 siblings, 2 replies; 10+ messages in thread
From: Andi Kleen @ 2002-01-12 15:54 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Andi Kleen, Andrew Morton, linux-kernel, Manfred Spraul

On Sat, Jan 12, 2002 at 04:33:32PM +0100, Andrea Arcangeli wrote:
> it doesn't (of course depends "what's the right thing"), and that's why

I think it does. Allocating all possible in future allocated pages
is just not possible for VM_GROWSDOWN, because the stack has really 
no suitable limit (other than rlimits, which are far too big to
mlock them) 

BTW expand_stack seems to have a small bug: it adds to mm->locked_vm
the complete offset from last vm_start; if it covers more than one page
the locked_vm value will be too large. 

> What the current kernel is doing with page faults, is to fault in only
> the touched pages, not the pages in between as well, this isn't a
> security concern because the faulted in pages won't be swapped out, but
> it may matter for some RT app, OTOH the RT apps would better memset the
> whole stack they need before assuming they won't get page faults, first
> of all because of all other kernels out there (this is what I mean with
> a matter of API).

For the stack they can get minor faults anyways when they allocate new
stack space below ESP. There is no good way to fix that from the kernel; the 
application has to preallocate its memory on stack. I think it's reasonable
if it does the same for holes on the stack. 

-Andi

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Q: behaviour of mlockall(MCL_FUTURE) and VM_GROWSDOWN segments
  2002-01-12 15:54       ` Andi Kleen
@ 2002-01-12 16:07         ` Manfred Spraul
  2002-01-12 16:17           ` Andrea Arcangeli
  2002-01-12 16:14         ` Andrea Arcangeli
  1 sibling, 1 reply; 10+ messages in thread
From: Manfred Spraul @ 2002-01-12 16:07 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Andrea Arcangeli, Andrew Morton, linux-kernel, Manfred Spraul

Andi Kleen wrote:
> 
> For the stack they can get minor faults anyways when they allocate new
> stack space below ESP. There is no good way to fix that from the kernel; the
> application has to preallocate its memory on stack. I think it's reasonable
> if it does the same for holes on the stack.
>
Ok, everyone agrees that mlockall() should not grow VM_GROWSDOWN
segments to their maximum size.
Should the page fault handler fill the hole created by

void * grow_stack(void)
{
	char data[100000];
	data[0] = '0';
	return data;
}

The principle of least surprise would mean filling holes, but OTHO sane
apps would use memset(data,0,sizeof(data)).

--
	Manfred

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Q: behaviour of mlockall(MCL_FUTURE) and VM_GROWSDOWN segments
  2002-01-12 15:54       ` Andi Kleen
  2002-01-12 16:07         ` Manfred Spraul
@ 2002-01-12 16:14         ` Andrea Arcangeli
  1 sibling, 0 replies; 10+ messages in thread
From: Andrea Arcangeli @ 2002-01-12 16:14 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Andrew Morton, linux-kernel, Manfred Spraul

On Sat, Jan 12, 2002 at 04:54:43PM +0100, Andi Kleen wrote:
> On Sat, Jan 12, 2002 at 04:33:32PM +0100, Andrea Arcangeli wrote:
> > it doesn't (of course depends "what's the right thing"), and that's why
> 
> I think it does. Allocating all possible in future allocated pages
> is just not possible for VM_GROWSDOWN, because the stack has really 
> no suitable limit (other than rlimits, which are far too big to
> mlock them) 

the user asked for VM_LOCKED vma, if he asks for that and he faults too
low on the stack with big holes in between that's his mistake. And
anyways ptrace just faults in all the intermediate pages just now, so
it is definitely possible (we provide different userspace API from
ptrace/map_user_kiobuf and page-fault at the moment).

> 
> BTW expand_stack seems to have a small bug: it adds to mm->locked_vm
> the complete offset from last vm_start; if it covers more than one page
> the locked_vm value will be too large. 
> 
> > What the current kernel is doing with page faults, is to fault in only
> > the touched pages, not the pages in between as well, this isn't a
> > security concern because the faulted in pages won't be swapped out, but
> > it may matter for some RT app, OTOH the RT apps would better memset the
> > whole stack they need before assuming they won't get page faults, first
> > of all because of all other kernels out there (this is what I mean with
> > a matter of API).
> 
> For the stack they can get minor faults anyways when they allocate new
> stack space below ESP. There is no good way to fix that from the kernel; the 

the only case here is when the app knows how much stack it will need to
use, without faulting in the holes, it will have to memset the whole
region of stack the it wants to be atomic. If instead the kernel also
fault-in the holes (like map_user_kiobuf/ptrace/get_user_pages just does
in 2.4) the app will only need to touch the lowest virtual address of
stack it needs as atomic.

I don't see any real problem either ways, it must be simply a well
defined API.

Then the user will know if he can touch one byte and the kernel fills
the holes automatically, or if he has to do the whole memset.

> application has to preallocate its memory on stack. I think it's reasonable
> if it does the same for holes on the stack. 
> 
> -Andi
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

Andrea

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Q: behaviour of mlockall(MCL_FUTURE) and VM_GROWSDOWN segments
  2002-01-12 16:07         ` Manfred Spraul
@ 2002-01-12 16:17           ` Andrea Arcangeli
  0 siblings, 0 replies; 10+ messages in thread
From: Andrea Arcangeli @ 2002-01-12 16:17 UTC (permalink / raw)
  To: Manfred Spraul; +Cc: Andi Kleen, Andrew Morton, linux-kernel, Manfred Spraul

On Sat, Jan 12, 2002 at 05:07:42PM +0100, Manfred Spraul wrote:
> Andi Kleen wrote:
> > 
> > For the stack they can get minor faults anyways when they allocate new
> > stack space below ESP. There is no good way to fix that from the kernel; the
> > application has to preallocate its memory on stack. I think it's reasonable
> > if it does the same for holes on the stack.
> >
> Ok, everyone agrees that mlockall() should not grow VM_GROWSDOWN
> segments to their maximum size.

Ah, definitely. I must have misunderstood something in the discussion
sorry, I thought we were just discussiong the below issue, and I
completly missed the "maximum size" one.

All I was trying to find out here, was about the intermediate pages
between vm_start and vm_end of a VM_GROWSDOWN|VM_LOCKED vma, exactly
your example below.

> Should the page fault handler fill the hole created by
> 
> void * grow_stack(void)
> {
> 	char data[100000];
> 	data[0] = '0';
> 	return data;
> }
> 
> The principle of least surprise would mean filling holes, but OTHO sane
> apps would use memset(data,0,sizeof(data)).

yep.

Andrea

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2002-01-12 16:18 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <3C3F3C7F.76CCAF76@colorfullife.com.suse.lists.linux.kernel>
     [not found] ` <3C3F4FC6.97A6A66D@zip.com.au.suse.lists.linux.kernel>
2002-01-12  0:33   ` Q: behaviour of mlockall(MCL_FUTURE) and VM_GROWSDOWN segments Andi Kleen
2002-01-12  1:04     ` Andrew Morton
2002-01-12 15:33     ` Andrea Arcangeli
2002-01-12 15:54       ` Andi Kleen
2002-01-12 16:07         ` Manfred Spraul
2002-01-12 16:17           ` Andrea Arcangeli
2002-01-12 16:14         ` Andrea Arcangeli
2002-01-11 19:26 Manfred Spraul
2002-01-11 20:49 ` Andrew Morton
2002-01-11 23:45   ` Richard Gooch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox