* Re: 2.6.33-rc8 breaks UML with Restrict initial stack space expansion to rlimit
2010-02-15 6:59 ` KOSAKI Motohiro
@ 2010-02-15 7:17 ` Jouni Malinen
2010-02-15 8:57 ` [PATCH] exec/fs: fix initial stack reservation Michael Neuling
2010-02-15 8:57 ` 2.6.33-rc8 breaks UML with Restrict initial stack space expansion to rlimit Américo Wang
2 siblings, 0 replies; 15+ messages in thread
From: Jouni Malinen @ 2010-02-15 7:17 UTC (permalink / raw)
To: KOSAKI Motohiro; +Cc: Michael Neuling, linux-kernel, Andrew Morton, anton
On Mon, Feb 15, 2010 at 03:59:26PM +0900, KOSAKI Motohiro wrote:
> - rlim_stack = min(rlim_stack, stack_size);
> + /* Expand only to rlimit, making sure not to shrink it */
> + rlim_stack = max(rlim_stack, stack_size);
>
> is better fix?
Assuming I understood correctly that that was to replace the patch from
Michael completely and not just a part of it (i.e., just this one-liner
on top of linux-2.6.git), I can confirm that this, too, resolves the
issue I was seeing with UML.
--
Jouni Malinen PGP id EFC895FA
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH] exec/fs: fix initial stack reservation
2010-02-15 6:59 ` KOSAKI Motohiro
2010-02-15 7:17 ` Jouni Malinen
@ 2010-02-15 8:57 ` Michael Neuling
2010-02-15 9:04 ` KOSAKI Motohiro
2010-02-15 9:08 ` Américo Wang
2010-02-15 8:57 ` 2.6.33-rc8 breaks UML with Restrict initial stack space expansion to rlimit Américo Wang
2 siblings, 2 replies; 15+ messages in thread
From: Michael Neuling @ 2010-02-15 8:57 UTC (permalink / raw)
To: KOSAKI Motohiro; +Cc: Jouni Malinen, linux-kernel, Andrew Morton, anton
In message <20100215155821.7298.A69D9226@jp.fujitsu.com> you wrote:
> >
> >
> > In message <20100214164023.GA2726@jm.kir.nu> you wrote:
> > > It looks like the commit 803bf5ec259941936262d10ecc84511b76a20921
> > > (fs/exec.c: restrict initial stack space expansion to rlimit) broke my
> > > user mode Linux setup by somehow preventing system setup from running
> > > properly (or killing some processes that try to mount things, etc.).
> > > This commit turned up as the reason based on git bisect and reverting it
> > > fixes my UML test setup (Ubuntu 9.10 on both host and in UML and AMD64
> > > arch for both). I have no idea what exactly would be the main cause for
> > > this issue, but this looks like a somewhat unfortunately timed
> > > regression in 2.6.33-rc8.
> > >
> > > The failed run shows like this (with current linux-2.6.git):
> > >
> > > ...
> > > EXT3-fs (ubda): mounted filesystem with writeback data mode
> > > VFS: Mounted root (ext3 filesystem) readonly on device 98:0.
> > > IRQ 3/console-write: IRQF_DISABLED is not guaranteed on shared IRQs
> > > IRQ 2/console: IRQF_DISABLED is not guaranteed on shared IRQs
> > > IRQ 10/winch: IRQF_DISABLED is not guaranteed on shared IRQs
> > > IRQ 10/winch: IRQF_DISABLED is not guaranteed on shared IRQs
> > > mountall: mount /sys/kernel/debug [218] killed by KILL signal
> > > mountall: Filesystem could not be mounted: /sys/kernel/debug
> > > mountall: mount /dev [219] killed by KILL signal
> > > mountall: Filesystem could not be mounted: /dev
> > > mountall: mount /tmp [220] killed by KILL signal
> > > mountall: Filesystem could not be mounted: /tmp
> > > mountall: mount /var/lock [222] killed by KILL signal
> > > mountall: Filesystem could not be mounted: /var/lock
> > > ...
> > >
> > >
> > > With 803bf5ec reverted, UML comes up and the output looks like this:
> > >
> > > ...
> > > EXT3-fs (ubda): mounted filesystem with writeback data mode
> > > VFS: Mounted root (ext3 filesystem) readonly on device 98:0.
> > > IRQ 3/console-write: IRQF_DISABLED is not guaranteed on shared IRQs
> > > IRQ 2/console: IRQF_DISABLED is not guaranteed on shared IRQs
> > > IRQ 10/winch: IRQF_DISABLED is not guaranteed on shared IRQs
> > > IRQ 10/winch: IRQF_DISABLED is not guaranteed on shared IRQs
> > > init: procps main process (226) terminated with status 255
> > > fsck from util-linux-ng 2.16
> > > ...
> >
> > Jouni,
> >
> > I can reproduce this now.
> >
> > We got the logic wrong in one of the cleanups and hence we aren't
> > actually changing the stack reservation ever, when we intended on
> > allocating up to 20 new pages.
> >
> > The:
> > rlim_stack = min(rlim_stack, stack_size);
> > always chooses stack_size hence we end up not changing the stack at all.
> > This seems to cause fatal problems on UML, but is obviously not what was
> > intended for archs as well.
> >
> > The following works for me on PPC64 64k and 4k pages and UML on x86_64.
> >
> > Let me know if it fixes it for you also.
> >
> > Mikey
> >
> >
> > exec/fs: fix initial stack reservation
> >
> > 803bf5ec259941936262d10ecc84511b76a20921 (fs/exec.c: restrict initial
> > stack space expansion to rlimit) attempts to limit the initial stack to
> > 20*PAGE_SIZE. Unfortunately, in also attempting ensure the stack is not
> > reduced in size, we ended up not changing the stack at all.
> >
> > This caused a regression in UML resulting in most guest processes to be
> > killed.
> >
> > Signed-off-by: Michael Neuling <mikey@neuling.org>
> > cc: <stable@kernel.org>
> >
> > diff --git a/fs/exec.c b/fs/exec.c
> > index e95c692..e0e7b3c 100644
> > --- a/fs/exec.c
> > +++ b/fs/exec.c
> > @@ -637,15 +637,16 @@ int setup_arg_pages(struct linux_binprm *bprm,
> > * will align it up.
> > */
> > rlim_stack = rlimit(RLIMIT_STACK) & PAGE_MASK;
> > - rlim_stack = min(rlim_stack, stack_size);
> > #ifdef CONFIG_STACK_GROWSUP
> > if (stack_size + stack_expand > rlim_stack)
> > - stack_base = vma->vm_start + rlim_stack;
> > + /* Expand only to rlimit, making sure not to shrink it */
> > + stack_base = vma->vm_start + max(rlim_stack,stack_size);
> > else
> > stack_base = vma->vm_end + stack_expand;
> > #else
> > if (stack_size + stack_expand > rlim_stack)
> > - stack_base = vma->vm_end - rlim_stack;
> > + /* Expand only to rlimit, making sure not to shrink it */
> > + stack_base = vma->vm_end - max(rlim_stack,stack_size);
> > else
> > stack_base = vma->vm_start - stack_expand;
> > #endif
>
> - rlim_stack = min(rlim_stack, stack_size);
> + /* Expand only to rlimit, making sure not to shrink it */
> + rlim_stack = max(rlim_stack, stack_size);
>
> is better fix?
Actually, I think we can just get rid of min() line altogether.
expand_stack checks to make sure the stack is getting bigger, otherwise
it does nothing. We don't need to bother with this check.
The below works for me on UML x86_64 and ppc64 64k and 4k pages.
Mikey
exec/fs: fix initial stack reservation
803bf5ec259941936262d10ecc84511b76a20921 (fs/exec.c: restrict initial
stack space expansion to rlimit) attempts to limit the initial stack to
20*PAGE_SIZE. Unfortunately, in attempting ensure the stack is not
reduced in size, we ended up not changing the stack at all.
This size reduction check is not necessary as the expand_stack call does
this already.
This caused a regression in UML resulting in most guest processes being
killed.
Signed-off-by: Michael Neuling <mikey@neuling.org>
cc: <stable@kernel.org>
---
fs/exec.c | 1 -
1 file changed, 1 deletion(-)
Index: linux-2.6-ozlabs/fs/exec.c
===================================================================
--- linux-2.6-ozlabs.orig/fs/exec.c
+++ linux-2.6-ozlabs/fs/exec.c
@@ -637,7 +637,6 @@ int setup_arg_pages(struct linux_binprm
* will align it up.
*/
rlim_stack = rlimit(RLIMIT_STACK) & PAGE_MASK;
- rlim_stack = min(rlim_stack, stack_size);
#ifdef CONFIG_STACK_GROWSUP
if (stack_size + stack_expand > rlim_stack)
stack_base = vma->vm_start + rlim_stack;
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] exec/fs: fix initial stack reservation
2010-02-15 8:57 ` [PATCH] exec/fs: fix initial stack reservation Michael Neuling
@ 2010-02-15 9:04 ` KOSAKI Motohiro
2010-02-15 9:08 ` Américo Wang
1 sibling, 0 replies; 15+ messages in thread
From: KOSAKI Motohiro @ 2010-02-15 9:04 UTC (permalink / raw)
To: Michael Neuling
Cc: kosaki.motohiro, Jouni Malinen, linux-kernel, Andrew Morton,
anton
> In message <20100215155821.7298.A69D9226@jp.fujitsu.com> you wrote:
> > >
> > >
> > > In message <20100214164023.GA2726@jm.kir.nu> you wrote:
> > > > It looks like the commit 803bf5ec259941936262d10ecc84511b76a20921
> > > > (fs/exec.c: restrict initial stack space expansion to rlimit) broke my
> > > > user mode Linux setup by somehow preventing system setup from running
> > > > properly (or killing some processes that try to mount things, etc.).
> > > > This commit turned up as the reason based on git bisect and reverting it
> > > > fixes my UML test setup (Ubuntu 9.10 on both host and in UML and AMD64
> > > > arch for both). I have no idea what exactly would be the main cause for
> > > > this issue, but this looks like a somewhat unfortunately timed
> > > > regression in 2.6.33-rc8.
> > > >
> > > > The failed run shows like this (with current linux-2.6.git):
> > > >
> > > > ...
> > > > EXT3-fs (ubda): mounted filesystem with writeback data mode
> > > > VFS: Mounted root (ext3 filesystem) readonly on device 98:0.
> > > > IRQ 3/console-write: IRQF_DISABLED is not guaranteed on shared IRQs
> > > > IRQ 2/console: IRQF_DISABLED is not guaranteed on shared IRQs
> > > > IRQ 10/winch: IRQF_DISABLED is not guaranteed on shared IRQs
> > > > IRQ 10/winch: IRQF_DISABLED is not guaranteed on shared IRQs
> > > > mountall: mount /sys/kernel/debug [218] killed by KILL signal
> > > > mountall: Filesystem could not be mounted: /sys/kernel/debug
> > > > mountall: mount /dev [219] killed by KILL signal
> > > > mountall: Filesystem could not be mounted: /dev
> > > > mountall: mount /tmp [220] killed by KILL signal
> > > > mountall: Filesystem could not be mounted: /tmp
> > > > mountall: mount /var/lock [222] killed by KILL signal
> > > > mountall: Filesystem could not be mounted: /var/lock
> > > > ...
> > > >
> > > >
> > > > With 803bf5ec reverted, UML comes up and the output looks like this:
> > > >
> > > > ...
> > > > EXT3-fs (ubda): mounted filesystem with writeback data mode
> > > > VFS: Mounted root (ext3 filesystem) readonly on device 98:0.
> > > > IRQ 3/console-write: IRQF_DISABLED is not guaranteed on shared IRQs
> > > > IRQ 2/console: IRQF_DISABLED is not guaranteed on shared IRQs
> > > > IRQ 10/winch: IRQF_DISABLED is not guaranteed on shared IRQs
> > > > IRQ 10/winch: IRQF_DISABLED is not guaranteed on shared IRQs
> > > > init: procps main process (226) terminated with status 255
> > > > fsck from util-linux-ng 2.16
> > > > ...
> > >
> > > Jouni,
> > >
> > > I can reproduce this now.
> > >
> > > We got the logic wrong in one of the cleanups and hence we aren't
> > > actually changing the stack reservation ever, when we intended on
> > > allocating up to 20 new pages.
> > >
> > > The:
> > > rlim_stack = min(rlim_stack, stack_size);
> > > always chooses stack_size hence we end up not changing the stack at all.
> > > This seems to cause fatal problems on UML, but is obviously not what was
> > > intended for archs as well.
> > >
> > > The following works for me on PPC64 64k and 4k pages and UML on x86_64.
> > >
> > > Let me know if it fixes it for you also.
> > >
> > > Mikey
> > >
> > >
> > > exec/fs: fix initial stack reservation
> > >
> > > 803bf5ec259941936262d10ecc84511b76a20921 (fs/exec.c: restrict initial
> > > stack space expansion to rlimit) attempts to limit the initial stack to
> > > 20*PAGE_SIZE. Unfortunately, in also attempting ensure the stack is not
> > > reduced in size, we ended up not changing the stack at all.
> > >
> > > This caused a regression in UML resulting in most guest processes to be
> > > killed.
> > >
> > > Signed-off-by: Michael Neuling <mikey@neuling.org>
> > > cc: <stable@kernel.org>
> > >
> > > diff --git a/fs/exec.c b/fs/exec.c
> > > index e95c692..e0e7b3c 100644
> > > --- a/fs/exec.c
> > > +++ b/fs/exec.c
> > > @@ -637,15 +637,16 @@ int setup_arg_pages(struct linux_binprm *bprm,
> > > * will align it up.
> > > */
> > > rlim_stack = rlimit(RLIMIT_STACK) & PAGE_MASK;
> > > - rlim_stack = min(rlim_stack, stack_size);
> > > #ifdef CONFIG_STACK_GROWSUP
> > > if (stack_size + stack_expand > rlim_stack)
> > > - stack_base = vma->vm_start + rlim_stack;
> > > + /* Expand only to rlimit, making sure not to shrink it */
> > > + stack_base = vma->vm_start + max(rlim_stack,stack_size);
> > > else
> > > stack_base = vma->vm_end + stack_expand;
> > > #else
> > > if (stack_size + stack_expand > rlim_stack)
> > > - stack_base = vma->vm_end - rlim_stack;
> > > + /* Expand only to rlimit, making sure not to shrink it */
> > > + stack_base = vma->vm_end - max(rlim_stack,stack_size);
> > > else
> > > stack_base = vma->vm_start - stack_expand;
> > > #endif
> >
> > - rlim_stack = min(rlim_stack, stack_size);
> > + /* Expand only to rlimit, making sure not to shrink it */
> > + rlim_stack = max(rlim_stack, stack_size);
> >
> > is better fix?
>
> Actually, I think we can just get rid of min() line altogether.
> expand_stack checks to make sure the stack is getting bigger, otherwise
> it does nothing. We don't need to bother with this check.
>
> The below works for me on UML x86_64 and ppc64 64k and 4k pages.
OK, Right you are.
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
>
> Mikey
>
> exec/fs: fix initial stack reservation
>
> 803bf5ec259941936262d10ecc84511b76a20921 (fs/exec.c: restrict initial
> stack space expansion to rlimit) attempts to limit the initial stack to
> 20*PAGE_SIZE. Unfortunately, in attempting ensure the stack is not
> reduced in size, we ended up not changing the stack at all.
>
> This size reduction check is not necessary as the expand_stack call does
> this already.
>
> This caused a regression in UML resulting in most guest processes being
> killed.
>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
> cc: <stable@kernel.org>
> ---
> fs/exec.c | 1 -
> 1 file changed, 1 deletion(-)
>
> Index: linux-2.6-ozlabs/fs/exec.c
> ===================================================================
> --- linux-2.6-ozlabs.orig/fs/exec.c
> +++ linux-2.6-ozlabs/fs/exec.c
> @@ -637,7 +637,6 @@ int setup_arg_pages(struct linux_binprm
> * will align it up.
> */
> rlim_stack = rlimit(RLIMIT_STACK) & PAGE_MASK;
> - rlim_stack = min(rlim_stack, stack_size);
> #ifdef CONFIG_STACK_GROWSUP
> if (stack_size + stack_expand > rlim_stack)
> stack_base = vma->vm_start + rlim_stack;
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] exec/fs: fix initial stack reservation
2010-02-15 8:57 ` [PATCH] exec/fs: fix initial stack reservation Michael Neuling
2010-02-15 9:04 ` KOSAKI Motohiro
@ 2010-02-15 9:08 ` Américo Wang
1 sibling, 0 replies; 15+ messages in thread
From: Américo Wang @ 2010-02-15 9:08 UTC (permalink / raw)
To: Michael Neuling
Cc: KOSAKI Motohiro, Jouni Malinen, linux-kernel, Andrew Morton,
anton
On Mon, Feb 15, 2010 at 07:57:11PM +1100, Michael Neuling wrote:
>In message <20100215155821.7298.A69D9226@jp.fujitsu.com> you wrote:
>> >
>> >
>> > In message <20100214164023.GA2726@jm.kir.nu> you wrote:
>> > > It looks like the commit 803bf5ec259941936262d10ecc84511b76a20921
>> > > (fs/exec.c: restrict initial stack space expansion to rlimit) broke my
>> > > user mode Linux setup by somehow preventing system setup from running
>> > > properly (or killing some processes that try to mount things, etc.).
>> > > This commit turned up as the reason based on git bisect and reverting it
>> > > fixes my UML test setup (Ubuntu 9.10 on both host and in UML and AMD64
>> > > arch for both). I have no idea what exactly would be the main cause for
>> > > this issue, but this looks like a somewhat unfortunately timed
>> > > regression in 2.6.33-rc8.
>> > >
>> > > The failed run shows like this (with current linux-2.6.git):
>> > >
>> > > ...
>> > > EXT3-fs (ubda): mounted filesystem with writeback data mode
>> > > VFS: Mounted root (ext3 filesystem) readonly on device 98:0.
>> > > IRQ 3/console-write: IRQF_DISABLED is not guaranteed on shared IRQs
>> > > IRQ 2/console: IRQF_DISABLED is not guaranteed on shared IRQs
>> > > IRQ 10/winch: IRQF_DISABLED is not guaranteed on shared IRQs
>> > > IRQ 10/winch: IRQF_DISABLED is not guaranteed on shared IRQs
>> > > mountall: mount /sys/kernel/debug [218] killed by KILL signal
>> > > mountall: Filesystem could not be mounted: /sys/kernel/debug
>> > > mountall: mount /dev [219] killed by KILL signal
>> > > mountall: Filesystem could not be mounted: /dev
>> > > mountall: mount /tmp [220] killed by KILL signal
>> > > mountall: Filesystem could not be mounted: /tmp
>> > > mountall: mount /var/lock [222] killed by KILL signal
>> > > mountall: Filesystem could not be mounted: /var/lock
>> > > ...
>> > >
>> > >
>> > > With 803bf5ec reverted, UML comes up and the output looks like this:
>> > >
>> > > ...
>> > > EXT3-fs (ubda): mounted filesystem with writeback data mode
>> > > VFS: Mounted root (ext3 filesystem) readonly on device 98:0.
>> > > IRQ 3/console-write: IRQF_DISABLED is not guaranteed on shared IRQs
>> > > IRQ 2/console: IRQF_DISABLED is not guaranteed on shared IRQs
>> > > IRQ 10/winch: IRQF_DISABLED is not guaranteed on shared IRQs
>> > > IRQ 10/winch: IRQF_DISABLED is not guaranteed on shared IRQs
>> > > init: procps main process (226) terminated with status 255
>> > > fsck from util-linux-ng 2.16
>> > > ...
>> >
>> > Jouni,
>> >
>> > I can reproduce this now.
>> >
>> > We got the logic wrong in one of the cleanups and hence we aren't
>> > actually changing the stack reservation ever, when we intended on
>> > allocating up to 20 new pages.
>> >
>> > The:
>> > rlim_stack = min(rlim_stack, stack_size);
>> > always chooses stack_size hence we end up not changing the stack at all.
>> > This seems to cause fatal problems on UML, but is obviously not what was
>> > intended for archs as well.
>> >
>> > The following works for me on PPC64 64k and 4k pages and UML on x86_64.
>> >
>> > Let me know if it fixes it for you also.
>> >
>> > Mikey
>> >
>> >
>> > exec/fs: fix initial stack reservation
>> >
>> > 803bf5ec259941936262d10ecc84511b76a20921 (fs/exec.c: restrict initial
>> > stack space expansion to rlimit) attempts to limit the initial stack to
>> > 20*PAGE_SIZE. Unfortunately, in also attempting ensure the stack is not
>> > reduced in size, we ended up not changing the stack at all.
>> >
>> > This caused a regression in UML resulting in most guest processes to be
>> > killed.
>> >
>> > Signed-off-by: Michael Neuling <mikey@neuling.org>
>> > cc: <stable@kernel.org>
>> >
>> > diff --git a/fs/exec.c b/fs/exec.c
>> > index e95c692..e0e7b3c 100644
>> > --- a/fs/exec.c
>> > +++ b/fs/exec.c
>> > @@ -637,15 +637,16 @@ int setup_arg_pages(struct linux_binprm *bprm,
>> > * will align it up.
>> > */
>> > rlim_stack = rlimit(RLIMIT_STACK) & PAGE_MASK;
>> > - rlim_stack = min(rlim_stack, stack_size);
>> > #ifdef CONFIG_STACK_GROWSUP
>> > if (stack_size + stack_expand > rlim_stack)
>> > - stack_base = vma->vm_start + rlim_stack;
>> > + /* Expand only to rlimit, making sure not to shrink it */
>> > + stack_base = vma->vm_start + max(rlim_stack,stack_size);
>> > else
>> > stack_base = vma->vm_end + stack_expand;
>> > #else
>> > if (stack_size + stack_expand > rlim_stack)
>> > - stack_base = vma->vm_end - rlim_stack;
>> > + /* Expand only to rlimit, making sure not to shrink it */
>> > + stack_base = vma->vm_end - max(rlim_stack,stack_size);
>> > else
>> > stack_base = vma->vm_start - stack_expand;
>> > #endif
>>
>> - rlim_stack = min(rlim_stack, stack_size);
>> + /* Expand only to rlimit, making sure not to shrink it */
>> + rlim_stack = max(rlim_stack, stack_size);
>>
>> is better fix?
>
>Actually, I think we can just get rid of min() line altogether.
>expand_stack checks to make sure the stack is getting bigger, otherwise
>it does nothing. We don't need to bother with this check.
>
Right...
Above change makes me confused. :-( But now, everything is clear.
>The below works for me on UML x86_64 and ppc64 64k and 4k pages.
>
>Mikey
>
>exec/fs: fix initial stack reservation
>
>803bf5ec259941936262d10ecc84511b76a20921 (fs/exec.c: restrict initial
>stack space expansion to rlimit) attempts to limit the initial stack to
>20*PAGE_SIZE. Unfortunately, in attempting ensure the stack is not
>reduced in size, we ended up not changing the stack at all.
>
>This size reduction check is not necessary as the expand_stack call does
>this already.
>
>This caused a regression in UML resulting in most guest processes being
>killed.
>
>Signed-off-by: Michael Neuling <mikey@neuling.org>
>cc: <stable@kernel.org>
This one definitely better.
Acked-by: WANG Cong <xiyou.wangcong@gmail.com>
>---
> fs/exec.c | 1 -
> 1 file changed, 1 deletion(-)
>
>Index: linux-2.6-ozlabs/fs/exec.c
>===================================================================
>--- linux-2.6-ozlabs.orig/fs/exec.c
>+++ linux-2.6-ozlabs/fs/exec.c
>@@ -637,7 +637,6 @@ int setup_arg_pages(struct linux_binprm
> * will align it up.
> */
> rlim_stack = rlimit(RLIMIT_STACK) & PAGE_MASK;
>- rlim_stack = min(rlim_stack, stack_size);
> #ifdef CONFIG_STACK_GROWSUP
> if (stack_size + stack_expand > rlim_stack)
> stack_base = vma->vm_start + rlim_stack;
>--
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/
--
Live like a child, think like the god.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: 2.6.33-rc8 breaks UML with Restrict initial stack space expansion to rlimit
2010-02-15 6:59 ` KOSAKI Motohiro
2010-02-15 7:17 ` Jouni Malinen
2010-02-15 8:57 ` [PATCH] exec/fs: fix initial stack reservation Michael Neuling
@ 2010-02-15 8:57 ` Américo Wang
2010-02-15 9:03 ` KOSAKI Motohiro
2 siblings, 1 reply; 15+ messages in thread
From: Américo Wang @ 2010-02-15 8:57 UTC (permalink / raw)
To: KOSAKI Motohiro
Cc: Michael Neuling, Jouni Malinen, linux-kernel, Andrew Morton,
anton
On Mon, Feb 15, 2010 at 03:59:26PM +0900, KOSAKI Motohiro wrote:
>>
>>
>> In message <20100214164023.GA2726@jm.kir.nu> you wrote:
>> > It looks like the commit 803bf5ec259941936262d10ecc84511b76a20921
>> > (fs/exec.c: restrict initial stack space expansion to rlimit) broke my
>> > user mode Linux setup by somehow preventing system setup from running
>> > properly (or killing some processes that try to mount things, etc.).
>> > This commit turned up as the reason based on git bisect and reverting it
>> > fixes my UML test setup (Ubuntu 9.10 on both host and in UML and AMD64
>> > arch for both). I have no idea what exactly would be the main cause for
>> > this issue, but this looks like a somewhat unfortunately timed
>> > regression in 2.6.33-rc8.
>> >
>> > The failed run shows like this (with current linux-2.6.git):
>> >
>> > ...
>> > EXT3-fs (ubda): mounted filesystem with writeback data mode
>> > VFS: Mounted root (ext3 filesystem) readonly on device 98:0.
>> > IRQ 3/console-write: IRQF_DISABLED is not guaranteed on shared IRQs
>> > IRQ 2/console: IRQF_DISABLED is not guaranteed on shared IRQs
>> > IRQ 10/winch: IRQF_DISABLED is not guaranteed on shared IRQs
>> > IRQ 10/winch: IRQF_DISABLED is not guaranteed on shared IRQs
>> > mountall: mount /sys/kernel/debug [218] killed by KILL signal
>> > mountall: Filesystem could not be mounted: /sys/kernel/debug
>> > mountall: mount /dev [219] killed by KILL signal
>> > mountall: Filesystem could not be mounted: /dev
>> > mountall: mount /tmp [220] killed by KILL signal
>> > mountall: Filesystem could not be mounted: /tmp
>> > mountall: mount /var/lock [222] killed by KILL signal
>> > mountall: Filesystem could not be mounted: /var/lock
>> > ...
>> >
>> >
>> > With 803bf5ec reverted, UML comes up and the output looks like this:
>> >
>> > ...
>> > EXT3-fs (ubda): mounted filesystem with writeback data mode
>> > VFS: Mounted root (ext3 filesystem) readonly on device 98:0.
>> > IRQ 3/console-write: IRQF_DISABLED is not guaranteed on shared IRQs
>> > IRQ 2/console: IRQF_DISABLED is not guaranteed on shared IRQs
>> > IRQ 10/winch: IRQF_DISABLED is not guaranteed on shared IRQs
>> > IRQ 10/winch: IRQF_DISABLED is not guaranteed on shared IRQs
>> > init: procps main process (226) terminated with status 255
>> > fsck from util-linux-ng 2.16
>> > ...
>>
>> Jouni,
>>
>> I can reproduce this now.
>>
>> We got the logic wrong in one of the cleanups and hence we aren't
>> actually changing the stack reservation ever, when we intended on
>> allocating up to 20 new pages.
>>
>> The:
>> rlim_stack = min(rlim_stack, stack_size);
>> always chooses stack_size hence we end up not changing the stack at all.
>> This seems to cause fatal problems on UML, but is obviously not what was
>> intended for archs as well.
>>
>> The following works for me on PPC64 64k and 4k pages and UML on x86_64.
>>
>> Let me know if it fixes it for you also.
>>
>> Mikey
>>
>>
>> exec/fs: fix initial stack reservation
>>
>> 803bf5ec259941936262d10ecc84511b76a20921 (fs/exec.c: restrict initial
>> stack space expansion to rlimit) attempts to limit the initial stack to
>> 20*PAGE_SIZE. Unfortunately, in also attempting ensure the stack is not
>> reduced in size, we ended up not changing the stack at all.
>>
>> This caused a regression in UML resulting in most guest processes to be
>> killed.
>>
>> Signed-off-by: Michael Neuling <mikey@neuling.org>
>> cc: <stable@kernel.org>
>>
>> diff --git a/fs/exec.c b/fs/exec.c
>> index e95c692..e0e7b3c 100644
>> --- a/fs/exec.c
>> +++ b/fs/exec.c
>> @@ -637,15 +637,16 @@ int setup_arg_pages(struct linux_binprm *bprm,
>> * will align it up.
>> */
>> rlim_stack = rlimit(RLIMIT_STACK) & PAGE_MASK;
>> - rlim_stack = min(rlim_stack, stack_size);
>> #ifdef CONFIG_STACK_GROWSUP
>> if (stack_size + stack_expand > rlim_stack)
>> - stack_base = vma->vm_start + rlim_stack;
>> + /* Expand only to rlimit, making sure not to shrink it */
>> + stack_base = vma->vm_start + max(rlim_stack,stack_size);
>> else
>> stack_base = vma->vm_end + stack_expand;
>> #else
>> if (stack_size + stack_expand > rlim_stack)
>> - stack_base = vma->vm_end - rlim_stack;
>> + /* Expand only to rlimit, making sure not to shrink it */
>> + stack_base = vma->vm_end - max(rlim_stack,stack_size);
>> else
>> stack_base = vma->vm_start - stack_expand;
>> #endif
>
>- rlim_stack = min(rlim_stack, stack_size);
>+ /* Expand only to rlimit, making sure not to shrink it */
>+ rlim_stack = max(rlim_stack, stack_size);
>
>is better fix?
>
Odd. If this is the right fix, 'stack_size" will be able to exceed
stack rlimit, then Michael's previous rlimit patch will be useless.
Am I missing something?
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: 2.6.33-rc8 breaks UML with Restrict initial stack space expansion to rlimit
2010-02-15 8:57 ` 2.6.33-rc8 breaks UML with Restrict initial stack space expansion to rlimit Américo Wang
@ 2010-02-15 9:03 ` KOSAKI Motohiro
0 siblings, 0 replies; 15+ messages in thread
From: KOSAKI Motohiro @ 2010-02-15 9:03 UTC (permalink / raw)
To: Americo Wang
Cc: kosaki.motohiro, Michael Neuling, Jouni Malinen, linux-kernel,
Andrew Morton, anton
> On Mon, Feb 15, 2010 at 03:59:26PM +0900, KOSAKI Motohiro wrote:
> >>
> >>
> >> In message <20100214164023.GA2726@jm.kir.nu> you wrote:
> >> > It looks like the commit 803bf5ec259941936262d10ecc84511b76a20921
> >> > (fs/exec.c: restrict initial stack space expansion to rlimit) broke my
> >> > user mode Linux setup by somehow preventing system setup from running
> >> > properly (or killing some processes that try to mount things, etc.).
> >> > This commit turned up as the reason based on git bisect and reverting it
> >> > fixes my UML test setup (Ubuntu 9.10 on both host and in UML and AMD64
> >> > arch for both). I have no idea what exactly would be the main cause for
> >> > this issue, but this looks like a somewhat unfortunately timed
> >> > regression in 2.6.33-rc8.
> >> >
> >> > The failed run shows like this (with current linux-2.6.git):
> >> >
> >> > ...
> >> > EXT3-fs (ubda): mounted filesystem with writeback data mode
> >> > VFS: Mounted root (ext3 filesystem) readonly on device 98:0.
> >> > IRQ 3/console-write: IRQF_DISABLED is not guaranteed on shared IRQs
> >> > IRQ 2/console: IRQF_DISABLED is not guaranteed on shared IRQs
> >> > IRQ 10/winch: IRQF_DISABLED is not guaranteed on shared IRQs
> >> > IRQ 10/winch: IRQF_DISABLED is not guaranteed on shared IRQs
> >> > mountall: mount /sys/kernel/debug [218] killed by KILL signal
> >> > mountall: Filesystem could not be mounted: /sys/kernel/debug
> >> > mountall: mount /dev [219] killed by KILL signal
> >> > mountall: Filesystem could not be mounted: /dev
> >> > mountall: mount /tmp [220] killed by KILL signal
> >> > mountall: Filesystem could not be mounted: /tmp
> >> > mountall: mount /var/lock [222] killed by KILL signal
> >> > mountall: Filesystem could not be mounted: /var/lock
> >> > ...
> >> >
> >> >
> >> > With 803bf5ec reverted, UML comes up and the output looks like this:
> >> >
> >> > ...
> >> > EXT3-fs (ubda): mounted filesystem with writeback data mode
> >> > VFS: Mounted root (ext3 filesystem) readonly on device 98:0.
> >> > IRQ 3/console-write: IRQF_DISABLED is not guaranteed on shared IRQs
> >> > IRQ 2/console: IRQF_DISABLED is not guaranteed on shared IRQs
> >> > IRQ 10/winch: IRQF_DISABLED is not guaranteed on shared IRQs
> >> > IRQ 10/winch: IRQF_DISABLED is not guaranteed on shared IRQs
> >> > init: procps main process (226) terminated with status 255
> >> > fsck from util-linux-ng 2.16
> >> > ...
> >>
> >> Jouni,
> >>
> >> I can reproduce this now.
> >>
> >> We got the logic wrong in one of the cleanups and hence we aren't
> >> actually changing the stack reservation ever, when we intended on
> >> allocating up to 20 new pages.
> >>
> >> The:
> >> rlim_stack = min(rlim_stack, stack_size);
> >> always chooses stack_size hence we end up not changing the stack at all.
> >> This seems to cause fatal problems on UML, but is obviously not what was
> >> intended for archs as well.
> >>
> >> The following works for me on PPC64 64k and 4k pages and UML on x86_64.
> >>
> >> Let me know if it fixes it for you also.
> >>
> >> Mikey
> >>
> >>
> >> exec/fs: fix initial stack reservation
> >>
> >> 803bf5ec259941936262d10ecc84511b76a20921 (fs/exec.c: restrict initial
> >> stack space expansion to rlimit) attempts to limit the initial stack to
> >> 20*PAGE_SIZE. Unfortunately, in also attempting ensure the stack is not
> >> reduced in size, we ended up not changing the stack at all.
> >>
> >> This caused a regression in UML resulting in most guest processes to be
> >> killed.
> >>
> >> Signed-off-by: Michael Neuling <mikey@neuling.org>
> >> cc: <stable@kernel.org>
> >>
> >> diff --git a/fs/exec.c b/fs/exec.c
> >> index e95c692..e0e7b3c 100644
> >> --- a/fs/exec.c
> >> +++ b/fs/exec.c
> >> @@ -637,15 +637,16 @@ int setup_arg_pages(struct linux_binprm *bprm,
> >> * will align it up.
> >> */
> >> rlim_stack = rlimit(RLIMIT_STACK) & PAGE_MASK;
> >> - rlim_stack = min(rlim_stack, stack_size);
> >> #ifdef CONFIG_STACK_GROWSUP
> >> if (stack_size + stack_expand > rlim_stack)
> >> - stack_base = vma->vm_start + rlim_stack;
> >> + /* Expand only to rlimit, making sure not to shrink it */
> >> + stack_base = vma->vm_start + max(rlim_stack,stack_size);
> >> else
> >> stack_base = vma->vm_end + stack_expand;
> >> #else
> >> if (stack_size + stack_expand > rlim_stack)
> >> - stack_base = vma->vm_end - rlim_stack;
> >> + /* Expand only to rlimit, making sure not to shrink it */
> >> + stack_base = vma->vm_end - max(rlim_stack,stack_size);
> >> else
> >> stack_base = vma->vm_start - stack_expand;
> >> #endif
> >
> >- rlim_stack = min(rlim_stack, stack_size);
> >+ /* Expand only to rlimit, making sure not to shrink it */
> >+ rlim_stack = max(rlim_stack, stack_size);
> >
> >is better fix?
> >
>
> Odd. If this is the right fix, 'stack_size" will be able to exceed
> stack rlimit, then Michael's previous rlimit patch will be useless.
> Am I missing something?
>
This function is in exec processing, IOW user process doesn't start yet,
and stack_size is always PAGE_SIZE.
No problem.
This expression only mean we parse "ulimit -s 1" as "ulimit -s 4".
(round up to one-page)
^ permalink raw reply [flat|nested] 15+ messages in thread