public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* recent change to exit_mmap
@ 2003-01-18  6:05 Anton Blanchard
  2003-01-18  6:44 ` Andrew Morton
  2003-01-18  7:00 ` Andrew Morton
  0 siblings, 2 replies; 10+ messages in thread
From: Anton Blanchard @ 2003-01-18  6:05 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel


Hi,

On ppc64 a 32bit task has 4GB and a 64bit task has 2TB of address space.

We use a bit in the thread struct to decide which limit to apply against
TASK_SIZE:

#define TASK_SIZE (test_thread_flag(TIF_32BIT) ? \
                TASK_SIZE_USER32 : TASK_SIZE_USER64)

The TIF_32BIT flag gets set in the arch specific SET_PERSONALITY hook
in load_elf_binary.

After the recent changes in mm/mmap.c, the following sequence of events
happens:

1. a 64bit task tries to exec a 32bit one
2. we reach load_elf_binary
3. call SET_PERSONALITY which sets TIF_32BIT to true
4. call flush_old_exec->exec_mmap->mmput->exit_mmap
5. call unmap_vmas(,,,,TASK_SIZE,) which only flushes mappings below 4GB
6. BUG_ON in exit_mmap

It seems with the TIF_32BIT scheme we have a window between
SET_PERSONALITY until exec returns where TASK_SIZE might be considered
incorrect (although at which exact point to you decide that, yes we are
now in the new context).

Any ideas on how to fix this? Maybe we can move SET_PERSONALITY down
after flush_old_exec.

Anton

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: recent change to exit_mmap
  2003-01-18  6:05 recent change to exit_mmap Anton Blanchard
@ 2003-01-18  6:44 ` Andrew Morton
  2003-01-18  7:42   ` David Mosberger
  2003-01-18  7:00 ` Andrew Morton
  1 sibling, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2003-01-18  6:44 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: akpm, linux-kernel

Anton Blanchard <anton@samba.org> wrote:
>
> 
> Hi,
> 
> On ppc64 a 32bit task has 4GB and a 64bit task has 2TB of address space.
> 
> We use a bit in the thread struct to decide which limit to apply against
> TASK_SIZE:
> 
> #define TASK_SIZE (test_thread_flag(TIF_32BIT) ? \
>                 TASK_SIZE_USER32 : TASK_SIZE_USER64)
> 
> The TIF_32BIT flag gets set in the arch specific SET_PERSONALITY hook
> in load_elf_binary.
> 
> After the recent changes in mm/mmap.c, the following sequence of events
> happens:
> 
> 1. a 64bit task tries to exec a 32bit one
> 2. we reach load_elf_binary
> 3. call SET_PERSONALITY which sets TIF_32BIT to true
> 4. call flush_old_exec->exec_mmap->mmput->exit_mmap
> 5. call unmap_vmas(,,,,TASK_SIZE,) which only flushes mappings below 4GB
> 6. BUG_ON in exit_mmap
> 
> It seems with the TIF_32BIT scheme we have a window between
> SET_PERSONALITY until exec returns where TASK_SIZE might be considered
> incorrect (although at which exact point to you decide that, yes we are
> now in the new context).
> 
> Any ideas on how to fix this? Maybe we can move SET_PERSONALITY down
> after flush_old_exec.

(Does this only apply to elf?)


If the old process has 64-bit virtual address space and the new process has
32-bit, you want to unmap the 64-bit space, yes?

And if the old process is 32-bit and the new process is 64-bit (possible?)
you want to unmap the 32-bit space.  But unmapping 64-bit space here is
harmless, yes?

Yes, it would be more logical to still be running with the personality of the
old process when we try to expunge all its mappings.  However I suspect we
need all those mappings to grab the exec arguments.

How about we make exit_mmap() be independent of the personality again by
doing something like this?

Looks like ia64 needs work, too...



 asm-ppc64/processor.h |    1 +
 mmap.c                |   18 +++++++++++++++++-
 2 files changed, 18 insertions(+), 1 deletion(-)

diff -puN mm/mmap.c~exit_mmap-fix mm/mmap.c
--- 25/mm/mmap.c~exit_mmap-fix	2003-01-17 22:35:13.000000000 -0800
+++ 25-akpm/mm/mmap.c	2003-01-17 22:41:10.000000000 -0800
@@ -1379,6 +1379,22 @@ void build_mmap_rb(struct mm_struct * mm
 	}
 }
 
+/*
+ * During exit_mmap, TASK_SIZE is not a reliable indication of the virtual
+ * size of the mm which is being torn down.  Because on the exec() path, this
+ * process may have switched its personality from 64-bit to 32-bit prior to
+ * calling exit_mmap().  So TASK_SIZE returns a value suitable for a 32-bit
+ * process, and not the 64-bit process whose mm we need to invalidate.
+ *
+ * So what we do is to always unmap the largest virtual address space which
+ * the architecture supports.  unmap_vmas() will then unmap every VMA in the
+ * mm, which is what we want to happen here.
+ */
+
+#ifndef TASK_SIZE_MAX
+#define TASK_SIZE_MAX TASK_SIZE
+#endif
+
 /* Release all mmaps. */
 void exit_mmap(struct mm_struct *mm)
 {
@@ -1395,7 +1411,7 @@ void exit_mmap(struct mm_struct *mm)
 	tlb = tlb_gather_mmu(mm, 1);
 	flush_cache_mm(mm);
 	mm->map_count -= unmap_vmas(&tlb, mm, mm->mmap, 0,
-					TASK_SIZE, &nr_accounted);
+					TASK_SIZE_MAX, &nr_accounted);
 	vm_unacct_memory(nr_accounted);
 	BUG_ON(mm->map_count);	/* This is just debugging */
 	clear_page_tables(tlb, FIRST_USER_PGD_NR, USER_PTRS_PER_PGD);
diff -puN include/asm-ppc64/processor.h~exit_mmap-fix include/asm-ppc64/processor.h
--- 25/include/asm-ppc64/processor.h~exit_mmap-fix	2003-01-17 22:41:32.000000000 -0800
+++ 25-akpm/include/asm-ppc64/processor.h	2003-01-17 22:42:03.000000000 -0800
@@ -630,6 +630,7 @@ extern struct task_struct *last_task_use
 
 #define TASK_SIZE (test_thread_flag(TIF_32BIT) ? \
 		TASK_SIZE_USER32 : TASK_SIZE_USER64)
+#define TASK_SIZE_MAX	TASK_SIZE_USER64
 #endif /* __KERNEL__ */
 
 

_


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: recent change to exit_mmap
  2003-01-18  6:05 recent change to exit_mmap Anton Blanchard
  2003-01-18  6:44 ` Andrew Morton
@ 2003-01-18  7:00 ` Andrew Morton
  2003-01-18  7:23   ` Anton Blanchard
  1 sibling, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2003-01-18  7:00 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: linux-kernel

Anton Blanchard <anton@samba.org> wrote:
>
> It seems with the TIF_32BIT scheme we have a window between
> SET_PERSONALITY until exec returns where TASK_SIZE might be considered
> incorrect (although at which exact point to you decide that, yes we are
> now in the new context).
> 
> Any ideas on how to fix this? Maybe we can move SET_PERSONALITY down
> after flush_old_exec.
> 

On seconds thoughts...

Are the first two SET_PERSONALITY() calls in there actually doing anything? 
Perhaps you're right, and we only need the one at line 615, whcih is in the
right place?


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: recent change to exit_mmap
  2003-01-18  7:00 ` Andrew Morton
@ 2003-01-18  7:23   ` Anton Blanchard
  2003-01-18 10:44     ` Andrew Morton
  0 siblings, 1 reply; 10+ messages in thread
From: Anton Blanchard @ 2003-01-18  7:23 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel


> On seconds thoughts...
> 
> Are the first two SET_PERSONALITY() calls in there actually doing anything? 
> Perhaps you're right, and we only need the one at line 615, whcih is in the
> right place?

Sounds good. The following patch tests out OK.

Anton

===== fs/binfmt_elf.c 1.34 vs edited =====
--- 1.34/fs/binfmt_elf.c	Mon Nov 25 19:59:02 2002
+++ edited/fs/binfmt_elf.c	Sat Jan 18 18:16:52 2003
@@ -536,8 +536,6 @@
 			    strcmp(elf_interpreter,"/usr/lib/ld.so.1") == 0)
 				ibcs2_interpreter = 1;
 
-			SET_PERSONALITY(elf_ex, ibcs2_interpreter);
-
 			interpreter = open_exec(elf_interpreter);
 			retval = PTR_ERR(interpreter);
 			if (IS_ERR(interpreter))
@@ -578,9 +576,6 @@
 			// printk(KERN_WARNING "ELF: Ambiguous type, using ELF\n");
 			interpreter_type = INTERPRETER_ELF;
 		}
-	} else {
-		/* Executables without an interpreter also need a personality  */
-		SET_PERSONALITY(elf_ex, ibcs2_interpreter);
 	}
 
 	/* OK, we are done with that, now set up the arg stuff,

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: recent change to exit_mmap
  2003-01-18  6:44 ` Andrew Morton
@ 2003-01-18  7:42   ` David Mosberger
  2003-01-18  7:53     ` Andrew Morton
  0 siblings, 1 reply; 10+ messages in thread
From: David Mosberger @ 2003-01-18  7:42 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Anton Blanchard, akpm, linux-kernel

>>>>> On Fri, 17 Jan 2003 22:44:44 -0800, Andrew Morton <akpm@digeo.com> said:

  Andrew> Looks like ia64 needs work, too...

Yes, should be the same problem there.  The fix looks fine to me.
(Let's just hope I remember it when Linus puts it in his tree...).

Thanks,

	--david

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: recent change to exit_mmap
  2003-01-18  7:42   ` David Mosberger
@ 2003-01-18  7:53     ` Andrew Morton
  2003-01-18  7:58       ` David Mosberger
  0 siblings, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2003-01-18  7:53 UTC (permalink / raw)
  To: davidm; +Cc: davidm, anton, linux-kernel

David Mosberger <davidm@napali.hpl.hp.com> wrote:
>
> >>>>> On Fri, 17 Jan 2003 22:44:44 -0800, Andrew Morton <akpm@digeo.com> said:
> 
>   Andrew> Looks like ia64 needs work, too...
> 
> Yes, should be the same problem there.  The fix looks fine to me.
> (Let's just hope I remember it when Linus puts it in his tree...).
> 

I've updated that patch to cover ia64, but I think we'll run with the other
approach - just remove those calls to SET_PERSONALITY().

They just seem illogical anyway - why are we switching into the new image's
personality prior to unmapping the old image's memory?


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: recent change to exit_mmap
  2003-01-18  7:53     ` Andrew Morton
@ 2003-01-18  7:58       ` David Mosberger
  2003-01-18  8:15         ` Andrew Morton
  0 siblings, 1 reply; 10+ messages in thread
From: David Mosberger @ 2003-01-18  7:58 UTC (permalink / raw)
  To: Andrew Morton; +Cc: davidm, davidm, anton, linux-kernel

>>>>> On Fri, 17 Jan 2003 23:53:17 -0800, Andrew Morton <akpm@digeo.com> said:

  Andrew> David Mosberger <davidm@napali.hpl.hp.com> wrote:
  >>  >>>>> On Fri, 17 Jan 2003 22:44:44 -0800, Andrew Morton
  >> <akpm@digeo.com> said:

  Andrew> Looks like ia64 needs work, too...
  >>  Yes, should be the same problem there.  The fix looks fine to
  >> me.  (Let's just hope I remember it when Linus puts it in his
  >> tree...).

  Andrew> I've updated that patch to cover ia64, but I think we'll run
  Andrew> with the other approach - just remove those calls to
  Andrew> SET_PERSONALITY().

  Andrew> They just seem illogical anyway - why are we switching into
  Andrew> the new image's personality prior to unmapping the old
  Andrew> image's memory?

I don't know why SET_PERSONALITY() came to be where it is now, but it
does make some sense to me.  One thing that comes to mind: on ia64, we
normally don't map data segments with execute permission but for
backwards-compatibility, we need to do that for x86 binaries.  I think
there might be a problem with that if SET_PERSONALITY() was done too
late.  Certainly something that could be fixed, but I suspect a
similar ordering issue (perhaps on SPARC?) might have triggered the
current placement of SET_PERSONALITY().

	--david

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: recent change to exit_mmap
  2003-01-18  7:58       ` David Mosberger
@ 2003-01-18  8:15         ` Andrew Morton
  2003-01-18  8:15           ` David Mosberger
  0 siblings, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2003-01-18  8:15 UTC (permalink / raw)
  To: davidm; +Cc: davidm, anton, linux-kernel

David Mosberger <davidm@napali.hpl.hp.com> wrote:
>
> I don't know why SET_PERSONALITY() came to be where it is now, but it
> does make some sense to me.  One thing that comes to mind: on ia64, we
> normally don't map data segments with execute permission but for
> backwards-compatibility, we need to do that for x86 binaries.  I think
> there might be a problem with that if SET_PERSONALITY() was done too
> late.  Certainly something that could be fixed, but I suspect a
> similar ordering issue (perhaps on SPARC?) might have triggered the
> current placement of SET_PERSONALITY().
> 

hmm.  Seems that all the activities between the two first SET_PERSONALITY()
calls and the flush_old_exec() are pretty innocuous.  And no mappings could
be set up there, because flush_old_exec() would remove them again.

I'll ask Dave about it.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: recent change to exit_mmap
  2003-01-18  8:15         ` Andrew Morton
@ 2003-01-18  8:15           ` David Mosberger
  0 siblings, 0 replies; 10+ messages in thread
From: David Mosberger @ 2003-01-18  8:15 UTC (permalink / raw)
  To: Andrew Morton; +Cc: davidm, davidm, anton, linux-kernel

>>>>> On Sat, 18 Jan 2003 00:15:46 -0800, Andrew Morton <akpm@digeo.com> said:

  Andrew> hmm.  Seems that all the activities between the two first
  Andrew> SET_PERSONALITY() calls and the flush_old_exec() are pretty
  Andrew> innocuous.  And no mappings could be set up there, because
  Andrew> flush_old_exec() would remove them again.

That's true.  Perhaps it is just cruft.

	--david

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: recent change to exit_mmap
  2003-01-18  7:23   ` Anton Blanchard
@ 2003-01-18 10:44     ` Andrew Morton
  0 siblings, 0 replies; 10+ messages in thread
From: Andrew Morton @ 2003-01-18 10:44 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: linux-kernel, David Mosberger

Anton Blanchard <anton@samba.org> wrote:
>
> 
> > On seconds thoughts...
> > 
> > Are the first two SET_PERSONALITY() calls in there actually doing anything? 
> > Perhaps you're right, and we only need the one at line 615, whcih is in the
> > right place?
> 
> Sounds good. The following patch tests out OK.
> 
> Anton
> 
> ===== fs/binfmt_elf.c 1.34 vs edited =====
> --- 1.34/fs/binfmt_elf.c	Mon Nov 25 19:59:02 2002
> +++ edited/fs/binfmt_elf.c	Sat Jan 18 18:16:52 2003
> @@ -536,8 +536,6 @@
>  			    strcmp(elf_interpreter,"/usr/lib/ld.so.1") == 0)
>  				ibcs2_interpreter = 1;
>  
> -			SET_PERSONALITY(elf_ex, ibcs2_interpreter);
> -
>  			interpreter = open_exec(elf_interpreter);
>  			retval = PTR_ERR(interpreter);
>  			if (IS_ERR(interpreter))
> @@ -578,9 +576,6 @@
>  			// printk(KERN_WARNING "ELF: Ambiguous type, using ELF\n");
>  			interpreter_type = INTERPRETER_ELF;
>  		}
> -	} else {
> -		/* Executables without an interpreter also need a personality  */
> -		SET_PERSONALITY(elf_ex, ibcs2_interpreter);
>  	}

Unfortunately it isn't right.  See, we set the peronality prior to looking up
the elf interpreter because different types of executables can have different
filesystem namespaces.  open_exec() needs it.  The IBCS stuff needs this, as
well as (thanks davem):

 "When we have a personality like "SunOS" or whatever, we allow the usage
  of an alternate '/' for file lookups, it trumps whatever would be found in
  the normal /, so we could put the SunOS ld.so under /emul/SunOS/lib for
  example using an emulation alternative-root of /emul/SunOS"

And we can't dump the currect exec image until we've determined that the
interpreter is available.

So it's all a bit of a stew.  We're back to the original patch.  Could you
please test it?  (This includes the ia64 bit)


diff -puN mm/mmap.c~exit_mmap-fix mm/mmap.c
--- 25/mm/mmap.c~exit_mmap-fix	2003-01-17 22:35:13.000000000 -0800
+++ 25-akpm/mm/mmap.c	2003-01-17 22:41:10.000000000 -0800
@@ -1379,6 +1379,22 @@ void build_mmap_rb(struct mm_struct * mm
 	}
 }
 
+/*
+ * During exit_mmap, TASK_SIZE is not a reliable indication of the virtual
+ * size of the mm which is being torn down.  Because on the exec() path, this
+ * process may have switched its personality from 64-bit to 32-bit prior to
+ * calling exit_mmap().  So TASK_SIZE returns a value suitable for a 32-bit
+ * process, and not the 64-bit process whose mm we need to empty.
+ *
+ * So what we do is to always unmap the largest virtual address space which
+ * the architecture supports.  unmap_vmas() will then unmap every VMA in the
+ * mm, which is what we want to happen here.
+ */
+
+#ifndef TASK_SIZE_MAX
+#define TASK_SIZE_MAX TASK_SIZE
+#endif
+
 /* Release all mmaps. */
 void exit_mmap(struct mm_struct *mm)
 {
@@ -1395,7 +1411,7 @@ void exit_mmap(struct mm_struct *mm)
 	tlb = tlb_gather_mmu(mm, 1);
 	flush_cache_mm(mm);
 	mm->map_count -= unmap_vmas(&tlb, mm, mm->mmap, 0,
-					TASK_SIZE, &nr_accounted);
+					TASK_SIZE_MAX, &nr_accounted);
 	vm_unacct_memory(nr_accounted);
 	BUG_ON(mm->map_count);	/* This is just debugging */
 	clear_page_tables(tlb, FIRST_USER_PGD_NR, USER_PTRS_PER_PGD);
diff -puN include/asm-ppc64/processor.h~exit_mmap-fix include/asm-ppc64/processor.h
--- 25/include/asm-ppc64/processor.h~exit_mmap-fix	2003-01-17 22:41:32.000000000 -0800
+++ 25-akpm/include/asm-ppc64/processor.h	2003-01-17 22:42:03.000000000 -0800
@@ -630,6 +630,7 @@ extern struct task_struct *last_task_use
 
 #define TASK_SIZE (test_thread_flag(TIF_32BIT) ? \
 		TASK_SIZE_USER32 : TASK_SIZE_USER64)
+#define TASK_SIZE_MAX	TASK_SIZE_USER64
 #endif /* __KERNEL__ */
 
 
diff -puN include/asm-ia64/processor.h~exit_mmap-fix include/asm-ia64/processor.h
--- 25/include/asm-ia64/processor.h~exit_mmap-fix	2003-01-17 22:46:08.000000000 -0800
+++ 25-akpm/include/asm-ia64/processor.h	2003-01-17 22:47:19.000000000 -0800
@@ -39,6 +39,12 @@
 #define TASK_SIZE		(current->thread.task_size)
 
 /*
+ * This determines the virtual address space which must be torn down
+ * during exec.
+ *
+#define TASK_SIZE_MAX		DEFAULT_TASK_SIZE
+
+/*
  * This decides where the kernel will search for a free chunk of vm
  * space during mmap's.
  */

_


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2003-01-18 10:34 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-01-18  6:05 recent change to exit_mmap Anton Blanchard
2003-01-18  6:44 ` Andrew Morton
2003-01-18  7:42   ` David Mosberger
2003-01-18  7:53     ` Andrew Morton
2003-01-18  7:58       ` David Mosberger
2003-01-18  8:15         ` Andrew Morton
2003-01-18  8:15           ` David Mosberger
2003-01-18  7:00 ` Andrew Morton
2003-01-18  7:23   ` Anton Blanchard
2003-01-18 10:44     ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox