* RE: [discuss] [Patch] X86_64 TASK_SIZE cleanup - more comments
@ 2005-04-21 16:09 Zou, Nanhai
2005-04-23 9:10 ` Alexander Nyberg
0 siblings, 1 reply; 4+ messages in thread
From: Zou, Nanhai @ 2005-04-21 16:09 UTC (permalink / raw)
To: Andi Kleen; +Cc: discuss, linux-kernel, Siddha, Suresh B
Hi Andi,
PPC64 IA64 and S390 use variable size TASK_SIZE for 32 bit and 64 bit
program.
I feel it is hard to maintain if we try to audit TASK_SIZE use
everywhere, because most of them are in generic code.
And maintaining those audit code in separate place is also a problem.
E.g. in current 32 bit emulation code
TASK_SIZE is defined as 0xfffffff in elf loading, but defined as
0xffffe000 in mmaping.
How did that earlier patch break applications?
Thanks
Zou Nan hai
> -----Original Message-----
> From: Andi Kleen [mailto:ak@suse.de]
> Sent: Thursday, April 21, 2005 7:54 PM
> To: Zou, Nanhai
> Cc: Andi Kleen; discuss@x86-64.org; linux-kernel@vger.kernel.org;
Siddha,
> Suresh B
> Subject: Re: [discuss] [Patch] X86_64 TASK_SIZE cleanup - more
comments
>
>
> Another comment:
>
> In general I am not too happy about the variable size TASK_SIZE.
> There was a patch for this earlier, but it broke 32bit emulation
> completely. And I think it needs auditing of all uses of TASK_SIZE,
> because I suspect there are more bugs lurking in it.
>
> The way hugetlb etc. mmap were supposed to be handled was to
> let the mmap succeed and then check in the mmap wrapper
> if any address is > 4GB and free it. Probably that code
> has some problems or got broken (I think it worked at least
> in 2.4, but there might have been regressions later)
>
> -Andi
^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: [discuss] [Patch] X86_64 TASK_SIZE cleanup - more comments
@ 2005-04-27 16:20 Zou, Nanhai
0 siblings, 0 replies; 4+ messages in thread
From: Zou, Nanhai @ 2005-04-27 16:20 UTC (permalink / raw)
To: Alexander Nyberg; +Cc: Andi Kleen, discuss, linux-kernel, Siddha, Suresh B
[-- Attachment #1: Type: text/plain, Size: 1878 bytes --]
Hi,
That patch is wrong for earlier kernel. There will be a memory leak on
page tables.
The reason is syscall page was not backup by VMA, clear_page_tables will
only clear up to TASK_SIZE, so there will be a leak on syscall page.
But in 2.6.11 kernel, clear_page_range will clear more page tables than
that.
I have tested my patch on 2.6.11 kernel.
Do a chroot bash to 32 bit system, then a 32 bit ltp test. No issue was
found.
However I found the patch is not compatible with recent 2.6.12-rc3
kernel.
In rc3 kernel, syscall page is backup with VMA, thus the special case
code to deal with syscall page was removed from arch/x86_64/mm/fault.c.
So I changed the patch to rebase it on 2.6.12-rc3,
This patch has been tested on 2.6.12-rc3
by chroot to a 32 bit system,
then follow an 32 bit ltp test.
Signed-off-by: Suresh Siddha <suresh.b.siddha@xxxxxxxxx>
Signed-off-by: Zou Nan hai <Nanhai.zou@xxxxxxxxx>
> -----Original Message-----
> From: Alexander Nyberg [mailto:alexn@dsv.su.se]
> Sent: Saturday, April 23, 2005 5:10 PM
> To: Zou, Nanhai
> Cc: Andi Kleen; discuss@x86-64.org; linux-kernel@vger.kernel.org;
Siddha,
> Suresh B
> Subject: RE: [discuss] [Patch] X86_64 TASK_SIZE cleanup - more
comments
>
> > PPC64 IA64 and S390 use variable size TASK_SIZE for 32 bit and 64
bit
> > program.
> > I feel it is hard to maintain if we try to audit TASK_SIZE use
> > everywhere, because most of them are in generic code.
> >
> > And maintaining those audit code in separate place is also a
problem.
> > E.g. in current 32 bit emulation code
> > TASK_SIZE is defined as 0xfffffff in elf loading, but defined as
> > 0xffffe000 in mmaping.
> >
> > How did that earlier patch break applications?
>
> http://www.ussg.iu.edu/hypermail/linux/kernel/0408.2/1605.html
>
> I never investigated specifically what broke down
[-- Attachment #2: x86_64-compat-tasksize-fix.patch --]
[-- Type: application/octet-stream, Size: 4359 bytes --]
diff -Nraup a/arch/x86_64/ia32/ia32_binfmt.c b/arch/x86_64/ia32/ia32_binfmt.c
--- a/arch/x86_64/ia32/ia32_binfmt.c 2005-04-27 07:04:33.000000000 +0800
+++ b/arch/x86_64/ia32/ia32_binfmt.c 2005-04-27 07:09:33.000000000 +0800
@@ -46,7 +46,7 @@ struct elf_phdr;
#define IA32_EMULATOR 1
-#define ELF_ET_DYN_BASE (TASK_UNMAPPED_32 + 0x1000000)
+#define ELF_ET_DYN_BASE (TASK_UNMAPPED_BASE + 0x1000000)
#undef ELF_ARCH
#define ELF_ARCH EM_386
@@ -307,9 +307,6 @@ MODULE_AUTHOR("Eric Youngdale, Andi Klee
#define elf_addr_t __u32
-#undef TASK_SIZE
-#define TASK_SIZE 0xffffffff
-
static void elf32_init(struct pt_regs *);
#define ARCH_HAS_SETUP_ADDITIONAL_PAGES 1
diff -Nraup a/arch/x86_64/kernel/sys_x86_64.c b/arch/x86_64/kernel/sys_x86_64.c
--- a/arch/x86_64/kernel/sys_x86_64.c 2005-04-27 07:04:33.000000000 +0800
+++ b/arch/x86_64/kernel/sys_x86_64.c 2005-04-27 07:09:58.000000000 +0800
@@ -68,13 +68,7 @@ out:
static void find_start_end(unsigned long flags, unsigned long *begin,
unsigned long *end)
{
-#ifdef CONFIG_IA32_EMULATION
- if (test_thread_flag(TIF_IA32)) {
- *begin = TASK_UNMAPPED_32;
- *end = IA32_PAGE_OFFSET;
- } else
-#endif
- if (flags & MAP_32BIT) {
+ if (!test_thread_flag(TIF_IA32) && (flags & MAP_32BIT)) {
/* This is usually used needed to map code in small
model, so it needs to be in the first 31bit. Limit
it to that. This means we need to move the
@@ -84,10 +78,10 @@ static void find_start_end(unsigned long
of playground for now. -AK */
*begin = 0x40000000;
*end = 0x80000000;
- } else {
- *begin = TASK_UNMAPPED_64;
+ } else {
+ *begin = TASK_UNMAPPED_BASE;
*end = TASK_SIZE;
- }
+ }
}
unsigned long
diff -Nraup a/arch/x86_64/mm/fault.c b/arch/x86_64/mm/fault.c
--- a/arch/x86_64/mm/fault.c 2005-04-27 07:04:33.000000000 +0800
+++ b/arch/x86_64/mm/fault.c 2005-04-27 07:08:39.000000000 +0800
@@ -74,7 +74,7 @@ static noinline int is_prefetch(struct p
instr = (unsigned char *)convert_rip_to_linear(current, regs);
max_instr = instr + 15;
- if ((regs->cs & 3) != 0 && instr >= (unsigned char *)TASK_SIZE)
+ if ((regs->cs & 3) != 0 && instr >= (unsigned char *)TASK_SIZE64)
return 0;
while (scan_more && instr < max_instr) {
@@ -345,7 +345,7 @@ asmlinkage void do_page_fault(struct pt_
* (error_code & 4) == 0, and that the fault was not a
* protection error (error_code & 1) == 0.
*/
- if (unlikely(address >= TASK_SIZE)) {
+ if (unlikely(address >= TASK_SIZE64)) {
if (!(error_code & 5)) {
if (vmalloc_fault(address) < 0)
goto bad_area_nosemaphore;
@@ -481,7 +481,7 @@ bad_area_nosemaphore:
tsk->thread.cr2 = address;
/* Kernel addresses are always protection faults */
- tsk->thread.error_code = error_code | (address >= TASK_SIZE);
+ tsk->thread.error_code = error_code | (address >= TASK_SIZE64);
tsk->thread.trap_no = 14;
info.si_signo = SIGSEGV;
info.si_errno = 0;
diff -Nraup a/include/asm-x86_64/a.out.h b/include/asm-x86_64/a.out.h
--- a/include/asm-x86_64/a.out.h 2005-04-27 07:04:57.000000000 +0800
+++ b/include/asm-x86_64/a.out.h 2005-04-27 07:10:31.000000000 +0800
@@ -21,7 +21,7 @@ struct exec
#ifdef __KERNEL__
#include <linux/thread_info.h>
-#define STACK_TOP (test_thread_flag(TIF_IA32) ? IA32_PAGE_OFFSET : TASK_SIZE)
+#define STACK_TOP TASK_SIZE
#endif
#endif /* __A_OUT_GNU_H__ */
diff -Nraup a/include/asm-x86_64/processor.h b/include/asm-x86_64/processor.h
--- a/include/asm-x86_64/processor.h 2005-04-27 07:04:57.000000000 +0800
+++ b/include/asm-x86_64/processor.h 2005-04-27 07:11:12.000000000 +0800
@@ -161,16 +161,15 @@ static inline void clear_in_cr4 (unsigne
/*
* User space process size. 47bits.
*/
-#define TASK_SIZE (0x800000000000UL)
+
+#define TASK_SIZE64 (0x800000000000UL)
+#define TASK_SIZE (test_thread_flag(TIF_IA32) ? IA32_PAGE_OFFSET : TASK_SIZE64)
/* This decides where the kernel will search for a free chunk of vm
* space during mmap's.
*/
#define IA32_PAGE_OFFSET ((current->personality & ADDR_LIMIT_3GB) ? 0xc0000000 : 0xFFFFe000)
-#define TASK_UNMAPPED_32 PAGE_ALIGN(IA32_PAGE_OFFSET/3)
-#define TASK_UNMAPPED_64 PAGE_ALIGN(TASK_SIZE/3)
-#define TASK_UNMAPPED_BASE \
- (test_thread_flag(TIF_IA32) ? TASK_UNMAPPED_32 : TASK_UNMAPPED_64)
+#define TASK_UNMAPPED_BASE PAGE_ALIGN(TASK_SIZE/3)
/*
* Size of io_bitmap.
^ permalink raw reply [flat|nested] 4+ messages in thread* RE: [discuss] [Patch] X86_64 TASK_SIZE cleanup
@ 2005-04-20 17:17 Zou, Nanhai
2005-04-21 11:54 ` [discuss] [Patch] X86_64 TASK_SIZE cleanup - more comments Andi Kleen
0 siblings, 1 reply; 4+ messages in thread
From: Zou, Nanhai @ 2005-04-20 17:17 UTC (permalink / raw)
To: Andi Kleen; +Cc: discuss, linux-kernel, Siddha, Suresh B
Hi Andi,
What is your comment on this patch?
Here is another example bug this patch will fix.
The following piece of code will get a success mmap even if compiled
with -m32.
int *p;
p = mmap((void *)(0xFFFFE000UL), 0x10000UL, PROT_READ|PROT_WRITE,
MAP_FIXED|MAP_PRIVATE|MAP_ANON, 0, 0);
I believe there are other kind of corner case bugs around mm and fs.
e.g in mremap and munmap.
Those bugs will be fixed by this patch.
Zou Nan hai
> -----Original Message-----
> From: Zou, Nanhai
> Sent: Tuesday, April 19, 2005 12:37 AM
> To: 'Andi Kleen'
> Cc: discuss@x86-64.org; linux-kernel@vger.kernel.org; Siddha, Suresh B
> Subject: RE: [discuss] [Patch] X86_64 TASK_SIZE cleanup
>
>
> When a 32bit program is mapping a lot of hugepage vm_areas,
> hugetlb_get_unmapped_area may search beyond 4G, then the program will
get a
> SIGFAULT instead of an errno of ENOMEM.
> This patch will fix that.
> I believe there are other inconsistent cases in generic code like mm
and fs.
>
> Zou Nan hai
>
> > -----Original Message-----
> > From: Andi Kleen [mailto:ak@suse.de]
> > Sent: Monday, April 18, 2005 5:06 PM
> > To: Zou, Nanhai
> > Cc: discuss@x86-64.org; Andi Kleen; linux-kernel@vger.kernel.org;
Siddha,
> > Suresh B
> > Subject: Re: [discuss] [Patch] X86_64 TASK_SIZE cleanup
> >
> > On Sat, Apr 16, 2005 at 09:34:25AM +0800, Zou, Nanhai wrote:
> > >
> > > Hi,
> > > This patch will clean up the X86_64 compatibility mode
TASK_SIZE
> > > define thus fix some bugs found in X86_64 compatibility mode
program.
> >
> > Fix what bugs exactly? Please a detailed description.
> >
> > -Andi
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [discuss] [Patch] X86_64 TASK_SIZE cleanup - more comments
2005-04-20 17:17 [discuss] [Patch] X86_64 TASK_SIZE cleanup Zou, Nanhai
@ 2005-04-21 11:54 ` Andi Kleen
0 siblings, 0 replies; 4+ messages in thread
From: Andi Kleen @ 2005-04-21 11:54 UTC (permalink / raw)
To: Zou, Nanhai; +Cc: Andi Kleen, discuss, linux-kernel, Siddha, Suresh B
Another comment:
In general I am not too happy about the variable size TASK_SIZE.
There was a patch for this earlier, but it broke 32bit emulation
completely. And I think it needs auditing of all uses of TASK_SIZE,
because I suspect there are more bugs lurking in it.
The way hugetlb etc. mmap were supposed to be handled was to
let the mmap succeed and then check in the mmap wrapper
if any address is > 4GB and free it. Probably that code
has some problems or got broken (I think it worked at least
in 2.4, but there might have been regressions later)
-Andi
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2005-04-27 16:22 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-04-21 16:09 [discuss] [Patch] X86_64 TASK_SIZE cleanup - more comments Zou, Nanhai
2005-04-23 9:10 ` Alexander Nyberg
-- strict thread matches above, loose matches on Subject: below --
2005-04-27 16:20 Zou, Nanhai
2005-04-20 17:17 [discuss] [Patch] X86_64 TASK_SIZE cleanup Zou, Nanhai
2005-04-21 11:54 ` [discuss] [Patch] X86_64 TASK_SIZE cleanup - more comments Andi Kleen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox