* [PATCH] i386: Add a temporary to make put_user more type safe. @ 2006-01-29 6:26 Eric W. Biederman 2006-01-29 6:39 ` Andrew Morton 0 siblings, 1 reply; 20+ messages in thread From: Eric W. Biederman @ 2006-01-29 6:26 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel In some code I am developing I had occasion to change the type of a variable. This made the value put_user was putting to user space wrong. But the code continued to build cleanly without errors. Introducing a temporary fixes this problem and at least with gcc-3.3.5 does not cause gcc any problems with optimizing out the temporary. gcc-4.x using SSA internally ought to be even better at optimizing out temporaries, so I don't expect a temporary to become a problem. Especially because in all correct cases the types on both sides of the assignment to the temporary are the same. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> --- include/asm-i386/uaccess.h | 11 ++++++----- 1 files changed, 6 insertions(+), 5 deletions(-) dd1215e541bfe2ed94a902f7fb263ca44b7e8afa diff --git a/include/asm-i386/uaccess.h b/include/asm-i386/uaccess.h index 3f1337c..1641613 100644 --- a/include/asm-i386/uaccess.h +++ b/include/asm-i386/uaccess.h @@ -198,12 +198,13 @@ extern void __put_user_8(void); #define put_user(x,ptr) \ ({ int __ret_pu; \ __chk_user_ptr(ptr); \ + __typeof__(*(ptr)) __pu_val = x; \ switch(sizeof(*(ptr))) { \ - case 1: __put_user_1(x, ptr); break; \ - case 2: __put_user_2(x, ptr); break; \ - case 4: __put_user_4(x, ptr); break; \ - case 8: __put_user_8(x, ptr); break; \ - default:__put_user_X(x, ptr); break; \ + case 1: __put_user_1(__pu_val, ptr); break; \ + case 2: __put_user_2(__pu_val, ptr); break; \ + case 4: __put_user_4(__pu_val, ptr); break; \ + case 8: __put_user_8(__pu_val, ptr); break; \ + default:__put_user_X(__pu_val, ptr); break; \ } \ __ret_pu; \ }) -- 1.1.5.g3480 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH] i386: Add a temporary to make put_user more type safe. 2006-01-29 6:26 [PATCH] i386: Add a temporary to make put_user more type safe Eric W. Biederman @ 2006-01-29 6:39 ` Andrew Morton 2006-01-29 6:49 ` Eric W. Biederman 0 siblings, 1 reply; 20+ messages in thread From: Andrew Morton @ 2006-01-29 6:39 UTC (permalink / raw) To: Eric W. Biederman; +Cc: linux-kernel ebiederm@xmission.com (Eric W. Biederman) wrote: > > > In some code I am developing I had occasion to change the type of a > variable. This made the value put_user was putting to user space > wrong. But the code continued to build cleanly without errors. What do you mean by "wrong"? What combinations of types do you think should be disallowed here? put_user(char, int*), for example, or what? We had one instance recently where code was doing put_user() of an eight-byte struct and that sailed through x86 testing but made the sparc64 compiler ICE. Certainly we should stamp out tricks like that at compile time, but how far should we go? There's probably a good case for ensuring that typeof(x)==typeof(*ptr), but I suspect that'd create a lot of noise. > Introducing a temporary fixes this problem and at least with gcc-3.3.5 > does not cause gcc any problems with optimizing out the temporary. > gcc-4.x using SSA internally ought to be even better at optimizing out > temporaries, so I don't expect a temporary to become a problem. > Especially because in all correct cases the types on both sides of the > assignment to the temporary are the same. > Sounds sane. We could make it warn if typeof(x)!=typeof(*ptr) by adding another temporary for the pointer, give it type typeof(x)*, but I haven't tried it. > > > --- > > include/asm-i386/uaccess.h | 11 ++++++----- > 1 files changed, 6 insertions(+), 5 deletions(-) > > dd1215e541bfe2ed94a902f7fb263ca44b7e8afa > diff --git a/include/asm-i386/uaccess.h b/include/asm-i386/uaccess.h > index 3f1337c..1641613 100644 > --- a/include/asm-i386/uaccess.h > +++ b/include/asm-i386/uaccess.h > @@ -198,12 +198,13 @@ extern void __put_user_8(void); > #define put_user(x,ptr) \ > ({ int __ret_pu; \ > __chk_user_ptr(ptr); \ > + __typeof__(*(ptr)) __pu_val = x; \ > switch(sizeof(*(ptr))) { \ > - case 1: __put_user_1(x, ptr); break; \ > - case 2: __put_user_2(x, ptr); break; \ > - case 4: __put_user_4(x, ptr); break; \ > - case 8: __put_user_8(x, ptr); break; \ > - default:__put_user_X(x, ptr); break; \ > + case 1: __put_user_1(__pu_val, ptr); break; \ > + case 2: __put_user_2(__pu_val, ptr); break; \ > + case 4: __put_user_4(__pu_val, ptr); break; \ > + case 8: __put_user_8(__pu_val, ptr); break; \ > + default:__put_user_X(__pu_val, ptr); break; \ > } \ > __ret_pu; \ > }) > -- > 1.1.5.g3480 ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] i386: Add a temporary to make put_user more type safe. 2006-01-29 6:39 ` Andrew Morton @ 2006-01-29 6:49 ` Eric W. Biederman 2006-01-29 7:51 ` Andrew Morton 0 siblings, 1 reply; 20+ messages in thread From: Eric W. Biederman @ 2006-01-29 6:49 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel Andrew Morton <akpm@osdl.org> writes: > ebiederm@xmission.com (Eric W. Biederman) wrote: >> >> >> In some code I am developing I had occasion to change the type of a >> variable. This made the value put_user was putting to user space >> wrong. But the code continued to build cleanly without errors. > > What do you mean by "wrong"? What combinations of types do you think > should be disallowed here? put_user(char, int*), for example, or what? > > We had one instance recently where code was doing put_user() of an > eight-byte struct and that sailed through x86 testing but made the sparc64 > compiler ICE. Certainly we should stamp out tricks like that at compile > time, but how far should we go? > > There's probably a good case for ensuring that typeof(x)==typeof(*ptr), but > I suspect that'd create a lot of noise. All I do is ensure that typeof(x) is assignment compatible with typeof(*ptr). That is what the assignment to the temporary does. We actually do this already on x86 if you compile for an i386 which people very rarely do anymore. I have performed sever kernel compiles with it applied against the stable tree and saw no errors other than when I tried to assign a pointer to an integer using put_user. So your eight-byte struct case would probably be caught but the character case would get type promoted and be fine. >> Introducing a temporary fixes this problem and at least with gcc-3.3.5 >> does not cause gcc any problems with optimizing out the temporary. >> gcc-4.x using SSA internally ought to be even better at optimizing out >> temporaries, so I don't expect a temporary to become a problem. >> Especially because in all correct cases the types on both sides of the >> assignment to the temporary are the same. >> > > Sounds sane. We could make it warn if typeof(x)!=typeof(*ptr) by adding > another temporary for the pointer, give it type typeof(x)*, but I haven't > tried it. I guess we could do that. However if we don't use the value we will probably get an unused variable warning. Mostly I am just after the normal C assignment warnings/errors. Although if we can do better that would be great. Eric ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] i386: Add a temporary to make put_user more type safe. 2006-01-29 6:49 ` Eric W. Biederman @ 2006-01-29 7:51 ` Andrew Morton [not found] ` <200601291620.28291.ioe-lkml@rameria.de> 0 siblings, 1 reply; 20+ messages in thread From: Andrew Morton @ 2006-01-29 7:51 UTC (permalink / raw) To: Eric W. Biederman; +Cc: linux-kernel ebiederm@xmission.com (Eric W. Biederman) wrote: > > > Sounds sane. We could make it warn if typeof(x)!=typeof(*ptr) by adding > > another temporary for the pointer, give it type typeof(x)*, but I haven't > > tried it. > > I guess we could do that. However if we don't use the value we will probably > get an unused variable warning. No, that's OK, we can use the temporary. Something like this: --- devel/include/asm-i386/uaccess.h~x86-tighten-uaccess-type-checking 2006-01-28 23:45:16.000000000 -0800 +++ devel-akpm/include/asm-i386/uaccess.h 2006-01-28 23:48:31.000000000 -0800 @@ -149,14 +149,16 @@ extern void __get_user_4(void); #define get_user(x,ptr) \ ({ int __ret_gu; \ unsigned long __val_gu; \ + __typeof__(x)*_p_; \ + _p_ = ptr; \ __chk_user_ptr(ptr); \ - switch(sizeof (*(ptr))) { \ - case 1: __get_user_x(1,__ret_gu,__val_gu,ptr); break; \ - case 2: __get_user_x(2,__ret_gu,__val_gu,ptr); break; \ - case 4: __get_user_x(4,__ret_gu,__val_gu,ptr); break; \ - default: __get_user_x(X,__ret_gu,__val_gu,ptr); break; \ + switch(sizeof (*(_p_))) { \ + case 1: __get_user_x(1, __ret_gu, __val_gu, _p_); break; \ + case 2: __get_user_x(2, __ret_gu, __val_gu, _p_); break; \ + case 4: __get_user_x(4, __ret_gu, __val_gu, _p_); break; \ + default: __get_user_x(X, __ret_gu, __val_gu, _p_); break; \ } \ - (x) = (__typeof__(*(ptr)))__val_gu; \ + (x) = __val_gu; \ __ret_gu; \ }) @@ -198,13 +200,17 @@ extern void __put_user_8(void); #define put_user(x,ptr) \ ({ int __ret_pu; \ __chk_user_ptr(ptr); \ - __typeof__(*(ptr)) __pu_val = x; \ + __typeof__(x)*_p_; \ + __typeof__(x)__pu_val; \ + \ + _p_ = ptr; \ + __pu_val = x; \ switch(sizeof(*(ptr))) { \ - case 1: __put_user_1(__pu_val, ptr); break; \ - case 2: __put_user_2(__pu_val, ptr); break; \ - case 4: __put_user_4(__pu_val, ptr); break; \ - case 8: __put_user_8(__pu_val, ptr); break; \ - default:__put_user_X(__pu_val, ptr); break; \ + case 1: __put_user_1(__pu_val, _p_); break; \ + case 2: __put_user_2(__pu_val, _p_); break; \ + case 4: __put_user_4(__pu_val, _p_); break; \ + case 8: __put_user_8(__pu_val, _p_); break; \ + default:__put_user_X(__pu_val, _p_); break; \ } \ __ret_pu; \ }) _ It gives ~100 warnings on my usual test config. It's a bit awkward because get_user/put_user/etc aren't allowed to evaluate their args multiple times - code likes to do get_user(v, p++); I guess fixing those 100-odd warnings might find some warts - compiler-caused truncation or sign-extension during uaccess copies is a bit of a worry. But I have enough things to be going on with :( ^ permalink raw reply [flat|nested] 20+ messages in thread
[parent not found: <200601291620.28291.ioe-lkml@rameria.de>]
* Re: [PATCH] i386: Add a temporary to make put_user more type safe. [not found] ` <200601291620.28291.ioe-lkml@rameria.de> @ 2006-01-29 19:33 ` Andrew Morton 2006-01-29 20:04 ` [PATCH] i386: instead of poisoning .init zone, change protection bits to force a fault Eric Dumazet 0 siblings, 1 reply; 20+ messages in thread From: Andrew Morton @ 2006-01-29 19:33 UTC (permalink / raw) To: Ingo Oeser; +Cc: linux-kernel, ebiederm Ingo Oeser <ioe-lkml@rameria.de> wrote: > > > #define put_user(x,ptr) \ > > ({ int __ret_pu; \ > > __chk_user_ptr(ptr); \ > > - __typeof__(*(ptr)) __pu_val = x; \ > > + __typeof__(x)*_p_; \ > > + __typeof__(x)__pu_val; \ > > + \ > > + _p_ = ptr; \ > > + __pu_val = x; \ > > switch(sizeof(*(ptr))) { \ > > - switch(sizeof(*(ptr))) { \ > + switch(sizeof(*(_p_))) { \ > > > - case 1: __put_user_1(__pu_val, ptr); break; \ > > - case 2: __put_user_2(__pu_val, ptr); break; \ > > - case 4: __put_user_4(__pu_val, ptr); break; \ > > - case 8: __put_user_8(__pu_val, ptr); break; \ > > - default:__put_user_X(__pu_val, ptr); break; \ > > + case 1: __put_user_1(__pu_val, _p_); break; \ > > + case 2: __put_user_2(__pu_val, _p_); break; \ > > + case 4: __put_user_4(__pu_val, _p_); break; \ > > + case 8: __put_user_8(__pu_val, _p_); break; \ > > + default:__put_user_X(__pu_val, _p_); break; \ > > } \ > > __ret_pu; \ > > }) > > Does this now give less warnings? No, it won't do. All the warnings were legitimate anyway. ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH] i386: instead of poisoning .init zone, change protection bits to force a fault 2006-01-29 19:33 ` Andrew Morton @ 2006-01-29 20:04 ` Eric Dumazet 2006-01-29 20:05 ` Benjamin LaHaise 0 siblings, 1 reply; 20+ messages in thread From: Eric Dumazet @ 2006-01-29 20:04 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel [-- Attachment #1: Type: text/plain, Size: 466 bytes --] Chasing some invalid accesses to .init zone, I found that free_init_pages() was properly freeing the pages but virtual was still usable. A poisoning (memset(page, 0xcc, PAGE_SIZE)) was done but this is not reliable. Applying this patch at least in mm is a good thing... (After that we could map non possible cpu percpu data to the initial percpudata that is included in .init and discarded in free_initmem()) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> [-- Attachment #2: i386_mm_init.patch --] [-- Type: text/plain, Size: 609 bytes --] --- linux-2.6.16-rc1-mm3/arch/i386/mm/init.c 2006-01-25 10:17:24.000000000 +0100 +++ linux-2.6.16-rc1-mm3-ed/arch/i386/mm/init.c 2006-01-29 21:46:39.000000000 +0100 @@ -750,11 +750,12 @@ for (addr = begin; addr < end; addr += PAGE_SIZE) { ClearPageReserved(virt_to_page(addr)); set_page_count(virt_to_page(addr), 1); - memset((void *)addr, 0xcc, PAGE_SIZE); + change_page_attr(virt_to_page(addr), 1, __pgprot(0)); free_page(addr); totalram_pages++; } printk(KERN_INFO "Freeing %s: %ldk freed\n", what, (end - begin) >> 10); + global_flush_tlb(); } void free_initmem(void) ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] i386: instead of poisoning .init zone, change protection bits to force a fault 2006-01-29 20:04 ` [PATCH] i386: instead of poisoning .init zone, change protection bits to force a fault Eric Dumazet @ 2006-01-29 20:05 ` Benjamin LaHaise 2006-01-29 20:28 ` Eric Dumazet 2006-01-29 20:56 ` [PATCH, V2] " Eric Dumazet 0 siblings, 2 replies; 20+ messages in thread From: Benjamin LaHaise @ 2006-01-29 20:05 UTC (permalink / raw) To: Eric Dumazet; +Cc: Andrew Morton, linux-kernel On Sun, Jan 29, 2006 at 09:04:44PM +0100, Eric Dumazet wrote: > Chasing some invalid accesses to .init zone, I found that free_init_pages() > was properly freeing the pages but virtual was still usable. This change will break the large table entries up, resulting in more TLB pressure and reducing performance, and so should only be activated as a debug option. -ben -- "Ladies and gentlemen, I'm sorry to interrupt, but the police are here and they've asked us to stop the party." Don't Email: <dont@kvack.org>. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] i386: instead of poisoning .init zone, change protection bits to force a fault 2006-01-29 20:05 ` Benjamin LaHaise @ 2006-01-29 20:28 ` Eric Dumazet 2006-01-29 20:56 ` [PATCH, V2] " Eric Dumazet 1 sibling, 0 replies; 20+ messages in thread From: Eric Dumazet @ 2006-01-29 20:28 UTC (permalink / raw) To: Benjamin LaHaise; +Cc: Andrew Morton, linux-kernel Benjamin LaHaise a écrit : > On Sun, Jan 29, 2006 at 09:04:44PM +0100, Eric Dumazet wrote: >> Chasing some invalid accesses to .init zone, I found that free_init_pages() >> was properly freeing the pages but virtual was still usable. > > This change will break the large table entries up, resulting in more TLB > pressure and reducing performance, and so should only be activated as a > debug option. Hum... yet another CONFIG option ? Could we 'just' move rodata (because of CONFIG_DEBUG_RODATA) and .init section in their own 2MB/4MB page, playing with vmlinux.lds.S ? It would be possible to have a full hugepage readonly for rodata, and a full NOPROT for .init ? Eric ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH, V2] i386: instead of poisoning .init zone, change protection bits to force a fault 2006-01-29 20:05 ` Benjamin LaHaise 2006-01-29 20:28 ` Eric Dumazet @ 2006-01-29 20:56 ` Eric Dumazet 2006-01-30 9:03 ` Questions about alloc_large_system_hash() and TLB entries Eric Dumazet 2006-02-04 22:41 ` [PATCH, V2] i386: instead of poisoning .init zone, change protection bits to force a fault Andrew Morton 1 sibling, 2 replies; 20+ messages in thread From: Eric Dumazet @ 2006-01-29 20:56 UTC (permalink / raw) To: Benjamin LaHaise, Andrew Morton; +Cc: linux-kernel [-- Attachment #1: Type: text/plain, Size: 702 bytes --] Chasing some invalid accesses to .init zone, I found that free_init_pages() was properly freeing the pages but virtual was still usable. A poisoning (memset(page, 0xcc, PAGE_SIZE)) was done but this is not reliable. A new config option DEBUG_INITDATA is introduced to mark this initdata as not present at all so that buggy code can trigger a fault. This option is not meant for production machines because it may split one or two huge page (2MB or 4MB) into small pages and thus slow down kernel a bit. (After that we could map non possible cpu percpu data to the initial percpudata that is included in .init and discarded in free_initmem()) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> [-- Attachment #2: i386_mm_init.patch --] [-- Type: text/plain, Size: 1511 bytes --] --- a/arch/i386/Kconfig.debug 2006-01-29 22:30:10.000000000 +0100 +++ b/arch/i386/Kconfig.debug 2006-01-29 22:35:54.000000000 +0100 @@ -61,6 +61,18 @@ portion of the kernel code won't be covered by a 2MB TLB anymore. If in doubt, say "N". +config DEBUG_INITDATA + bool "Read/Write protect kernel init data structures" + depends on DEBUG_KERNEL + help + The init data is normally freed when kernel has booted. + Some code may still try to read or write to data in this area. + If you say Y here, the kernel will mark this zone as not readable + or writeable at all. Buggy code will then fault. + This option may have a slight performance impact because a + portion of the kernel code won't be covered by a 2MB TLB anymore. + If in doubt, say "N". + config 4KSTACKS bool "Use 4Kb + 4Kb for kernel stacks instead of 8Kb" if DEBUG_KERNEL default y --- a/arch/i386/mm/init.c 2006-01-25 10:17:24.000000000 +0100 +++ b/arch/i386/mm/init.c 2006-01-29 22:38:53.000000000 +0100 @@ -750,11 +750,18 @@ for (addr = begin; addr < end; addr += PAGE_SIZE) { ClearPageReserved(virt_to_page(addr)); set_page_count(virt_to_page(addr), 1); +#ifdef CONFIG_DEBUG_INITDATA + change_page_attr(virt_to_page(addr), 1, __pgprot(0)); +#else memset((void *)addr, 0xcc, PAGE_SIZE); +#endif free_page(addr); totalram_pages++; } printk(KERN_INFO "Freeing %s: %ldk freed\n", what, (end - begin) >> 10); +#ifdef CONFIG_DEBUG_INITDATA + global_flush_tlb(); +#endif } void free_initmem(void) ^ permalink raw reply [flat|nested] 20+ messages in thread
* Questions about alloc_large_system_hash() and TLB entries 2006-01-29 20:56 ` [PATCH, V2] " Eric Dumazet @ 2006-01-30 9:03 ` Eric Dumazet 2006-01-30 9:22 ` David S. Miller 2006-02-04 22:41 ` [PATCH, V2] i386: instead of poisoning .init zone, change protection bits to force a fault Andrew Morton 1 sibling, 1 reply; 20+ messages in thread From: Eric Dumazet @ 2006-01-30 9:03 UTC (permalink / raw) To: Benjamin LaHaise, Andrew Morton; +Cc: linux-kernel alloc_large_system_hash() is used to allocate large hash tables at boot time. Example on a 2 nodes NUMA machine : Inode-cache hash table entries: 1048576 (order: 11, 8388608 bytes) IP route cache hash table entries: 2097152 (order: 12, 16777216 bytes) TCP established hash table entries: 2097152 (order: 12, 16777216 bytes) Memory is taken from : bootmem if (flags & HASH_EARLY) __vmalloc() if (hashdist is set) (NUMA knob) __get_free_pages(GFP_ATOMIC, order); What would be the needed changes in the code to get both : - Allocate ram equally from all the nodes of the machine - Use large pages (2MB) to lower TLB stress Thank you Eric ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Questions about alloc_large_system_hash() and TLB entries 2006-01-30 9:03 ` Questions about alloc_large_system_hash() and TLB entries Eric Dumazet @ 2006-01-30 9:22 ` David S. Miller 2006-01-30 10:22 ` Eric Dumazet 0 siblings, 1 reply; 20+ messages in thread From: David S. Miller @ 2006-01-30 9:22 UTC (permalink / raw) To: dada1; +Cc: bcrl, akpm, linux-kernel From: Eric Dumazet <dada1@cosmosbay.com> Date: Mon, 30 Jan 2006 10:03:40 +0100 > What would be the needed changes in the code to get both : > > - Allocate ram equally from all the nodes of the machine > > - Use large pages (2MB) to lower TLB stress These two desires are mutually exclusive, I think. If you want an 8MB hash table, for example, with 2MB mappings you could use memory from a maximum of 4 nodes since the 2MB chunks have to be physically 2MB aligned and 2MB contiguous. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Questions about alloc_large_system_hash() and TLB entries 2006-01-30 9:22 ` David S. Miller @ 2006-01-30 10:22 ` Eric Dumazet 0 siblings, 0 replies; 20+ messages in thread From: Eric Dumazet @ 2006-01-30 10:22 UTC (permalink / raw) To: David S. Miller; +Cc: bcrl, akpm, linux-kernel David S. Miller a écrit : > From: Eric Dumazet <dada1@cosmosbay.com> > Date: Mon, 30 Jan 2006 10:03:40 +0100 > >> What would be the needed changes in the code to get both : >> >> - Allocate ram equally from all the nodes of the machine >> >> - Use large pages (2MB) to lower TLB stress > > These two desires are mutually exclusive, I think. > > If you want an 8MB hash table, for example, with 2MB mappings > you could use memory from a maximum of 4 nodes since the > 2MB chunks have to be physically 2MB aligned and 2MB contiguous. Yes of course, but as those hash tables are very big (their size is bigger than 2MB * number_of_nodes if you have at least 4GB per node), this could be done ? Eric ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH, V2] i386: instead of poisoning .init zone, change protection bits to force a fault 2006-01-29 20:56 ` [PATCH, V2] " Eric Dumazet 2006-01-30 9:03 ` Questions about alloc_large_system_hash() and TLB entries Eric Dumazet @ 2006-02-04 22:41 ` Andrew Morton 2006-02-05 17:03 ` Eric Dumazet 2006-02-06 9:02 ` Eric Dumazet 1 sibling, 2 replies; 20+ messages in thread From: Andrew Morton @ 2006-02-04 22:41 UTC (permalink / raw) To: Eric Dumazet; +Cc: bcrl, linux-kernel Eric Dumazet <dada1@cosmosbay.com> wrote: > > > Chasing some invalid accesses to .init zone, I found that free_init_pages() > was properly freeing the pages but virtual was still usable. > > A poisoning (memset(page, 0xcc, PAGE_SIZE)) was done but this is not reliable. > > A new config option DEBUG_INITDATA is introduced to mark this initdata as not > present at all so that buggy code can trigger a fault. > > This option is not meant for production machines because it may split one or > two huge page (2MB or 4MB) into small pages and thus slow down kernel a bit. > > (After that we could map non possible cpu percpu data to the initial > percpudata that is included in .init and discarded in free_initmem()) > > ... > > --- a/arch/i386/mm/init.c 2006-01-25 10:17:24.000000000 +0100 > +++ b/arch/i386/mm/init.c 2006-01-29 22:38:53.000000000 +0100 > @@ -750,11 +750,18 @@ > for (addr = begin; addr < end; addr += PAGE_SIZE) { > ClearPageReserved(virt_to_page(addr)); > set_page_count(virt_to_page(addr), 1); > +#ifdef CONFIG_DEBUG_INITDATA > + change_page_attr(virt_to_page(addr), 1, __pgprot(0)); > +#else > memset((void *)addr, 0xcc, PAGE_SIZE); > +#endif > free_page(addr); > totalram_pages++; > } > printk(KERN_INFO "Freeing %s: %ldk freed\n", what, (end - begin) >> 10); > +#ifdef CONFIG_DEBUG_INITDATA > + global_flush_tlb(); > +#endif > } > This doesn't seem very pointful. We unmap the page, then return it to the page allocator. Then someone reallocates the page, tries to use it and goes oops. If CONFIG_DEBUG_PAGEALLOC is also set, the kernel will remap the page when it's allocated and everything works OK. So this patch requires CONFIG_DEBUG_PAGEALLOC. But if CONFIG_DEBUG_PAGEALLOC is set, we'll have unmapped that page in free_page() _anyway_, so why bother using this patch? The only enhancement I can think of here is to not free the page, so it's permanently leaked and permanently unmapped. --- devel/arch/i386/mm/init.c~i386-instead-of-poisoning-init-zone-change-protection-fix 2006-02-04 14:33:33.000000000 -0800 +++ devel-akpm/arch/i386/mm/init.c 2006-02-04 14:34:07.000000000 -0800 @@ -751,11 +751,15 @@ void free_init_pages(char *what, unsigne ClearPageReserved(virt_to_page(addr)); set_page_count(virt_to_page(addr), 1); #ifdef CONFIG_DEBUG_INITDATA + /* + * Unmap the page, and leak it. So any further accesses will + * oops. + */ change_page_attr(virt_to_page(addr), 1, __pgprot(0)); #else memset((void *)addr, 0xcc, PAGE_SIZE); -#endif free_page(addr); +#endif totalram_pages++; } printk(KERN_INFO "Freeing %s: %ldk freed\n", what, (end - begin) >> 10); _ But is there much point in doing this? Does it offer much more than CONFIG_DEBUG_PAGEALLOC? ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH, V2] i386: instead of poisoning .init zone, change protection bits to force a fault 2006-02-04 22:41 ` [PATCH, V2] i386: instead of poisoning .init zone, change protection bits to force a fault Andrew Morton @ 2006-02-05 17:03 ` Eric Dumazet 2006-02-05 19:42 ` Andrew Morton 2006-02-06 9:02 ` Eric Dumazet 1 sibling, 1 reply; 20+ messages in thread From: Eric Dumazet @ 2006-02-05 17:03 UTC (permalink / raw) To: Andrew Morton; +Cc: bcrl, linux-kernel Andrew Morton a écrit : > Eric Dumazet <dada1@cosmosbay.com> wrote: >> >> Chasing some invalid accesses to .init zone, I found that free_init_pages() >> was properly freeing the pages but virtual was still usable. >> >> A poisoning (memset(page, 0xcc, PAGE_SIZE)) was done but this is not reliable. >> >> A new config option DEBUG_INITDATA is introduced to mark this initdata as not >> present at all so that buggy code can trigger a fault. >> >> This option is not meant for production machines because it may split one or >> two huge page (2MB or 4MB) into small pages and thus slow down kernel a bit. >> >> (After that we could map non possible cpu percpu data to the initial >> percpudata that is included in .init and discarded in free_initmem()) >> >> ... >> >> --- a/arch/i386/mm/init.c 2006-01-25 10:17:24.000000000 +0100 >> +++ b/arch/i386/mm/init.c 2006-01-29 22:38:53.000000000 +0100 >> @@ -750,11 +750,18 @@ >> for (addr = begin; addr < end; addr += PAGE_SIZE) { >> ClearPageReserved(virt_to_page(addr)); >> set_page_count(virt_to_page(addr), 1); >> +#ifdef CONFIG_DEBUG_INITDATA >> + change_page_attr(virt_to_page(addr), 1, __pgprot(0)); >> +#else >> memset((void *)addr, 0xcc, PAGE_SIZE); >> +#endif >> free_page(addr); >> totalram_pages++; >> } >> printk(KERN_INFO "Freeing %s: %ldk freed\n", what, (end - begin) >> 10); >> +#ifdef CONFIG_DEBUG_INITDATA >> + global_flush_tlb(); >> +#endif >> } >> > > This doesn't seem very pointful. > > We unmap the page, then return it to the page allocator. Then someone > reallocates the page, tries to use it and goes oops. > > If CONFIG_DEBUG_PAGEALLOC is also set, the kernel will remap the page when > it's allocated and everything works OK. So this patch requires > CONFIG_DEBUG_PAGEALLOC. > > But if CONFIG_DEBUG_PAGEALLOC is set, we'll have unmapped that page in > free_page() _anyway_, so why bother using this patch? > > The only enhancement I can think of here is to not free the page, so it's > permanently leaked and permanently unmapped. > > --- devel/arch/i386/mm/init.c~i386-instead-of-poisoning-init-zone-change-protection-fix 2006-02-04 14:33:33.000000000 -0800 > +++ devel-akpm/arch/i386/mm/init.c 2006-02-04 14:34:07.000000000 -0800 > @@ -751,11 +751,15 @@ void free_init_pages(char *what, unsigne > ClearPageReserved(virt_to_page(addr)); > set_page_count(virt_to_page(addr), 1); > #ifdef CONFIG_DEBUG_INITDATA > + /* > + * Unmap the page, and leak it. So any further accesses will > + * oops. > + */ > change_page_attr(virt_to_page(addr), 1, __pgprot(0)); > #else > memset((void *)addr, 0xcc, PAGE_SIZE); > -#endif > free_page(addr); > +#endif > totalram_pages++; > } > printk(KERN_INFO "Freeing %s: %ldk freed\n", what, (end - begin) >> 10); > _ > > But is there much point in doing this? Does it offer much more than > CONFIG_DEBUG_PAGEALLOC? > > 1) CONFIG_DEBUG_PAGEALLOC is very generic (and expensive), while CONFIG_DEBUG_INITDATA only makes sure init data becomes unreadable. 2) If CONFIG_DEBUG_INITDATA is on, the redzone is in action in the virtual memory that hold the initdata, while the physical pages that contained the initdata where freed and might be reused for other needs. I think we have two different things here : Virtual mem redzoning (my patch), and physical ram poisoning (your patch). CONFIG_DEBUG_INITDATA uses only a virtual mem redzoning (no underlying memory cost, apart of the page tables), while your solution doesnt free the pages, and the poisoining wont catch further accesses, just make some results funny or false. The only bad effect of my patch is about the TLB cost, because of the hugepage(s) that should revert to 512 normal 4K pages. Eric ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH, V2] i386: instead of poisoning .init zone, change protection bits to force a fault 2006-02-05 17:03 ` Eric Dumazet @ 2006-02-05 19:42 ` Andrew Morton 2006-02-06 8:53 ` Eric Dumazet 0 siblings, 1 reply; 20+ messages in thread From: Andrew Morton @ 2006-02-05 19:42 UTC (permalink / raw) To: Eric Dumazet; +Cc: bcrl, linux-kernel Eric Dumazet <dada1@cosmosbay.com> wrote: > > > --- devel/arch/i386/mm/init.c~i386-instead-of-poisoning-init-zone-change-protection-fix 2006-02-04 14:33:33.000000000 -0800 > > +++ devel-akpm/arch/i386/mm/init.c 2006-02-04 14:34:07.000000000 -0800 > > @@ -751,11 +751,15 @@ void free_init_pages(char *what, unsigne > > ClearPageReserved(virt_to_page(addr)); > > set_page_count(virt_to_page(addr), 1); > > #ifdef CONFIG_DEBUG_INITDATA > > + /* > > + * Unmap the page, and leak it. So any further accesses will > > + * oops. > > + */ > > change_page_attr(virt_to_page(addr), 1, __pgprot(0)); > > #else > > memset((void *)addr, 0xcc, PAGE_SIZE); > > -#endif > > free_page(addr); > > +#endif > > totalram_pages++; > > } > > printk(KERN_INFO "Freeing %s: %ldk freed\n", what, (end - begin) >> 10); > > _ > > > > But is there much point in doing this? Does it offer much more than > > CONFIG_DEBUG_PAGEALLOC? > > > > > > 1) CONFIG_DEBUG_PAGEALLOC is very generic (and expensive), while > CONFIG_DEBUG_INITDATA only makes sure init data becomes unreadable. > > 2) If CONFIG_DEBUG_INITDATA is on, the redzone is in action in the virtual > memory that hold the initdata, while the physical pages that contained the > initdata where freed and might be reused for other needs. > > I think we have two different things here : Virtual mem redzoning (my patch), > and physical ram poisoning (your patch). > > CONFIG_DEBUG_INITDATA uses only a virtual mem redzoning (no underlying memory > cost, apart of the page tables), while your solution doesnt free the pages, > and the poisoining wont catch further accesses, just make some results funny > or false. > > The only bad effect of my patch is about the TLB cost, because of the > hugepage(s) that should revert to 512 normal 4K pages. Your patch made my kernel oops! The oops was prevented by either enabling CONFIG_DEBUG_PAGEALLOC or by the above patch. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH, V2] i386: instead of poisoning .init zone, change protection bits to force a fault 2006-02-05 19:42 ` Andrew Morton @ 2006-02-06 8:53 ` Eric Dumazet 0 siblings, 0 replies; 20+ messages in thread From: Eric Dumazet @ 2006-02-06 8:53 UTC (permalink / raw) To: Andrew Morton; +Cc: bcrl, linux-kernel Andrew Morton a écrit : > Eric Dumazet <dada1@cosmosbay.com> wrote: >>> --- devel/arch/i386/mm/init.c~i386-instead-of-poisoning-init-zone-change-protection-fix 2006-02-04 14:33:33.000000000 -0800 >> > +++ devel-akpm/arch/i386/mm/init.c 2006-02-04 14:34:07.000000000 -0800 >> > @@ -751,11 +751,15 @@ void free_init_pages(char *what, unsigne >> > ClearPageReserved(virt_to_page(addr)); >> > set_page_count(virt_to_page(addr), 1); >> > #ifdef CONFIG_DEBUG_INITDATA >> > + /* >> > + * Unmap the page, and leak it. So any further accesses will >> > + * oops. >> > + */ >> > change_page_attr(virt_to_page(addr), 1, __pgprot(0)); >> > #else >> > memset((void *)addr, 0xcc, PAGE_SIZE); >> > -#endif >> > free_page(addr); >> > +#endif >> > totalram_pages++; >> > } >> > printk(KERN_INFO "Freeing %s: %ldk freed\n", what, (end - begin) >> 10); >> > _ >> > >> > But is there much point in doing this? Does it offer much more than >> > CONFIG_DEBUG_PAGEALLOC? >> > >> > >> >> 1) CONFIG_DEBUG_PAGEALLOC is very generic (and expensive), while >> CONFIG_DEBUG_INITDATA only makes sure init data becomes unreadable. >> >> 2) If CONFIG_DEBUG_INITDATA is on, the redzone is in action in the virtual >> memory that hold the initdata, while the physical pages that contained the >> initdata where freed and might be reused for other needs. >> >> I think we have two different things here : Virtual mem redzoning (my patch), >> and physical ram poisoning (your patch). >> >> CONFIG_DEBUG_INITDATA uses only a virtual mem redzoning (no underlying memory >> cost, apart of the page tables), while your solution doesnt free the pages, >> and the poisoining wont catch further accesses, just make some results funny >> or false. >> >> The only bad effect of my patch is about the TLB cost, because of the >> hugepage(s) that should revert to 512 normal 4K pages. > > Your patch made my kernel oops! The oops was prevented by either enabling > CONFIG_DEBUG_PAGEALLOC or by the above patch. Oh I got it now ! Sorry for being stupid :( If pages are freed, they indeed can be reused by kmalloc() and we get page faults... I wrongly assumed these pages would be mapped to different virtual addresses. So your patch is necessary, unless we can free the pages and let them be used only for User context for example. Thank you Eric ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH, V2] i386: instead of poisoning .init zone, change protection bits to force a fault 2006-02-04 22:41 ` [PATCH, V2] i386: instead of poisoning .init zone, change protection bits to force a fault Andrew Morton 2006-02-05 17:03 ` Eric Dumazet @ 2006-02-06 9:02 ` Eric Dumazet 2006-02-06 9:28 ` Andrew Morton 1 sibling, 1 reply; 20+ messages in thread From: Eric Dumazet @ 2006-02-06 9:02 UTC (permalink / raw) To: Andrew Morton; +Cc: bcrl, linux-kernel Andrew Morton a écrit : > Eric Dumazet <dada1@cosmosbay.com> wrote: >> >> Chasing some invalid accesses to .init zone, I found that free_init_pages() >> was properly freeing the pages but virtual was still usable. >> >> A poisoning (memset(page, 0xcc, PAGE_SIZE)) was done but this is not reliable. >> >> A new config option DEBUG_INITDATA is introduced to mark this initdata as not >> present at all so that buggy code can trigger a fault. >> >> This option is not meant for production machines because it may split one or >> two huge page (2MB or 4MB) into small pages and thus slow down kernel a bit. >> >> (After that we could map non possible cpu percpu data to the initial >> percpudata that is included in .init and discarded in free_initmem()) >> >> ... >> >> --- a/arch/i386/mm/init.c 2006-01-25 10:17:24.000000000 +0100 >> +++ b/arch/i386/mm/init.c 2006-01-29 22:38:53.000000000 +0100 >> @@ -750,11 +750,18 @@ >> for (addr = begin; addr < end; addr += PAGE_SIZE) { >> ClearPageReserved(virt_to_page(addr)); >> set_page_count(virt_to_page(addr), 1); >> +#ifdef CONFIG_DEBUG_INITDATA >> + change_page_attr(virt_to_page(addr), 1, __pgprot(0)); >> +#else >> memset((void *)addr, 0xcc, PAGE_SIZE); >> +#endif >> free_page(addr); >> totalram_pages++; >> } >> printk(KERN_INFO "Freeing %s: %ldk freed\n", what, (end - begin) >> 10); >> +#ifdef CONFIG_DEBUG_INITDATA >> + global_flush_tlb(); >> +#endif >> } >> > > This doesn't seem very pointful. > > We unmap the page, then return it to the page allocator. Then someone > reallocates the page, tries to use it and goes oops. > > If CONFIG_DEBUG_PAGEALLOC is also set, the kernel will remap the page when > it's allocated and everything works OK. So this patch requires > CONFIG_DEBUG_PAGEALLOC. > > But if CONFIG_DEBUG_PAGEALLOC is set, we'll have unmapped that page in > free_page() _anyway_, so why bother using this patch? > > The only enhancement I can think of here is to not free the page, so it's > permanently leaked and permanently unmapped. > > --- devel/arch/i386/mm/init.c~i386-instead-of-poisoning-init-zone-change-protection-fix 2006-02-04 14:33:33.000000000 -0800 > +++ devel-akpm/arch/i386/mm/init.c 2006-02-04 14:34:07.000000000 -0800 > @@ -751,11 +751,15 @@ void free_init_pages(char *what, unsigne > ClearPageReserved(virt_to_page(addr)); > set_page_count(virt_to_page(addr), 1); > #ifdef CONFIG_DEBUG_INITDATA > + /* > + * Unmap the page, and leak it. So any further accesses will > + * oops. > + */ > change_page_attr(virt_to_page(addr), 1, __pgprot(0)); > #else > memset((void *)addr, 0xcc, PAGE_SIZE); > -#endif > free_page(addr); > +#endif > totalram_pages++; > } > printk(KERN_INFO "Freeing %s: %ldk freed\n", what, (end - begin) >> 10); I wonder if you dont have to move the 'totalram_pages++;' next to the free_page(addr) call (ie inside the #else/#endif block) Eric ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH, V2] i386: instead of poisoning .init zone, change protection bits to force a fault 2006-02-06 9:02 ` Eric Dumazet @ 2006-02-06 9:28 ` Andrew Morton 2006-02-06 10:07 ` Eric Dumazet 0 siblings, 1 reply; 20+ messages in thread From: Andrew Morton @ 2006-02-06 9:28 UTC (permalink / raw) To: Eric Dumazet; +Cc: bcrl, linux-kernel Eric Dumazet <dada1@cosmosbay.com> wrote: > > > #ifdef CONFIG_DEBUG_INITDATA > > + /* > > + * Unmap the page, and leak it. So any further accesses will > > + * oops. > > + */ > > change_page_attr(virt_to_page(addr), 1, __pgprot(0)); > > #else > > memset((void *)addr, 0xcc, PAGE_SIZE); > > -#endif > > free_page(addr); > > +#endif > > totalram_pages++; > > } > > printk(KERN_INFO "Freeing %s: %ldk freed\n", what, (end - begin) >> 10); > > I wonder if you dont have to move the 'totalram_pages++;' next to the > free_page(addr) call (ie inside the #else/#endif block) > yup, thanks. But I'm inclined to drop the whole patch - I don't see how it can detect any bugs which CONFIG_DEBUG_PAGEALLOC won't find. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH, V2] i386: instead of poisoning .init zone, change protection bits to force a fault 2006-02-06 9:28 ` Andrew Morton @ 2006-02-06 10:07 ` Eric Dumazet 2006-02-06 10:16 ` Andrew Morton 0 siblings, 1 reply; 20+ messages in thread From: Eric Dumazet @ 2006-02-06 10:07 UTC (permalink / raw) To: Andrew Morton; +Cc: bcrl, linux-kernel Andrew Morton a écrit : > But I'm inclined to drop the whole patch - I don't see how it can detect > any bugs which CONFIG_DEBUG_PAGEALLOC won't find. > If CONFIG_DEBUG_PAGEALLOC is selected, does a page freed by free_page(addr); guaranted not being reused later ? Anyway, the CONFIG_DEBUG_INITDATA was a temporary patch in order to track the accesses to non possible cpus percpu data. So if you feel all such accesses were cleaned, we can drop the patch... Eric ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH, V2] i386: instead of poisoning .init zone, change protection bits to force a fault 2006-02-06 10:07 ` Eric Dumazet @ 2006-02-06 10:16 ` Andrew Morton 0 siblings, 0 replies; 20+ messages in thread From: Andrew Morton @ 2006-02-06 10:16 UTC (permalink / raw) To: Eric Dumazet; +Cc: bcrl, linux-kernel Eric Dumazet <dada1@cosmosbay.com> wrote: > > Andrew Morton a écrit : > > > But I'm inclined to drop the whole patch - I don't see how it can detect > > any bugs which CONFIG_DEBUG_PAGEALLOC won't find. > > > > If CONFIG_DEBUG_PAGEALLOC is selected, does a page freed by free_page(addr); > guaranted not being reused later ? No. So with your patch, any access to the freed page will oops. With CONFIG_DEBUG_PAGEALLOC it'll only oops if that page is presently free. So if enough people run with CONFIG_DEBUG_PAGEALLOC for enough time, we'll find the same bugs. > Anyway, the CONFIG_DEBUG_INITDATA was a temporary patch in order to track the > accesses to non possible cpus percpu data. So if you feel all such accesses > were cleaned, we can drop the patch... I think so.. ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2006-02-06 10:17 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-01-29 6:26 [PATCH] i386: Add a temporary to make put_user more type safe Eric W. Biederman
2006-01-29 6:39 ` Andrew Morton
2006-01-29 6:49 ` Eric W. Biederman
2006-01-29 7:51 ` Andrew Morton
[not found] ` <200601291620.28291.ioe-lkml@rameria.de>
2006-01-29 19:33 ` Andrew Morton
2006-01-29 20:04 ` [PATCH] i386: instead of poisoning .init zone, change protection bits to force a fault Eric Dumazet
2006-01-29 20:05 ` Benjamin LaHaise
2006-01-29 20:28 ` Eric Dumazet
2006-01-29 20:56 ` [PATCH, V2] " Eric Dumazet
2006-01-30 9:03 ` Questions about alloc_large_system_hash() and TLB entries Eric Dumazet
2006-01-30 9:22 ` David S. Miller
2006-01-30 10:22 ` Eric Dumazet
2006-02-04 22:41 ` [PATCH, V2] i386: instead of poisoning .init zone, change protection bits to force a fault Andrew Morton
2006-02-05 17:03 ` Eric Dumazet
2006-02-05 19:42 ` Andrew Morton
2006-02-06 8:53 ` Eric Dumazet
2006-02-06 9:02 ` Eric Dumazet
2006-02-06 9:28 ` Andrew Morton
2006-02-06 10:07 ` Eric Dumazet
2006-02-06 10:16 ` Andrew Morton
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox