* [PATCH] kmemcheck: don't track pages allocated with interrupts disabled
@ 2008-06-07 14:32 Vegard Nossum
2008-06-07 17:12 ` Andi Kleen
0 siblings, 1 reply; 6+ messages in thread
From: Vegard Nossum @ 2008-06-07 14:32 UTC (permalink / raw)
To: Pekka Enberg, Ingo Molnar; +Cc: Andi Kleen, linux-kernel
Hi,
It seems that I was wrong regarding the set_memory_4k() approach. It seems
to only have worked because we were lucky. We may later reinstate the code
to disable PSE, but this time not unconditionally, perhaps with a config
option or boot-time option.
That a new page needs to be allocated with interrupts disabled is probably
rare enough that we can get away with this without losing too many pages
that could otherwise have been tracked.
Again, this is overall nicer than unconditionally disabling PSE, since
that incurs a much larger overhead in the case where kmemcheck is enabled
in the config but disabled by default. (Ingo, you previously indicated that
it would be nice if kernels could be configured in this way by default.)
Vegard
From: Vegard Nossum <vegardno@ben.ifi.uio.no>
Date: Sat, 7 Jun 2008 16:18:29 +0200
Subject: [PATCH] kmemcheck: don't track pages allocated with interrupts disabled
Because large physical pages need to be split to 4k-sized pages after
the slab page has been allocated, and this needs to flush the page
tables, we cannot track any pages that were allocated while interrupts
were disabled.
This fixes a boot crash that seems to depend on timing to appear.
Signed-off-by: Vegard Nossum <vegardno@ben.ifi.uio.no>
---
mm/kmemcheck.c | 8 ++++++++
1 files changed, 8 insertions(+), 0 deletions(-)
diff --git a/mm/kmemcheck.c b/mm/kmemcheck.c
index 4efdf1e..b337374 100644
--- a/mm/kmemcheck.c
+++ b/mm/kmemcheck.c
@@ -10,6 +10,14 @@ void kmemcheck_alloc_shadow(struct kmem_cache *s, gfp_t flags, int node,
int pages;
int i;
+ /*
+ * It's sad, but it's true. If interrupts are enabled while making
+ * this allocation, it means that we can't split the large page
+ * because it would require flushing the page tables.
+ */
+ if (irqs_disabled())
+ return;
+
pages = 1 << order;
/*
--
1.5.3.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] kmemcheck: don't track pages allocated with interrupts disabled
2008-06-07 14:32 [PATCH] kmemcheck: don't track pages allocated with interrupts disabled Vegard Nossum
@ 2008-06-07 17:12 ` Andi Kleen
2008-06-07 18:15 ` Vegard Nossum
2008-06-07 18:18 ` Vegard Nossum
0 siblings, 2 replies; 6+ messages in thread
From: Andi Kleen @ 2008-06-07 17:12 UTC (permalink / raw)
To: Vegard Nossum; +Cc: Pekka Enberg, Ingo Molnar, linux-kernel
Vegard Nossum wrote:
> It seems that I was wrong regarding the set_memory_4k() approach. It seems
> to only have worked because we were lucky.
You just put it in the wrong place I think.
> We may later reinstate the code
> to disable PSE, but this time not unconditionally, perhaps with a config
> option or boot-time option.
It would be kind of guaranteed (not sure 100%) together with
CONFOG_DEBUG_PAGEALLOC and Thomas' page table pool (the pool is ifdefed
on that)
But a far better solution is to just split all the pages in the system
with set_memory_4k() at boot time (and memory hot add time) when you
know you're not in interrupt context.
I believe that would be the right solution for DEBUG_PAGEALLOC too.
-Andi
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] kmemcheck: don't track pages allocated with interrupts disabled
2008-06-07 17:12 ` Andi Kleen
@ 2008-06-07 18:15 ` Vegard Nossum
2008-06-07 19:47 ` Andi Kleen
2008-06-07 18:18 ` Vegard Nossum
1 sibling, 1 reply; 6+ messages in thread
From: Vegard Nossum @ 2008-06-07 18:15 UTC (permalink / raw)
To: Andi Kleen; +Cc: Pekka Enberg, Ingo Molnar, linux-kernel
On Sat, Jun 7, 2008 at 7:12 PM, Andi Kleen <andi@firstfloor.org> wrote:
> Vegard Nossum wrote:
>> It seems that I was wrong regarding the set_memory_4k() approach. It seems
>> to only have worked because we were lucky.
>
> You just put it in the wrong place I think.
>
It is called when the slab allocator has allocated a new slab page.
But slab allocations can be made from irq_disabled() context...
I don't know where we may otherwise place it. It would be really nice
to be able to ask the page allocator for 4k-physical (already split)
pages. This means that we could request (only) those pages for new
slabs, and give out the large pages to others that don't need this.
>> We may later reinstate the code
>> to disable PSE, but this time not unconditionally, perhaps with a config
>> option or boot-time option.
>
> It would be kind of guaranteed (not sure 100%) together with
> CONFOG_DEBUG_PAGEALLOC and Thomas' page table pool (the pool is ifdefed
> on that)
>
> But a far better solution is to just split all the pages in the system
> with set_memory_4k() at boot time (and memory hot add time) when you
> know you're not in interrupt context.
>
> I believe that would be the right solution for DEBUG_PAGEALLOC too.
Yes, but this is kind of the same approach that we had before by
disabling PSE entirely. The drawback here is that a kernel configured
with KMEMCHECK=y and booted without kmemcheck=1 will still call
set_memory_4k(), even though it isn't necessary (kmemcheck may never
be used on the system and the system is capable of using large pages).
Maybe we can do the splitting when kmemcheck is enabled for the first
time, either with the proc handler or at boot if kmemcheck=1 is passed
on the command line. Both of these contexts should/can be
!irqs_disabled(), I think.
I may try to implement this idea. Thanks!
Vegard
--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] kmemcheck: don't track pages allocated with interrupts disabled
2008-06-07 18:15 ` Vegard Nossum
@ 2008-06-07 19:47 ` Andi Kleen
2008-06-07 20:19 ` Vegard Nossum
0 siblings, 1 reply; 6+ messages in thread
From: Andi Kleen @ 2008-06-07 19:47 UTC (permalink / raw)
To: Vegard Nossum; +Cc: Pekka Enberg, Ingo Molnar, linux-kernel
> Maybe we can do the splitting when kmemcheck is enabled for the first
> time, either with the proc handler or at boot if kmemcheck=1 is passed
> on the command line. Both of these contexts should/can be
> !irqs_disabled(), I think.
You could always split at boot when the CONFIG is enabled. I assume the
CONFIG alone has already quite a lot of overhead and it will be only
on in debug kernels and adding some more TLB misses shouldn't be really
a problem. Using 2MB pages in kernel is merely an optimization.
-Andi
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] kmemcheck: don't track pages allocated with interrupts disabled
2008-06-07 19:47 ` Andi Kleen
@ 2008-06-07 20:19 ` Vegard Nossum
0 siblings, 0 replies; 6+ messages in thread
From: Vegard Nossum @ 2008-06-07 20:19 UTC (permalink / raw)
To: Andi Kleen; +Cc: Pekka Enberg, Ingo Molnar, linux-kernel
On Sat, Jun 7, 2008 at 9:47 PM, Andi Kleen <andi@firstfloor.org> wrote:
>
>> Maybe we can do the splitting when kmemcheck is enabled for the first
>> time, either with the proc handler or at boot if kmemcheck=1 is passed
>> on the command line. Both of these contexts should/can be
>> !irqs_disabled(), I think.
>
> You could always split at boot when the CONFIG is enabled. I assume the
> CONFIG alone has already quite a lot of overhead and it will be only
> on in debug kernels and adding some more TLB misses shouldn't be really
> a problem. Using 2MB pages in kernel is merely an optimization.
Actually, no, the CONFIG itself is now very low overhead. The only
thing we do is a very simple check in the slab allocator each time you
allocate or free an object (it's a couple of if()s, I doubt it would
show up in anything but microbenchmarks). These is also maybe two or
three calls for each page fault, but they return very quickly.
I didn't test, but this should be the only penalty of enabling the
CONFIG. Of course, once you pass kmemcheck=1 at boot it will be a
disaster :-)
But you are probably right nevertheless; the overhead of 4K pages
instead 2M is probably negligible in most cases.
Vegard
--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] kmemcheck: don't track pages allocated with interrupts disabled
2008-06-07 17:12 ` Andi Kleen
2008-06-07 18:15 ` Vegard Nossum
@ 2008-06-07 18:18 ` Vegard Nossum
1 sibling, 0 replies; 6+ messages in thread
From: Vegard Nossum @ 2008-06-07 18:18 UTC (permalink / raw)
To: Andi Kleen; +Cc: Pekka Enberg, Ingo Molnar, linux-kernel
On Sat, Jun 7, 2008 at 7:12 PM, Andi Kleen <andi@firstfloor.org> wrote:
> But a far better solution is to just split all the pages in the system
> with set_memory_4k() at boot time (and memory hot add time) when you
> know you're not in interrupt context.
>
> I believe that would be the right solution for DEBUG_PAGEALLOC too.
Ah, this is very true for more than one reason. We only ever touch
kernel pages with kmemcheck. So this means that userspace can still
use large pages and save memory with that. This wouldn't work when we
disable PSE entirely.
Vegard
--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2008-06-07 20:19 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-07 14:32 [PATCH] kmemcheck: don't track pages allocated with interrupts disabled Vegard Nossum
2008-06-07 17:12 ` Andi Kleen
2008-06-07 18:15 ` Vegard Nossum
2008-06-07 19:47 ` Andi Kleen
2008-06-07 20:19 ` Vegard Nossum
2008-06-07 18:18 ` Vegard Nossum
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.