* [PATCH 2/2] slab,slub: ignore __GFP_WAIT if we're booting or suspending @ 2009-06-12 8:13 Pekka J Enberg 2009-06-12 9:03 ` [PATCH v2] " Pekka J Enberg 0 siblings, 1 reply; 44+ messages in thread From: Pekka J Enberg @ 2009-06-12 8:13 UTC (permalink / raw) To: linux-mm, linux-kernel, mingo, npiggin, benh From: Pekka Enberg <penberg@cs.helsinki.fi> As explained by Benjamin Herrenschmidt: Oh and btw, your patch alone doesn't fix powerpc, because it's missing a whole bunch of GFP_KERNEL's in the arch code... You would have to grep the entire kernel for things that check slab_is_available() and even then you'll be missing some. For example, slab_is_available() didn't always exist, and so in the early days on powerpc, we used a mem_init_done global that is set form mem_init() (not perfect but works in practice). And we still have code using that to do the test. Therefore, ignore __GFP_WAIT in the slab allocators if we're booting or suspending. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Nick Piggin <npiggin@suse.de> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> --- mm/slab.c | 7 +++++++ mm/slub.c | 7 +++++++ 2 files changed, 14 insertions(+), 0 deletions(-) diff --git a/mm/slab.c b/mm/slab.c index f46b65d..4b932e0 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -2812,6 +2812,13 @@ static int cache_grow(struct kmem_cache *cachep, offset *= cachep->colour_off; + /* + * Lets not wait if we're booting up or suspending even if the user + * asks for it. + */ + if (system_state != SYSTEM_RUNNING) + local_flags &= ~__GFP_WAIT; + if (local_flags & __GFP_WAIT) local_irq_enable(); diff --git a/mm/slub.c b/mm/slub.c index 3964d3c..053ea3e 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -1548,6 +1548,13 @@ new_slab: goto load_freelist; } + /* + * Lets not wait if we're booting up or suspending even if the user + * asks for it. + */ + if (system_state != SYSTEM_RUNNING) + gfpflags &= ~__GFP_WAIT; + if (gfpflags & __GFP_WAIT) local_irq_enable(); -- 1.6.0.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v2] slab,slub: ignore __GFP_WAIT if we're booting or suspending 2009-06-12 8:13 [PATCH 2/2] slab,slub: ignore __GFP_WAIT if we're booting or suspending Pekka J Enberg @ 2009-06-12 9:03 ` Pekka J Enberg 2009-06-12 9:10 ` Ingo Molnar ` (2 more replies) 0 siblings, 3 replies; 44+ messages in thread From: Pekka J Enberg @ 2009-06-12 9:03 UTC (permalink / raw) To: linux-mm, linux-kernel, mingo, npiggin, benh, akpm, cl, torvalds From: Pekka Enberg <penberg@cs.helsinki.fi> As explained by Benjamin Herrenschmidt: Oh and btw, your patch alone doesn't fix powerpc, because it's missing a whole bunch of GFP_KERNEL's in the arch code... You would have to grep the entire kernel for things that check slab_is_available() and even then you'll be missing some. For example, slab_is_available() didn't always exist, and so in the early days on powerpc, we used a mem_init_done global that is set form mem_init() (not perfect but works in practice). And we still have code using that to do the test. Therefore, ignore __GFP_WAIT in the slab allocators if we're booting or suspending. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Nick Piggin <npiggin@suse.de> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> --- v1 -> v2: fix up some missing cases pointed out by BenH mm/slab.c | 19 ++++++++++++++++++- mm/slub.c | 24 ++++++++++++++++++++++-- 2 files changed, 40 insertions(+), 3 deletions(-) diff --git a/mm/slab.c b/mm/slab.c index f46b65d..5119c22 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -2812,6 +2812,15 @@ static int cache_grow(struct kmem_cache *cachep, offset *= cachep->colour_off; + /* + * Lets not wait if we're booting up or suspending even if the user + * asks for it. + */ + if (system_state != SYSTEM_RUNNING) + local_flags &= ~__GFP_WAIT; + + might_sleep_if(local_flags & __GFP_WAIT); + if (local_flags & __GFP_WAIT) local_irq_enable(); @@ -3073,7 +3082,6 @@ alloc_done: static inline void cache_alloc_debugcheck_before(struct kmem_cache *cachep, gfp_t flags) { - might_sleep_if(flags & __GFP_WAIT); #if DEBUG kmem_flagcheck(cachep, flags); #endif @@ -3238,6 +3246,15 @@ retry: if (!obj) { /* + * Lets not wait if we're booting up or suspending even if the user + * asks for it. + */ + if (system_state != SYSTEM_RUNNING) + local_flags &= ~__GFP_WAIT; + + might_sleep_if(local_flags & __GFP_WAIT); + + /* * This allocation will be performed within the constraints * of the current cpuset / memory policy requirements. * We may trigger various forms of reclaim on the allowed diff --git a/mm/slub.c b/mm/slub.c index 3964d3c..6387c19 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -1548,6 +1548,20 @@ new_slab: goto load_freelist; } + /* + * Lets not wait if we're booting up or suspending even if the user + * asks for it. + */ + if (system_state != SYSTEM_RUNNING) + gfpflags &= ~__GFP_WAIT; + + /* + * Now that we really know whether or not we're going to sleep or not, + * lets do our debugging checks. + */ + lockdep_trace_alloc(gfpflags); + might_sleep_if(gfpflags & __GFP_WAIT); + if (gfpflags & __GFP_WAIT) local_irq_enable(); @@ -1595,8 +1609,7 @@ static __always_inline void *slab_alloc(struct kmem_cache *s, unsigned long flags; unsigned int objsize; - lockdep_trace_alloc(gfpflags); - might_sleep_if(gfpflags & __GFP_WAIT); + lockdep_trace_alloc(gfpflags & ~__GFP_WAIT); if (should_failslab(s->objsize, gfpflags)) return NULL; @@ -2607,6 +2620,13 @@ static noinline struct kmem_cache *dma_kmalloc_cache(int index, gfp_t flags) if (s) return s; + /* + * Lets not wait if we're booting up or suspending even if the user + * asks for it. + */ + if (system_state != SYSTEM_RUNNING) + flags &= ~__GFP_WAIT; + /* Dynamically create dma cache */ if (flags & __GFP_WAIT) down_write(&slub_lock); -- 1.6.0.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH v2] slab,slub: ignore __GFP_WAIT if we're booting or suspending 2009-06-12 9:03 ` [PATCH v2] " Pekka J Enberg @ 2009-06-12 9:10 ` Ingo Molnar 2009-06-12 9:21 ` Benjamin Herrenschmidt 2009-06-12 9:49 ` Pekka Enberg 2009-06-12 15:04 ` Linus Torvalds 2009-06-19 14:59 ` Pavel Machek 2 siblings, 2 replies; 44+ messages in thread From: Ingo Molnar @ 2009-06-12 9:10 UTC (permalink / raw) To: Pekka J Enberg; +Cc: linux-mm, linux-kernel, npiggin, benh, akpm, cl, torvalds * Pekka J Enberg <penberg@cs.helsinki.fi> wrote: > index 3964d3c..6387c19 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -1548,6 +1548,20 @@ new_slab: > goto load_freelist; > } > > + /* > + * Lets not wait if we're booting up or suspending even if the user > + * asks for it. > + */ > + if (system_state != SYSTEM_RUNNING) > + gfpflags &= ~__GFP_WAIT; Hiding that bug like that is not particularly clean IMO. We should not let system_state hacks spread like that. We emit a debug warning but dont crash, so all should be fine and the culprits can then be fixed, right? Ingo -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2] slab,slub: ignore __GFP_WAIT if we're booting or suspending 2009-06-12 9:10 ` Ingo Molnar @ 2009-06-12 9:21 ` Benjamin Herrenschmidt 2009-06-12 9:24 ` Pekka Enberg 2009-06-12 9:49 ` Pekka Enberg 1 sibling, 1 reply; 44+ messages in thread From: Benjamin Herrenschmidt @ 2009-06-12 9:21 UTC (permalink / raw) To: Ingo Molnar Cc: Pekka J Enberg, linux-mm, linux-kernel, npiggin, akpm, cl, torvalds > We emit a debug warning but dont crash, so all should be fine and > the culprits can then be fixed, right? ... rewind ... :-) Ok so, no, the culprit cannot be all fixed in a satifactory way. The main reason is that I believe it's not "right" to have every caller of slab around know whether GFP_KERNEL is good to go or it should get into GFP_NOWAIT. This depends on many factors (among others us moving things around more), and is not actually a good solution for thing that can be called both at boot and later, such as get_vm_area(). I really think we are looking for trouble (and a lot of hidden bugs) by trying to "fix" all callers, in addition to making some code like vmalloc() more failure prone because it's unconditionally changed from GFP_KERNEL to GFP_NOWAIT. It seems a lot more reasonably to me to have sl*b naturally degrade to NOWAIT when it's too early to enable interrupts. In addition, my proposal of having bits to mask off gfp will also be useful in fixing similar issues with suspend/resume vs. GFP_NOIO which should really become implicit when devices start becoming suspended. Cheers, Ben. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2] slab,slub: ignore __GFP_WAIT if we're booting or suspending 2009-06-12 9:21 ` Benjamin Herrenschmidt @ 2009-06-12 9:24 ` Pekka Enberg 2009-06-12 9:36 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 44+ messages in thread From: Pekka Enberg @ 2009-06-12 9:24 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Ingo Molnar, linux-mm, linux-kernel, npiggin, akpm, cl, torvalds Hi Ben, On Fri, Jun 12, 2009 at 12:21 PM, Benjamin Herrenschmidt<benh@kernel.crashing.org> wrote: > I really think we are looking for trouble (and a lot of hidden bugs) by > trying to "fix" all callers, in addition to making some code like > vmalloc() more failure prone because it's unconditionally changed from > GFP_KERNEL to GFP_NOWAIT. It's a new API function vmalloc_node_boot() that uses GFP_NOWAIT so I don't share your concern that it's error prone. Pekka -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2] slab,slub: ignore __GFP_WAIT if we're booting or suspending 2009-06-12 9:24 ` Pekka Enberg @ 2009-06-12 9:36 ` Benjamin Herrenschmidt 2009-06-12 9:45 ` Pekka J Enberg 0 siblings, 1 reply; 44+ messages in thread From: Benjamin Herrenschmidt @ 2009-06-12 9:36 UTC (permalink / raw) To: Pekka Enberg Cc: Ingo Molnar, linux-mm, linux-kernel, npiggin, akpm, cl, torvalds On Fri, 2009-06-12 at 12:24 +0300, Pekka Enberg wrote: > Hi Ben, > > On Fri, Jun 12, 2009 at 12:21 PM, Benjamin > Herrenschmidt<benh@kernel.crashing.org> wrote: > > I really think we are looking for trouble (and a lot of hidden bugs) by > > trying to "fix" all callers, in addition to making some code like > > vmalloc() more failure prone because it's unconditionally changed from > > GFP_KERNEL to GFP_NOWAIT. > > It's a new API function vmalloc_node_boot() that uses GFP_NOWAIT so I > don't share your concern that it's error prone. But you didn't fix __get_vm_area_caller() which means my ioremap is still broken... Take a break, take a step back, and look at the big picture. Do you really want to find all the needles in the haystack or just make sure you wear gloves when handling the hay ? :-) Cheers, Ben. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2] slab,slub: ignore __GFP_WAIT if we're booting or suspending 2009-06-12 9:36 ` Benjamin Herrenschmidt @ 2009-06-12 9:45 ` Pekka J Enberg 2009-06-12 9:58 ` Benjamin Herrenschmidt 2009-06-12 15:22 ` Andrew Morton 0 siblings, 2 replies; 44+ messages in thread From: Pekka J Enberg @ 2009-06-12 9:45 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Ingo Molnar, linux-mm, linux-kernel, npiggin, akpm, cl, torvalds On Fri, 12 Jun 2009, Benjamin Herrenschmidt wrote: > Take a break, take a step back, and look at the big picture. Do you > really want to find all the needles in the haystack or just make sure > you wear gloves when handling the hay ? :-) Well, I would like to find the needles but I think we should do it with gloves on. If everyone is happy with this version of Ben's patch, I'm going to just apply it and push it to Linus. Pekka ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2] slab,slub: ignore __GFP_WAIT if we're booting or suspending 2009-06-12 9:45 ` Pekka J Enberg @ 2009-06-12 9:58 ` Benjamin Herrenschmidt 2009-06-12 10:00 ` Pekka Enberg 2009-06-12 15:22 ` Andrew Morton 1 sibling, 1 reply; 44+ messages in thread From: Benjamin Herrenschmidt @ 2009-06-12 9:58 UTC (permalink / raw) To: Pekka J Enberg Cc: Ingo Molnar, linux-mm, linux-kernel, npiggin, akpm, cl, torvalds On Fri, 2009-06-12 at 12:45 +0300, Pekka J Enberg wrote: > On Fri, 12 Jun 2009, Benjamin Herrenschmidt wrote: > > Take a break, take a step back, and look at the big picture. Do you > > really want to find all the needles in the haystack or just make sure > > you wear gloves when handling the hay ? :-) > > Well, I would like to find the needles but I think we should do it with > gloves on. > > If everyone is happy with this version of Ben's patch, I'm going to just > apply it and push it to Linus. Thanks :-) Looks right at first glance. I'll test tomorrow. Cheers, Ben. > Pekka > > >From 7760451b006b165bce052622af7316b54bd5a122 Mon Sep 17 00:00:00 2001 > From: Benjamin Herrenschmidt <benh@kernel.crashing.org> > Date: Fri, 12 Jun 2009 12:39:58 +0300 > Subject: [PATCH] Sanitize "gfp" flags during boot > > With the recent shuffle of initialization order to move memory related > inits earlier, various subtle breakage was introduced in archs like > powerpc due to code somewhat assuming that GFP_KERNEL can be used as > soon as the allocators are up. This is not true because any __GFP_WAIT > allocation will cause interrupts to be enabled, which can be fatal if > it happens too early. > > This isn't trivial to fix on every call site. For example, powerpc's > ioremap implementation needs to be called early. For that, it uses two > different mechanisms to carve out virtual space. Before memory init, > by moving down VMALLOC_END, and then, by calling get_vm_area(). > Unfortunately, the later does GFK_KERNEL allocations. But we can't do > anything else because once vmalloc's been initialized, we can no longer > safely move VMALLOC_END to carve out space. > > There are other examples, wehere can can be called either very early > or later on when devices are hot-plugged. It would be a major pain for > such code to have to "know" whether it's in a context where it should > use GFP_KERNEL or GFP_NOWAIT. > > Finally, by having the ability to silently removed __GFP_WAIT from > allocations, we pave the way for suspend-to-RAM to use that feature > to also remove __GFP_IO from allocations done after suspending devices > has started. This is important because such allocations may hang if > devices on the swap-out path have been suspended, but not-yet suspended > drivers don't know about it, and may deadlock themselves by being hung > into a kmalloc somewhere while holding a mutex for example. > > Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> > Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> > --- > include/linux/gfp.h | 9 +++++++++ > init/main.c | 1 + > mm/page_alloc.c | 18 ++++++++++++++++++ > mm/slab.c | 9 +++++++++ > mm/slub.c | 3 +++ > 5 files changed, 40 insertions(+), 0 deletions(-) > > diff --git a/include/linux/gfp.h b/include/linux/gfp.h > index 0bbc15f..8d2ea79 100644 > --- a/include/linux/gfp.h > +++ b/include/linux/gfp.h > @@ -99,6 +99,13 @@ struct vm_area_struct; > /* 4GB DMA on some platforms */ > #define GFP_DMA32 __GFP_DMA32 > > +extern gfp_t gfp_allowed_bits; > + > +static inline gfp_t gfp_sanitize(gfp_t gfp_flags) > +{ > + return gfp_flags & gfp_allowed_bits; > +} > + > /* Convert GFP flags to their corresponding migrate type */ > static inline int allocflags_to_migratetype(gfp_t gfp_flags) > { > @@ -245,4 +252,6 @@ void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp); > void drain_all_pages(void); > void drain_local_pages(void *dummy); > > +void mm_late_init(void); > + > #endif /* __LINUX_GFP_H */ > diff --git a/init/main.c b/init/main.c > index b3e8f14..34c6e7e 100644 > --- a/init/main.c > +++ b/init/main.c > @@ -640,6 +640,7 @@ asmlinkage void __init start_kernel(void) > "enabled early\n"); > early_boot_irqs_on(); > local_irq_enable(); > + mm_late_init(); > > /* > * HACK ALERT! This is early. We're enabling the console before > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 7760ef9..a42e4c7 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -77,6 +77,13 @@ int percpu_pagelist_fraction; > int pageblock_order __read_mostly; > #endif > > +/* > + * We set up the page allocator and the slab allocator early on with interrupts > + * disabled. Therefore, make sure that we sanitize GFP flags accordingly before > + * everything is up and running. > + */ > +gfp_t gfp_allowed_bits = ~(__GFP_WAIT|__GFP_FS | __GFP_IO); > + > static void __free_pages_ok(struct page *page, unsigned int order); > > /* > @@ -1473,6 +1480,9 @@ __alloc_pages_internal(gfp_t gfp_mask, unsigned int order, > unsigned long did_some_progress; > unsigned long pages_reclaimed = 0; > > + /* Sanitize flags so we don't enable irqs too early during boot */ > + gfp_mask = gfp_sanitize(gfp_mask); > + > lockdep_trace_alloc(gfp_mask); > > might_sleep_if(wait); > @@ -4728,3 +4738,11 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn) > spin_unlock_irqrestore(&zone->lock, flags); > } > #endif > + > +void mm_late_init(void) > +{ > + /* > + * Interrupts are enabled now so all GFP allocations are safe. > + */ > + gfp_allowed_bits = __GFP_BITS_MASK; > +} > diff --git a/mm/slab.c b/mm/slab.c > index f46b65d..87b166e 100644 > --- a/mm/slab.c > +++ b/mm/slab.c > @@ -2791,6 +2791,9 @@ static int cache_grow(struct kmem_cache *cachep, > gfp_t local_flags; > struct kmem_list3 *l3; > > + /* Sanitize flags so we don't enable irqs too early during boot */ > + gfp_mask = gfp_sanitize(gfp_mask); > + > /* > * Be lazy and only check for valid flags here, keeping it out of the > * critical path in kmem_cache_alloc(). > @@ -3212,6 +3215,9 @@ static void *fallback_alloc(struct kmem_cache *cache, gfp_t flags) > void *obj = NULL; > int nid; > > + /* Sanitize flags so we don't enable irqs too early during boot */ > + gfp_mask = gfp_sanitize(gfp_mask); > + > if (flags & __GFP_THISNODE) > return NULL; > > @@ -3434,6 +3440,9 @@ __cache_alloc(struct kmem_cache *cachep, gfp_t flags, void *caller) > unsigned long save_flags; > void *objp; > > + /* Sanitize flags so we don't enable irqs too early during boot */ > + gfp_mask = gfp_sanitize(gfp_mask); > + > lockdep_trace_alloc(flags); > > if (slab_should_failslab(cachep, flags)) > diff --git a/mm/slub.c b/mm/slub.c > index 3964d3c..5c646f7 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -1512,6 +1512,9 @@ static void *__slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node, > /* We handle __GFP_ZERO in the caller */ > gfpflags &= ~__GFP_ZERO; > > + /* Sanitize flags so we don't enable irqs too early during boot */ > + gfpflags = gfp_sanitize(gfpflags); > + > if (!c->page) > goto new_slab; > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2] slab,slub: ignore __GFP_WAIT if we're booting or suspending 2009-06-12 9:58 ` Benjamin Herrenschmidt @ 2009-06-12 10:00 ` Pekka Enberg 0 siblings, 0 replies; 44+ messages in thread From: Pekka Enberg @ 2009-06-12 10:00 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Ingo Molnar, linux-mm, linux-kernel, npiggin, akpm, cl, torvalds On Fri, 12 Jun 2009, Benjamin Herrenschmidt wrote: > > > Take a break, take a step back, and look at the big picture. Do you > > > really want to find all the needles in the haystack or just make sure > > > you wear gloves when handling the hay ? :-) On Fri, 2009-06-12 at 12:45 +0300, Pekka J Enberg wrote: > > Well, I would like to find the needles but I think we should do it with > > gloves on. > > > > If everyone is happy with this version of Ben's patch, I'm going to just > > apply it and push it to Linus. On Fri, 2009-06-12 at 19:58 +1000, Benjamin Herrenschmidt wrote: > Thanks :-) Looks right at first glance. I'll test tomorrow. Nick? I do think this is the best short-term solution. We can get rid of it later on if we decide to fix up the callers instead. Pekka -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2] slab,slub: ignore __GFP_WAIT if we're booting or suspending 2009-06-12 9:45 ` Pekka J Enberg 2009-06-12 9:58 ` Benjamin Herrenschmidt @ 2009-06-12 15:22 ` Andrew Morton 1 sibling, 0 replies; 44+ messages in thread From: Andrew Morton @ 2009-06-12 15:22 UTC (permalink / raw) To: Pekka J Enberg Cc: Benjamin Herrenschmidt, Ingo Molnar, linux-mm, linux-kernel, npiggin, cl, torvalds On Fri, 12 Jun 2009 12:45:21 +0300 (EEST) Pekka J Enberg <penberg@cs.helsinki.fi> wrote: > From: Benjamin Herrenschmidt <benh@kernel.crashing.org> > Date: Fri, 12 Jun 2009 12:39:58 +0300 > Subject: [PATCH] Sanitize "gfp" flags during boot > > With the recent shuffle of initialization order to move memory related > inits earlier, various subtle breakage was introduced in archs like > powerpc due to code somewhat assuming that GFP_KERNEL can be used as > soon as the allocators are up. This is not true because any __GFP_WAIT > allocation will cause interrupts to be enabled, which can be fatal if > it happens too early. > > This isn't trivial to fix on every call site. For example, powerpc's > ioremap implementation needs to be called early. For that, it uses two > different mechanisms to carve out virtual space. Before memory init, > by moving down VMALLOC_END, and then, by calling get_vm_area(). > Unfortunately, the later does GFK_KERNEL allocations. But we can't do > anything else because once vmalloc's been initialized, we can no longer > safely move VMALLOC_END to carve out space. > > There are other examples, wehere can can be called either very early > or later on when devices are hot-plugged. It would be a major pain for > such code to have to "know" whether it's in a context where it should > use GFP_KERNEL or GFP_NOWAIT. > > Finally, by having the ability to silently removed __GFP_WAIT from > allocations, we pave the way for suspend-to-RAM to use that feature > to also remove __GFP_IO from allocations done after suspending devices > has started. This is important because such allocations may hang if > devices on the swap-out path have been suspended, but not-yet suspended > drivers don't know about it, and may deadlock themselves by being hung > into a kmalloc somewhere while holding a mutex for example. > > ... > > +/* > + * We set up the page allocator and the slab allocator early on with interrupts > + * disabled. Therefore, make sure that we sanitize GFP flags accordingly before > + * everything is up and running. > + */ > +gfp_t gfp_allowed_bits = ~(__GFP_WAIT|__GFP_FS | __GFP_IO); __read_mostly > +void mm_late_init(void) > +{ > + /* > + * Interrupts are enabled now so all GFP allocations are safe. > + */ > + gfp_allowed_bits = __GFP_BITS_MASK; > +} Using plain old -1 here would be a more obviously-correct change. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2] slab,slub: ignore __GFP_WAIT if we're booting or suspending 2009-06-12 9:10 ` Ingo Molnar 2009-06-12 9:21 ` Benjamin Herrenschmidt @ 2009-06-12 9:49 ` Pekka Enberg 2009-06-12 9:52 ` Nick Piggin 2009-06-12 10:07 ` Ingo Molnar 1 sibling, 2 replies; 44+ messages in thread From: Pekka Enberg @ 2009-06-12 9:49 UTC (permalink / raw) To: Ingo Molnar; +Cc: linux-mm, linux-kernel, npiggin, benh, akpm, cl, torvalds On Fri, Jun 12, 2009 at 12:10 PM, Ingo Molnar<mingo@elte.hu> wrote: >> @@ -1548,6 +1548,20 @@ new_slab: >> goto load_freelist; >> } >> >> + /* >> + * Lets not wait if we're booting up or suspending even if the user >> + * asks for it. >> + */ >> + if (system_state != SYSTEM_RUNNING) >> + gfpflags &= ~__GFP_WAIT; > > Hiding that bug like that is not particularly clean IMO. We should > not let system_state hacks spread like that. > > We emit a debug warning but dont crash, so all should be fine and > the culprits can then be fixed, right? OK, lets not use system_state then and go with Ben's approach then. Again, neither of the patches are about "hiding buggy callers" but changing allocation policy wrt. gfp flags during boot (and later on during suspend). Pekka -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2] slab,slub: ignore __GFP_WAIT if we're booting or suspending 2009-06-12 9:49 ` Pekka Enberg @ 2009-06-12 9:52 ` Nick Piggin 2009-06-12 9:54 ` Pekka Enberg 2009-06-12 9:59 ` Benjamin Herrenschmidt 2009-06-12 10:07 ` Ingo Molnar 1 sibling, 2 replies; 44+ messages in thread From: Nick Piggin @ 2009-06-12 9:52 UTC (permalink / raw) To: Pekka Enberg Cc: Ingo Molnar, linux-mm, linux-kernel, benh, akpm, cl, torvalds On Fri, Jun 12, 2009 at 12:49:17PM +0300, Pekka Enberg wrote: > On Fri, Jun 12, 2009 at 12:10 PM, Ingo Molnar<mingo@elte.hu> wrote: > >> @@ -1548,6 +1548,20 @@ new_slab: > >> goto load_freelist; > >> } > >> > >> + /* > >> + * Lets not wait if we're booting up or suspending even if the user > >> + * asks for it. > >> + */ > >> + if (system_state != SYSTEM_RUNNING) > >> + gfpflags &= ~__GFP_WAIT; > > > > Hiding that bug like that is not particularly clean IMO. We should > > not let system_state hacks spread like that. > > > > We emit a debug warning but dont crash, so all should be fine and > > the culprits can then be fixed, right? > > OK, lets not use system_state then and go with Ben's approach then. > Again, neither of the patches are about "hiding buggy callers" but > changing allocation policy wrt. gfp flags during boot (and later on > during suspend). Maybe if we just not make it a general "tweak gfpflag" bit (at least not until a bit more discussion), but a specific workaround for the local_irq_enable in early boot problem. Seems like it would not be hard to track things down if we add a warning if we have GFP_WAIT and interrupts are not enabled... -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2] slab,slub: ignore __GFP_WAIT if we're booting or suspending 2009-06-12 9:52 ` Nick Piggin @ 2009-06-12 9:54 ` Pekka Enberg 2009-06-12 9:59 ` Benjamin Herrenschmidt 1 sibling, 0 replies; 44+ messages in thread From: Pekka Enberg @ 2009-06-12 9:54 UTC (permalink / raw) To: Nick Piggin; +Cc: Ingo Molnar, linux-mm, linux-kernel, benh, akpm, cl, torvalds On Fri, 2009-06-12 at 11:52 +0200, Nick Piggin wrote: > Maybe if we just not make it a general "tweak gfpflag" bit (at > least not until a bit more discussion), but a specific workaround > for the local_irq_enable in early boot problem. > > Seems like it would not be hard to track things down if we add > a warning if we have GFP_WAIT and interrupts are not enabled... AFAICT, the point is that Ben thinks that we shouldn't go and try to fix up all the callers. But yes, we could certainly do that too. Pekka -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2] slab,slub: ignore __GFP_WAIT if we're booting or suspending 2009-06-12 9:52 ` Nick Piggin 2009-06-12 9:54 ` Pekka Enberg @ 2009-06-12 9:59 ` Benjamin Herrenschmidt 2009-06-25 4:38 ` Nick Piggin 1 sibling, 1 reply; 44+ messages in thread From: Benjamin Herrenschmidt @ 2009-06-12 9:59 UTC (permalink / raw) To: Nick Piggin Cc: Pekka Enberg, Ingo Molnar, linux-mm, linux-kernel, akpm, cl, torvalds > Maybe if we just not make it a general "tweak gfpflag" bit (at > least not until a bit more discussion), but a specific workaround > for the local_irq_enable in early boot problem. > > Seems like it would not be hard to track things down if we add > a warning if we have GFP_WAIT and interrupts are not enabled... But tweaking local_irq_enable() will have a lot more performance & bloat impact overall on the normal case. Cheers, Ben. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2] slab,slub: ignore __GFP_WAIT if we're booting or suspending 2009-06-12 9:59 ` Benjamin Herrenschmidt @ 2009-06-25 4:38 ` Nick Piggin 0 siblings, 0 replies; 44+ messages in thread From: Nick Piggin @ 2009-06-25 4:38 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Pekka Enberg, Ingo Molnar, linux-mm, linux-kernel, akpm, cl, torvalds On Fri, Jun 12, 2009 at 07:59:34PM +1000, Benjamin Herrenschmidt wrote: > > > Maybe if we just not make it a general "tweak gfpflag" bit (at > > least not until a bit more discussion), but a specific workaround > > for the local_irq_enable in early boot problem. > > > > Seems like it would not be hard to track things down if we add > > a warning if we have GFP_WAIT and interrupts are not enabled... > > But tweaking local_irq_enable() will have a lot more performance & bloat > impact overall on the normal case. (sorry for the late replies. I've been sick and missed a few things over the past week or two... not that this is a really urgent issue ;)) I was not proposing to put a branch in local_irq_enable ;) but to use local_irq_save/restore in the slab allocators rather than unconditional. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2] slab,slub: ignore __GFP_WAIT if we're booting or suspending 2009-06-12 9:49 ` Pekka Enberg 2009-06-12 9:52 ` Nick Piggin @ 2009-06-12 10:07 ` Ingo Molnar 2009-06-12 10:11 ` Pekka Enberg 2009-06-12 11:09 ` Benjamin Herrenschmidt 1 sibling, 2 replies; 44+ messages in thread From: Ingo Molnar @ 2009-06-12 10:07 UTC (permalink / raw) To: Pekka Enberg; +Cc: linux-mm, linux-kernel, npiggin, benh, akpm, cl, torvalds * Pekka Enberg <penberg@cs.helsinki.fi> wrote: > On Fri, Jun 12, 2009 at 12:10 PM, Ingo Molnar<mingo@elte.hu> wrote: > >> @@ -1548,6 +1548,20 @@ new_slab: > >> goto load_freelist; > >> } > >> > >> + /* > >> + * Lets not wait if we're booting up or suspending even if the user > >> + * asks for it. > >> + */ > >> + if (system_state != SYSTEM_RUNNING) > >> + gfpflags &= ~__GFP_WAIT; > > > > Hiding that bug like that is not particularly clean IMO. We should > > not let system_state hacks spread like that. > > > > We emit a debug warning but dont crash, so all should be fine and > > the culprits can then be fixed, right? > > OK, lets not use system_state then and go with Ben's approach > then. Again, neither of the patches are about "hiding buggy > callers" but changing allocation policy wrt. gfp flags during boot > (and later on during suspend). IMHO such invisible side-channels modifying the semantics of GFP flags is a bit dubious. We could do GFP_INIT or GFP_BOOT. These can imply other useful modifiers as well: panic-on-failure for example. (this would clean up a fair amount of init code that currently checks for an panics on allocation failure.) Ingo -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2] slab,slub: ignore __GFP_WAIT if we're booting or suspending 2009-06-12 10:07 ` Ingo Molnar @ 2009-06-12 10:11 ` Pekka Enberg 2009-06-12 10:15 ` Nick Piggin 2009-06-12 11:11 ` Benjamin Herrenschmidt 2009-06-12 11:09 ` Benjamin Herrenschmidt 1 sibling, 2 replies; 44+ messages in thread From: Pekka Enberg @ 2009-06-12 10:11 UTC (permalink / raw) To: Ingo Molnar; +Cc: linux-mm, linux-kernel, npiggin, benh, akpm, cl, torvalds Hi Ingo, On Fri, Jun 12, 2009 at 1:07 PM, Ingo Molnar<mingo@elte.hu> wrote: > IMHO such invisible side-channels modifying the semantics of GFP > flags is a bit dubious. > > We could do GFP_INIT or GFP_BOOT. These can imply other useful > modifiers as well: panic-on-failure for example. (this would clean > up a fair amount of init code that currently checks for an panics on > allocation failure.) OK, but that means we need to fix up every single caller. I'm fine with that but Ben is not. As I am unable to test powerpc here, I am inclined to just merge Ben's patch as "obviously correct". That does not mean we can't introduce GFP_BOOT later on if we want to. Hmm? Pekka -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2] slab,slub: ignore __GFP_WAIT if we're booting or suspending 2009-06-12 10:11 ` Pekka Enberg @ 2009-06-12 10:15 ` Nick Piggin 2009-06-12 10:30 ` Pekka J Enberg 2009-06-12 11:13 ` Benjamin Herrenschmidt 2009-06-12 11:11 ` Benjamin Herrenschmidt 1 sibling, 2 replies; 44+ messages in thread From: Nick Piggin @ 2009-06-12 10:15 UTC (permalink / raw) To: Pekka Enberg Cc: Ingo Molnar, linux-mm, linux-kernel, benh, akpm, cl, torvalds On Fri, Jun 12, 2009 at 01:11:52PM +0300, Pekka Enberg wrote: > Hi Ingo, > > On Fri, Jun 12, 2009 at 1:07 PM, Ingo Molnar<mingo@elte.hu> wrote: > > IMHO such invisible side-channels modifying the semantics of GFP > > flags is a bit dubious. > > > > We could do GFP_INIT or GFP_BOOT. These can imply other useful > > modifiers as well: panic-on-failure for example. (this would clean > > up a fair amount of init code that currently checks for an panics on > > allocation failure.) > > OK, but that means we need to fix up every single caller. I'm fine > with that but Ben is not. As I am unable to test powerpc here, I am > inclined to just merge Ben's patch as "obviously correct". I agree with Ingo though that exposing it as a gfp modifier is not so good. I just like the implementation to mask off GFP_WAIT better, and also prefer not to test system state, but have someone just call into slab to tell it not to unconditionally enable interrupts. > That does not mean we can't introduce GFP_BOOT later on if we want to. Hmm? Yes, with sufficient warnings in place, I don't think it should be too error prone to clean up remaining code over the course of a few releases. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2] slab,slub: ignore __GFP_WAIT if we're booting or suspending 2009-06-12 10:15 ` Nick Piggin @ 2009-06-12 10:30 ` Pekka J Enberg 2009-06-12 10:32 ` Pekka Enberg 2009-06-12 15:16 ` Linus Torvalds 2009-06-12 11:13 ` Benjamin Herrenschmidt 1 sibling, 2 replies; 44+ messages in thread From: Pekka J Enberg @ 2009-06-12 10:30 UTC (permalink / raw) To: Nick Piggin; +Cc: Ingo Molnar, linux-mm, linux-kernel, benh, akpm, cl, torvalds On Fri, 12 Jun 2009, Nick Piggin wrote: > > Hi Ingo, > > > > On Fri, Jun 12, 2009 at 1:07 PM, Ingo Molnar<mingo@elte.hu> wrote: > > > IMHO such invisible side-channels modifying the semantics of GFP > > > flags is a bit dubious. > > > > > > We could do GFP_INIT or GFP_BOOT. These can imply other useful > > > modifiers as well: panic-on-failure for example. (this would clean > > > up a fair amount of init code that currently checks for an panics on > > > allocation failure.) > > > > OK, but that means we need to fix up every single caller. I'm fine > > with that but Ben is not. As I am unable to test powerpc here, I am > > inclined to just merge Ben's patch as "obviously correct". > > I agree with Ingo though that exposing it as a gfp modifier is > not so good. I just like the implementation to mask off GFP_WAIT > better, and also prefer not to test system state, but have someone > just call into slab to tell it not to unconditionally enable > interrupts. > > > That does not mean we can't introduce GFP_BOOT later on if we want to. Hmm? > > Yes, with sufficient warnings in place, I don't think it should be > too error prone to clean up remaining code over the course of > a few releases. Hmm. This is turning into one epic patch discussion for sure! But here's a patch to do what you suggested. With the amount of patches I am generating, I'm bound to hit the right one sooner or later, no?-) Pekka diff --git a/include/linux/slab.h b/include/linux/slab.h index 4880306..219b8fb 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -319,4 +319,6 @@ static inline void *kzalloc_node(size_t size, gfp_t flags, int node) return kmalloc_node(size, flags | __GFP_ZERO, node); } +void __init kmem_cache_init_late(void); + #endif /* _LINUX_SLAB_H */ diff --git a/include/linux/slob_def.h b/include/linux/slob_def.h index 0ec00b3..bb5368d 100644 --- a/include/linux/slob_def.h +++ b/include/linux/slob_def.h @@ -34,4 +34,9 @@ static __always_inline void *__kmalloc(size_t size, gfp_t flags) return kmalloc(size, flags); } +static inline void kmem_cache_init_late(void) +{ + /* Nothing to do */ +} + #endif /* __LINUX_SLOB_DEF_H */ diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h index be5d40c..4dcbc2c 100644 --- a/include/linux/slub_def.h +++ b/include/linux/slub_def.h @@ -302,4 +302,6 @@ static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node) } #endif +void __init kmem_cache_init_late(void); + #endif /* _LINUX_SLUB_DEF_H */ diff --git a/init/main.c b/init/main.c index b3e8f14..f6204f7 100644 --- a/init/main.c +++ b/init/main.c @@ -640,6 +640,7 @@ asmlinkage void __init start_kernel(void) "enabled early\n"); early_boot_irqs_on(); local_irq_enable(); + kmem_cache_init_late(); /* * HACK ALERT! This is early. We're enabling the console before diff --git a/mm/slab.c b/mm/slab.c index f46b65d..1fac378 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -304,6 +304,12 @@ struct kmem_list3 { }; /* + * The slab allocator is initialized with interrupts disabled. Therefore, make + * sure early boot allocations don't accidentally enable interrupts. + */ +static gfp_t slab_gfp_mask __read_mostly = __GFP_BITS_MASK & ~__GFP_WAIT; + +/* * Need this for bootstrapping a per node allocator. */ #define NUM_INIT_LISTS (3 * MAX_NUMNODES) @@ -1654,6 +1660,14 @@ void __init kmem_cache_init(void) */ } +void __init kmem_cache_init_late(void) +{ + /* + * Interrupts are enabled now so all GFP allocations are safe. + */ + slab_gfp_mask = __GFP_BITS_MASK; +} + static int __init cpucache_init(void) { int cpu; @@ -3237,6 +3251,8 @@ retry: } if (!obj) { + local_flags &= slab_gfp_mask; + /* * This allocation will be performed within the constraints * of the current cpuset / memory policy requirements. @@ -3354,12 +3370,14 @@ __cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid, unsigned long save_flags; void *ptr; + flags &= slab_gfp_mask; + lockdep_trace_alloc(flags); if (slab_should_failslab(cachep, flags)) return NULL; - cache_alloc_debugcheck_before(cachep, flags); + cache_alloc_debugcheck_before(cachep, flags & slab_gfp_flags); local_irq_save(save_flags); if (unlikely(nodeid == -1)) @@ -3434,6 +3452,8 @@ __cache_alloc(struct kmem_cache *cachep, gfp_t flags, void *caller) unsigned long save_flags; void *objp; + flags &= slab_gfp_flags; + lockdep_trace_alloc(flags); if (slab_should_failslab(cachep, flags)) diff --git a/mm/slub.c b/mm/slub.c index 3964d3c..c09cb98 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -178,6 +178,12 @@ static enum { SYSFS /* Sysfs up */ } slab_state = DOWN; +/* + * The slab allocator is initialized with interrupts disabled. Therefore, make + * sure early boot allocations don't accidentally enable interrupts. + */ +static gfp_t slab_gfp_mask __read_mostly = __GFP_BITS_MASK & ~__GFP_WAIT; + /* A list of all slab caches on the system */ static DECLARE_RWSEM(slub_lock); static LIST_HEAD(slab_caches); @@ -1595,6 +1601,8 @@ static __always_inline void *slab_alloc(struct kmem_cache *s, unsigned long flags; unsigned int objsize; + gfpflags &= slab_gfp_mask; + lockdep_trace_alloc(gfpflags); might_sleep_if(gfpflags & __GFP_WAIT); @@ -3104,6 +3112,14 @@ void __init kmem_cache_init(void) nr_cpu_ids, nr_node_ids); } +void __init kmem_cache_init_late(void) +{ + /* + * Interrupts are enabled now so all GFP allocations are safe. + */ + slab_gfp_mask = __GFP_BITS_MASK; +} + /* * Find a mergeable slab cache */ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH v2] slab,slub: ignore __GFP_WAIT if we're booting or suspending 2009-06-12 10:30 ` Pekka J Enberg @ 2009-06-12 10:32 ` Pekka Enberg 2009-06-12 15:16 ` Linus Torvalds 1 sibling, 0 replies; 44+ messages in thread From: Pekka Enberg @ 2009-06-12 10:32 UTC (permalink / raw) To: Nick Piggin; +Cc: Ingo Molnar, linux-mm, linux-kernel, benh, akpm, cl, torvalds On Fri, Jun 12, 2009 at 1:30 PM, Pekka J Enberg<penberg@cs.helsinki.fi> wrote: > Hmm. This is turning into one epic patch discussion for sure! But here's a > patch to do what you suggested. With the amount of patches I am > generating, I'm bound to hit the right one sooner or later, no?-) [ And yes, I do see SLAB parts are not even compiling. But you get the idea. ] -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2] slab,slub: ignore __GFP_WAIT if we're booting or suspending 2009-06-12 10:30 ` Pekka J Enberg 2009-06-12 10:32 ` Pekka Enberg @ 2009-06-12 15:16 ` Linus Torvalds 2009-06-12 15:16 ` Pekka Enberg 1 sibling, 1 reply; 44+ messages in thread From: Linus Torvalds @ 2009-06-12 15:16 UTC (permalink / raw) To: Pekka J Enberg Cc: Nick Piggin, Ingo Molnar, linux-mm, linux-kernel, benh, akpm, cl On Fri, 12 Jun 2009, Pekka J Enberg wrote: > > Hmm. This is turning into one epic patch discussion for sure! But here's a > patch to do what you suggested. With the amount of patches I am > generating, I'm bound to hit the right one sooner or later, no?-) Ok, this one looks pretty good. I like the statics, and I like how it lets each allocator decide what to do. Small nit: your mm/slab.c patch does an obviously unnecessary mask in: cache_alloc_debugcheck_before(cachep, flags & slab_gfp_flags); but that's stupid, because the bits were already masked earlier. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2] slab,slub: ignore __GFP_WAIT if we're booting or suspending 2009-06-12 15:16 ` Linus Torvalds @ 2009-06-12 15:16 ` Pekka Enberg 0 siblings, 0 replies; 44+ messages in thread From: Pekka Enberg @ 2009-06-12 15:16 UTC (permalink / raw) To: Linus Torvalds Cc: Nick Piggin, Ingo Molnar, linux-mm, linux-kernel, benh, akpm, cl Linus Torvalds wrote: > > On Fri, 12 Jun 2009, Pekka J Enberg wrote: >> Hmm. This is turning into one epic patch discussion for sure! But here's a >> patch to do what you suggested. With the amount of patches I am >> generating, I'm bound to hit the right one sooner or later, no?-) > > Ok, this one looks pretty good. I like the statics, and I like how it lets > each allocator decide what to do. > > Small nit: your mm/slab.c patch does an obviously unnecessary mask in: > > cache_alloc_debugcheck_before(cachep, flags & slab_gfp_flags); > > but that's stupid, because the bits were already masked earlier. Yeah, the SLAB parts were completely untested. I have this in my tree now (that I sent a pull request for): http://git.kernel.org/?p=linux/kernel/git/penberg/slab-2.6.git;a=commitdiff;h=f6b726dae91cc74fb3a00f192932ec4fe0949875 Do you want me to drop it? I can also do an incremental patch to do the unmasking as in this patch. Pekka -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2] slab,slub: ignore __GFP_WAIT if we're booting or suspending 2009-06-12 10:15 ` Nick Piggin 2009-06-12 10:30 ` Pekka J Enberg @ 2009-06-12 11:13 ` Benjamin Herrenschmidt 2009-06-12 11:24 ` Benjamin Herrenschmidt 1 sibling, 1 reply; 44+ messages in thread From: Benjamin Herrenschmidt @ 2009-06-12 11:13 UTC (permalink / raw) To: Nick Piggin Cc: Pekka Enberg, Ingo Molnar, linux-mm, linux-kernel, akpm, cl, torvalds > I agree with Ingo though that exposing it as a gfp modifier is > not so good. I just like the implementation to mask off GFP_WAIT > better, and also prefer not to test system state, but have someone > just call into slab to tell it not to unconditionally enable > interrupts. But interrupts is just one example. GFP_NOIO is another one vs. suspend and resume. What we have here is the allocator needs to be clamped down based on the system state. I think it will not work to try to butcher every caller, especially since they don't always know themselves in what state they are called. Moving the "fix" into the couple of nexuses where all the code path go through really seem like a better, simpler, more maintainable and more fool proof solution to me. > Yes, with sufficient warnings in place, I don't think it should be > too error prone to clean up remaining code over the course of > a few releases. But that will no fix all the cases. That will not fix __get_vm_area() being called from both boot and non-boot (and ioremap, etc..) and every similar thing we can have all over the place (I have some in the interrupt handling on powerpc, I'm sure we can find much more). I don't see what the problem is in providing simple allocator semantics and have the allocator itself adapt to the system state, especially when the code is as small as having a bit mask applied in 2 or 3 places. Cheers, Ben. > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2] slab,slub: ignore __GFP_WAIT if we're booting or suspending 2009-06-12 11:13 ` Benjamin Herrenschmidt @ 2009-06-12 11:24 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 44+ messages in thread From: Benjamin Herrenschmidt @ 2009-06-12 11:24 UTC (permalink / raw) To: Nick Piggin Cc: Pekka Enberg, Ingo Molnar, linux-mm, linux-kernel, akpm, cl, torvalds On Fri, 2009-06-12 at 21:13 +1000, Benjamin Herrenschmidt wrote: > > I agree with Ingo though that exposing it as a gfp modifier is > > not so good. I just like the implementation to mask off GFP_WAIT > > better, and also prefer not to test system state, but have someone > > just call into slab to tell it not to unconditionally enable > > interrupts. > > But interrupts is just one example. GFP_NOIO is another one vs. suspend > and resume. > > What we have here is the allocator needs to be clamped down based on the > system state. I think it will not work to try to butcher every caller, > especially since they don't always know themselves in what state they > are called. Let me put it another way.... If you have to teach every call site whether to use one flag or the other, there is -no- difference with teaching them to call one routine (alloc_bootmem) vs another (kmalloc). The way I see thing is that the -whole- point of the exercise is to remove the need for the callers to have to know in what environment they are calling kmalloc(). Yes, we do still want that for atomic calls, just because it's a good way to get people to think twice before allocating things in atomic context, but that logic pretty much ends there. If we're going to require any boot time caller of kmalloc() to pass a different set of flags than any non-boot time caller, then the whole idea of moving the initialization earlier so a single allocator can be used is moot. Cheers, Ben. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2] slab,slub: ignore __GFP_WAIT if we're booting or suspending 2009-06-12 10:11 ` Pekka Enberg 2009-06-12 10:15 ` Nick Piggin @ 2009-06-12 11:11 ` Benjamin Herrenschmidt 2009-06-12 11:34 ` Pekka Enberg 1 sibling, 1 reply; 44+ messages in thread From: Benjamin Herrenschmidt @ 2009-06-12 11:11 UTC (permalink / raw) To: Pekka Enberg Cc: Ingo Molnar, linux-mm, linux-kernel, npiggin, akpm, cl, torvalds > OK, but that means we need to fix up every single caller. I'm fine > with that but Ben is not. As I am unable to test powerpc here, I am > inclined to just merge Ben's patch as "obviously correct". > > That does not mean we can't introduce GFP_BOOT later on if we want to. Hmm? Again, you are missing part of the picture. Yes we -can- fix all the -direct- callers that are obviously only be run at boot time. But what about all the indirect ones (or even direct ones) that can be called either at boot time or later. vmalloc() is the perfect example (or more precisely __get_vm_area() which brings in ioremap etc...) but there are many more. Cheers, Ben. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2] slab,slub: ignore __GFP_WAIT if we're booting or suspending 2009-06-12 11:11 ` Benjamin Herrenschmidt @ 2009-06-12 11:34 ` Pekka Enberg 2009-06-12 11:41 ` Benjamin Herrenschmidt 2009-06-12 15:30 ` Andrew Morton 0 siblings, 2 replies; 44+ messages in thread From: Pekka Enberg @ 2009-06-12 11:34 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Ingo Molnar, linux-mm, linux-kernel, npiggin, akpm, cl, torvalds Hi Ben, On Fri, 2009-06-12 at 21:11 +1000, Benjamin Herrenschmidt wrote: > > OK, but that means we need to fix up every single caller. I'm fine > > with that but Ben is not. As I am unable to test powerpc here, I am > > inclined to just merge Ben's patch as "obviously correct". > > > > That does not mean we can't introduce GFP_BOOT later on if we want to. Hmm? > > Again, you are missing part of the picture. Yes we -can- fix all the > -direct- callers that are obviously only be run at boot time. But what > about all the indirect ones (or even direct ones) that can be called > either at boot time or later. vmalloc() is the perfect example (or more > precisely __get_vm_area() which brings in ioremap etc...) but there are > many more. No, I don't think I am. We can fix up the indirect callers too by making sure we pass the proper GFP flag and propagate that all the way down. Yes, this is potentially quite a bit of code churn which is why I do see your patch being the easy way out. That said, Nick and Ingo seem to think special-casing is questionable and I haven't had green light for any of the patches yet. The gfp sanitization patch adds some overhead to kmalloc() and page allocator paths which is obviously a concern. So while we continue to discuss this, I'd really like to proceed with the patch below. At least it should allow people to boot their kernels (although it will produce warnings). I really don't want to keep other people waiting for us to reach a resolution on this. Are you OK with that? Pekka >From f6b726dae91cc74fb3a00f192932ec4fe0949875 Mon Sep 17 00:00:00 2001 From: Pekka Enberg <penberg@cs.helsinki.fi> Date: Fri, 12 Jun 2009 14:03:06 +0300 Subject: [PATCH] slab: don't enable interrupts during early boot As explained by Benjamin Herrenschmidt: Oh and btw, your patch alone doesn't fix powerpc, because it's missing a whole bunch of GFP_KERNEL's in the arch code... You would have to grep the entire kernel for things that check slab_is_available() and even then you'll be missing some. For example, slab_is_available() didn't always exist, and so in the early days on powerpc, we used a mem_init_done global that is set form mem_init() (not perfect but works in practice). And we still have code using that to do the test. Therefore, mask out __GFP_WAIT in the slab allocators in early boot code to avoid enabling interrupts. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> --- include/linux/slab.h | 2 ++ include/linux/slob_def.h | 5 +++++ include/linux/slub_def.h | 2 ++ init/main.c | 1 + mm/slab.c | 22 ++++++++++++++++++++++ mm/slub.c | 18 ++++++++++++++++++ 6 files changed, 50 insertions(+), 0 deletions(-) diff --git a/include/linux/slab.h b/include/linux/slab.h index 4880306..219b8fb 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -319,4 +319,6 @@ static inline void *kzalloc_node(size_t size, gfp_t flags, int node) return kmalloc_node(size, flags | __GFP_ZERO, node); } +void __init kmem_cache_init_late(void); + #endif /* _LINUX_SLAB_H */ diff --git a/include/linux/slob_def.h b/include/linux/slob_def.h index 0ec00b3..bb5368d 100644 --- a/include/linux/slob_def.h +++ b/include/linux/slob_def.h @@ -34,4 +34,9 @@ static __always_inline void *__kmalloc(size_t size, gfp_t flags) return kmalloc(size, flags); } +static inline void kmem_cache_init_late(void) +{ + /* Nothing to do */ +} + #endif /* __LINUX_SLOB_DEF_H */ diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h index be5d40c..4dcbc2c 100644 --- a/include/linux/slub_def.h +++ b/include/linux/slub_def.h @@ -302,4 +302,6 @@ static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node) } #endif +void __init kmem_cache_init_late(void); + #endif /* _LINUX_SLUB_DEF_H */ diff --git a/init/main.c b/init/main.c index b3e8f14..f6204f7 100644 --- a/init/main.c +++ b/init/main.c @@ -640,6 +640,7 @@ asmlinkage void __init start_kernel(void) "enabled early\n"); early_boot_irqs_on(); local_irq_enable(); + kmem_cache_init_late(); /* * HACK ALERT! This is early. We're enabling the console before diff --git a/mm/slab.c b/mm/slab.c index f46b65d..a785808 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -304,6 +304,12 @@ struct kmem_list3 { }; /* + * The slab allocator is initialized with interrupts disabled. Therefore, make + * sure early boot allocations don't accidentally enable interrupts. + */ +static gfp_t slab_gfp_mask __read_mostly = __GFP_BITS_MASK & ~__GFP_WAIT; + +/* * Need this for bootstrapping a per node allocator. */ #define NUM_INIT_LISTS (3 * MAX_NUMNODES) @@ -1654,6 +1660,14 @@ void __init kmem_cache_init(void) */ } +void __init kmem_cache_init_late(void) +{ + /* + * Interrupts are enabled now so all GFP allocations are safe. + */ + slab_gfp_mask = __GFP_BITS_MASK; +} + static int __init cpucache_init(void) { int cpu; @@ -2812,6 +2826,10 @@ static int cache_grow(struct kmem_cache *cachep, offset *= cachep->colour_off; + /* Lets avoid crashing in early boot code. */ + if (WARN_ON_ONCE((local_flags & ~slab_gfp_mask) != 0)) + local_flags &= slab_gfp_mask; + if (local_flags & __GFP_WAIT) local_irq_enable(); @@ -3237,6 +3255,10 @@ retry: } if (!obj) { + /* Lets avoid crashing in early boot code. */ + if (WARN_ON_ONCE((local_flags & ~slab_gfp_mask) != 0)) + local_flags &= slab_gfp_mask; + /* * This allocation will be performed within the constraints * of the current cpuset / memory policy requirements. diff --git a/mm/slub.c b/mm/slub.c index 3964d3c..651bb34 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -178,6 +178,12 @@ static enum { SYSFS /* Sysfs up */ } slab_state = DOWN; +/* + * The slab allocator is initialized with interrupts disabled. Therefore, make + * sure early boot allocations don't accidentally enable interrupts. + */ +static gfp_t slab_gfp_mask __read_mostly = __GFP_BITS_MASK & ~__GFP_WAIT; + /* A list of all slab caches on the system */ static DECLARE_RWSEM(slub_lock); static LIST_HEAD(slab_caches); @@ -1548,6 +1554,10 @@ new_slab: goto load_freelist; } + /* Lets avoid crashing in early boot code. */ + if (WARN_ON_ONCE((gfpflags & ~slab_gfp_mask) != 0)) + gfpflags &= slab_gfp_mask; + if (gfpflags & __GFP_WAIT) local_irq_enable(); @@ -3104,6 +3114,14 @@ void __init kmem_cache_init(void) nr_cpu_ids, nr_node_ids); } +void __init kmem_cache_init_late(void) +{ + /* + * Interrupts are enabled now so all GFP allocations are safe. + */ + slab_gfp_mask = __GFP_BITS_MASK; +} + /* * Find a mergeable slab cache */ -- 1.6.0.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH v2] slab,slub: ignore __GFP_WAIT if we're booting or suspending 2009-06-12 11:34 ` Pekka Enberg @ 2009-06-12 11:41 ` Benjamin Herrenschmidt 2009-06-12 11:43 ` Pekka Enberg 2009-06-12 15:30 ` Andrew Morton 1 sibling, 1 reply; 44+ messages in thread From: Benjamin Herrenschmidt @ 2009-06-12 11:41 UTC (permalink / raw) To: Pekka Enberg Cc: Ingo Molnar, linux-mm, linux-kernel, npiggin, akpm, cl, torvalds > No, I don't think I am. We can fix up the indirect callers too by making > sure we pass the proper GFP flag and propagate that all the way down. > Yes, this is potentially quite a bit of code churn which is why I do see > your patch being the easy way out. s/churn/bloat ... and I really don't see the point. We can duplicate all of the __get_vm_area() interface variants with some _boot() versions, that's going to grow the kernel text for no good reason. Again, why would every call site have to know whether it's called during the boot process or not... more than that, whether interrupts have been turned on yet or not, especially since we may decide to move the point where we turn them on in the future. I really don't follow your logic here. It's fragile, adds complexity and bloat, in an area where we are trying to remove some. > That said, Nick and Ingo seem to think special-casing is questionable > and I haven't had green light for any of the patches yet. The gfp > sanitization patch adds some overhead to kmalloc() and page allocator > paths which is obviously a concern. Let's wait and see what Linus thinks... > So while we continue to discuss this, I'd really like to proceed with > the patch below. At least it should allow people to boot their kernels > (although it will produce warnings). I really don't want to keep other > people waiting for us to reach a resolution on this. Are you OK with > that? I don't care -how- we achieve the result I want as long as we achieve it, which is to remove the need for callers to care. My approach was one way to do it, I'm sure there's a better one. That's not the point. I'm too tried now to properly review your patch and I'll need to test it tomorrow morning, but it looks ok except for the WARN_ON maybe. Again, I don't think we need to -fix- things. I think most code should be able to call kmalloc(GFP_KERNEL) without having to bother at which precise stage of the boot sequence it is running. Cheers, Ben. > Pekka > > >From f6b726dae91cc74fb3a00f192932ec4fe0949875 Mon Sep 17 00:00:00 2001 > From: Pekka Enberg <penberg@cs.helsinki.fi> > Date: Fri, 12 Jun 2009 14:03:06 +0300 > Subject: [PATCH] slab: don't enable interrupts during early boot > > As explained by Benjamin Herrenschmidt: > > Oh and btw, your patch alone doesn't fix powerpc, because it's missing > a whole bunch of GFP_KERNEL's in the arch code... You would have to > grep the entire kernel for things that check slab_is_available() and > even then you'll be missing some. > > For example, slab_is_available() didn't always exist, and so in the > early days on powerpc, we used a mem_init_done global that is set form > mem_init() (not perfect but works in practice). And we still have code > using that to do the test. > > Therefore, mask out __GFP_WAIT in the slab allocators in early boot code to > avoid enabling interrupts. > > Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> > --- > include/linux/slab.h | 2 ++ > include/linux/slob_def.h | 5 +++++ > include/linux/slub_def.h | 2 ++ > init/main.c | 1 + > mm/slab.c | 22 ++++++++++++++++++++++ > mm/slub.c | 18 ++++++++++++++++++ > 6 files changed, 50 insertions(+), 0 deletions(-) > > diff --git a/include/linux/slab.h b/include/linux/slab.h > index 4880306..219b8fb 100644 > --- a/include/linux/slab.h > +++ b/include/linux/slab.h > @@ -319,4 +319,6 @@ static inline void *kzalloc_node(size_t size, gfp_t flags, int node) > return kmalloc_node(size, flags | __GFP_ZERO, node); > } > > +void __init kmem_cache_init_late(void); > + > #endif /* _LINUX_SLAB_H */ > diff --git a/include/linux/slob_def.h b/include/linux/slob_def.h > index 0ec00b3..bb5368d 100644 > --- a/include/linux/slob_def.h > +++ b/include/linux/slob_def.h > @@ -34,4 +34,9 @@ static __always_inline void *__kmalloc(size_t size, gfp_t flags) > return kmalloc(size, flags); > } > > +static inline void kmem_cache_init_late(void) > +{ > + /* Nothing to do */ > +} > + > #endif /* __LINUX_SLOB_DEF_H */ > diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h > index be5d40c..4dcbc2c 100644 > --- a/include/linux/slub_def.h > +++ b/include/linux/slub_def.h > @@ -302,4 +302,6 @@ static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node) > } > #endif > > +void __init kmem_cache_init_late(void); > + > #endif /* _LINUX_SLUB_DEF_H */ > diff --git a/init/main.c b/init/main.c > index b3e8f14..f6204f7 100644 > --- a/init/main.c > +++ b/init/main.c > @@ -640,6 +640,7 @@ asmlinkage void __init start_kernel(void) > "enabled early\n"); > early_boot_irqs_on(); > local_irq_enable(); > + kmem_cache_init_late(); > > /* > * HACK ALERT! This is early. We're enabling the console before > diff --git a/mm/slab.c b/mm/slab.c > index f46b65d..a785808 100644 > --- a/mm/slab.c > +++ b/mm/slab.c > @@ -304,6 +304,12 @@ struct kmem_list3 { > }; > > /* > + * The slab allocator is initialized with interrupts disabled. Therefore, make > + * sure early boot allocations don't accidentally enable interrupts. > + */ > +static gfp_t slab_gfp_mask __read_mostly = __GFP_BITS_MASK & ~__GFP_WAIT; > + > +/* > * Need this for bootstrapping a per node allocator. > */ > #define NUM_INIT_LISTS (3 * MAX_NUMNODES) > @@ -1654,6 +1660,14 @@ void __init kmem_cache_init(void) > */ > } > > +void __init kmem_cache_init_late(void) > +{ > + /* > + * Interrupts are enabled now so all GFP allocations are safe. > + */ > + slab_gfp_mask = __GFP_BITS_MASK; > +} > + > static int __init cpucache_init(void) > { > int cpu; > @@ -2812,6 +2826,10 @@ static int cache_grow(struct kmem_cache *cachep, > > offset *= cachep->colour_off; > > + /* Lets avoid crashing in early boot code. */ > + if (WARN_ON_ONCE((local_flags & ~slab_gfp_mask) != 0)) > + local_flags &= slab_gfp_mask; > + > if (local_flags & __GFP_WAIT) > local_irq_enable(); > > @@ -3237,6 +3255,10 @@ retry: > } > > if (!obj) { > + /* Lets avoid crashing in early boot code. */ > + if (WARN_ON_ONCE((local_flags & ~slab_gfp_mask) != 0)) > + local_flags &= slab_gfp_mask; > + > /* > * This allocation will be performed within the constraints > * of the current cpuset / memory policy requirements. > diff --git a/mm/slub.c b/mm/slub.c > index 3964d3c..651bb34 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -178,6 +178,12 @@ static enum { > SYSFS /* Sysfs up */ > } slab_state = DOWN; > > +/* > + * The slab allocator is initialized with interrupts disabled. Therefore, make > + * sure early boot allocations don't accidentally enable interrupts. > + */ > +static gfp_t slab_gfp_mask __read_mostly = __GFP_BITS_MASK & ~__GFP_WAIT; > + > /* A list of all slab caches on the system */ > static DECLARE_RWSEM(slub_lock); > static LIST_HEAD(slab_caches); > @@ -1548,6 +1554,10 @@ new_slab: > goto load_freelist; > } > > + /* Lets avoid crashing in early boot code. */ > + if (WARN_ON_ONCE((gfpflags & ~slab_gfp_mask) != 0)) > + gfpflags &= slab_gfp_mask; > + > if (gfpflags & __GFP_WAIT) > local_irq_enable(); > > @@ -3104,6 +3114,14 @@ void __init kmem_cache_init(void) > nr_cpu_ids, nr_node_ids); > } > > +void __init kmem_cache_init_late(void) > +{ > + /* > + * Interrupts are enabled now so all GFP allocations are safe. > + */ > + slab_gfp_mask = __GFP_BITS_MASK; > +} > + > /* > * Find a mergeable slab cache > */ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2] slab,slub: ignore __GFP_WAIT if we're booting or suspending 2009-06-12 11:41 ` Benjamin Herrenschmidt @ 2009-06-12 11:43 ` Pekka Enberg 0 siblings, 0 replies; 44+ messages in thread From: Pekka Enberg @ 2009-06-12 11:43 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Ingo Molnar, linux-mm, linux-kernel, npiggin, akpm, cl, torvalds Hi Ben, On Fri, Jun 12, 2009 at 2:41 PM, Benjamin Herrenschmidt<benh@kernel.crashing.org> wrote: >> That said, Nick and Ingo seem to think special-casing is questionable >> and I haven't had green light for any of the patches yet. The gfp >> sanitization patch adds some overhead to kmalloc() and page allocator >> paths which is obviously a concern. > > Let's wait and see what Linus thinks... Yup, lets do that. On Fri, Jun 12, 2009 at 2:41 PM, Benjamin Herrenschmidt<benh@kernel.crashing.org> wrote: >> So while we continue to discuss this, I'd really like to proceed with >> the patch below. At least it should allow people to boot their kernels >> (although it will produce warnings). I really don't want to keep other >> people waiting for us to reach a resolution on this. Are you OK with >> that? > > I don't care -how- we achieve the result I want as long as we achieve > it, which is to remove the need for callers to care. My approach was one > way to do it, I'm sure there's a better one. That's not the point. I'm > too tried now to properly review your patch and I'll need to test it > tomorrow morning, but it looks ok except for the WARN_ON maybe. OK, the WARN_ON is there because you will get warnings for might_sleep() et al as well. Pekka -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2] slab,slub: ignore __GFP_WAIT if we're booting or suspending 2009-06-12 11:34 ` Pekka Enberg 2009-06-12 11:41 ` Benjamin Herrenschmidt @ 2009-06-12 15:30 ` Andrew Morton 2009-06-12 21:42 ` Benjamin Herrenschmidt 2009-06-25 4:41 ` Nick Piggin 1 sibling, 2 replies; 44+ messages in thread From: Andrew Morton @ 2009-06-12 15:30 UTC (permalink / raw) To: Pekka Enberg Cc: Benjamin Herrenschmidt, Ingo Molnar, linux-mm, linux-kernel, npiggin, cl, torvalds On Fri, 12 Jun 2009 14:34:00 +0300 Pekka Enberg <penberg@cs.helsinki.fi> wrote: > +static gfp_t slab_gfp_mask __read_mostly = __GFP_BITS_MASK & ~__GFP_WAIT; It'd be safer and saner to disable __GFP_FS and __GFP_IO as well. Having either of those flags set without __GFP_WAIT is a somewhat self-contradictory thing and there might be code under reclaim which assumes that __GFP_FS|__GFP_IO implies __GFP_WAIT. <wonders why mempool_alloc() didn't clear __GFP_FS> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2] slab,slub: ignore __GFP_WAIT if we're booting or suspending 2009-06-12 15:30 ` Andrew Morton @ 2009-06-12 21:42 ` Benjamin Herrenschmidt 2009-06-25 4:41 ` Nick Piggin 1 sibling, 0 replies; 44+ messages in thread From: Benjamin Herrenschmidt @ 2009-06-12 21:42 UTC (permalink / raw) To: Andrew Morton Cc: Pekka Enberg, Ingo Molnar, linux-mm, linux-kernel, npiggin, cl, torvalds On Fri, 2009-06-12 at 08:30 -0700, Andrew Morton wrote: > On Fri, 12 Jun 2009 14:34:00 +0300 Pekka Enberg <penberg@cs.helsinki.fi> wrote: > > > +static gfp_t slab_gfp_mask __read_mostly = __GFP_BITS_MASK & ~__GFP_WAIT; > > It'd be safer and saner to disable __GFP_FS and __GFP_IO as well. Right. That's what my original patch does in fact. I also re-enabled them all together but in that case, it might be better to re-enable FS and IO later, I'll let experts decide. > Having either of those flags set without __GFP_WAIT is a somewhat > self-contradictory thing and there might be code under reclaim which > assumes that __GFP_FS|__GFP_IO implies __GFP_WAIT. > > <wonders why mempool_alloc() didn't clear __GFP_FS> Cheers, Ben. > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2] slab,slub: ignore __GFP_WAIT if we're booting or suspending 2009-06-12 15:30 ` Andrew Morton 2009-06-12 21:42 ` Benjamin Herrenschmidt @ 2009-06-25 4:41 ` Nick Piggin 1 sibling, 0 replies; 44+ messages in thread From: Nick Piggin @ 2009-06-25 4:41 UTC (permalink / raw) To: Andrew Morton Cc: Pekka Enberg, Benjamin Herrenschmidt, Ingo Molnar, linux-mm, linux-kernel, cl, torvalds On Fri, Jun 12, 2009 at 08:30:05AM -0700, Andrew Morton wrote: > On Fri, 12 Jun 2009 14:34:00 +0300 Pekka Enberg <penberg@cs.helsinki.fi> wrote: > > > +static gfp_t slab_gfp_mask __read_mostly = __GFP_BITS_MASK & ~__GFP_WAIT; > > It'd be safer and saner to disable __GFP_FS and __GFP_IO as well. > Having either of those flags set without __GFP_WAIT is a somewhat > self-contradictory thing and there might be code under reclaim which > assumes that __GFP_FS|__GFP_IO implies __GFP_WAIT. > > <wonders why mempool_alloc() didn't clear __GFP_FS> Maybe we never get there if __GFP_WAIT is clear? It would be neater if it did clear __GFP_FS, though... -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2] slab,slub: ignore __GFP_WAIT if we're booting or suspending 2009-06-12 10:07 ` Ingo Molnar 2009-06-12 10:11 ` Pekka Enberg @ 2009-06-12 11:09 ` Benjamin Herrenschmidt 1 sibling, 0 replies; 44+ messages in thread From: Benjamin Herrenschmidt @ 2009-06-12 11:09 UTC (permalink / raw) To: Ingo Molnar Cc: Pekka Enberg, linux-mm, linux-kernel, npiggin, akpm, cl, torvalds On Fri, 2009-06-12 at 12:07 +0200, Ingo Molnar wrote: > > IMHO such invisible side-channels modifying the semantics of GFP > flags is a bit dubious. > > We could do GFP_INIT or GFP_BOOT. These can imply other useful > modifiers as well: panic-on-failure for example. (this would clean > up a fair amount of init code that currently checks for an panics on > allocation failure.) I disagree. I believe most code shouldn't have to care whether it's in boot, suspend or similar to get the right flags to kmalloc(). This is especially true for when the allocator is called indirectly by something that can itself be called from either boot or non-boot. I believe the best example here is __get_vm_area() will use GFP_KERNEL. I don't think it should be "fixed" to do anything else. The normal case of GFP_KERNEL is correct and it shouldn't be changed to do GFP_NOWAIT just because it happens that we use it earlier during init time. This is also true of a lot of code used on "hotplug" path that is commonly used at init time but can be used later on. To some extent, the subtle distinction of whether interrupts are enabled or not is something that shouldn't be something those callers have to bother with. Yes, it is obvious for some strictly init code, but it's far from being always that simple, and it's not unlikely that we'll decide to move around in the init sequence the point at which we decide to enable interrupts. We shouldn't have to fix half of the init code when we do that. In fact, we could push the logic further (but please read it all before reacting :-) The fact that we -do- specific GFP_ATOMIC for atomic context is -almost- a side effect of history. To some extent we could get rid of it since we can almost always know when we are in such a context. In that case, though, I believe we should keep it that way, at least because it does discourage people from allocating in those contexts which is a good thing. Back to the general idea, I think we shouldn't burden arch, driver, subsystem etc... code with the need to understand the system state, in our present case, init vs. non init, but the same issue applies with suspend/resume vs. GFP_NOIO as I explained in a separate email. This typically a case where I believe the best way to ensure we do the right thing is to put the check in the few common code path where everybody funnels through, which is the allocator itself. Cheers, Ben. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2] slab,slub: ignore __GFP_WAIT if we're booting or suspending 2009-06-12 9:03 ` [PATCH v2] " Pekka J Enberg 2009-06-12 9:10 ` Ingo Molnar @ 2009-06-12 15:04 ` Linus Torvalds 2009-06-12 15:05 ` Pekka Enberg 2009-06-19 14:59 ` Pavel Machek 2 siblings, 1 reply; 44+ messages in thread From: Linus Torvalds @ 2009-06-12 15:04 UTC (permalink / raw) To: Pekka J Enberg; +Cc: linux-mm, linux-kernel, mingo, npiggin, benh, akpm, cl On Fri, 12 Jun 2009, Pekka J Enberg wrote: > > + if (system_state != SYSTEM_RUNNING) > + local_flags &= ~__GFP_WAIT; > + > + might_sleep_if(local_flags & __GFP_WAIT); This is pointless. You're doing the "might_sleep_if()" way too late. At that point, you've already lost 99% of all coverage, since now none of the cases of just finding a free slab entry on the list will ever trigger that "might_sleep()" case. So you need to do this _early_, at the entry-point, not late, at cache re-fill time. So rather than removing the might_sleep_if() at the early point, and then moving it to this late stage (because you only do the local_flags fixups late), you need to move the local-flags fixup early instead, and do the might_sleep_it() there. The whole point of "might_sleep()" is that it triggers every time if something is called in the wrong context - not just for the cases where it actually _does_ sleep. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2] slab,slub: ignore __GFP_WAIT if we're booting or suspending 2009-06-12 15:04 ` Linus Torvalds @ 2009-06-12 15:05 ` Pekka Enberg 0 siblings, 0 replies; 44+ messages in thread From: Pekka Enberg @ 2009-06-12 15:05 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-mm, linux-kernel, mingo, npiggin, benh, akpm, cl Hi Linus, Linus Torvalds wrote: > > On Fri, 12 Jun 2009, Pekka J Enberg wrote: >> >> + if (system_state != SYSTEM_RUNNING) >> + local_flags &= ~__GFP_WAIT; >> + >> + might_sleep_if(local_flags & __GFP_WAIT); > > This is pointless. > > You're doing the "might_sleep_if()" way too late. At that point, you've > already lost 99% of all coverage, since now none of the cases of just > finding a free slab entry on the list will ever trigger that > "might_sleep()" case. > > So you need to do this _early_, at the entry-point, not late, at cache > re-fill time. > > So rather than removing the might_sleep_if() at the early point, and then > moving it to this late stage (because you only do the local_flags fixups > late), you need to move the local-flags fixup early instead, and do the > might_sleep_it() there. > > The whole point of "might_sleep()" is that it triggers every time if > something is called in the wrong context - not just for the cases where it > actually _does_ sleep. OK, makes sense. So what do you think of this patch then: http://patchwork.kernel.org/patch/29733/ It's what Ben has been proposing all along in a slightly edited form. Pekka -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2] slab,slub: ignore __GFP_WAIT if we're booting or suspending 2009-06-12 9:03 ` [PATCH v2] " Pekka J Enberg 2009-06-12 9:10 ` Ingo Molnar 2009-06-12 15:04 ` Linus Torvalds @ 2009-06-19 14:59 ` Pavel Machek 2009-06-19 22:27 ` Benjamin Herrenschmidt 2 siblings, 1 reply; 44+ messages in thread From: Pavel Machek @ 2009-06-19 14:59 UTC (permalink / raw) To: Pekka J Enberg Cc: linux-mm, linux-kernel, mingo, npiggin, benh, akpm, cl, torvalds Hi! > > As explained by Benjamin Herrenschmidt: > > Oh and btw, your patch alone doesn't fix powerpc, because it's missing > a whole bunch of GFP_KERNEL's in the arch code... You would have to > grep the entire kernel for things that check slab_is_available() and > even then you'll be missing some. > > For example, slab_is_available() didn't always exist, and so in the > early days on powerpc, we used a mem_init_done global that is set form > mem_init() (not perfect but works in practice). And we still have code > using that to do the test. > > Therefore, ignore __GFP_WAIT in the slab allocators if we're booting or > suspending. Ok... GFP_KERNEL allocations normally don't fail; now they will. Should we at least force access to atomic reserves in such case? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2] slab,slub: ignore __GFP_WAIT if we're booting or suspending 2009-06-19 14:59 ` Pavel Machek @ 2009-06-19 22:27 ` Benjamin Herrenschmidt 2009-06-19 23:23 ` Pavel Machek 0 siblings, 1 reply; 44+ messages in thread From: Benjamin Herrenschmidt @ 2009-06-19 22:27 UTC (permalink / raw) To: Pavel Machek Cc: Pekka J Enberg, linux-mm, linux-kernel, mingo, npiggin, akpm, cl, torvalds On Fri, 2009-06-19 at 16:59 +0200, Pavel Machek wrote: > > Ok... GFP_KERNEL allocations normally don't fail; now they > will. Should we at least force access to atomic reserves in such case? No. First, code that assumes GFP_KERNEL don't fail is stupid. Any allocation should always be assumed to potentially fail. Then, if you start failing allocations at boot time, then you aren't going anywhere are you ? Cheers, Ben. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2] slab,slub: ignore __GFP_WAIT if we're booting or suspending 2009-06-19 22:27 ` Benjamin Herrenschmidt @ 2009-06-19 23:23 ` Pavel Machek 2009-06-19 23:50 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 44+ messages in thread From: Pavel Machek @ 2009-06-19 23:23 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Pekka J Enberg, linux-mm, linux-kernel, mingo, npiggin, akpm, cl, torvalds On Sat 2009-06-20 08:27:29, Benjamin Herrenschmidt wrote: > On Fri, 2009-06-19 at 16:59 +0200, Pavel Machek wrote: > > > > Ok... GFP_KERNEL allocations normally don't fail; now they > > will. Should we at least force access to atomic reserves in such case? > > No. First, code that assumes GFP_KERNEL don't fail is stupid. Any > allocation should always be assumed to potentially fail. Stupid, yes. Uncommon? Not sure. > Then, if you start failing allocations at boot time, then you aren't > going anywhere are you ? Exactly. So boot code should have access to all the memory, right? Setting some aside for GFP_ATOMIC does not make sense in that context. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2] slab,slub: ignore __GFP_WAIT if we're booting or suspending 2009-06-19 23:23 ` Pavel Machek @ 2009-06-19 23:50 ` Benjamin Herrenschmidt 2009-06-20 0:28 ` Pavel Machek 0 siblings, 1 reply; 44+ messages in thread From: Benjamin Herrenschmidt @ 2009-06-19 23:50 UTC (permalink / raw) To: Pavel Machek Cc: Pekka J Enberg, linux-mm, linux-kernel, mingo, npiggin, akpm, cl, torvalds On Sat, 2009-06-20 at 01:23 +0200, Pavel Machek wrote: > > No. First, code that assumes GFP_KERNEL don't fail is stupid. Any > > allocation should always be assumed to potentially fail. > > Stupid, yes. Uncommon? Not sure. A lot less than it used to be, we've been fixing those by the truckload over the past few years. But again, if allocations start failing that early at boot, you are likely to be doomed anyway. Still, better to do proper error handling, and I think we -mostly- do (ok, not -always-). > > Then, if you start failing allocations at boot time, then you aren't > > going anywhere are you ? > > Exactly. So boot code should have access to all the memory, right? > Setting some aside for GFP_ATOMIC does not make sense in that context. I'm not certain what you mean here. If you're going to hit the atomic reserve that early, you aren't going anywhere neither :-) Is there any real problem you are trying to solve here or is it all just academic ? Cheers, Ben. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2] slab,slub: ignore __GFP_WAIT if we're booting or suspending 2009-06-19 23:50 ` Benjamin Herrenschmidt @ 2009-06-20 0:28 ` Pavel Machek 2009-06-20 2:10 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 44+ messages in thread From: Pavel Machek @ 2009-06-20 0:28 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Pekka J Enberg, linux-mm, linux-kernel, mingo, npiggin, akpm, cl, torvalds On Sat 2009-06-20 09:50:09, Benjamin Herrenschmidt wrote: > On Sat, 2009-06-20 at 01:23 +0200, Pavel Machek wrote: > > > No. First, code that assumes GFP_KERNEL don't fail is stupid. Any > > > allocation should always be assumed to potentially fail. > > > > Stupid, yes. Uncommon? Not sure. > > A lot less than it used to be, we've been fixing those by the truckload > over the past few years. But again, if allocations start failing that > early at boot, you are likely to be doomed anyway. Still, better to do > proper error handling, and I think we -mostly- do (ok, not -always-). > > > > Then, if you start failing allocations at boot time, then you aren't > > > going anywhere are you ? > > > > Exactly. So boot code should have access to all the memory, right? > > Setting some aside for GFP_ATOMIC does not make sense in that context. > > I'm not certain what you mean here. If you're going to hit the atomic > reserve that early, you aren't going anywhere neither :-) > > Is there any real problem you are trying to solve here or is it all > just academic ? Academic for boot, probably real for suspend/resume. There the atomic reserves could matter because the memory can be pretty full when you start suspend. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2] slab,slub: ignore __GFP_WAIT if we're booting or suspending 2009-06-20 0:28 ` Pavel Machek @ 2009-06-20 2:10 ` Benjamin Herrenschmidt 2009-06-21 6:18 ` Pavel Machek 0 siblings, 1 reply; 44+ messages in thread From: Benjamin Herrenschmidt @ 2009-06-20 2:10 UTC (permalink / raw) To: Pavel Machek Cc: Pekka J Enberg, linux-mm, linux-kernel, mingo, npiggin, akpm, cl, torvalds On Sat, 2009-06-20 at 02:28 +0200, Pavel Machek wrote: > > Academic for boot, probably real for suspend/resume. There the atomic > reserves could matter because the memory can be pretty full when you > start suspend. Right, that might be something to look into, though we haven't yet applied the technique for suspend & resume. My main issue with it at the moment is how do I synchronize with allocations that are already sleeping when changing the gfp flag mask without bloating the normal path. I haven't had time to look into it, it's mostly a problem local to the page allocator and reclaim, not much to do with SL*Bs though, which is fortunate. I also suspect that we might want to try to make -some- amount of free space before starting suspend, though of course not nearly as aggressively as with std. Cheers, Ben. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2] slab,slub: ignore __GFP_WAIT if we're booting or suspending 2009-06-20 2:10 ` Benjamin Herrenschmidt @ 2009-06-21 6:18 ` Pavel Machek 2009-06-21 9:31 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 44+ messages in thread From: Pavel Machek @ 2009-06-21 6:18 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Pekka J Enberg, linux-mm, linux-kernel, mingo, npiggin, akpm, cl, torvalds, Rafael J. Wysocki Hi! > > Academic for boot, probably real for suspend/resume. There the atomic > > reserves could matter because the memory can be pretty full when you > > start suspend. > > Right, that might be something to look into, though we haven't yet > applied the technique for suspend & resume. My main issue with it at the > moment is how do I synchronize with allocations that are already > sleeping when changing the gfp flag mask without bloating the normal Well, but the problem already exists, no? If someone is already sleeping due to __GFP_WAIT, he'll probably sleep till the resume. ...well, if he's sleeping in the disk driver, I suspect driver will finish outstanding requests as part of .suspend(). > I also suspect that we might want to try to make -some- amount of free > space before starting suspend, though of course not nearly as > aggressively as with std. We free 4MB in 2.6.30, but Rafael is removing that for 2.6.31 :-(. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2] slab,slub: ignore __GFP_WAIT if we're booting or suspending 2009-06-21 6:18 ` Pavel Machek @ 2009-06-21 9:31 ` Benjamin Herrenschmidt 2009-06-25 4:34 ` Nick Piggin 0 siblings, 1 reply; 44+ messages in thread From: Benjamin Herrenschmidt @ 2009-06-21 9:31 UTC (permalink / raw) To: Pavel Machek Cc: Pekka J Enberg, linux-mm, linux-kernel, mingo, npiggin, akpm, cl, torvalds, Rafael J. Wysocki > > Right, that might be something to look into, though we haven't yet > > applied the technique for suspend & resume. My main issue with it at the > > moment is how do I synchronize with allocations that are already > > sleeping when changing the gfp flag mask without bloating the normal > > Well, but the problem already exists, no? If someone is already > sleeping due to __GFP_WAIT, he'll probably sleep till the resume. Yes. In fact, without the masking, a driver that hasn't been suspended yet could well start sleeping in GFP_KERNEL after the disk driver has suspended. It may do so while holding a mutex or similar, which might deadlock its own suspend() callback. It's not something that drivers can trivially address by having a pre-suspend hook, and avoid allocations, since allocations may be done by subsystems on behalf of the driver or such. It's a can of worms, which is why I believe the only sane approach is to stop allocators from doing IOs once we start suspend. So yes, just applying the mask would help, but wouldn't completely fix it unless we also find a way to synchronize. > ...well, if he's sleeping in the disk driver, I suspect driver will > finish outstanding requests as part of .suspend(). > > > I also suspect that we might want to try to make -some- amount of free > > space before starting suspend, though of course not nearly as > > aggressively as with std. > > We free 4MB in 2.6.30, but Rafael is removing that for 2.6.31 :-(. Well... we are taking a chance of making the above scenario more likely to hit then. Cheers, Ben. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2] slab,slub: ignore __GFP_WAIT if we're booting or suspending 2009-06-21 9:31 ` Benjamin Herrenschmidt @ 2009-06-25 4:34 ` Nick Piggin 2009-06-25 9:56 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 44+ messages in thread From: Nick Piggin @ 2009-06-25 4:34 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Pavel Machek, Pekka J Enberg, linux-mm, linux-kernel, mingo, akpm, cl, torvalds, Rafael J. Wysocki On Sun, Jun 21, 2009 at 07:31:05PM +1000, Benjamin Herrenschmidt wrote: > > > > Right, that might be something to look into, though we haven't yet > > > applied the technique for suspend & resume. My main issue with it at the > > > moment is how do I synchronize with allocations that are already > > > sleeping when changing the gfp flag mask without bloating the normal > > > > Well, but the problem already exists, no? If someone is already > > sleeping due to __GFP_WAIT, he'll probably sleep till the resume. > > Yes. In fact, without the masking, a driver that hasn't been suspended > yet could well start sleeping in GFP_KERNEL after the disk driver has > suspended. It may do so while holding a mutex or similar, which might > deadlock its own suspend() callback. It's not something that drivers can > trivially address by having a pre-suspend hook, and avoid allocations, > since allocations may be done by subsystems on behalf of the driver or > such. It's a can of worms, which is why I believe the only sane approach > is to stop allocators from doing IOs once we start suspend. Maybe so. Masking off __GFP_WAIT up in slab and page allocator isn't really needed though (or necessarily a good idea to throw out that information far from where it is used). Checking for suspend active and avoiding writeout from reclaim for example might be a better idea. > So yes, just applying the mask would help, but wouldn't completely fix > it unless we also find a way to synchronize. You could potentially use srcu or something like that in page reclaim in order to have a way to be able to kick everyone out. page reclaim entry/exit from the page allocator isn't such a fastpath though, so even a simple mutex or something may be possible. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2] slab,slub: ignore __GFP_WAIT if we're booting or suspending 2009-06-25 4:34 ` Nick Piggin @ 2009-06-25 9:56 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 44+ messages in thread From: Benjamin Herrenschmidt @ 2009-06-25 9:56 UTC (permalink / raw) To: Nick Piggin Cc: Pavel Machek, Pekka J Enberg, linux-mm, linux-kernel, mingo, akpm, cl, torvalds, Rafael J. Wysocki > Maybe so. Masking off __GFP_WAIT up in slab and page allocator > isn't really needed though (or necessarily a good idea to throw > out that information far from where it is used). > > Checking for suspend active and avoiding writeout from reclaim > for example might be a better idea. Ah ok. Yes, I agree. I'm not familiar with those code path and so masking gfp here sounded like the easier solution but you may well be right here :-) > > So yes, just applying the mask would help, but wouldn't completely fix > > it unless we also find a way to synchronize. > > You could potentially use srcu or something like that in page > reclaim in order to have a way to be able to kick everyone > out. page reclaim entry/exit from the page allocator isn't such > a fastpath though, so even a simple mutex or something may be > possible. Ok. Well, I'll leave that to the suspend/resume folks for now, as I'm way too busy at the moment to give that a serious look, but thanks for the pointer. Cheers, Ben. > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
end of thread, other threads:[~2009-06-25 9:55 UTC | newest] Thread overview: 44+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-06-12 8:13 [PATCH 2/2] slab,slub: ignore __GFP_WAIT if we're booting or suspending Pekka J Enberg 2009-06-12 9:03 ` [PATCH v2] " Pekka J Enberg 2009-06-12 9:10 ` Ingo Molnar 2009-06-12 9:21 ` Benjamin Herrenschmidt 2009-06-12 9:24 ` Pekka Enberg 2009-06-12 9:36 ` Benjamin Herrenschmidt 2009-06-12 9:45 ` Pekka J Enberg 2009-06-12 9:58 ` Benjamin Herrenschmidt 2009-06-12 10:00 ` Pekka Enberg 2009-06-12 15:22 ` Andrew Morton 2009-06-12 9:49 ` Pekka Enberg 2009-06-12 9:52 ` Nick Piggin 2009-06-12 9:54 ` Pekka Enberg 2009-06-12 9:59 ` Benjamin Herrenschmidt 2009-06-25 4:38 ` Nick Piggin 2009-06-12 10:07 ` Ingo Molnar 2009-06-12 10:11 ` Pekka Enberg 2009-06-12 10:15 ` Nick Piggin 2009-06-12 10:30 ` Pekka J Enberg 2009-06-12 10:32 ` Pekka Enberg 2009-06-12 15:16 ` Linus Torvalds 2009-06-12 15:16 ` Pekka Enberg 2009-06-12 11:13 ` Benjamin Herrenschmidt 2009-06-12 11:24 ` Benjamin Herrenschmidt 2009-06-12 11:11 ` Benjamin Herrenschmidt 2009-06-12 11:34 ` Pekka Enberg 2009-06-12 11:41 ` Benjamin Herrenschmidt 2009-06-12 11:43 ` Pekka Enberg 2009-06-12 15:30 ` Andrew Morton 2009-06-12 21:42 ` Benjamin Herrenschmidt 2009-06-25 4:41 ` Nick Piggin 2009-06-12 11:09 ` Benjamin Herrenschmidt 2009-06-12 15:04 ` Linus Torvalds 2009-06-12 15:05 ` Pekka Enberg 2009-06-19 14:59 ` Pavel Machek 2009-06-19 22:27 ` Benjamin Herrenschmidt 2009-06-19 23:23 ` Pavel Machek 2009-06-19 23:50 ` Benjamin Herrenschmidt 2009-06-20 0:28 ` Pavel Machek 2009-06-20 2:10 ` Benjamin Herrenschmidt 2009-06-21 6:18 ` Pavel Machek 2009-06-21 9:31 ` Benjamin Herrenschmidt 2009-06-25 4:34 ` Nick Piggin 2009-06-25 9:56 ` Benjamin Herrenschmidt
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).