Re: [PATCH v2] slab,slub: ignore __GFP_WAIT if we're booting or suspending

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To: Pekka J Enberg <penberg@cs.helsinki.fi>
Cc: Ingo Molnar <mingo@elte.hu>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	npiggin@suse.de, akpm@linux-foundation.org,
	cl@linux-foundation.org, torvalds@linux-foundation.org
Subject: Re: [PATCH v2] slab,slub: ignore __GFP_WAIT if we're booting or  suspending
Date: Fri, 12 Jun 2009 19:58:15 +1000	[thread overview]
Message-ID: <1244800695.7172.115.camel@pasglop> (raw)
In-Reply-To: <Pine.LNX.4.64.0906121244020.30911@melkki.cs.Helsinki.FI>

On Fri, 2009-06-12 at 12:45 +0300, Pekka J Enberg wrote:
> On Fri, 12 Jun 2009, Benjamin Herrenschmidt wrote:
> > Take a break, take a step back, and look at the big picture. Do you
> > really want to find all the needles in the haystack or just make sure
> > you wear gloves when handling the hay ? :-)
> 
> Well, I would like to find the needles but I think we should do it with 
> gloves on.
> 
> If everyone is happy with this version of Ben's patch, I'm going to just 
> apply it and push it to Linus.

Thanks :-) Looks right at first glance. I'll test tomorrow.

Cheers,
Ben.

> 			Pekka
> 
> >From 7760451b006b165bce052622af7316b54bd5a122 Mon Sep 17 00:00:00 2001
> From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Date: Fri, 12 Jun 2009 12:39:58 +0300
> Subject: [PATCH] Sanitize "gfp" flags during boot
> 
> With the recent shuffle of initialization order to move memory related
> inits earlier, various subtle breakage was introduced in archs like
> powerpc due to code somewhat assuming that GFP_KERNEL can be used as
> soon as the allocators are up. This is not true because any __GFP_WAIT
> allocation will cause interrupts to be enabled, which can be fatal if
> it happens too early.
> 
> This isn't trivial to fix on every call site. For example, powerpc's
> ioremap implementation needs to be called early. For that, it uses two
> different mechanisms to carve out virtual space. Before memory init,
> by moving down VMALLOC_END, and then, by calling get_vm_area().
> Unfortunately, the later does GFK_KERNEL allocations. But we can't do
> anything else because once vmalloc's been initialized, we can no longer
> safely move VMALLOC_END to carve out space.
> 
> There are other examples, wehere can can be called either very early
> or later on when devices are hot-plugged. It would be a major pain for
> such code to have to "know" whether it's in a context where it should
> use GFP_KERNEL or GFP_NOWAIT.
> 
> Finally, by having the ability to silently removed __GFP_WAIT from
> allocations, we pave the way for suspend-to-RAM to use that feature
> to also remove __GFP_IO from allocations done after suspending devices
> has started. This is important because such allocations may hang if
> devices on the swap-out path have been suspended, but not-yet suspended
> drivers don't know about it, and may deadlock themselves by being hung
> into a kmalloc somewhere while holding a mutex for example.
> 
> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
> ---
>  include/linux/gfp.h |    9 +++++++++
>  init/main.c         |    1 +
>  mm/page_alloc.c     |   18 ++++++++++++++++++
>  mm/slab.c           |    9 +++++++++
>  mm/slub.c           |    3 +++
>  5 files changed, 40 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index 0bbc15f..8d2ea79 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -99,6 +99,13 @@ struct vm_area_struct;
>  /* 4GB DMA on some platforms */
>  #define GFP_DMA32	__GFP_DMA32
>  
> +extern gfp_t gfp_allowed_bits;
> +
> +static inline gfp_t gfp_sanitize(gfp_t gfp_flags)
> +{
> +	return gfp_flags & gfp_allowed_bits;
> +}
> +
>  /* Convert GFP flags to their corresponding migrate type */
>  static inline int allocflags_to_migratetype(gfp_t gfp_flags)
>  {
> @@ -245,4 +252,6 @@ void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp);
>  void drain_all_pages(void);
>  void drain_local_pages(void *dummy);
>  
> +void mm_late_init(void);
> +
>  #endif /* __LINUX_GFP_H */
> diff --git a/init/main.c b/init/main.c
> index b3e8f14..34c6e7e 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -640,6 +640,7 @@ asmlinkage void __init start_kernel(void)
>  				 "enabled early\n");
>  	early_boot_irqs_on();
>  	local_irq_enable();
> +	mm_late_init();
>  
>  	/*
>  	 * HACK ALERT! This is early. We're enabling the console before
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 7760ef9..a42e4c7 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -77,6 +77,13 @@ int percpu_pagelist_fraction;
>  int pageblock_order __read_mostly;
>  #endif
>  
> +/*
> + * We set up the page allocator and the slab allocator early on with interrupts
> + * disabled. Therefore, make sure that we sanitize GFP flags accordingly before
> + * everything is up and running.
> + */
> +gfp_t gfp_allowed_bits = ~(__GFP_WAIT|__GFP_FS | __GFP_IO);
> +
>  static void __free_pages_ok(struct page *page, unsigned int order);
>  
>  /*
> @@ -1473,6 +1480,9 @@ __alloc_pages_internal(gfp_t gfp_mask, unsigned int order,
>  	unsigned long did_some_progress;
>  	unsigned long pages_reclaimed = 0;
>  
> +	/* Sanitize flags so we don't enable irqs too early during boot */
> +	gfp_mask = gfp_sanitize(gfp_mask);
> +
>  	lockdep_trace_alloc(gfp_mask);
>  
>  	might_sleep_if(wait);
> @@ -4728,3 +4738,11 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
>  	spin_unlock_irqrestore(&zone->lock, flags);
>  }
>  #endif
> +
> +void mm_late_init(void)
> +{
> +	/*
> +	 * Interrupts are enabled now so all GFP allocations are safe.
> +	 */
> +	gfp_allowed_bits = __GFP_BITS_MASK;
> +}
> diff --git a/mm/slab.c b/mm/slab.c
> index f46b65d..87b166e 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -2791,6 +2791,9 @@ static int cache_grow(struct kmem_cache *cachep,
>  	gfp_t local_flags;
>  	struct kmem_list3 *l3;
>  
> +	/* Sanitize flags so we don't enable irqs too early during boot */
> +	gfp_mask = gfp_sanitize(gfp_mask);
> +
>  	/*
>  	 * Be lazy and only check for valid flags here,  keeping it out of the
>  	 * critical path in kmem_cache_alloc().
> @@ -3212,6 +3215,9 @@ static void *fallback_alloc(struct kmem_cache *cache, gfp_t flags)
>  	void *obj = NULL;
>  	int nid;
>  
> +	/* Sanitize flags so we don't enable irqs too early during boot */
> +	gfp_mask = gfp_sanitize(gfp_mask);
> +
>  	if (flags & __GFP_THISNODE)
>  		return NULL;
>  
> @@ -3434,6 +3440,9 @@ __cache_alloc(struct kmem_cache *cachep, gfp_t flags, void *caller)
>  	unsigned long save_flags;
>  	void *objp;
>  
> +	/* Sanitize flags so we don't enable irqs too early during boot */
> +	gfp_mask = gfp_sanitize(gfp_mask);
> +
>  	lockdep_trace_alloc(flags);
>  
>  	if (slab_should_failslab(cachep, flags))
> diff --git a/mm/slub.c b/mm/slub.c
> index 3964d3c..5c646f7 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1512,6 +1512,9 @@ static void *__slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
>  	/* We handle __GFP_ZERO in the caller */
>  	gfpflags &= ~__GFP_ZERO;
>  
> +	/* Sanitize flags so we don't enable irqs too early during boot */
> +	gfpflags = gfp_sanitize(gfpflags);
> +
>  	if (!c->page)
>  		goto new_slab;
>

WARNING: multiple messages have this Message-ID (diff)

From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To: Pekka J Enberg <penberg@cs.helsinki.fi>
Cc: Ingo Molnar <mingo@elte.hu>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	npiggin@suse.de, akpm@linux-foundation.org,
	cl@linux-foundation.org, torvalds@linux-foundation.org
Subject: Re: [PATCH v2] slab,slub: ignore __GFP_WAIT if we're booting or suspending
Date: Fri, 12 Jun 2009 19:58:15 +1000	[thread overview]
Message-ID: <1244800695.7172.115.camel@pasglop> (raw)
In-Reply-To: <Pine.LNX.4.64.0906121244020.30911@melkki.cs.Helsinki.FI>

On Fri, 2009-06-12 at 12:45 +0300, Pekka J Enberg wrote:
> On Fri, 12 Jun 2009, Benjamin Herrenschmidt wrote:
> > Take a break, take a step back, and look at the big picture. Do you
> > really want to find all the needles in the haystack or just make sure
> > you wear gloves when handling the hay ? :-)
> 
> Well, I would like to find the needles but I think we should do it with 
> gloves on.
> 
> If everyone is happy with this version of Ben's patch, I'm going to just 
> apply it and push it to Linus.

Thanks :-) Looks right at first glance. I'll test tomorrow.

Cheers,
Ben.

> 			Pekka
> 
> >From 7760451b006b165bce052622af7316b54bd5a122 Mon Sep 17 00:00:00 2001
> From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Date: Fri, 12 Jun 2009 12:39:58 +0300
> Subject: [PATCH] Sanitize "gfp" flags during boot
> 
> With the recent shuffle of initialization order to move memory related
> inits earlier, various subtle breakage was introduced in archs like
> powerpc due to code somewhat assuming that GFP_KERNEL can be used as
> soon as the allocators are up. This is not true because any __GFP_WAIT
> allocation will cause interrupts to be enabled, which can be fatal if
> it happens too early.
> 
> This isn't trivial to fix on every call site. For example, powerpc's
> ioremap implementation needs to be called early. For that, it uses two
> different mechanisms to carve out virtual space. Before memory init,
> by moving down VMALLOC_END, and then, by calling get_vm_area().
> Unfortunately, the later does GFK_KERNEL allocations. But we can't do
> anything else because once vmalloc's been initialized, we can no longer
> safely move VMALLOC_END to carve out space.
> 
> There are other examples, wehere can can be called either very early
> or later on when devices are hot-plugged. It would be a major pain for
> such code to have to "know" whether it's in a context where it should
> use GFP_KERNEL or GFP_NOWAIT.
> 
> Finally, by having the ability to silently removed __GFP_WAIT from
> allocations, we pave the way for suspend-to-RAM to use that feature
> to also remove __GFP_IO from allocations done after suspending devices
> has started. This is important because such allocations may hang if
> devices on the swap-out path have been suspended, but not-yet suspended
> drivers don't know about it, and may deadlock themselves by being hung
> into a kmalloc somewhere while holding a mutex for example.
> 
> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
> ---
>  include/linux/gfp.h |    9 +++++++++
>  init/main.c         |    1 +
>  mm/page_alloc.c     |   18 ++++++++++++++++++
>  mm/slab.c           |    9 +++++++++
>  mm/slub.c           |    3 +++
>  5 files changed, 40 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index 0bbc15f..8d2ea79 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -99,6 +99,13 @@ struct vm_area_struct;
>  /* 4GB DMA on some platforms */
>  #define GFP_DMA32	__GFP_DMA32
>  
> +extern gfp_t gfp_allowed_bits;
> +
> +static inline gfp_t gfp_sanitize(gfp_t gfp_flags)
> +{
> +	return gfp_flags & gfp_allowed_bits;
> +}
> +
>  /* Convert GFP flags to their corresponding migrate type */
>  static inline int allocflags_to_migratetype(gfp_t gfp_flags)
>  {
> @@ -245,4 +252,6 @@ void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp);
>  void drain_all_pages(void);
>  void drain_local_pages(void *dummy);
>  
> +void mm_late_init(void);
> +
>  #endif /* __LINUX_GFP_H */
> diff --git a/init/main.c b/init/main.c
> index b3e8f14..34c6e7e 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -640,6 +640,7 @@ asmlinkage void __init start_kernel(void)
>  				 "enabled early\n");
>  	early_boot_irqs_on();
>  	local_irq_enable();
> +	mm_late_init();
>  
>  	/*
>  	 * HACK ALERT! This is early. We're enabling the console before
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 7760ef9..a42e4c7 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -77,6 +77,13 @@ int percpu_pagelist_fraction;
>  int pageblock_order __read_mostly;
>  #endif
>  
> +/*
> + * We set up the page allocator and the slab allocator early on with interrupts
> + * disabled. Therefore, make sure that we sanitize GFP flags accordingly before
> + * everything is up and running.
> + */
> +gfp_t gfp_allowed_bits = ~(__GFP_WAIT|__GFP_FS | __GFP_IO);
> +
>  static void __free_pages_ok(struct page *page, unsigned int order);
>  
>  /*
> @@ -1473,6 +1480,9 @@ __alloc_pages_internal(gfp_t gfp_mask, unsigned int order,
>  	unsigned long did_some_progress;
>  	unsigned long pages_reclaimed = 0;
>  
> +	/* Sanitize flags so we don't enable irqs too early during boot */
> +	gfp_mask = gfp_sanitize(gfp_mask);
> +
>  	lockdep_trace_alloc(gfp_mask);
>  
>  	might_sleep_if(wait);
> @@ -4728,3 +4738,11 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
>  	spin_unlock_irqrestore(&zone->lock, flags);
>  }
>  #endif
> +
> +void mm_late_init(void)
> +{
> +	/*
> +	 * Interrupts are enabled now so all GFP allocations are safe.
> +	 */
> +	gfp_allowed_bits = __GFP_BITS_MASK;
> +}
> diff --git a/mm/slab.c b/mm/slab.c
> index f46b65d..87b166e 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -2791,6 +2791,9 @@ static int cache_grow(struct kmem_cache *cachep,
>  	gfp_t local_flags;
>  	struct kmem_list3 *l3;
>  
> +	/* Sanitize flags so we don't enable irqs too early during boot */
> +	gfp_mask = gfp_sanitize(gfp_mask);
> +
>  	/*
>  	 * Be lazy and only check for valid flags here,  keeping it out of the
>  	 * critical path in kmem_cache_alloc().
> @@ -3212,6 +3215,9 @@ static void *fallback_alloc(struct kmem_cache *cache, gfp_t flags)
>  	void *obj = NULL;
>  	int nid;
>  
> +	/* Sanitize flags so we don't enable irqs too early during boot */
> +	gfp_mask = gfp_sanitize(gfp_mask);
> +
>  	if (flags & __GFP_THISNODE)
>  		return NULL;
>  
> @@ -3434,6 +3440,9 @@ __cache_alloc(struct kmem_cache *cachep, gfp_t flags, void *caller)
>  	unsigned long save_flags;
>  	void *objp;
>  
> +	/* Sanitize flags so we don't enable irqs too early during boot */
> +	gfp_mask = gfp_sanitize(gfp_mask);
> +
>  	lockdep_trace_alloc(flags);
>  
>  	if (slab_should_failslab(cachep, flags))
> diff --git a/mm/slub.c b/mm/slub.c
> index 3964d3c..5c646f7 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1512,6 +1512,9 @@ static void *__slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
>  	/* We handle __GFP_ZERO in the caller */
>  	gfpflags &= ~__GFP_ZERO;
>  
> +	/* Sanitize flags so we don't enable irqs too early during boot */
> +	gfpflags = gfp_sanitize(gfpflags);
> +
>  	if (!c->page)
>  		goto new_slab;
>  

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2009-06-12  9:59 UTC|newest]

Thread overview: 88+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-06-12  8:13 [PATCH 2/2] slab,slub: ignore __GFP_WAIT if we're booting or suspending Pekka J Enberg
2009-06-12  8:13 ` Pekka J Enberg
2009-06-12  9:03 ` [PATCH v2] " Pekka J Enberg
2009-06-12  9:03   ` Pekka J Enberg
2009-06-12  9:10   ` Ingo Molnar
2009-06-12  9:10     ` Ingo Molnar
2009-06-12  9:21     ` Benjamin Herrenschmidt
2009-06-12  9:21       ` Benjamin Herrenschmidt
2009-06-12  9:24       ` Pekka Enberg
2009-06-12  9:24         ` Pekka Enberg
2009-06-12  9:36         ` Benjamin Herrenschmidt
2009-06-12  9:36           ` Benjamin Herrenschmidt
2009-06-12  9:45           ` Pekka J Enberg
2009-06-12  9:45             ` Pekka J Enberg
2009-06-12  9:58             ` Benjamin Herrenschmidt [this message]
2009-06-12  9:58               ` Benjamin Herrenschmidt
2009-06-12 10:00               ` Pekka Enberg
2009-06-12 10:00                 ` Pekka Enberg
2009-06-12 15:22             ` Andrew Morton
2009-06-12 15:22               ` Andrew Morton
2009-06-12  9:49     ` Pekka Enberg
2009-06-12  9:49       ` Pekka Enberg
2009-06-12  9:52       ` Nick Piggin
2009-06-12  9:52         ` Nick Piggin
2009-06-12  9:54         ` Pekka Enberg
2009-06-12  9:54           ` Pekka Enberg
2009-06-12  9:59         ` Benjamin Herrenschmidt
2009-06-12  9:59           ` Benjamin Herrenschmidt
2009-06-25  4:38           ` Nick Piggin
2009-06-25  4:38             ` Nick Piggin
2009-06-12 10:07       ` Ingo Molnar
2009-06-12 10:07         ` Ingo Molnar
2009-06-12 10:11         ` Pekka Enberg
2009-06-12 10:11           ` Pekka Enberg
2009-06-12 10:15           ` Nick Piggin
2009-06-12 10:15             ` Nick Piggin
2009-06-12 10:30             ` Pekka J Enberg
2009-06-12 10:30               ` Pekka J Enberg
2009-06-12 10:32               ` Pekka Enberg
2009-06-12 10:32                 ` Pekka Enberg
2009-06-12 15:16               ` Linus Torvalds
2009-06-12 15:16                 ` Linus Torvalds
2009-06-12 15:16                 ` Pekka Enberg
2009-06-12 15:16                   ` Pekka Enberg
2009-06-12 11:13             ` Benjamin Herrenschmidt
2009-06-12 11:13               ` Benjamin Herrenschmidt
2009-06-12 11:24               ` Benjamin Herrenschmidt
2009-06-12 11:24                 ` Benjamin Herrenschmidt
2009-06-12 11:11           ` Benjamin Herrenschmidt
2009-06-12 11:11             ` Benjamin Herrenschmidt
2009-06-12 11:34             ` Pekka Enberg
2009-06-12 11:34               ` Pekka Enberg
2009-06-12 11:41               ` Benjamin Herrenschmidt
2009-06-12 11:41                 ` Benjamin Herrenschmidt
2009-06-12 11:43                 ` Pekka Enberg
2009-06-12 11:43                   ` Pekka Enberg
2009-06-12 15:30               ` Andrew Morton
2009-06-12 15:30                 ` Andrew Morton
2009-06-12 21:42                 ` Benjamin Herrenschmidt
2009-06-12 21:42                   ` Benjamin Herrenschmidt
2009-06-25  4:41                 ` Nick Piggin
2009-06-25  4:41                   ` Nick Piggin
2009-06-12 11:09         ` Benjamin Herrenschmidt
2009-06-12 11:09           ` Benjamin Herrenschmidt
2009-06-12 15:04   ` Linus Torvalds
2009-06-12 15:04     ` Linus Torvalds
2009-06-12 15:05     ` Pekka Enberg
2009-06-12 15:05       ` Pekka Enberg
2009-06-19 14:59   ` Pavel Machek
2009-06-19 14:59     ` Pavel Machek
2009-06-19 22:27     ` Benjamin Herrenschmidt
2009-06-19 22:27       ` Benjamin Herrenschmidt
2009-06-19 23:23       ` Pavel Machek
2009-06-19 23:23         ` Pavel Machek
2009-06-19 23:50         ` Benjamin Herrenschmidt
2009-06-19 23:50           ` Benjamin Herrenschmidt
2009-06-20  0:28           ` Pavel Machek
2009-06-20  0:28             ` Pavel Machek
2009-06-20  2:10             ` Benjamin Herrenschmidt
2009-06-20  2:10               ` Benjamin Herrenschmidt
2009-06-21  6:18               ` Pavel Machek
2009-06-21  6:18                 ` Pavel Machek
2009-06-21  9:31                 ` Benjamin Herrenschmidt
2009-06-21  9:31                   ` Benjamin Herrenschmidt
2009-06-25  4:34                   ` Nick Piggin
2009-06-25  4:34                     ` Nick Piggin
2009-06-25  9:56                     ` Benjamin Herrenschmidt
2009-06-25  9:56                       ` Benjamin Herrenschmidt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1244800695.7172.115.camel@pasglop \
    --to=benh@kernel.crashing.org \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@elte.hu \
    --cc=npiggin@suse.de \
    --cc=penberg@cs.helsinki.fi \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.