linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: "Ricardo M. Correia" <ricardo.correia@oracle.com>
Cc: linux-mm@kvack.org, Brian Behlendorf <behlendorf1@llnl.gov>,
	Andreas Dilger <andreas.dilger@oracle.com>
Subject: Re: Propagating GFP_NOFS inside __vmalloc()
Date: Thu, 11 Nov 2010 12:06:43 -0800	[thread overview]
Message-ID: <20101111120643.22dcda5b.akpm@linux-foundation.org> (raw)
In-Reply-To: <1289421759.11149.59.camel@oralap>

On Wed, 10 Nov 2010 21:42:39 +0100
"Ricardo M. Correia" <ricardo.correia@oracle.com> wrote:

> Hi,
> 
> As part of Lustre filesystem development, we are running into a
> situation where we (sporadically) need to call into __vmalloc() from a
> thread that processes I/Os to disk (it's a long story).
> 
> In general, this would be fine as long as we pass GFP_NOFS to
> __vmalloc(), but the problem is that even if we pass this flag, vmalloc
> itself sometimes allocates memory with GFP_KERNEL.
> 
> This is not OK for us because the GFP_KERNEL allocations may go into the
> synchronous reclaim path and try to write out data to disk (in order to
> free memory for the allocation), which leads to a deadlock because those
> reclaims may themselves depend on the thread that is doing the
> allocation to make forward progress (which it can't, because it's
> blocked trying to allocate the memory).
> 
> Andreas suggested that this may be a bug in __vmalloc(), in the sense
> that it's not propagating the gfp_mask that the caller requested to all
> allocations that happen inside it.
> 
> On the latest torvalds git tree, for x86-64, the path for these
> GFP_KERNEL allocations go something like this:
> 
> __vmalloc()
>   __vmalloc_node()
>     __vmalloc_area_node()
>       map_vm_area()
>         vmap_page_range()
>           vmap_pud_range()
>             vmap_pmd_range()
>               pmd_alloc()
>                 __pmd_alloc()
>                   pmd_alloc_one()
>                     get_zeroed_page() <-- GFP_KERNEL
>               vmap_pte_range()
>                 pte_alloc_kernel()
>                   __pte_alloc_kernel()
>                     pte_alloc_one_kernel()
>                       get_free_page() <-- GFP_KERNEL
> 
> We've actually observed these deadlocks during testing (although in an
> older kernel).

Bug.

> Andreas suggested that we should fix __vmalloc() to propagate the
> caller-passed gfp_mask all the way to those allocating functions. This
> may require fixing these interfaces for all architectures.
> 
> I also suggested that it would be nice to have a per-task
> gfp_allowed_mask, similar to the existing gfp_allowed_mask /
> set_gfp_allowed_mask() interface that exists in the kernel, but instead
> of being global to the entire system, it would be stored in the thread's
> task_struct and only apply in the context of the current thread.

Possibly we should have done pass-via-task_struct for the gfp mode
everywhere.  Fifteen years ago...  Sites which modify the mask should
do a save/restore on the stack, so there would be no stack savings, but
I suspect there would be some nice text size savings from all that
pass-it-on-to-the-next-guy stuff we do.  Note that this approach could
perhaps be used to move PF_MEMALLOC, PF_KSWAPD and maybe a few other
things into task_struct.gfp_flags.

But that's history.  Before embarking on that path (and introducing a
mixture of both forms of argument-passing) we should take a look at how
big and ugly it is to fix this bug via the normal passing convention,
so we can make a better-informed decision.  Is that something which
you've looked into in any detail?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2010-11-11 20:07 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-10 20:42 Propagating GFP_NOFS inside __vmalloc() Ricardo M. Correia
2010-11-10 21:35 ` Ricardo M. Correia
2010-11-10 22:10   ` Dave Chinner
2010-11-11 20:06 ` Andrew Morton [this message]
2010-11-11 22:02   ` Ricardo M. Correia
2010-11-11 22:25     ` Andrew Morton
2010-11-11 22:45       ` Ricardo M. Correia
2010-11-11 23:19         ` Ricardo M. Correia
2010-11-11 23:27           ` Andrew Morton
2010-11-11 23:29             ` Ricardo M. Correia
2010-11-15 17:01       ` Ricardo M. Correia
2010-11-15 21:28         ` David Rientjes
2010-11-15 22:19           ` Ricardo M. Correia
2010-11-15 22:50             ` David Rientjes
2010-11-15 23:30               ` Ricardo M. Correia
2010-11-15 23:55                 ` David Rientjes
2010-11-16 22:11           ` Andrew Morton
2010-11-17  7:18             ` Andreas Dilger
2010-11-17  7:24               ` Andrew Morton
2010-11-17  7:37               ` David Rientjes
2010-11-17  9:04                 ` Christoph Hellwig
2010-11-17 21:24                   ` David Rientjes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101111120643.22dcda5b.akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=andreas.dilger@oracle.com \
    --cc=behlendorf1@llnl.gov \
    --cc=linux-mm@kvack.org \
    --cc=ricardo.correia@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).