Propagating GFP_NOFS inside __vmalloc()

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: "Ricardo M. Correia" <ricardo.correia@oracle.com>
To: linux-mm@kvack.org
Cc: Brian Behlendorf <behlendorf1@llnl.gov>,
	Andreas Dilger <andreas.dilger@oracle.com>
Subject: Propagating GFP_NOFS inside __vmalloc()
Date: Wed, 10 Nov 2010 21:42:39 +0100	[thread overview]
Message-ID: <1289421759.11149.59.camel@oralap> (raw)

Hi,

As part of Lustre filesystem development, we are running into a
situation where we (sporadically) need to call into __vmalloc() from a
thread that processes I/Os to disk (it's a long story).

In general, this would be fine as long as we pass GFP_NOFS to
__vmalloc(), but the problem is that even if we pass this flag, vmalloc
itself sometimes allocates memory with GFP_KERNEL.

This is not OK for us because the GFP_KERNEL allocations may go into the
synchronous reclaim path and try to write out data to disk (in order to
free memory for the allocation), which leads to a deadlock because those
reclaims may themselves depend on the thread that is doing the
allocation to make forward progress (which it can't, because it's
blocked trying to allocate the memory).

Andreas suggested that this may be a bug in __vmalloc(), in the sense
that it's not propagating the gfp_mask that the caller requested to all
allocations that happen inside it.

On the latest torvalds git tree, for x86-64, the path for these
GFP_KERNEL allocations go something like this:

__vmalloc()
  __vmalloc_node()
    __vmalloc_area_node()
      map_vm_area()
        vmap_page_range()
          vmap_pud_range()
            vmap_pmd_range()
              pmd_alloc()
                __pmd_alloc()
                  pmd_alloc_one()
                    get_zeroed_page() <-- GFP_KERNEL
              vmap_pte_range()
                pte_alloc_kernel()
                  __pte_alloc_kernel()
                    pte_alloc_one_kernel()
                      get_free_page() <-- GFP_KERNEL

We've actually observed these deadlocks during testing (although in an
older kernel).

Andreas suggested that we should fix __vmalloc() to propagate the
caller-passed gfp_mask all the way to those allocating functions. This
may require fixing these interfaces for all architectures.

I also suggested that it would be nice to have a per-task
gfp_allowed_mask, similar to the existing gfp_allowed_mask /
set_gfp_allowed_mask() interface that exists in the kernel, but instead
of being global to the entire system, it would be stored in the thread's
task_struct and only apply in the context of the current thread.

This would allow us to call a function when our I/O threads are created,
say set_thread_gfp_allowed_mask(~__GFP_IO), to make sure that any kernel
allocations that happen in the context of those threads would have
__GFP_IO masked out.

I am willing to code and send out any of those 2 patches (the vmalloc
fix and/or the per-thread gfp mask), and I was wondering if this is
something you'd be willing to accept into the upstream kernel, or if you
have any other ideas as to how to prevent all __GFP_IO allocations from
the kernel itself in the context of threads that perform I/O.

(Please reply-to-all as we are not subscribed to linux-mm).

Thanks,
Ricardo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next             reply	other threads:[~2010-11-10 20:45 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-10 20:42 Ricardo M. Correia [this message]
2010-11-10 21:35 ` Propagating GFP_NOFS inside __vmalloc() Ricardo M. Correia
2010-11-10 22:10   ` Dave Chinner
2010-11-11 20:06 ` Andrew Morton
2010-11-11 22:02   ` Ricardo M. Correia
2010-11-11 22:25     ` Andrew Morton
2010-11-11 22:45       ` Ricardo M. Correia
2010-11-11 23:19         ` Ricardo M. Correia
2010-11-11 23:27           ` Andrew Morton
2010-11-11 23:29             ` Ricardo M. Correia
2010-11-15 17:01       ` Ricardo M. Correia
2010-11-15 21:28         ` David Rientjes
2010-11-15 22:19           ` Ricardo M. Correia
2010-11-15 22:50             ` David Rientjes
2010-11-15 23:30               ` Ricardo M. Correia
2010-11-15 23:55                 ` David Rientjes
2010-11-16 22:11           ` Andrew Morton
2010-11-17  7:18             ` Andreas Dilger
2010-11-17  7:24               ` Andrew Morton
2010-11-17  7:37               ` David Rientjes
2010-11-17  9:04                 ` Christoph Hellwig
2010-11-17 21:24                   ` David Rientjes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1289421759.11149.59.camel@oralap \
    --to=ricardo.correia@oracle.com \
    --cc=andreas.dilger@oracle.com \
    --cc=behlendorf1@llnl.gov \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).