From: Andrew Morton <akpm@linux-foundation.org>
To: Nick Piggin <npiggin@suse.de>
Cc: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, stable@kernel.org
Subject: Re: [patch] mm: pagecache gfp flags fix
Date: Mon, 15 Dec 2008 15:21:44 -0800 [thread overview]
Message-ID: <20081215152144.00a84c4f.akpm@linux-foundation.org> (raw)
In-Reply-To: <20081212044120.GD15804@wotan.suse.de>
On Fri, 12 Dec 2008 05:41:20 +0100
Nick Piggin <npiggin@suse.de> wrote:
> This patch doesn't actually fix a regression, but a longer standing bug.
> --
>
> Frustratingly, gfp_t is really divided into two classes of flags. One are the
> context dependent ones (can we sleep? can we enter filesystem? block subsystem?
> should we use some extra reserves, etc.). The other ones are the type of memory
> required and depend on how the algorithm is implemented rather than the point
> at which the memory is allocated (highmem? dma memory? etc).
>
> Some of functions which allocate a page and add it to page cache take a gfp_t,
> but sometimes those functions or their callers aren't really doing the right
> thing: when allocating pagecache page, the memory type should be
> mapping_gfp_mask(mapping). When allocating radix tree nodes, the memory type
> should be kernel mapped (not highmem) memory. The gfp_t argument should only
> really be needed for context dependent options.
>
> This patch doesn't really solve that tangle in a nice way, but it does attempt
> to fix a couple of bugs.
>
> - find_or_create_page changes its radix-tree allocation to only include the
> main context dependent flags in order so the pagecache page may be allocated
> from arbitrary types of memory without affecting the radix-tree. In practice,
> slab allocations don't come from highmem anyway, and radix-tree only uses
> slab allocations. So there isn't a practical change (unless some fs uses
> GFP_DMA for pages).
>
> - grab_cache_page_nowait() is changed to allocate radix-tree nodes with
> GFP_NOFS, because it is not supposed to reenter the filesystem. This bug
> could cause lock recursion if a filesystem is not expecting the function
> to reenter the fs (as-per documentation).
>
> Filesystems should be careful about exactly what semantics they want and what
> they get when fiddling with gfp_t masks to allocate pagecache. One should be
> as liberal as possible with the type of memory that can be used, and same
> for the the context specific flags.
ug. So at present page_symlink() can call write_begin() which will do
a GFP_KERNEL/GFP_USER allocation even though we hold fs locks?
In which calling context does this happen?
This is a pretty big ugly patch. I'm thinking that we merge into
2.6.29 and backport into 2.6.28.x.
WARNING: multiple messages have this Message-ID (diff)
From: Andrew Morton <akpm@linux-foundation.org>
To: Nick Piggin <npiggin@suse.de>
Cc: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, stable@kernel.org
Subject: Re: [patch] mm: pagecache gfp flags fix
Date: Mon, 15 Dec 2008 15:21:44 -0800 [thread overview]
Message-ID: <20081215152144.00a84c4f.akpm@linux-foundation.org> (raw)
In-Reply-To: <20081212044120.GD15804@wotan.suse.de>
On Fri, 12 Dec 2008 05:41:20 +0100
Nick Piggin <npiggin@suse.de> wrote:
> This patch doesn't actually fix a regression, but a longer standing bug.
> --
>
> Frustratingly, gfp_t is really divided into two classes of flags. One are the
> context dependent ones (can we sleep? can we enter filesystem? block subsystem?
> should we use some extra reserves, etc.). The other ones are the type of memory
> required and depend on how the algorithm is implemented rather than the point
> at which the memory is allocated (highmem? dma memory? etc).
>
> Some of functions which allocate a page and add it to page cache take a gfp_t,
> but sometimes those functions or their callers aren't really doing the right
> thing: when allocating pagecache page, the memory type should be
> mapping_gfp_mask(mapping). When allocating radix tree nodes, the memory type
> should be kernel mapped (not highmem) memory. The gfp_t argument should only
> really be needed for context dependent options.
>
> This patch doesn't really solve that tangle in a nice way, but it does attempt
> to fix a couple of bugs.
>
> - find_or_create_page changes its radix-tree allocation to only include the
> main context dependent flags in order so the pagecache page may be allocated
> from arbitrary types of memory without affecting the radix-tree. In practice,
> slab allocations don't come from highmem anyway, and radix-tree only uses
> slab allocations. So there isn't a practical change (unless some fs uses
> GFP_DMA for pages).
>
> - grab_cache_page_nowait() is changed to allocate radix-tree nodes with
> GFP_NOFS, because it is not supposed to reenter the filesystem. This bug
> could cause lock recursion if a filesystem is not expecting the function
> to reenter the fs (as-per documentation).
>
> Filesystems should be careful about exactly what semantics they want and what
> they get when fiddling with gfp_t masks to allocate pagecache. One should be
> as liberal as possible with the type of memory that can be used, and same
> for the the context specific flags.
ug. So at present page_symlink() can call write_begin() which will do
a GFP_KERNEL/GFP_USER allocation even though we hold fs locks?
In which calling context does this happen?
This is a pretty big ugly patch. I'm thinking that we merge into
2.6.29 and backport into 2.6.28.x.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2008-12-15 23:21 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-12-12 4:41 [patch] mm: pagecache gfp flags fix Nick Piggin
2008-12-15 23:21 ` Andrew Morton [this message]
2008-12-15 23:21 ` Andrew Morton
2008-12-15 23:30 ` Nick Piggin
2008-12-15 23:30 ` Nick Piggin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20081215152144.00a84c4f.akpm@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=npiggin@suse.de \
--cc=stable@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.