From: Michal Hocko <mhocko@kernel.org>
To: Harsh Shandilya <harsh@prjkt.io>
Cc: stable@vger.kernel.org,
Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
Mel Gorman <mgorman@suse.de>, Dave Chinner <david@fromorbit.com>,
Mark Fasheh <mfasheh@suse.com>,
Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH 3.18.y 3/5] mm: allow GFP_{FS,IO} for page_cache_read page cache allocation
Date: Sun, 22 Apr 2018 16:46:36 -0600 [thread overview]
Message-ID: <20180422224636.GL17484@dhcp22.suse.cz> (raw)
In-Reply-To: <20180422200746.29118-4-harsh@prjkt.io>
I am not reallu sure it is a good idea to blindly apply this patch to
the older stable tree. Have you checked all filemap_fault handlers?
This might have been quite different in 3.18 than for the kernel I was
developing this against.
If the sole purpose of this backport is to make other
patch (abc1be13fd11 ("mm/filemap.c: fix NULL pointer in
page_cache_tree_insert()")) apply easier then I've already suggested how
to handle those rejects.
On Mon 23-04-18 01:37:44, Harsh Shandilya wrote:
> From: Michal Hocko <mhocko@suse.com>
>
> Commit c20cd45eb01748f0fba77a504f956b000df4ea73 upstream.
>
> page_cache_read has been historically using page_cache_alloc_cold to
> allocate a new page. This means that mapping_gfp_mask is used as the
> base for the gfp_mask. Many filesystems are setting this mask to
> GFP_NOFS to prevent from fs recursion issues. page_cache_read is called
> from the vm_operations_struct::fault() context during the page fault.
> This context doesn't need the reclaim protection normally.
>
> ceph and ocfs2 which call filemap_fault from their fault handlers seem
> to be OK because they are not taking any fs lock before invoking generic
> implementation. xfs which takes XFS_MMAPLOCK_SHARED is safe from the
> reclaim recursion POV because this lock serializes truncate and punch
> hole with the page faults and it doesn't get involved in the reclaim.
>
> There is simply no reason to deliberately use a weaker allocation
> context when a __GFP_FS | __GFP_IO can be used. The GFP_NOFS protection
> might be even harmful. There is a push to fail GFP_NOFS allocations
> rather than loop within allocator indefinitely with a very limited
> reclaim ability. Once we start failing those requests the OOM killer
> might be triggered prematurely because the page cache allocation failure
> is propagated up the page fault path and end up in
> pagefault_out_of_memory.
>
> We cannot play with mapping_gfp_mask directly because that would be racy
> wrt. parallel page faults and it might interfere with other users who
> really rely on NOFS semantic from the stored gfp_mask. The mask is also
> inode proper so it would even be a layering violation. What we can do
> instead is to push the gfp_mask into struct vm_fault and allow fs layer
> to overwrite it should the callback need to be called with a different
> allocation context.
>
> Initialize the default to (mapping_gfp_mask | __GFP_FS | __GFP_IO)
> because this should be safe from the page fault path normally. Why do
> we care about mapping_gfp_mask at all then? Because this doesn't hold
> only reclaim protection flags but it also might contain zone and
> movability restrictions (GFP_DMA32, __GFP_MOVABLE and others) so we have
> to respect those.
>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Acked-by: Jan Kara <jack@suse.com>
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Dave Chinner <david@fromorbit.com>
> Cc: Mark Fasheh <mfasheh@suse.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> Signed-off-by: Harsh Shandilya <harsh@prjkt.io>
> ---
> include/linux/mm.h | 4 ++++
> mm/filemap.c | 8 ++++----
> mm/memory.c | 17 +++++++++++++++++
> 3 files changed, 25 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 5adffb0a468f..9ac4697979e8 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -203,9 +203,13 @@ extern pgprot_t protection_map[16];
> *
> * pgoff should be used in favour of virtual_address, if possible. If pgoff
> * is used, one may implement ->remap_pages to get nonlinear mapping support.
> + *
> + * MM layer fills up gfp_mask for page allocations but fault handler might
> + * alter it if its implementation requires a different allocation context.
> */
> struct vm_fault {
> unsigned int flags; /* FAULT_FLAG_xxx flags */
> + gfp_t gfp_mask; /* gfp mask to be used for allocations */
> pgoff_t pgoff; /* Logical page offset based on vma */
> void __user *virtual_address; /* Faulting virtual address */
>
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 7e6ab98d4d3c..aafeeefcb00d 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -1746,18 +1746,18 @@ EXPORT_SYMBOL(generic_file_read_iter);
> * This adds the requested page to the page cache if it isn't already there,
> * and schedules an I/O to read in its contents from disk.
> */
> -static int page_cache_read(struct file *file, pgoff_t offset)
> +static int page_cache_read(struct file *file, pgoff_t offset, gfp_t gfp_mask)
> {
> struct address_space *mapping = file->f_mapping;
> struct page *page;
> int ret;
>
> do {
> - page = page_cache_alloc_cold(mapping);
> + page = __page_cache_alloc(gfp_mask|__GFP_COLD);
> if (!page)
> return -ENOMEM;
>
> - ret = add_to_page_cache_lru(page, mapping, offset, GFP_KERNEL);
> + ret = add_to_page_cache_lru(page, mapping, offset, gfp_mask & GFP_KERNEL);
> if (ret == 0)
> ret = mapping->a_ops->readpage(file, page);
> else if (ret == -EEXIST)
> @@ -1940,7 +1940,7 @@ no_cached_page:
> * We're only likely to ever get here if MADV_RANDOM is in
> * effect.
> */
> - error = page_cache_read(file, offset);
> + error = page_cache_read(file, offset, vmf->gfp_mask);
>
> /*
> * The page we want has now been added to the page cache.
> diff --git a/mm/memory.c b/mm/memory.c
> index 0c4f5e36b155..5a62c6a42143 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1973,6 +1973,20 @@ static inline void cow_user_page(struct page *dst, struct page *src, unsigned lo
> copy_user_highpage(dst, src, va, vma);
> }
>
> +static gfp_t __get_fault_gfp_mask(struct vm_area_struct *vma)
> +{
> + struct file *vm_file = vma->vm_file;
> +
> + if (vm_file)
> + return mapping_gfp_mask(vm_file->f_mapping) | __GFP_FS | __GFP_IO;
> +
> + /*
> + * Special mappings (e.g. VDSO) do not have any file so fake
> + * a default GFP_KERNEL for them.
> + */
> + return GFP_KERNEL;
> +}
> +
> /*
> * Notify the address space that the page is about to become writable so that
> * it can prohibit this or wait for the page to get into an appropriate state.
> @@ -1988,6 +2002,7 @@ static int do_page_mkwrite(struct vm_area_struct *vma, struct page *page,
> vmf.virtual_address = (void __user *)(address & PAGE_MASK);
> vmf.pgoff = page->index;
> vmf.flags = FAULT_FLAG_WRITE|FAULT_FLAG_MKWRITE;
> + vmf.gfp_mask = __get_fault_gfp_mask(vma);
> vmf.page = page;
>
> ret = vma->vm_ops->page_mkwrite(vma, &vmf);
> @@ -2670,6 +2685,7 @@ static int __do_fault(struct vm_area_struct *vma, unsigned long address,
> vmf.pgoff = pgoff;
> vmf.flags = flags;
> vmf.page = NULL;
> + vmf.gfp_mask = __get_fault_gfp_mask(vma);
>
> ret = vma->vm_ops->fault(vma, &vmf);
> if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY)))
> @@ -2834,6 +2850,7 @@ static void do_fault_around(struct vm_area_struct *vma, unsigned long address,
> vmf.pgoff = pgoff;
> vmf.max_pgoff = max_pgoff;
> vmf.flags = flags;
> + vmf.gfp_mask = __get_fault_gfp_mask(vma);
> vma->vm_ops->map_pages(vma, &vmf);
> }
>
> --
> 2.15.0.2308.g658a28aa74af
>
--
Michal Hocko
SUSE Labs
next prev parent reply other threads:[~2018-04-22 22:46 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-04-22 20:07 [PATCH 3.18.y 0/5] Backports for 3.18.y Harsh Shandilya
2018-04-22 20:07 ` [PATCH 3.18.y 1/5] ext4: fix deadlock between inline_data and ext4_expand_extra_isize_ea() Harsh Shandilya
2018-04-22 20:07 ` [PATCH 3.18.y 2/5] ext4: bugfix for mmaped pages in mpage_release_unused_pages() Harsh Shandilya
2018-04-22 20:07 ` [PATCH 3.18.y 3/5] mm: allow GFP_{FS,IO} for page_cache_read page cache allocation Harsh Shandilya
2018-04-22 22:46 ` Michal Hocko [this message]
2018-04-23 3:16 ` Harsh Shandilya
2018-04-22 20:07 ` [PATCH 3.18.y 4/5] mm/filemap.c: fix NULL pointer in page_cache_tree_insert() Harsh Shandilya
2018-04-22 20:07 ` [PATCH 3.18.y 5/5] ext4: don't update checksum of new initialized bitmaps Harsh Shandilya
2018-04-24 12:30 ` [PATCH 3.18.y 0/5] Backports for 3.18.y Greg KH
2018-04-24 12:59 ` Harsh Shandilya
2018-04-24 13:11 ` Harsh Shandilya
2018-04-24 14:03 ` Greg KH
2018-04-24 14:11 ` Harsh Shandilya
2018-04-24 14:24 ` Greg KH
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180422224636.GL17484@dhcp22.suse.cz \
--to=mhocko@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=david@fromorbit.com \
--cc=harsh@prjkt.io \
--cc=mfasheh@suse.com \
--cc=mgorman@suse.de \
--cc=penguin-kernel@I-love.SAKURA.ne.jp \
--cc=stable@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).