From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Michal Hocko <mhocko@suse.com>,
Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
Jan Kara <jack@suse.com>, Vlastimil Babka <vbabka@suse.cz>,
Mel Gorman <mgorman@suse.de>, Dave Chinner <david@fromorbit.com>,
Mark Fasheh <mfasheh@suse.com>,
Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH 4.4 93/97] mm: allow GFP_{FS,IO} for page_cache_read page cache allocation
Date: Sun, 22 Apr 2018 15:54:11 +0200 [thread overview]
Message-ID: <20180422135310.256979609@linuxfoundation.org> (raw)
In-Reply-To: <20180422135304.577223025@linuxfoundation.org>
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michal Hocko <mhocko@suse.com>
commit c20cd45eb01748f0fba77a504f956b000df4ea73 upstream.
page_cache_read has been historically using page_cache_alloc_cold to
allocate a new page. This means that mapping_gfp_mask is used as the
base for the gfp_mask. Many filesystems are setting this mask to
GFP_NOFS to prevent from fs recursion issues. page_cache_read is called
from the vm_operations_struct::fault() context during the page fault.
This context doesn't need the reclaim protection normally.
ceph and ocfs2 which call filemap_fault from their fault handlers seem
to be OK because they are not taking any fs lock before invoking generic
implementation. xfs which takes XFS_MMAPLOCK_SHARED is safe from the
reclaim recursion POV because this lock serializes truncate and punch
hole with the page faults and it doesn't get involved in the reclaim.
There is simply no reason to deliberately use a weaker allocation
context when a __GFP_FS | __GFP_IO can be used. The GFP_NOFS protection
might be even harmful. There is a push to fail GFP_NOFS allocations
rather than loop within allocator indefinitely with a very limited
reclaim ability. Once we start failing those requests the OOM killer
might be triggered prematurely because the page cache allocation failure
is propagated up the page fault path and end up in
pagefault_out_of_memory.
We cannot play with mapping_gfp_mask directly because that would be racy
wrt. parallel page faults and it might interfere with other users who
really rely on NOFS semantic from the stored gfp_mask. The mask is also
inode proper so it would even be a layering violation. What we can do
instead is to push the gfp_mask into struct vm_fault and allow fs layer
to overwrite it should the callback need to be called with a different
allocation context.
Initialize the default to (mapping_gfp_mask | __GFP_FS | __GFP_IO)
because this should be safe from the page fault path normally. Why do
we care about mapping_gfp_mask at all then? Because this doesn't hold
only reclaim protection flags but it also might contain zone and
movability restrictions (GFP_DMA32, __GFP_MOVABLE and others) so we have
to respect those.
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Jan Kara <jack@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
include/linux/mm.h | 4 ++++
mm/filemap.c | 9 ++++-----
mm/memory.c | 17 +++++++++++++++++
3 files changed, 25 insertions(+), 5 deletions(-)
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -225,10 +225,14 @@ extern pgprot_t protection_map[16];
* ->fault function. The vma's ->fault is responsible for returning a bitmask
* of VM_FAULT_xxx flags that give details about how the fault was handled.
*
+ * MM layer fills up gfp_mask for page allocations but fault handler might
+ * alter it if its implementation requires a different allocation context.
+ *
* pgoff should be used in favour of virtual_address, if possible.
*/
struct vm_fault {
unsigned int flags; /* FAULT_FLAG_xxx flags */
+ gfp_t gfp_mask; /* gfp mask to be used for allocations */
pgoff_t pgoff; /* Logical page offset based on vma */
void __user *virtual_address; /* Faulting virtual address */
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1827,19 +1827,18 @@ EXPORT_SYMBOL(generic_file_read_iter);
* This adds the requested page to the page cache if it isn't already there,
* and schedules an I/O to read in its contents from disk.
*/
-static int page_cache_read(struct file *file, pgoff_t offset)
+static int page_cache_read(struct file *file, pgoff_t offset, gfp_t gfp_mask)
{
struct address_space *mapping = file->f_mapping;
struct page *page;
int ret;
do {
- page = page_cache_alloc_cold(mapping);
+ page = __page_cache_alloc(gfp_mask|__GFP_COLD);
if (!page)
return -ENOMEM;
- ret = add_to_page_cache_lru(page, mapping, offset,
- mapping_gfp_constraint(mapping, GFP_KERNEL));
+ ret = add_to_page_cache_lru(page, mapping, offset, gfp_mask & GFP_KERNEL);
if (ret == 0)
ret = mapping->a_ops->readpage(file, page);
else if (ret == -EEXIST)
@@ -2020,7 +2019,7 @@ no_cached_page:
* We're only likely to ever get here if MADV_RANDOM is in
* effect.
*/
- error = page_cache_read(file, offset);
+ error = page_cache_read(file, offset, vmf->gfp_mask);
/*
* The page we want has now been added to the page cache.
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1990,6 +1990,20 @@ static inline void cow_user_page(struct
copy_user_highpage(dst, src, va, vma);
}
+static gfp_t __get_fault_gfp_mask(struct vm_area_struct *vma)
+{
+ struct file *vm_file = vma->vm_file;
+
+ if (vm_file)
+ return mapping_gfp_mask(vm_file->f_mapping) | __GFP_FS | __GFP_IO;
+
+ /*
+ * Special mappings (e.g. VDSO) do not have any file so fake
+ * a default GFP_KERNEL for them.
+ */
+ return GFP_KERNEL;
+}
+
/*
* Notify the address space that the page is about to become writable so that
* it can prohibit this or wait for the page to get into an appropriate state.
@@ -2005,6 +2019,7 @@ static int do_page_mkwrite(struct vm_are
vmf.virtual_address = (void __user *)(address & PAGE_MASK);
vmf.pgoff = page->index;
vmf.flags = FAULT_FLAG_WRITE|FAULT_FLAG_MKWRITE;
+ vmf.gfp_mask = __get_fault_gfp_mask(vma);
vmf.page = page;
vmf.cow_page = NULL;
@@ -2770,6 +2785,7 @@ static int __do_fault(struct vm_area_str
vmf.pgoff = pgoff;
vmf.flags = flags;
vmf.page = NULL;
+ vmf.gfp_mask = __get_fault_gfp_mask(vma);
vmf.cow_page = cow_page;
ret = vma->vm_ops->fault(vma, &vmf);
@@ -2936,6 +2952,7 @@ static void do_fault_around(struct vm_ar
vmf.pgoff = pgoff;
vmf.max_pgoff = max_pgoff;
vmf.flags = flags;
+ vmf.gfp_mask = __get_fault_gfp_mask(vma);
vma->vm_ops->map_pages(vma, &vmf);
}
next prev parent reply other threads:[~2018-04-22 14:19 UTC|newest]
Thread overview: 104+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-04-22 13:52 [PATCH 4.4 00/97] 4.4.129-stable review Greg Kroah-Hartman
2018-04-22 13:52 ` [PATCH 4.4 01/97] media: v4l2-compat-ioctl32: dont oops on overlay Greg Kroah-Hartman
2018-04-22 13:52 ` [PATCH 4.4 02/97] parisc: Fix out of array access in match_pci_device() Greg Kroah-Hartman
2018-04-22 13:52 ` [PATCH 4.4 03/97] perf intel-pt: Fix overlap detection to identify consecutive buffers correctly Greg Kroah-Hartman
2018-04-22 13:52 ` [PATCH 4.4 04/97] perf intel-pt: Fix sync_switch Greg Kroah-Hartman
2018-04-22 13:52 ` [PATCH 4.4 05/97] perf intel-pt: Fix error recovery from missing TIP packet Greg Kroah-Hartman
2018-04-22 13:52 ` [PATCH 4.4 06/97] perf intel-pt: Fix timestamp following overflow Greg Kroah-Hartman
2018-04-22 13:52 ` [PATCH 4.4 08/97] Revert "perf tests: Decompress kernel module before objdump" Greg Kroah-Hartman
2018-04-22 13:52 ` [PATCH 4.4 09/97] block/loop: fix deadlock after loop_set_status Greg Kroah-Hartman
2018-04-22 13:52 ` [PATCH 4.4 10/97] s390/qdio: dont retry EQBS after CCQ 96 Greg Kroah-Hartman
2018-04-22 13:52 ` [PATCH 4.4 11/97] s390/qdio: dont merge ERROR output buffers Greg Kroah-Hartman
2018-04-22 13:52 ` [PATCH 4.4 12/97] s390/ipl: ensure loadparm valid flag is set Greg Kroah-Hartman
2018-04-22 13:52 ` [PATCH 4.4 13/97] getname_kernel() needs to make sure that ->name != ->iname in long case Greg Kroah-Hartman
2018-04-22 13:52 ` [PATCH 4.4 14/97] rtl8187: Fix NULL pointer dereference in priv->conf_mutex Greg Kroah-Hartman
2018-04-22 13:52 ` [PATCH 4.4 15/97] hwmon: (ina2xx) Fix access to uninitialized mutex Greg Kroah-Hartman
2018-04-22 13:52 ` [PATCH 4.4 16/97] cdc_ether: flag the Cinterion AHS8 modem by gemalto as WWAN Greg Kroah-Hartman
2018-04-22 13:52 ` [PATCH 4.4 17/97] slip: Check if rstate is initialized before uncompressing Greg Kroah-Hartman
2018-04-22 13:52 ` [PATCH 4.4 18/97] lan78xx: Correctly indicate invalid OTP Greg Kroah-Hartman
2018-04-22 13:52 ` [PATCH 4.4 19/97] x86/hweight: Get rid of the special calling convention Greg Kroah-Hartman
2018-04-22 13:52 ` [PATCH 4.4 21/97] tty: make n_tty_read() always abort if hangup is in progress Greg Kroah-Hartman
2018-04-22 13:53 ` [PATCH 4.4 22/97] ubifs: Check ubifs_wbuf_sync() return code Greg Kroah-Hartman
2018-04-22 13:53 ` [PATCH 4.4 23/97] ubi: fastmap: Dont flush fastmap work on detach Greg Kroah-Hartman
2018-05-16 16:53 ` Ben Hutchings
2018-05-16 17:37 ` Richard Weinberger
2018-04-22 13:53 ` [PATCH 4.4 24/97] ubi: Fix error for write access Greg Kroah-Hartman
2018-04-22 13:53 ` [PATCH 4.4 25/97] ubi: Reject MLC NAND Greg Kroah-Hartman
2018-04-22 13:53 ` [PATCH 4.4 26/97] fs/reiserfs/journal.c: add missing resierfs_warning() arg Greg Kroah-Hartman
2018-04-22 13:53 ` [PATCH 4.4 27/97] resource: fix integer overflow at reallocation Greg Kroah-Hartman
2018-04-22 13:53 ` [PATCH 4.4 28/97] ipc/shm: fix use-after-free of shm file via remap_file_pages() Greg Kroah-Hartman
2018-04-22 13:53 ` [PATCH 4.4 29/97] mm, slab: reschedule cache_reap() on the same CPU Greg Kroah-Hartman
2018-04-22 13:53 ` [PATCH 4.4 30/97] usb: musb: gadget: misplaced out of bounds check Greg Kroah-Hartman
2018-04-22 13:53 ` [PATCH 4.4 31/97] ARM: dts: at91: at91sam9g25: fix mux-mask pinctrl property Greg Kroah-Hartman
2018-04-22 13:53 ` [PATCH 4.4 32/97] ARM: dts: at91: sama5d4: fix pinctrl compatible string Greg Kroah-Hartman
2018-04-22 13:53 ` [PATCH 4.4 33/97] xen-netfront: Fix hang on device removal Greg Kroah-Hartman
2018-04-22 13:53 ` [PATCH 4.4 34/97] regmap: Fix reversed bounds check in regmap_raw_write() Greg Kroah-Hartman
2018-04-22 13:53 ` [PATCH 4.4 35/97] ACPI / video: Add quirk to force acpi-video backlight on Samsung 670Z5E Greg Kroah-Hartman
2018-04-22 13:53 ` [PATCH 4.4 36/97] ACPI / hotplug / PCI: Check presence of slot itself in get_slot_status() Greg Kroah-Hartman
2018-04-22 13:53 ` [PATCH 4.4 38/97] usb: dwc3: pci: Properly cleanup resource Greg Kroah-Hartman
2018-04-22 13:53 ` [PATCH 4.4 39/97] HID: i2c-hid: fix size check and type usage Greg Kroah-Hartman
2018-04-22 13:53 ` [PATCH 4.4 40/97] powerpc/powernv: Handle unknown OPAL errors in opal_nvram_write() Greg Kroah-Hartman
2018-04-22 13:53 ` [PATCH 4.4 41/97] powerpc/64: Fix smp_wmb barrier definition use use lwsync consistently Greg Kroah-Hartman
2018-04-22 13:53 ` [PATCH 4.4 42/97] powerpc/powernv: Fix OPAL NVRAM driver OPAL_BUSY loops Greg Kroah-Hartman
2018-04-22 13:53 ` [PATCH 4.4 43/97] HID: Fix hid_report_len usage Greg Kroah-Hartman
2018-04-22 13:53 ` [PATCH 4.4 44/97] HID: core: Fix size as type u32 Greg Kroah-Hartman
2018-04-22 13:53 ` [PATCH 4.4 45/97] ASoC: ssm2602: Replace reg_default_raw with reg_default Greg Kroah-Hartman
2018-04-22 13:53 ` [PATCH 4.4 46/97] thunderbolt: Resume control channel after hibernation image is created Greg Kroah-Hartman
2018-04-22 13:53 ` [PATCH 4.4 47/97] random: use a tighter cap in credit_entropy_bits_safe() Greg Kroah-Hartman
2018-04-22 13:53 ` [PATCH 4.4 48/97] jbd2: if the journal is aborted then dont allow update of the log tail Greg Kroah-Hartman
2018-04-22 13:53 ` [PATCH 4.4 49/97] ext4: dont update checksum of new initialized bitmaps Greg Kroah-Hartman
2018-04-22 13:53 ` [PATCH 4.4 50/97] ext4: add validity checks for bitmap block numbers Greg Kroah-Hartman
2018-04-22 13:53 ` [PATCH 4.4 51/97] ext4: fail ext4_iget for root directory if unallocated Greg Kroah-Hartman
2018-04-22 13:53 ` [PATCH 4.4 52/97] RDMA/ucma: Dont allow setting RDMA_OPTION_IB_PATH without an RDMA device Greg Kroah-Hartman
2018-04-22 13:53 ` [PATCH 4.4 53/97] ALSA: pcm: Fix UAF at PCM release via PCM timer access Greg Kroah-Hartman
2018-05-16 20:51 ` Ben Hutchings
2018-05-16 22:09 ` Takashi Iwai
2018-05-17 8:54 ` Greg Kroah-Hartman
2018-04-22 13:53 ` [PATCH 4.4 54/97] IB/srp: Fix srp_abort() Greg Kroah-Hartman
2018-04-22 13:53 ` [PATCH 4.4 55/97] IB/srp: Fix completion vector assignment algorithm Greg Kroah-Hartman
2018-04-22 13:53 ` [PATCH 4.4 56/97] dmaengine: at_xdmac: fix rare residue corruption Greg Kroah-Hartman
2018-04-22 13:53 ` [PATCH 4.4 57/97] um: Use POSIX ucontext_t instead of struct ucontext Greg Kroah-Hartman
2018-04-22 13:53 ` [PATCH 4.4 58/97] iommu/vt-d: Fix a potential memory leak Greg Kroah-Hartman
2018-04-22 13:53 ` [PATCH 4.4 59/97] mmc: jz4740: Fix race condition in IRQ mask update Greg Kroah-Hartman
2018-04-22 13:53 ` [PATCH 4.4 60/97] clk: mvebu: armada-38x: add support for 1866MHz variants Greg Kroah-Hartman
2018-05-16 23:32 ` Ben Hutchings
2018-05-16 23:34 ` Ben Hutchings
2018-04-22 13:53 ` [PATCH 4.4 61/97] clk: mvebu: armada-38x: add support for missing clocks Greg Kroah-Hartman
2018-04-22 13:53 ` [PATCH 4.4 62/97] clk: bcm2835: De-assert/assert PLL reset signal when appropriate Greg Kroah-Hartman
2018-04-22 13:53 ` [PATCH 4.4 63/97] thermal: imx: Fix race condition in imx_thermal_probe() Greg Kroah-Hartman
2018-04-22 13:53 ` [PATCH 4.4 64/97] watchdog: f71808e_wdt: Fix WD_EN register read Greg Kroah-Hartman
2018-04-22 13:53 ` [PATCH 4.4 65/97] ALSA: oss: consolidate kmalloc/memset 0 call to kzalloc Greg Kroah-Hartman
2018-04-22 13:53 ` [PATCH 4.4 66/97] ALSA: pcm: Use ERESTARTSYS instead of EINTR in OSS emulation Greg Kroah-Hartman
2018-04-22 13:53 ` [PATCH 4.4 67/97] ALSA: pcm: Avoid potential races between OSS ioctls and read/write Greg Kroah-Hartman
2018-04-22 13:53 ` [PATCH 4.4 68/97] ALSA: pcm: Return -EBUSY for OSS ioctls changing busy streams Greg Kroah-Hartman
2018-04-22 13:53 ` [PATCH 4.4 69/97] ALSA: pcm: Fix mutex unbalance in OSS emulation ioctls Greg Kroah-Hartman
2018-04-22 13:53 ` [PATCH 4.4 70/97] ALSA: pcm: Fix endless loop for XRUN recovery in OSS emulation Greg Kroah-Hartman
2018-04-22 13:53 ` [PATCH 4.4 71/97] vfio-pci: Virtualize PCIe & AF FLR Greg Kroah-Hartman
2018-04-22 13:53 ` [PATCH 4.4 72/97] vfio/pci: Virtualize Maximum Payload Size Greg Kroah-Hartman
2018-04-22 13:53 ` [PATCH 4.4 73/97] vfio/pci: Virtualize Maximum Read Request Size Greg Kroah-Hartman
2018-04-22 13:53 ` [PATCH 4.4 74/97] ext4: dont allow r/w mounts if metadata blocks overlap the superblock Greg Kroah-Hartman
2018-04-22 13:53 ` [PATCH 4.4 76/97] ext4: fix crashes in dioread_nolock mode Greg Kroah-Hartman
2018-04-22 13:53 ` [PATCH 4.4 77/97] ext4: fix deadlock between inline_data and ext4_expand_extra_isize_ea() Greg Kroah-Hartman
2018-04-22 13:53 ` [PATCH 4.4 79/97] ALSA: rawmidi: Fix missing input substream checks in compat ioctls Greg Kroah-Hartman
2018-04-22 13:53 ` [PATCH 4.4 80/97] ALSA: hda - New VIA controller suppor no-snoop path Greg Kroah-Hartman
2018-04-22 13:53 ` [PATCH 4.4 81/97] HID: hidraw: Fix crash on HIDIOCGFEATURE with a destroyed device Greg Kroah-Hartman
2018-04-22 13:54 ` [PATCH 4.4 82/97] MIPS: uaccess: Add micromips clobbers to bzero invocation Greg Kroah-Hartman
2018-04-22 13:54 ` [PATCH 4.4 83/97] MIPS: memset.S: EVA & fault support for small_memset Greg Kroah-Hartman
2018-04-22 13:54 ` [PATCH 4.4 84/97] MIPS: memset.S: Fix return of __clear_user from Lpartial_fixup Greg Kroah-Hartman
2018-04-22 13:54 ` [PATCH 4.4 85/97] MIPS: memset.S: Fix clobber of v1 in last_fixup Greg Kroah-Hartman
2018-04-22 13:54 ` [PATCH 4.4 86/97] powerpc/eeh: Fix enabling bridge MMIO windows Greg Kroah-Hartman
2018-04-22 13:54 ` [PATCH 4.4 87/97] powerpc/lib: Fix off-by-one in alternate feature patching Greg Kroah-Hartman
2018-04-22 13:54 ` [PATCH 4.4 88/97] jffs2_kill_sb(): deal with failed allocations Greg Kroah-Hartman
2018-04-22 13:54 ` [PATCH 4.4 89/97] hypfs_kill_super(): " Greg Kroah-Hartman
2018-04-22 13:54 ` [PATCH 4.4 90/97] rpc_pipefs: fix double-dput() Greg Kroah-Hartman
2018-04-22 13:54 ` [PATCH 4.4 91/97] Dont leak MNT_INTERNAL away from internal mounts Greg Kroah-Hartman
2018-04-22 13:54 ` [PATCH 4.4 92/97] autofs: mount point create should honour passed in mode Greg Kroah-Hartman
2018-04-22 13:54 ` Greg Kroah-Hartman [this message]
2018-04-22 13:54 ` [PATCH 4.4 94/97] mm/filemap.c: fix NULL pointer in page_cache_tree_insert() Greg Kroah-Hartman
2018-04-22 13:54 ` [PATCH 4.4 96/97] fanotify: fix logic of events on child Greg Kroah-Hartman
2018-04-22 13:54 ` [PATCH 4.4 97/97] writeback: safer lock nesting Greg Kroah-Hartman
2018-04-22 20:44 ` [PATCH 4.4 00/97] 4.4.129-stable review Nathan Chancellor
2018-04-23 6:57 ` Greg Kroah-Hartman
2018-04-23 7:38 ` Naresh Kamboju
2018-04-23 16:53 ` Guenter Roeck
2018-04-23 21:38 ` Shuah Khan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180422135310.256979609@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=akpm@linux-foundation.org \
--cc=david@fromorbit.com \
--cc=jack@suse.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mfasheh@suse.com \
--cc=mgorman@suse.de \
--cc=mhocko@suse.com \
--cc=penguin-kernel@I-love.SAKURA.ne.jp \
--cc=stable@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).