From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Johannes Weiner <hannes@cmpxchg.org>,
Michal Hocko <mhocko@suse.com>,
Vladimir Davydov <vdavydov.dev@gmail.com>,
Tejun Heo <tj@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH 4.4 07/69] mm: memcontrol: do not recurse in direct reclaim
Date: Wed, 9 Nov 2016 11:43:45 +0100 [thread overview]
Message-ID: <20161109102901.453726700@linuxfoundation.org> (raw)
In-Reply-To: <20161109102901.127641653@linuxfoundation.org>
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johannes Weiner <hannes@cmpxchg.org>
commit 89a2848381b5fcd9c4d9c0cd97680e3b28730e31 upstream.
On 4.0, we saw a stack corruption from a page fault entering direct
memory cgroup reclaim, calling into btrfs_releasepage(), which then
tried to allocate an extent and recursed back into a kmem charge ad
nauseam:
[...]
btrfs_releasepage+0x2c/0x30
try_to_release_page+0x32/0x50
shrink_page_list+0x6da/0x7a0
shrink_inactive_list+0x1e5/0x510
shrink_lruvec+0x605/0x7f0
shrink_zone+0xee/0x320
do_try_to_free_pages+0x174/0x440
try_to_free_mem_cgroup_pages+0xa7/0x130
try_charge+0x17b/0x830
memcg_charge_kmem+0x40/0x80
new_slab+0x2d9/0x5a0
__slab_alloc+0x2fd/0x44f
kmem_cache_alloc+0x193/0x1e0
alloc_extent_state+0x21/0xc0
__clear_extent_bit+0x2b5/0x400
try_release_extent_mapping+0x1a3/0x220
__btrfs_releasepage+0x31/0x70
btrfs_releasepage+0x2c/0x30
try_to_release_page+0x32/0x50
shrink_page_list+0x6da/0x7a0
shrink_inactive_list+0x1e5/0x510
shrink_lruvec+0x605/0x7f0
shrink_zone+0xee/0x320
do_try_to_free_pages+0x174/0x440
try_to_free_mem_cgroup_pages+0xa7/0x130
try_charge+0x17b/0x830
mem_cgroup_try_charge+0x65/0x1c0
handle_mm_fault+0x117f/0x1510
__do_page_fault+0x177/0x420
do_page_fault+0xc/0x10
page_fault+0x22/0x30
On later kernels, kmem charging is opt-in rather than opt-out, and that
particular kmem allocation in btrfs_releasepage() is no longer being
charged and won't recurse and overrun the stack anymore.
But it's not impossible for an accounted allocation to happen from the
memcg direct reclaim context, and we needed to reproduce this crash many
times before we even got a useful stack trace out of it.
Like other direct reclaimers, mark tasks in memcg reclaim PF_MEMALLOC to
avoid recursing into any other form of direct reclaim. Then let
recursive charges from PF_MEMALLOC contexts bypass the cgroup limit.
Link: http://lkml.kernel.org/r/20161025141050.GA13019@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
mm/memcontrol.c | 9 +++++++++
mm/vmscan.c | 2 ++
2 files changed, 11 insertions(+)
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2055,6 +2055,15 @@ retry:
current->flags & PF_EXITING))
goto force;
+ /*
+ * Prevent unbounded recursion when reclaim operations need to
+ * allocate memory. This might exceed the limits temporarily,
+ * but we prefer facilitating memory reclaim and getting back
+ * under the limit over triggering OOM kills in these cases.
+ */
+ if (unlikely(current->flags & PF_MEMALLOC))
+ goto force;
+
if (unlikely(task_in_memcg_oom(current)))
goto nomem;
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2910,7 +2910,9 @@ unsigned long try_to_free_mem_cgroup_pag
sc.may_writepage,
sc.gfp_mask);
+ current->flags |= PF_MEMALLOC;
nr_reclaimed = do_try_to_free_pages(zonelist, &sc);
+ current->flags &= ~PF_MEMALLOC;
trace_mm_vmscan_memcg_reclaim_end(nr_reclaimed);
next prev parent reply other threads:[~2016-11-09 10:45 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CGME20161109104447epcas2p2800d2cee304b181b04018da2ac18030c@epcas2p2.samsung.com>
2016-11-09 10:43 ` [PATCH 4.4 00/69] 4.4.31-stable review Greg Kroah-Hartman
2016-11-09 10:43 ` [PATCH 4.4 01/69] i2c: xgene: Avoid dma_buffer overrun Greg Kroah-Hartman
2016-11-09 10:43 ` [PATCH 4.4 02/69] i2c: core: fix NULL pointer dereference under race condition Greg Kroah-Hartman
2016-11-09 10:43 ` [PATCH 4.4 04/69] h8300: fix syscall restarting Greg Kroah-Hartman
2016-11-09 10:43 ` [PATCH 4.4 05/69] libxfs: clean up _calc_dquots_per_chunk Greg Kroah-Hartman
2016-11-09 10:43 ` [PATCH 4.4 06/69] mm/list_lru.c: avoid error-path NULL pointer deref Greg Kroah-Hartman
2016-11-09 10:43 ` Greg Kroah-Hartman [this message]
2016-11-09 10:43 ` [PATCH 4.4 08/69] KEYS: Fix short sprintf buffer in /proc/keys show function Greg Kroah-Hartman
2016-11-09 10:43 ` [PATCH 4.4 09/69] ALSA: usb-audio: Add quirk for Syntek STK1160 Greg Kroah-Hartman
2016-11-09 10:43 ` [PATCH 4.4 10/69] ALSA: hda - Merge RIRB_PRE_DELAY into CTX_WORKAROUND caps Greg Kroah-Hartman
2016-11-09 10:43 ` [PATCH 4.4 11/69] ALSA: hda - Raise AZX_DCAPS_RIRB_DELAY handling into top drivers Greg Kroah-Hartman
2016-11-09 10:43 ` [PATCH 4.4 12/69] ALSA: hda - allow 40 bit DMA mask for NVidia devices Greg Kroah-Hartman
2016-11-09 10:43 ` [PATCH 4.4 13/69] ALSA: hda - Adding a new group of pin cfg into ALC295 pin quirk table Greg Kroah-Hartman
2016-11-09 10:43 ` [PATCH 4.4 14/69] ALSA: hda - Fix headset mic detection problem for two Dell laptops Greg Kroah-Hartman
2016-11-09 10:43 ` [PATCH 4.4 17/69] btrfs: fix races on root_log_ctx lists Greg Kroah-Hartman
2016-11-09 10:43 ` [PATCH 4.4 18/69] ubifs: Abort readdir upon error Greg Kroah-Hartman
2016-11-09 10:43 ` [PATCH 4.4 19/69] ubifs: Fix regression in ubifs_readdir() Greg Kroah-Hartman
2016-11-09 10:43 ` [PATCH 4.4 20/69] mei: txe: dont clean an unprocessed interrupt cause Greg Kroah-Hartman
2016-11-09 10:44 ` [PATCH 4.4 22/69] USB: serial: fix potential NULL-dereference at probe Greg Kroah-Hartman
2016-11-09 10:44 ` [PATCH 4.4 23/69] USB: serial: ftdi_sio: add support for Infineon TriBoard TC2X7 Greg Kroah-Hartman
2016-11-09 10:44 ` [PATCH 4.4 24/69] xhci: use default USB_RESUME_TIMEOUT when resuming ports Greg Kroah-Hartman
2016-11-09 10:44 ` [PATCH 4.4 25/69] usb: increase ohci watchdog delay to 275 msec Greg Kroah-Hartman
2016-11-09 10:44 ` [PATCH 4.4 26/69] GenWQE: Fix bad page access during abort of resource allocation Greg Kroah-Hartman
2016-11-09 10:44 ` [PATCH 4.4 27/69] Fix potential infoleak in older kernels Greg Kroah-Hartman
2016-11-09 10:44 ` [PATCH 4.4 28/69] vt: clear selection before resizing Greg Kroah-Hartman
2016-11-09 10:44 ` [PATCH 4.4 29/69] hv: do not lose pending heartbeat vmbus packets Greg Kroah-Hartman
2016-11-09 10:44 ` [PATCH 4.4 30/69] xhci: add restart quirk for Intel Wildcatpoint PCH Greg Kroah-Hartman
2016-11-09 10:44 ` [PATCH 4.4 31/69] tty: limit terminal size to 4M chars Greg Kroah-Hartman
2016-11-09 10:44 ` [PATCH 4.4 32/69] USB: serial: cp210x: fix tiocmget error handling Greg Kroah-Hartman
2016-11-09 10:44 ` [PATCH 4.4 33/69] dm: free io_barrier after blk_cleanup_queue call Greg Kroah-Hartman
2016-11-09 10:44 ` [PATCH 4.4 36/69] ovl: fsync after copy-up Greg Kroah-Hartman
2016-11-09 10:44 ` [PATCH 4.4 37/69] parisc: Ensure consistent state when switching to kernel stack at syscall entry Greg Kroah-Hartman
2016-11-09 10:44 ` [PATCH 4.4 38/69] virtio_ring: Make interrupt suppression spec compliant Greg Kroah-Hartman
2016-11-09 10:44 ` [PATCH 4.4 39/69] virtio: console: Unlock vqs while freeing buffers Greg Kroah-Hartman
2016-11-09 10:44 ` [PATCH 4.4 40/69] dm mirror: fix read error on recovery after default leg failure Greg Kroah-Hartman
2016-11-09 10:44 ` [PATCH 4.4 41/69] Input: i8042 - add XMG C504 to keyboard reset table Greg Kroah-Hartman
2016-11-09 10:44 ` [PATCH 4.4 42/69] firewire: net: guard against rx buffer overflows Greg Kroah-Hartman
2016-11-09 10:44 ` [PATCH 4.4 43/69] firewire: net: fix fragmented datagram_size off-by-one Greg Kroah-Hartman
2016-11-09 10:44 ` [PATCH 4.4 44/69] mac80211: discard multicast and 4-addr A-MSDUs Greg Kroah-Hartman
2016-11-09 10:44 ` [PATCH 4.4 45/69] scsi: megaraid_sas: Fix data integrity failure for JBOD (passthrough) devices Greg Kroah-Hartman
2016-11-09 10:44 ` [PATCH 4.4 46/69] scsi: scsi_debug: Fix memory leak if LBP enabled and module is unloaded Greg Kroah-Hartman
2016-11-09 10:44 ` [PATCH 4.4 47/69] scsi: arcmsr: Send SYNCHRONIZE_CACHE command to firmware Greg Kroah-Hartman
2016-11-09 10:44 ` [PATCH 4.4 48/69] mmc: dw_mmc-pltfm: fix the potential NULL pointer dereference Greg Kroah-Hartman
2016-11-09 10:44 ` [PATCH 4.4 50/69] drm/radeon/si_dpm: Limit clocks on HD86xx part Greg Kroah-Hartman
2016-11-09 10:44 ` [PATCH 4.4 51/69] drm/radeon/si_dpm: workaround for SI kickers Greg Kroah-Hartman
2016-11-09 10:44 ` [PATCH 4.4 54/69] perf build: Fix traceevent plugins build race Greg Kroah-Hartman
2016-11-09 10:44 ` [PATCH 4.4 55/69] x86/xen: fix upper bound of pmd loop in xen_cleanhighmap() Greg Kroah-Hartman
2016-11-09 10:44 ` [PATCH 4.4 56/69] powerpc/ptrace: Fix out of bounds array access warning Greg Kroah-Hartman
2016-11-09 10:44 ` [PATCH 4.4 57/69] ARM: 8584/1: floppy: avoid gcc-6 warning Greg Kroah-Hartman
2016-11-09 10:44 ` [PATCH 4.4 58/69] mm/cma: silence warnings due to max() usage Greg Kroah-Hartman
2016-11-09 10:44 ` [PATCH 4.4 59/69] drm/exynos: fix error handling in exynos_drm_subdrv_open Greg Kroah-Hartman
2016-11-09 10:44 ` [PATCH 4.4 60/69] cgroup: avoid false positive gcc-6 warning Greg Kroah-Hartman
2016-11-09 10:44 ` [PATCH 4.4 61/69] smc91x: avoid self-comparison warning Greg Kroah-Hartman
2016-11-09 10:44 ` [PATCH 4.4 63/69] UBI: fastmap: scrub PEB when bitflips are detected in a free PEB EC header Greg Kroah-Hartman
2016-11-09 10:44 ` [PATCH 4.4 64/69] pwm: Unexport children before chip removal Greg Kroah-Hartman
2016-11-09 10:44 ` [PATCH 4.4 65/69] usb: dwc3: Fix size used in dma_free_coherent() Greg Kroah-Hartman
2016-11-09 10:44 ` [PATCH 4.4 67/69] kvm: x86: Check memopp before dereference (CVE-2016-8630) Greg Kroah-Hartman
2016-11-09 10:44 ` [PATCH 4.4 68/69] ubi: fastmap: Fix add_vol() return value test in ubi_attach_fastmap() Greg Kroah-Hartman
2016-11-09 10:44 ` [PATCH 4.4 69/69] HID: usbhid: add ATEN CS962 to list of quirky devices Greg Kroah-Hartman
2016-11-09 18:21 ` [PATCH 4.4 00/69] 4.4.31-stable review Shuah Khan
2016-11-09 19:34 ` Guenter Roeck
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20161109102901.453726700@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mhocko@suse.com \
--cc=stable@vger.kernel.org \
--cc=tj@kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=vdavydov.dev@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).