[PATCH 4.9 19/51] mm: dont warn about allocations which stall for too long

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org,
	Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
	Cong Wang <xiyou.wangcong@gmail.com>,
	"yuwang.yuwang" <yuwang.yuwang@alibaba-inc.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@suse.com>, Vlastimil Babka <vbabka@suse.cz>,
	Mel Gorman <mgorman@suse.de>, Dave Hansen <dave.hansen@intel.com>,
	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
	Petr Mladek <pmladek@suse.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Amit Shah <amit@kernel.org>, Sasha Levin <sashal@kernel.org>
Subject: [PATCH 4.9 19/51] mm: dont warn about allocations which stall for too long
Date: Tue, 11 Dec 2018 16:41:27 +0100	[thread overview]
Message-ID: <20181211151614.632099430@linuxfoundation.org> (raw)
In-Reply-To: <20181211151612.328911565@linuxfoundation.org>

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

commit 400e22499dd92613821374c8c6c88c7225359980 upstream.

Commit 63f53dea0c98 ("mm: warn about allocations which stall for too
long") was a great step for reducing possibility of silent hang up
problem caused by memory allocation stalls.  But this commit reverts it,
for it is possible to trigger OOM lockup and/or soft lockups when many
threads concurrently called warn_alloc() (in order to warn about memory
allocation stalls) due to current implementation of printk(), and it is
difficult to obtain useful information due to limitation of synchronous
warning approach.

Current printk() implementation flushes all pending logs using the
context of a thread which called console_unlock().  printk() should be
able to flush all pending logs eventually unless somebody continues
appending to printk() buffer.

Since warn_alloc() started appending to printk() buffer while waiting
for oom_kill_process() to make forward progress when oom_kill_process()
is processing pending logs, it became possible for warn_alloc() to force
oom_kill_process() loop inside printk().  As a result, warn_alloc()
significantly increased possibility of preventing oom_kill_process()
from making forward progress.

---------- Pseudo code start ----------
Before warn_alloc() was introduced:

  retry:
    if (mutex_trylock(&oom_lock)) {
      while (atomic_read(&printk_pending_logs) > 0) {
        atomic_dec(&printk_pending_logs);
        print_one_log();
      }
      // Send SIGKILL here.
      mutex_unlock(&oom_lock)
    }
    goto retry;

After warn_alloc() was introduced:

  retry:
    if (mutex_trylock(&oom_lock)) {
      while (atomic_read(&printk_pending_logs) > 0) {
        atomic_dec(&printk_pending_logs);
        print_one_log();
      }
      // Send SIGKILL here.
      mutex_unlock(&oom_lock)
    } else if (waited_for_10seconds()) {
      atomic_inc(&printk_pending_logs);
    }
    goto retry;
---------- Pseudo code end ----------

Although waited_for_10seconds() becomes true once per 10 seconds,
unbounded number of threads can call waited_for_10seconds() at the same
time.  Also, since threads doing waited_for_10seconds() keep doing
almost busy loop, the thread doing print_one_log() can use little CPU
resource.  Therefore, this situation can be simplified like

---------- Pseudo code start ----------
  retry:
    if (mutex_trylock(&oom_lock)) {
      while (atomic_read(&printk_pending_logs) > 0) {
        atomic_dec(&printk_pending_logs);
        print_one_log();
      }
      // Send SIGKILL here.
      mutex_unlock(&oom_lock)
    } else {
      atomic_inc(&printk_pending_logs);
    }
    goto retry;
---------- Pseudo code end ----------

when printk() is called faster than print_one_log() can process a log.

One of possible mitigation would be to introduce a new lock in order to
make sure that no other series of printk() (either oom_kill_process() or
warn_alloc()) can append to printk() buffer when one series of printk()
(either oom_kill_process() or warn_alloc()) is already in progress.

Such serialization will also help obtaining kernel messages in readable
form.

---------- Pseudo code start ----------
  retry:
    if (mutex_trylock(&oom_lock)) {
      mutex_lock(&oom_printk_lock);
      while (atomic_read(&printk_pending_logs) > 0) {
        atomic_dec(&printk_pending_logs);
        print_one_log();
      }
      // Send SIGKILL here.
      mutex_unlock(&oom_printk_lock);
      mutex_unlock(&oom_lock)
    } else {
      if (mutex_trylock(&oom_printk_lock)) {
        atomic_inc(&printk_pending_logs);
        mutex_unlock(&oom_printk_lock);
      }
    }
    goto retry;
---------- Pseudo code end ----------

But this commit does not go that direction, for we don't want to
introduce a new lock dependency, and we unlikely be able to obtain
useful information even if we serialized oom_kill_process() and
warn_alloc().

Synchronous approach is prone to unexpected results (e.g.  too late [1],
too frequent [2], overlooked [3]).  As far as I know, warn_alloc() never
helped with providing information other than "something is going wrong".
I want to consider asynchronous approach which can obtain information
during stalls with possibly relevant threads (e.g.  the owner of
oom_lock and kswapd-like threads) and serve as a trigger for actions
(e.g.  turn on/off tracepoints, ask libvirt daemon to take a memory dump
of stalling KVM guest for diagnostic purpose).

This commit temporarily loses ability to report e.g.  OOM lockup due to
unable to invoke the OOM killer due to !__GFP_FS allocation request.
But asynchronous approach will be able to detect such situation and emit
warning.  Thus, let's remove warn_alloc().

[1] https://bugzilla.kernel.org/show_bug.cgi?id=192981
[2] http://lkml.kernel.org/r/CAM_iQpWuPVGc2ky8M-9yukECtS+zKjiDasNymX7rMcBjBFyM_A@mail.gmail.com
[3] commit db73ee0d46379922 ("mm, vmscan: do not loop on too_many_isolated for ever"))

Link: http://lkml.kernel.org/r/1509017339-4802-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: Cong Wang <xiyou.wangcong@gmail.com>
Reported-by: yuwang.yuwang <yuwang.yuwang@alibaba-inc.com>
Reported-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

[Resolved backport conflict due to missing 8225196, a8e9925, 9e80c71 and
 9a67f64 in 4.9 -- all of which modified this hunk being removed.]
Signed-off-by: Amit Shah <amit@kernel.org>

Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 mm/page_alloc.c | 10 ----------
 1 file changed, 10 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 28240ce475d6..3af727d95c17 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3530,8 +3530,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	enum compact_result compact_result;
 	int compaction_retries;
 	int no_progress_loops;
-	unsigned long alloc_start = jiffies;
-	unsigned int stall_timeout = 10 * HZ;
 	unsigned int cpuset_mems_cookie;

 	/*
@@ -3704,14 +3702,6 @@ retry:
 	if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & __GFP_REPEAT))
 		goto nopage;

-	/* Make sure we know about allocations which stall for too long */
-	if (time_after(jiffies, alloc_start + stall_timeout)) {
-		warn_alloc(gfp_mask,
-			"page allocation stalls for %ums, order:%u",
-			jiffies_to_msecs(jiffies-alloc_start), order);
-		stall_timeout += 10 * HZ;
-	}
-
 	if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags,
 				 did_some_progress > 0, &no_progress_loops))
 		goto retry;
-- 
2.19.1

next prev parent reply	other threads:[~2018-12-11 16:14 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-11 15:41 [PATCH 4.9 00/51] 4.9.145-stable review Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.9 01/51] media: omap3isp: Unregister media device as first Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.9 02/51] iommu/vt-d: Fix NULL pointer dereference in prq_event_thread() Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.9 03/51] brcmutil: really fix decoding channel info for 160 MHz bandwidth Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.9 04/51] iommu/ipmmu-vmsa: Fix crash on early domain free Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.9 05/51] can: rcar_can: Fix erroneous registration Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.9 06/51] HID: input: Ignore battery reported by Symbol DS4308 Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.9 07/51] batman-adv: Expand merged fragment buffer for full packet Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.9 08/51] bnx2x: Assign unique DMAE channel number for FW DMAE transactions Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.9 09/51] qed: Fix PTT leak in qed_drain() Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.9 10/51] qed: Fix reading wrong value in loop condition Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.9 11/51] net/mlx4_core: Zero out lkey field in SW2HW_MPT fw command Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.9 12/51] net/mlx4_core: Fix uninitialized variable compilation warning Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.9 13/51] net/mlx4: Fix UBSAN warning of signed integer overflow Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.9 14/51] mtd: rawnand: qcom: Namespace prefix some commands Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.9 15/51] net: faraday: ftmac100: remove netif_running(netdev) check before disabling interrupts Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.9 16/51] iommu/vt-d: Use memunmap to free memremap Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.9 17/51] team: no need to do team_notify_peers or team_mcast_rejoin when disabling port Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.9 18/51] net: amd: add missing of_node_put() Greg Kroah-Hartman
2018-12-11 15:41 ` Greg Kroah-Hartman [this message]
2018-12-11 15:41 ` [PATCH 4.9 20/51] ARC: [zebu] Remove CONFIG_INITRAMFS_SOURCE from defconfigs Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.9 21/51] usb: quirk: add no-LPM quirk on SanDisk Ultra Flair device Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.9 22/51] usb: appledisplay: Add 27" Apple Cinema Display Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.9 23/51] USB: check usb_get_extra_descriptor for proper size Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.9 24/51] ALSA: usb-audio: Fix UAF decrement if card has no live interfaces in card.c Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.9 25/51] ALSA: hda: Add support for AMD Stoney Ridge Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.9 26/51] ALSA: pcm: Fix starvation on down_write_nonblock() Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.9 27/51] ALSA: pcm: Call snd_pcm_unlink() conditionally at closing Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.9 28/51] ALSA: pcm: Fix interval evaluation with openmin/max Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.9 29/51] ALSA: hda/realtek - Fix speaker output regression on Thinkpad T570 Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.9 30/51] virtio/s390: avoid race on vcdev->config Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.9 31/51] virtio/s390: fix race in ccw_io_helper() Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.9 32/51] SUNRPC: Fix leak of krb5p encode pages Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.9 33/51] dmaengine: cppi41: delete channel from pending list when stop channel Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.9 34/51] xhci: Prevent U1/U2 link pm states if exit latency is too long Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.9 35/51] sr: pass down correctly sized SCSI sense buffer Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.9 36/51] swiotlb: clean up reporting Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.9 37/51] vsock: lookup and setup guest_cid inside vhost_vsock_lock Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.9 38/51] vhost/vsock: fix use-after-free in network stack callers Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.9 39/51] Staging: lustre: remove two build warnings Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.9 40/51] cifs: Fix separator when building path from dentry Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.9 41/51] staging: rtl8712: Fix possible buffer overrun Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.9 42/51] tty: serial: 8250_mtk: always resume the device in probe Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.9 43/51] tty: do not set TTY_IO_ERROR flag if console port Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.9 44/51] kgdboc: fix KASAN global-out-of-bounds bug in param_set_kgdboc_var() Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.9 45/51] mac80211_hwsim: Timer should be initialized before device registered Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.9 46/51] mac80211: Clear beacon_int in ieee80211_do_stop Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.9 47/51] mac80211: ignore tx status for PS stations in ieee80211_tx_status_ext Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.9 48/51] mac80211: fix reordering of buffered broadcast packets Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.9 49/51] mac80211: ignore NullFunc frames in the duplicate detection Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.9 50/51] kbuild: fix linker feature test macros when cross compiling with Clang Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.9 51/51] kbuild: allow to use GCC toolchain not in Clang search path Greg Kroah-Hartman
2018-12-11 20:33 ` [PATCH 4.9 00/51] 4.9.145-stable review kernelci.org bot
2018-12-11 23:57 ` shuah
2018-12-12  6:45 ` Naresh Kamboju
2018-12-12 18:49 ` Guenter Roeck

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:28240ce475d dfblob:3af727d95c1 )
 OR (
bs:"[PATCH 4.9 19/51] mm: dont warn about allocations which stall for too long" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181211151614.632099430@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=amit@kernel.org \
    --cc=dave.hansen@intel.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.com \
    --cc=penguin-kernel@I-love.SAKURA.ne.jp \
    --cc=pmladek@suse.com \
    --cc=rostedt@goodmis.org \
    --cc=sashal@kernel.org \
    --cc=sergey.senozhatsky@gmail.com \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    --cc=xiyou.wangcong@gmail.com \
    --cc=yuwang.yuwang@alibaba-inc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox