From: Jan Stancek <jstancek@redhat.com>
To: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: mhocko@suse.com, tj@kernel.org, clameter@sgi.com,
js1304@gmail.com, arekm@maven.pl, akpm@linux-foundation.org,
torvalds@linux-foundation.org, linux-mm@kvack.org
Subject: Re: [LTP] [BUG] oom hangs the system, NMI backtrace shows most CPUs in shrink_slab
Date: Fri, 29 Jan 2016 02:32:53 -0500 (EST) [thread overview]
Message-ID: <443846857.13955817.1454052773098.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <201601290048.IHF21869.OSJOQVOMLFFFHt@I-love.SAKURA.ne.jp>
----- Original Message -----
> From: "Tetsuo Handa" <penguin-kernel@I-love.SAKURA.ne.jp>
> To: mhocko@suse.com, jstancek@redhat.com
> Cc: tj@kernel.org, clameter@sgi.com, js1304@gmail.com, arekm@maven.pl, akpm@linux-foundation.org,
> torvalds@linux-foundation.org, linux-mm@kvack.org
> Sent: Thursday, 28 January, 2016 4:48:36 PM
> Subject: Re: [LTP] [BUG] oom hangs the system, NMI backtrace shows most CPUs in shrink_slab
>
> Tetsuo Handa wrote:
> > Inviting people who involved in commit 373ccbe5927034b5 "mm, vmstat: allow
> > WQ concurrency to discover memory reclaim doesn't make any progress".
> >
> > In this thread, Jan hit an OOM stall where free memory does not increase
> > even after OOM victim and dying tasks terminated. I'm wondering why such
> > thing can happen. Jan established a reproducer and I tried it.
> >
> > I'm observing vmstat_update workqueue item forever remains pending.
> > Didn't we make sure that vmstat_update is processed when memory allocation
> > is stalling?
>
> I confirmed that a forced sleep patch solves this problem.
>
> ----------------------------------------
> diff --git a/mm/backing-dev.c b/mm/backing-dev.c
> index 7340353..b986216 100644
> --- a/mm/backing-dev.c
> +++ b/mm/backing-dev.c
> @@ -979,6 +979,12 @@ long wait_iff_congested(struct zone *zone, int sync,
> long timeout)
> */
> if (atomic_read(&nr_wb_congested[sync]) == 0 ||
> !test_bit(ZONE_CONGESTED, &zone->flags)) {
> + const struct memalloc_info *m = ¤t->memalloc;
> + if (m->valid && time_after_eq(jiffies, m->start + 30 * HZ)) {
> + pr_err("********** %s(%u) Forced sleep **********\n",
> + current->comm, current->pid);
> + schedule_timeout_uninterruptible(HZ);
> + }
>
> /*
> * Memory allocation/reclaim might be called from a WQ
> ----------------------------------------
>
> ----------------------------------------
> [ 939.038719] Showing busy workqueues and worker pools:
> [ 939.040519] workqueue events: flags=0x0
> [ 939.042142] pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256
> [ 939.044350] pending: vmpressure_work_fn(delay=20659)
> [ 939.046302] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256
> [ 939.048392] pending: vmw_fb_dirty_flush [vmwgfx](delay=42),
> vmstat_shepherd(delay=10)
> [ 939.050946] workqueue events_power_efficient: flags=0x80
> [ 939.052844] pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256
> [ 939.054980] pending: fb_flashcursor(delay=20573)
> [ 939.056939] workqueue events_freezable_power_: flags=0x84
> [ 939.058872] pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256
> [ 939.060994] in-flight: 9571:disk_events_workfn(delay=20719)
> [ 939.063069] workqueue vmstat: flags=0xc
> [ 939.064667] pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256
> [ 939.066795] pending: vmstat_update(delay=20016)
> [ 939.068752] workqueue xfs-eofblocks/sda1: flags=0xc
> [ 939.070546] pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256
> [ 939.072675] pending: xfs_eofblocks_worker(delay=5574)
> [ 939.074660] pool 4: cpus=2 node=0 flags=0x0 nice=0 workers=8 idle: 20
> 10098 10100 505 10099 10364 10363
> [ 948.026046] ********** a.out(10423) Forced sleep **********
> [ 948.036318] ********** a.out(10424) Forced sleep **********
> [ 948.323267] ********** kworker/2:3(9571) Forced sleep **********
> [ 949.030045] a.out invoked oom-killer: gfp_mask=0x24280ca, order=0,
> oom_score_adj=0
> [ 949.032320] a.out cpuset=/ mems_allowed=0
> [ 949.033976] CPU: 3 PID: 10423 Comm: a.out Not tainted 4.4.0+ #39
> ----------------------------------------
> [ 1255.809372] Showing busy workqueues and worker pools:
> [ 1255.811163] workqueue events: flags=0x0
> [ 1255.812744] pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256
> [ 1255.814877] pending: vmpressure_work_fn(delay=10713)
> [ 1255.816837] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256
> [ 1255.818960] pending: vmw_fb_dirty_flush [vmwgfx](delay=42)
> [ 1255.821025] workqueue events_power_efficient: flags=0x80
> [ 1255.822937] pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256
> [ 1255.825136] pending: fb_flashcursor(delay=20673)
> [ 1255.827069] workqueue events_freezable_power_: flags=0x84
> [ 1255.828953] pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256
> [ 1255.831050] in-flight: 20:disk_events_workfn(delay=20777)
> [ 1255.833063] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256
> [ 1255.835168] pending: disk_events_workfn(delay=7)
> [ 1255.837084] workqueue vmstat: flags=0xc
> [ 1255.838707] pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256
> [ 1255.840827] pending: vmstat_update(delay=19787)
> [ 1255.842794] pool 4: cpus=2 node=0 flags=0x0 nice=0 workers=3 idle: 9571
> 10098
> [ 1265.036032] ********** kworker/2:0(20) Forced sleep **********
> [ 1265.038131] a.out invoked oom-killer: gfp_mask=0x24280ca, order=0,
> oom_score_adj=0
> [ 1265.041018] a.out cpuset=/ mems_allowed=0
> [ 1265.043008] CPU: 2 PID: 10622 Comm: a.out Not tainted 4.4.0+ #39
> ----------------------------------------
>
> In the post "[PATCH 1/2] mm, oom: introduce oom reaper", Andrew Morton said
> that "schedule_timeout() in state TASK_RUNNING doesn't do anything".
>
> Looking at commit 373ccbe5927034b5, it is indeed using schedule_timeout(1)
> instead of schedule_timeout_*(1). What!? We meant to force the kworker to
> sleep but the kworker did not sleep at all? Then, that explains why the
> forced sleep patch above solves the OOM livelock.
>
> Jan, can you reproduce your problem with below patch applied?
I took v4.5-rc1, applied your memalloc patch and then patch below.
I have mixed results so far. First attempt hanged after ~15 minutes,
second is still running (for 12+ hours).
The way it hanged is different from previous ones, I don't recall seeing
messages like these before:
SLUB: Unable to allocate memory on node -1 (gfp=0x2000000)
NMI watchdog: Watchdog detected hard LOCKUP on cpu 0
Full log from one that hanged:
http://jan.stancek.eu/tmp/oom_hangs/console.log.4-v4.5-rc1_and_wait_iff_congested_patch.txt
I'll let it run through the weekend.
Regards,
Jan
>
> ----------
> diff --git a/mm/backing-dev.c b/mm/backing-dev.c
> index 7340353..cbe6f0b 100644
> --- a/mm/backing-dev.c
> +++ b/mm/backing-dev.c
> @@ -989,7 +989,7 @@ long wait_iff_congested(struct zone *zone, int sync, long
> timeout)
> * here rather than calling cond_resched().
> */
> if (current->flags & PF_WQ_WORKER)
> - schedule_timeout(1);
> + schedule_timeout_uninterruptible(1);
> else
> cond_resched();
>
> ----------
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2016-01-29 7:33 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-01-18 15:38 [BUG] oom hangs the system, NMI backtrace shows most CPUs in shrink_slab Jan Stancek
2016-01-19 10:29 ` Tetsuo Handa
2016-01-19 15:13 ` Jan Stancek
2016-01-20 10:23 ` [BUG] oom hangs the system, NMI backtrace shows most CPUs inshrink_slab Tetsuo Handa
2016-01-20 13:17 ` [BUG] oom hangs the system, NMI backtrace shows most CPUs in shrink_slab Tetsuo Handa
2016-01-20 15:10 ` Tejun Heo
2016-01-20 15:54 ` Tetsuo Handa
2016-01-22 15:14 ` Jan Stancek
2016-01-23 6:30 ` Tetsuo Handa
2016-01-26 7:48 ` [LTP] " Jan Stancek
2016-01-26 14:46 ` Tetsuo Handa
2016-01-27 11:02 ` Tetsuo Handa
2016-01-28 15:48 ` Tetsuo Handa
2016-01-29 7:32 ` Jan Stancek [this message]
2016-01-29 12:35 ` Tetsuo Handa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=443846857.13955817.1454052773098.JavaMail.zimbra@redhat.com \
--to=jstancek@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=arekm@maven.pl \
--cc=clameter@sgi.com \
--cc=js1304@gmail.com \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=penguin-kernel@I-love.SAKURA.ne.jp \
--cc=tj@kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).