linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jan Stancek <jstancek@redhat.com>
To: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: mhocko@suse.com, tj@kernel.org, clameter@sgi.com,
	js1304@gmail.com, arekm@maven.pl, akpm@linux-foundation.org,
	torvalds@linux-foundation.org, linux-mm@kvack.org
Subject: Re: [LTP] [BUG] oom hangs the system, NMI backtrace shows most CPUs in shrink_slab
Date: Fri, 29 Jan 2016 02:32:53 -0500 (EST)	[thread overview]
Message-ID: <443846857.13955817.1454052773098.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <201601290048.IHF21869.OSJOQVOMLFFFHt@I-love.SAKURA.ne.jp>



----- Original Message -----
> From: "Tetsuo Handa" <penguin-kernel@I-love.SAKURA.ne.jp>
> To: mhocko@suse.com, jstancek@redhat.com
> Cc: tj@kernel.org, clameter@sgi.com, js1304@gmail.com, arekm@maven.pl, akpm@linux-foundation.org,
> torvalds@linux-foundation.org, linux-mm@kvack.org
> Sent: Thursday, 28 January, 2016 4:48:36 PM
> Subject: Re: [LTP] [BUG] oom hangs the system, NMI backtrace shows most CPUs in shrink_slab
> 
> Tetsuo Handa wrote:
> > Inviting people who involved in commit 373ccbe5927034b5 "mm, vmstat: allow
> > WQ concurrency to discover memory reclaim doesn't make any progress".
> > 
> > In this thread, Jan hit an OOM stall where free memory does not increase
> > even after OOM victim and dying tasks terminated. I'm wondering why such
> > thing can happen. Jan established a reproducer and I tried it.
> > 
> > I'm observing vmstat_update workqueue item forever remains pending.
> > Didn't we make sure that vmstat_update is processed when memory allocation
> > is stalling?
> 
> I confirmed that a forced sleep patch solves this problem.
> 
> ----------------------------------------
> diff --git a/mm/backing-dev.c b/mm/backing-dev.c
> index 7340353..b986216 100644
> --- a/mm/backing-dev.c
> +++ b/mm/backing-dev.c
> @@ -979,6 +979,12 @@ long wait_iff_congested(struct zone *zone, int sync,
> long timeout)
>  	 */
>  	if (atomic_read(&nr_wb_congested[sync]) == 0 ||
>  	    !test_bit(ZONE_CONGESTED, &zone->flags)) {
> +		const struct memalloc_info *m = &current->memalloc;
> +		if (m->valid && time_after_eq(jiffies, m->start + 30 * HZ)) {
> +			pr_err("********** %s(%u) Forced sleep **********\n",
> +			       current->comm, current->pid);
> +			schedule_timeout_uninterruptible(HZ);
> +		}
>  
>  		/*
>  		 * Memory allocation/reclaim might be called from a WQ
> ----------------------------------------
> 
> ----------------------------------------
> [  939.038719] Showing busy workqueues and worker pools:
> [  939.040519] workqueue events: flags=0x0
> [  939.042142]   pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256
> [  939.044350]     pending: vmpressure_work_fn(delay=20659)
> [  939.046302]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256
> [  939.048392]     pending: vmw_fb_dirty_flush [vmwgfx](delay=42),
> vmstat_shepherd(delay=10)
> [  939.050946] workqueue events_power_efficient: flags=0x80
> [  939.052844]   pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256
> [  939.054980]     pending: fb_flashcursor(delay=20573)
> [  939.056939] workqueue events_freezable_power_: flags=0x84
> [  939.058872]   pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256
> [  939.060994]     in-flight: 9571:disk_events_workfn(delay=20719)
> [  939.063069] workqueue vmstat: flags=0xc
> [  939.064667]   pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256
> [  939.066795]     pending: vmstat_update(delay=20016)
> [  939.068752] workqueue xfs-eofblocks/sda1: flags=0xc
> [  939.070546]   pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256
> [  939.072675]     pending: xfs_eofblocks_worker(delay=5574)
> [  939.074660] pool 4: cpus=2 node=0 flags=0x0 nice=0 workers=8 idle: 20
> 10098 10100 505 10099 10364 10363
> [  948.026046] ********** a.out(10423) Forced sleep **********
> [  948.036318] ********** a.out(10424) Forced sleep **********
> [  948.323267] ********** kworker/2:3(9571) Forced sleep **********
> [  949.030045] a.out invoked oom-killer: gfp_mask=0x24280ca, order=0,
> oom_score_adj=0
> [  949.032320] a.out cpuset=/ mems_allowed=0
> [  949.033976] CPU: 3 PID: 10423 Comm: a.out Not tainted 4.4.0+ #39
> ----------------------------------------
> [ 1255.809372] Showing busy workqueues and worker pools:
> [ 1255.811163] workqueue events: flags=0x0
> [ 1255.812744]   pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256
> [ 1255.814877]     pending: vmpressure_work_fn(delay=10713)
> [ 1255.816837]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256
> [ 1255.818960]     pending: vmw_fb_dirty_flush [vmwgfx](delay=42)
> [ 1255.821025] workqueue events_power_efficient: flags=0x80
> [ 1255.822937]   pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256
> [ 1255.825136]     pending: fb_flashcursor(delay=20673)
> [ 1255.827069] workqueue events_freezable_power_: flags=0x84
> [ 1255.828953]   pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256
> [ 1255.831050]     in-flight: 20:disk_events_workfn(delay=20777)
> [ 1255.833063]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256
> [ 1255.835168]     pending: disk_events_workfn(delay=7)
> [ 1255.837084] workqueue vmstat: flags=0xc
> [ 1255.838707]   pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256
> [ 1255.840827]     pending: vmstat_update(delay=19787)
> [ 1255.842794] pool 4: cpus=2 node=0 flags=0x0 nice=0 workers=3 idle: 9571
> 10098
> [ 1265.036032] ********** kworker/2:0(20) Forced sleep **********
> [ 1265.038131] a.out invoked oom-killer: gfp_mask=0x24280ca, order=0,
> oom_score_adj=0
> [ 1265.041018] a.out cpuset=/ mems_allowed=0
> [ 1265.043008] CPU: 2 PID: 10622 Comm: a.out Not tainted 4.4.0+ #39
> ----------------------------------------
> 
> In the post "[PATCH 1/2] mm, oom: introduce oom reaper", Andrew Morton said
> that "schedule_timeout() in state TASK_RUNNING doesn't do anything".
> 
> Looking at commit 373ccbe5927034b5, it is indeed using schedule_timeout(1)
> instead of schedule_timeout_*(1). What!? We meant to force the kworker to
> sleep but the kworker did not sleep at all? Then, that explains why the
> forced sleep patch above solves the OOM livelock.
> 
> Jan, can you reproduce your problem with below patch applied?

I took v4.5-rc1, applied your memalloc patch and then patch below.

I have mixed results so far. First attempt hanged after ~15 minutes,
second is still running (for 12+ hours).

The way it hanged is different from previous ones, I don't recall seeing
messages like these before:
  SLUB: Unable to allocate memory on node -1 (gfp=0x2000000)
  NMI watchdog: Watchdog detected hard LOCKUP on cpu 0

Full log from one that hanged:
  http://jan.stancek.eu/tmp/oom_hangs/console.log.4-v4.5-rc1_and_wait_iff_congested_patch.txt

I'll let it run through the weekend.

Regards,
Jan

> 
> ----------
> diff --git a/mm/backing-dev.c b/mm/backing-dev.c
> index 7340353..cbe6f0b 100644
> --- a/mm/backing-dev.c
> +++ b/mm/backing-dev.c
> @@ -989,7 +989,7 @@ long wait_iff_congested(struct zone *zone, int sync, long
> timeout)
>  		 * here rather than calling cond_resched().
>  		 */
>  		if (current->flags & PF_WQ_WORKER)
> -			schedule_timeout(1);
> +			schedule_timeout_uninterruptible(1);
>  		else
>  			cond_resched();
>  
> ----------
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2016-01-29  7:33 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-18 15:38 [BUG] oom hangs the system, NMI backtrace shows most CPUs in shrink_slab Jan Stancek
2016-01-19 10:29 ` Tetsuo Handa
2016-01-19 15:13   ` Jan Stancek
2016-01-20 10:23     ` [BUG] oom hangs the system, NMI backtrace shows most CPUs inshrink_slab Tetsuo Handa
2016-01-20 13:17       ` [BUG] oom hangs the system, NMI backtrace shows most CPUs in shrink_slab Tetsuo Handa
2016-01-20 15:10         ` Tejun Heo
2016-01-20 15:54           ` Tetsuo Handa
2016-01-22 15:14   ` Jan Stancek
2016-01-23  6:30     ` Tetsuo Handa
2016-01-26  7:48     ` [LTP] " Jan Stancek
2016-01-26 14:46       ` Tetsuo Handa
2016-01-27 11:02         ` Tetsuo Handa
2016-01-28 15:48           ` Tetsuo Handa
2016-01-29  7:32             ` Jan Stancek [this message]
2016-01-29 12:35               ` Tetsuo Handa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=443846857.13955817.1454052773098.JavaMail.zimbra@redhat.com \
    --to=jstancek@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=arekm@maven.pl \
    --cc=clameter@sgi.com \
    --cc=js1304@gmail.com \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=penguin-kernel@I-love.SAKURA.ne.jp \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).