[PATCH] mm, vmscan: add cond_resched into shrink_node_memcg

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Michal Hocko <mhocko@kernel.org>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>,
	Michal Hocko <mhocko@suse.com>, Boris Zhmurov <bb@kernelpanic.ru>,
	"Christopher S. Aker" <caker@theshore.net>,
	Donald Buczek <buczek@molgen.mpg.de>,
	Paul Menzel <pmenzel@molgen.mpg.de>
Subject: [PATCH] mm, vmscan: add cond_resched into shrink_node_memcg
Date: Fri,  2 Dec 2016 10:58:41 +0100	[thread overview]
Message-ID: <20161202095841.16648-1-mhocko@kernel.org> (raw)

From: Michal Hocko <mhocko@suse.com>

Boris Zhmurov has reported RCU stalls during the kswapd reclaim:
17511.573645] INFO: rcu_sched detected stalls on CPUs/tasks:
[17511.573699]  23-...: (22 ticks this GP) idle=92f/140000000000000/0 softirq=2638404/2638404 fqs=23
[17511.573740]  (detected by 4, t=6389 jiffies, g=786259, c=786258, q=42115)
[17511.573776] Task dump for CPU 23:
[17511.573777] kswapd1         R  running task        0   148      2 0x00000008
[17511.573781]  0000000000000000 ffff8efe5f491400 ffff8efe44523e68 ffff8f16a7f49000
[17511.573782]  0000000000000000 ffffffffafb67482 0000000000000000 0000000000000000
[17511.573784]  0000000000000000 0000000000000000 ffff8efe44523e58 00000000016dbbee
[17511.573786] Call Trace:
[17511.573796]  [<ffffffffafb67482>] ? shrink_node+0xd2/0x2f0
[17511.573798]  [<ffffffffafb683ab>] ? kswapd+0x2cb/0x6a0
[17511.573800]  [<ffffffffafb680e0>] ? mem_cgroup_shrink_node+0x160/0x160
[17511.573806]  [<ffffffffafa8b63d>] ? kthread+0xbd/0xe0
[17511.573810]  [<ffffffffafa2967a>] ? __switch_to+0x1fa/0x5c0
[17511.573813]  [<ffffffffaff9095f>] ? ret_from_fork+0x1f/0x40
[17511.573815]  [<ffffffffafa8b580>] ? kthread_create_on_node+0x180/0x180

a closer code inspection has shown that we might indeed miss all the
scheduling points in the reclaim path if no pages can be isolated from
the LRU list. This is a pathological case but other reports from Donald
Buczek have shown that we might indeed hit such a path:
        clusterd-989   [009] .... 118023.654491: mm_vmscan_direct_reclaim_end: nr_reclaimed=193
         kswapd1-86    [001] dN.. 118023.987475: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=4239830 nr_taken=0 file=1
         kswapd1-86    [001] dN.. 118024.320968: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=4239844 nr_taken=0 file=1
         kswapd1-86    [001] dN.. 118024.654375: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=4239858 nr_taken=0 file=1
         kswapd1-86    [001] dN.. 118024.987036: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=4239872 nr_taken=0 file=1
         kswapd1-86    [001] dN.. 118025.319651: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=4239886 nr_taken=0 file=1
         kswapd1-86    [001] dN.. 118025.652248: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=4239900 nr_taken=0 file=1
         kswapd1-86    [001] dN.. 118025.984870: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=4239914 nr_taken=0 file=1
[...]
         kswapd1-86    [001] dN.. 118084.274403: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=4241133 nr_taken=0 file=1

this is minute long snapshot which didn't take a single page from the
LRU. It is not entirely clear why only 1303 pages have been scanned
during that time (maybe there was a heavy IRQ activity interfering).

In any case it looks like we can really hit long periods without
scheduling on non preemptive kernels so an explicit cond_resched() in
shrink_node_memcg which is independent on the reclaim operation is due.

Reported-and-tested-by: Boris Zhmurov <bb@kernelpanic.ru>
Reported-by: Donald Buczek <buczek@molgen.mpg.de>
Reported-by: "Christopher S. Aker" <caker@theshore.net>
Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---

Hi,
there were multiple reportes of the similar RCU stalls. Only Boris has
confirmed that this patch helps in his workload. Others might see a
slightly different issue and that should be investigated if it is the
case. As pointed out by Paul [1] cond_resched might be not sufficient
to silence RCU stalls because that would require a real scheduling.
This is a separate problem, though, and Paul is working with Peter [2]
to resolve it.

Anyway, I believe that this patch should be a good start because it
really seems that nr_taken=0 during the LRU isolation can be triggered
in the real life. All reporters are agreeing to start seeing this issue
when moving on to 4.8 kernel which might be just a coincidence or a
different behavior of some subsystem. Well, MM has moved from zone to
node reclaim but I couldn't have found any direct relation to that
change.

[1] http://lkml.kernel.org/r/20161130142955.GS3924@linux.vnet.ibm.com
[2] http://lkml.kernel.org/r/20161201124024.GB3924@linux.vnet.ibm.com

 mm/vmscan.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index c05f00042430..c4abf08861d2 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2362,6 +2362,8 @@ static void shrink_node_memcg(struct pglist_data *pgdat, struct mem_cgroup *memc
 			}
 		}
 
+		cond_resched();
+
 		if (nr_reclaimed < nr_to_reclaim || scan_adjusted)
 			continue;
 
-- 
2.10.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next             reply	other threads:[~2016-12-02  9:58 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-02  9:58 Michal Hocko [this message]
2016-12-05 12:44 ` [PATCH] mm, vmscan: add cond_resched into shrink_node_memcg Balbir Singh
2016-12-05 12:49   ` Michal Hocko
2016-12-05 16:16     ` Paul E. McKenney
2016-12-09 10:13 ` Donald Buczek

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:c05f0004243 dfblob:c4abf08861d )
 OR (
bs:"[PATCH] mm, vmscan: add cond_resched into shrink_node_memcg" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161202095841.16648-1-mhocko@kernel.org \
    --to=mhocko@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=bb@kernelpanic.ru \
    --cc=buczek@molgen.mpg.de \
    --cc=caker@theshore.net \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.com \
    --cc=pmenzel@molgen.mpg.de \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).