From: Michal Hocko <mhocko@suse.com>
To: Marcelo Tosatti <mtosatti@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
Vlastimil Babka <vbabka@suse.cz>,
Andrew Morton <akpm@linux-foundation.org>,
David Hildenbrand <david@redhat.com>,
Peter Xu <peterx@redhat.com>
Subject: Re: [patch 0/2] mm: too_many_isolated can stall due to out of sync VM counters
Date: Tue, 14 Nov 2023 09:20:09 +0100 [thread overview]
Message-ID: <ZVMtuYLviLYqAI7x@tiehlicka> (raw)
In-Reply-To: <20231113233420.446465795@redhat.com>
On Mon 13-11-23 20:34:20, Marcelo Tosatti wrote:
> A customer reported seeing processes hung at too_many_isolated,
> while analysis indicated that the problem occurred due to out
> of sync per-CPU stats (see below).
>
> Fix is to use node_page_state_snapshot to avoid the out of stale values.
>
> 2136 static unsigned long
> 2137 shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
> 2138 struct scan_control *sc, enum lru_list lru)
> 2139 {
> :
> 2145 bool file = is_file_lru(lru);
> :
> 2147 struct pglist_data *pgdat = lruvec_pgdat(lruvec);
> :
> 2150 while (unlikely(too_many_isolated(pgdat, file, sc))) {
> 2151 if (stalled)
> 2152 return 0;
> 2153
> 2154 /* wait a bit for the reclaimer. */
> 2155 msleep(100); <--- some processes were sleeping here, with pending SIGKILL.
> 2156 stalled = true;
> 2157
> 2158 /* We are about to die and free our memory. Return now. */
> 2159 if (fatal_signal_pending(current))
> 2160 return SWAP_CLUSTER_MAX;
> 2161 }
>
> msleep() must be called only when there are too many isolated pages:
What do you mean here?
> 2019 static int too_many_isolated(struct pglist_data *pgdat, int file,
> 2020 struct scan_control *sc)
> 2021 {
> :
> 2030 if (file) {
> 2031 inactive = node_page_state(pgdat, NR_INACTIVE_FILE);
> 2032 isolated = node_page_state(pgdat, NR_ISOLATED_FILE);
> 2033 } else {
> :
> 2046 return isolated > inactive;
>
> The return value was true since:
>
> crash> p ((struct pglist_data *) 0xffff00817fffe580)->vm_stat[NR_INACTIVE_FILE]
> $8 = {
> counter = 1
> }
> crash> p ((struct pglist_data *) 0xffff00817fffe580)->vm_stat[NR_ISOLATED_FILE]
> $9 = {
> counter = 2
>
> while per_cpu stats had:
>
> crash> p ((struct pglist_data *) 0xffff00817fffe580)->per_cpu_nodestats
> $85 = (struct per_cpu_nodestat *) 0xffff8000118832e0
> crash> p/x 0xffff8000118832e0 + __per_cpu_offset[42]
> $86 = 0xffff00917fcc32e0
> crash> p ((struct per_cpu_nodestat *) 0xffff00917fcc32e0)->vm_node_stat_diff[NR_ISOLATED_FILE]
> $87 = -1 '\377'
>
> crash> p/x 0xffff8000118832e0 + __per_cpu_offset[44]
> $89 = 0xffff00917fe032e0
> crash> p ((struct per_cpu_nodestat *) 0xffff00917fe032e0)->vm_node_stat_diff[NR_ISOLATED_FILE]
> $91 = -1 '\377'
This doesn't really tell much. How much out of sync they really are
cumulatively over all cpus?
> It seems that processes were trapped in direct reclaim/compaction loop
> because these nodes had few free pages lower than watermark min.
>
> crash> kmem -z | grep -A 3 Normal
> :
> NODE: 4 ZONE: 1 ADDR: ffff00817fffec40 NAME: "Normal"
> SIZE: 8454144 PRESENT: 98304 MIN/LOW/HIGH: 68/166/264
> VM_STAT:
> NR_FREE_PAGES: 68
> --
> NODE: 5 ZONE: 1 ADDR: ffff00897fffec40 NAME: "Normal"
> SIZE: 118784 MIN/LOW/HIGH: 82/200/318
> VM_STAT:
> NR_FREE_PAGES: 45
> --
> NODE: 6 ZONE: 1 ADDR: ffff00917fffec40 NAME: "Normal"
> SIZE: 118784 MIN/LOW/HIGH: 82/200/318
> VM_STAT:
> NR_FREE_PAGES: 53
> --
> NODE: 7 ZONE: 1 ADDR: ffff00997fbbec40 NAME: "Normal"
> SIZE: 118784 MIN/LOW/HIGH: 82/200/318
> VM_STAT:
> NR_FREE_PAGES: 52
How have you concluded that too_many_isolated is at root of this issue.
With a very low NR_FREE_PAGES and many contending allocation the system
could be easily stuck in reclaim. What are other reclaim
characteristics? Is the direct reclaim successful?
--
Michal Hocko
SUSE Labs
next prev parent reply other threads:[~2023-11-14 8:20 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-13 23:34 [patch 0/2] mm: too_many_isolated can stall due to out of sync VM counters Marcelo Tosatti
2023-11-13 23:34 ` [patch 1/2] mm: vmstat: introduce node_page_state_pages_snapshot Marcelo Tosatti
2023-11-13 23:34 ` [patch 2/2] mm: vmstat: use node_page_state_snapshot in too_many_isolated Marcelo Tosatti
2023-11-14 8:20 ` Michal Hocko [this message]
2023-11-14 12:26 ` [patch 0/2] mm: too_many_isolated can stall due to out of sync VM counters Marcelo Tosatti
2023-11-14 12:46 ` Michal Hocko
2023-11-21 13:35 ` Marcelo Tosatti
2023-11-22 11:23 ` Marcelo Tosatti
2023-11-22 11:26 ` Marcelo Tosatti
2023-11-22 13:56 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZVMtuYLviLYqAI7x@tiehlicka \
--to=mhocko@suse.com \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mtosatti@redhat.com \
--cc=peterx@redhat.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.