From: Oleg Nesterov <oleg@redhat.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Andrea Arcangeli <aarcange@redhat.com>,
Mel Gorman <mgorman@techsingularity.net>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: zone_reclaimable() leads to livelock in __alloc_pages_slowpath()
Date: Sun, 29 May 2016 23:25:40 +0200 [thread overview]
Message-ID: <20160529212540.GA15180@redhat.com> (raw)
In-Reply-To: <20160525120957.GH20132@dhcp22.suse.cz>
sorry for delay,
On 05/25, Michal Hocko wrote:
>
> On Wed 25-05-16 00:43:41, Oleg Nesterov wrote:
> >
> > But. It _seems to me_ that the kernel "leaks" some pages in LRU_INACTIVE_FILE
> > list because inactive_file_is_low() returns the wrong value. And do not even
> > ask me why I think so, unlikely I will be able to explain ;) to remind, I never
> > tried to read vmscan.c before.
No, this is not because of inactive_file_is_low(), but
> >
> > But. if I change lruvec_lru_size()
> >
> > - return zone_page_state(lruvec_zone(lruvec), NR_LRU_BASE + lru);
> > + return zone_page_state_snapshot(lruvec_zone(lruvec), NR_LRU_BASE + lru);
> >
> > the problem goes away too.
Yes,
> This is a bit surprising but my testing shows that the result shouldn't
> make much difference. I can see some discrepancies between lru_vec size
> and zone_reclaimable_pages but they are too small to actually matter.
Yes, the difference is small but it does matter.
I do not pretend I understand this all, but finally it seems I understand
whats going on on my system when it hangs. At least, why the change in
lruvec_lru_size() or calculate_normal_threshold() makes a difference.
This single change in get_scan_count() under for_each_evictable_lru() loop
- size = lruvec_lru_size(lruvec, lru);
+ size = zone_page_state_snapshot(lruvec_zone(lruvec), NR_LRU_BASE + lru);
fixes the problem too.
Without this change shrink*() continues to scan the LRU_ACTIVE_FILE list
while it is empty. LRU_INACTIVE_FILE is not empty (just a few pages) but
we do not even try to scan it, lruvec_lru_size() returns zero.
Then later we recheck zone_reclaimable() and it notices the INACTIVE_FILE
counter because it uses the _snapshot variant, this leads to livelock.
I guess this doesn't really matter, but in my particular case these
ACTIVE/INACTIVE counters were screwed by the recent putback_inactive_pages()
logic. The pages we "leak" in INACTIVE list were recently moved from ACTIVE
to INACTIVE list, and this updated only the per-cpu ->vm_stat_diff[] counters,
so the "non snapshot" lruvec_lru_size() in get_scan_count() sees the "old"
numbers.
I even added more printk's, and yes when the system hangs I have something
like, say,
->vm_stat[ACTIVE] = NR; // small number
->vm_stat_diff[ACTIVE] = -NR; // so it is actually zero but
// get_scan_count() sees NR
->vm_stat[INACTIVE] = 0; // this is what get_scan_count() sees
->vm_stat_diff[INACTIVE] = NR; // and this is what zone_reclaimable()
Oleg.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Oleg Nesterov <oleg@redhat.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Andrea Arcangeli <aarcange@redhat.com>,
Mel Gorman <mgorman@techsingularity.net>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: zone_reclaimable() leads to livelock in __alloc_pages_slowpath()
Date: Sun, 29 May 2016 23:25:40 +0200 [thread overview]
Message-ID: <20160529212540.GA15180@redhat.com> (raw)
In-Reply-To: <20160525120957.GH20132@dhcp22.suse.cz>
sorry for delay,
On 05/25, Michal Hocko wrote:
>
> On Wed 25-05-16 00:43:41, Oleg Nesterov wrote:
> >
> > But. It _seems to me_ that the kernel "leaks" some pages in LRU_INACTIVE_FILE
> > list because inactive_file_is_low() returns the wrong value. And do not even
> > ask me why I think so, unlikely I will be able to explain ;) to remind, I never
> > tried to read vmscan.c before.
No, this is not because of inactive_file_is_low(), but
> >
> > But. if I change lruvec_lru_size()
> >
> > - return zone_page_state(lruvec_zone(lruvec), NR_LRU_BASE + lru);
> > + return zone_page_state_snapshot(lruvec_zone(lruvec), NR_LRU_BASE + lru);
> >
> > the problem goes away too.
Yes,
> This is a bit surprising but my testing shows that the result shouldn't
> make much difference. I can see some discrepancies between lru_vec size
> and zone_reclaimable_pages but they are too small to actually matter.
Yes, the difference is small but it does matter.
I do not pretend I understand this all, but finally it seems I understand
whats going on on my system when it hangs. At least, why the change in
lruvec_lru_size() or calculate_normal_threshold() makes a difference.
This single change in get_scan_count() under for_each_evictable_lru() loop
- size = lruvec_lru_size(lruvec, lru);
+ size = zone_page_state_snapshot(lruvec_zone(lruvec), NR_LRU_BASE + lru);
fixes the problem too.
Without this change shrink*() continues to scan the LRU_ACTIVE_FILE list
while it is empty. LRU_INACTIVE_FILE is not empty (just a few pages) but
we do not even try to scan it, lruvec_lru_size() returns zero.
Then later we recheck zone_reclaimable() and it notices the INACTIVE_FILE
counter because it uses the _snapshot variant, this leads to livelock.
I guess this doesn't really matter, but in my particular case these
ACTIVE/INACTIVE counters were screwed by the recent putback_inactive_pages()
logic. The pages we "leak" in INACTIVE list were recently moved from ACTIVE
to INACTIVE list, and this updated only the per-cpu ->vm_stat_diff[] counters,
so the "non snapshot" lruvec_lru_size() in get_scan_count() sees the "old"
numbers.
I even added more printk's, and yes when the system hangs I have something
like, say,
->vm_stat[ACTIVE] = NR; // small number
->vm_stat_diff[ACTIVE] = -NR; // so it is actually zero but
// get_scan_count() sees NR
->vm_stat[INACTIVE] = 0; // this is what get_scan_count() sees
->vm_stat_diff[INACTIVE] = NR; // and this is what zone_reclaimable()
Oleg.
next prev parent reply other threads:[~2016-05-29 21:25 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-05-20 20:28 zone_reclaimable() leads to livelock in __alloc_pages_slowpath() Oleg Nesterov
2016-05-20 20:28 ` Oleg Nesterov
2016-05-21 4:07 ` Tetsuo Handa
2016-05-21 4:07 ` Tetsuo Handa
2016-05-22 21:17 ` Oleg Nesterov
2016-05-22 21:17 ` Oleg Nesterov
2016-05-23 7:29 ` Michal Hocko
2016-05-23 7:29 ` Michal Hocko
2016-05-23 15:14 ` Oleg Nesterov
2016-05-23 15:14 ` Oleg Nesterov
2016-05-24 7:16 ` Michal Hocko
2016-05-24 7:16 ` Michal Hocko
2016-05-24 22:43 ` Oleg Nesterov
2016-05-24 22:43 ` Oleg Nesterov
2016-05-25 12:09 ` Michal Hocko
2016-05-29 21:25 ` Oleg Nesterov [this message]
2016-05-29 21:25 ` Oleg Nesterov
2016-05-31 12:52 ` Michal Hocko
2016-05-31 12:52 ` Michal Hocko
2016-05-31 23:56 ` Oleg Nesterov
2016-05-31 23:56 ` Oleg Nesterov
2016-06-01 10:00 ` Michal Hocko
2016-06-01 10:00 ` Michal Hocko
2016-06-01 21:38 ` Oleg Nesterov
2016-06-01 21:38 ` Oleg Nesterov
2016-06-02 15:11 ` Michal Hocko
2016-06-02 15:11 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160529212540.GA15180@redhat.com \
--to=oleg@redhat.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=mhocko@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.