linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Oleg Nesterov <oleg@redhat.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: zone_reclaimable() leads to livelock in __alloc_pages_slowpath()
Date: Sun, 29 May 2016 23:25:40 +0200	[thread overview]
Message-ID: <20160529212540.GA15180@redhat.com> (raw)
In-Reply-To: <20160525120957.GH20132@dhcp22.suse.cz>

sorry for delay,

On 05/25, Michal Hocko wrote:
>
> On Wed 25-05-16 00:43:41, Oleg Nesterov wrote:
> >
> > But. It _seems to me_ that the kernel "leaks" some pages in LRU_INACTIVE_FILE
> > list because inactive_file_is_low() returns the wrong value. And do not even
> > ask me why I think so, unlikely I will be able to explain ;) to remind, I never
> > tried to read vmscan.c before.

No, this is not because of inactive_file_is_low(), but

> >
> > But. if I change lruvec_lru_size()
> >
> > 	-       return zone_page_state(lruvec_zone(lruvec), NR_LRU_BASE + lru);
> > 	+       return zone_page_state_snapshot(lruvec_zone(lruvec), NR_LRU_BASE + lru);
> >
> > the problem goes away too.

Yes,

> This is a bit surprising but my testing shows that the result shouldn't
> make much difference. I can see some discrepancies between lru_vec size
> and zone_reclaimable_pages but they are too small to actually matter.

Yes, the difference is small but it does matter.

I do not pretend I understand this all, but finally it seems I understand
whats going on on my system when it hangs. At least, why the change in
lruvec_lru_size() or calculate_normal_threshold() makes a difference.

This single change in get_scan_count() under for_each_evictable_lru() loop

	-	size = lruvec_lru_size(lruvec, lru);
	+	size = zone_page_state_snapshot(lruvec_zone(lruvec), NR_LRU_BASE + lru);

fixes the problem too.

Without this change shrink*() continues to scan the LRU_ACTIVE_FILE list
while it is empty. LRU_INACTIVE_FILE is not empty (just a few pages) but
we do not even try to scan it, lruvec_lru_size() returns zero.

Then later we recheck zone_reclaimable() and it notices the INACTIVE_FILE
counter because it uses the _snapshot variant, this leads to livelock.

I guess this doesn't really matter, but in my particular case these
ACTIVE/INACTIVE counters were screwed by the recent putback_inactive_pages()
logic. The pages we "leak" in INACTIVE list were recently moved from ACTIVE
to INACTIVE list, and this updated only the per-cpu ->vm_stat_diff[] counters,
so the "non snapshot" lruvec_lru_size() in get_scan_count() sees the "old"
numbers.

I even added more printk's, and yes when the system hangs I have something
like, say,

	->vm_stat[ACTIVE] 	 = NR;		// small number
	->vm_stat_diff[ACTIVE]	 = -NR;		// so it is actually zero but
						// get_scan_count() sees NR

	->vm_stat[INACTIVE]	 = 0;		// this is what get_scan_count() sees
	->vm_stat_diff[INACTIVE] = NR;		// and this is what zone_reclaimable()

Oleg.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2016-05-29 21:25 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-20 20:28 zone_reclaimable() leads to livelock in __alloc_pages_slowpath() Oleg Nesterov
2016-05-21  4:07 ` Tetsuo Handa
2016-05-22 21:17   ` Oleg Nesterov
2016-05-23  7:29 ` Michal Hocko
2016-05-23 15:14   ` Oleg Nesterov
2016-05-24  7:16     ` Michal Hocko
2016-05-24 22:43       ` Oleg Nesterov
2016-05-25 12:09         ` Michal Hocko
2016-05-29 21:25           ` Oleg Nesterov [this message]
2016-05-31 12:52             ` Michal Hocko
2016-05-31 23:56               ` Oleg Nesterov
2016-06-01 10:00                 ` Michal Hocko
2016-06-01 21:38                   ` Oleg Nesterov
2016-06-02 15:11                     ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160529212540.GA15180@redhat.com \
    --to=oleg@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).