From: Minchan Kim <minchan.kim@gmail.com>
To: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: linux-kernel@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>,
David Rientjes <rientjes@google.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Rik van Riel <riel@redhat.com>, Oleg Nesterov <oleg@redhat.com>,
linux-mm <linux-mm@kvack.org>, Andrey Vagin <avagin@openvz.org>,
Hugh Dickins <hughd@google.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
Nick Piggin <npiggin@kernel.dk>,
Johannes Weiner <hannes@cmpxchg.org>
Subject: Re: [PATCH 1/5] vmscan: remove all_unreclaimable check from direct reclaim path completely
Date: Tue, 22 Mar 2011 23:49:50 +0900 [thread overview]
Message-ID: <20110322144950.GA2628@barrios-desktop> (raw)
In-Reply-To: <20110322200523.B061.A69D9226@jp.fujitsu.com>
Hi Kosaki,
On Tue, Mar 22, 2011 at 08:05:55PM +0900, KOSAKI Motohiro wrote:
> all_unreclaimable check in direct reclaim has been introduced at 2.6.19
> by following commit.
>
> 2006 Sep 25; commit 408d8544; oom: use unreclaimable info
>
> And it went through strange history. firstly, following commit broke
> the logic unintentionally.
>
> 2008 Apr 29; commit a41f24ea; page allocator: smarter retry of
> costly-order allocations
>
> Two years later, I've found obvious meaningless code fragment and
> restored original intention by following commit.
>
> 2010 Jun 04; commit bb21c7ce; vmscan: fix do_try_to_free_pages()
> return value when priority==0
>
> But, the logic didn't works when 32bit highmem system goes hibernation
> and Minchan slightly changed the algorithm and fixed it .
>
> 2010 Sep 22: commit d1908362: vmscan: check all_unreclaimable
> in direct reclaim path
>
> But, recently, Andrey Vagin found the new corner case. Look,
>
> struct zone {
> ..
> int all_unreclaimable;
> ..
> unsigned long pages_scanned;
> ..
> }
>
> zone->all_unreclaimable and zone->pages_scanned are neigher atomic
> variables nor protected by lock. Therefore a zone can become a state
> of zone->page_scanned=0 and zone->all_unreclaimable=1. In this case,
Possible although it's very rare.
> current all_unreclaimable() return false even though
> zone->all_unreclaimabe=1.
The case is very rare since we reset zone->all_unreclaimabe to zero
right before resetting zone->page_scanned to zero.
But I admit it's possible.
CPU 0 CPU 1
free_pcppages_bulk balance_pgdat
zone->all_unreclaimabe = 0
zone->all_unreclaimabe = 1
zone->pages_scanned = 0
>
> Is this ignorable minor issue? No. Unfortunatelly, x86 has very
> small dma zone and it become zone->all_unreclamble=1 easily. and
> if it becase all_unreclaimable, it never return all_unreclaimable=0
^^^^^ it's very important verb. ^^^^^ return? reset?
I can't understand your point due to the typo. Please correct the typo.
> beucase it typicall don't have reclaimable pages.
If DMA zone have very small reclaimable pages or zero reclaimable pages,
zone_reclaimable() can return false easily so all_unreclaimable() could return
true. Eventually oom-killer might works.
In my test, I saw the livelock, too so apparently we have a problem.
I couldn't dig in it recently by another urgent my work.
I think you know root cause but the description in this patch isn't enough
for me to be persuaded.
Could you explain the root cause in detail?
>
> Eventually, oom-killer never works on such systems. Let's remove
> this problematic logic completely.
>
> Reported-by: Andrey Vagin <avagin@openvz.org>
> Cc: Nick Piggin <npiggin@kernel.dk>
> Cc: Minchan Kim <minchan.kim@gmail.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> ---
> mm/vmscan.c | 36 +-----------------------------------
> 1 files changed, 1 insertions(+), 35 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 060e4c1..254aada 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1989,33 +1989,6 @@ static bool zone_reclaimable(struct zone *zone)
> }
>
> /*
> - * As hibernation is going on, kswapd is freezed so that it can't mark
> - * the zone into all_unreclaimable. It can't handle OOM during hibernation.
> - * So let's check zone's unreclaimable in direct reclaim as well as kswapd.
> - */
> -static bool all_unreclaimable(struct zonelist *zonelist,
> - struct scan_control *sc)
> -{
> - struct zoneref *z;
> - struct zone *zone;
> - bool all_unreclaimable = true;
> -
> - for_each_zone_zonelist_nodemask(zone, z, zonelist,
> - gfp_zone(sc->gfp_mask), sc->nodemask) {
> - if (!populated_zone(zone))
> - continue;
> - if (!cpuset_zone_allowed_hardwall(zone, GFP_KERNEL))
> - continue;
> - if (zone_reclaimable(zone)) {
> - all_unreclaimable = false;
> - break;
> - }
> - }
> -
> - return all_unreclaimable;
> -}
> -
> -/*
> * This is the main entry point to direct page reclaim.
> *
> * If a full scan of the inactive list fails to free enough memory then we
> @@ -2105,14 +2078,7 @@ out:
> delayacct_freepages_end();
> put_mems_allowed();
>
> - if (sc->nr_reclaimed)
> - return sc->nr_reclaimed;
> -
> - /* top priority shrink_zones still had more to do? don't OOM, then */
> - if (scanning_global_lru(sc) && !all_unreclaimable(zonelist, sc))
> - return 1;
> -
> - return 0;
> + return sc->nr_reclaimed;
> }
>
> unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
> --
> 1.6.5.2
>
>
>
--
Kind regards,
Minchan Kim
WARNING: multiple messages have this Message-ID (diff)
From: Minchan Kim <minchan.kim@gmail.com>
To: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: linux-kernel@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>,
David Rientjes <rientjes@google.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Rik van Riel <riel@redhat.com>, Oleg Nesterov <oleg@redhat.com>,
linux-mm <linux-mm@kvack.org>, Andrey Vagin <avagin@openvz.org>,
Hugh Dickins <hughd@google.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
Nick Piggin <npiggin@kernel.dk>,
Johannes Weiner <hannes@cmpxchg.org>
Subject: Re: [PATCH 1/5] vmscan: remove all_unreclaimable check from direct reclaim path completely
Date: Tue, 22 Mar 2011 23:49:50 +0900 [thread overview]
Message-ID: <20110322144950.GA2628@barrios-desktop> (raw)
In-Reply-To: <20110322200523.B061.A69D9226@jp.fujitsu.com>
Hi Kosaki,
On Tue, Mar 22, 2011 at 08:05:55PM +0900, KOSAKI Motohiro wrote:
> all_unreclaimable check in direct reclaim has been introduced at 2.6.19
> by following commit.
>
> 2006 Sep 25; commit 408d8544; oom: use unreclaimable info
>
> And it went through strange history. firstly, following commit broke
> the logic unintentionally.
>
> 2008 Apr 29; commit a41f24ea; page allocator: smarter retry of
> costly-order allocations
>
> Two years later, I've found obvious meaningless code fragment and
> restored original intention by following commit.
>
> 2010 Jun 04; commit bb21c7ce; vmscan: fix do_try_to_free_pages()
> return value when priority==0
>
> But, the logic didn't works when 32bit highmem system goes hibernation
> and Minchan slightly changed the algorithm and fixed it .
>
> 2010 Sep 22: commit d1908362: vmscan: check all_unreclaimable
> in direct reclaim path
>
> But, recently, Andrey Vagin found the new corner case. Look,
>
> struct zone {
> ..
> int all_unreclaimable;
> ..
> unsigned long pages_scanned;
> ..
> }
>
> zone->all_unreclaimable and zone->pages_scanned are neigher atomic
> variables nor protected by lock. Therefore a zone can become a state
> of zone->page_scanned=0 and zone->all_unreclaimable=1. In this case,
Possible although it's very rare.
> current all_unreclaimable() return false even though
> zone->all_unreclaimabe=1.
The case is very rare since we reset zone->all_unreclaimabe to zero
right before resetting zone->page_scanned to zero.
But I admit it's possible.
CPU 0 CPU 1
free_pcppages_bulk balance_pgdat
zone->all_unreclaimabe = 0
zone->all_unreclaimabe = 1
zone->pages_scanned = 0
>
> Is this ignorable minor issue? No. Unfortunatelly, x86 has very
> small dma zone and it become zone->all_unreclamble=1 easily. and
> if it becase all_unreclaimable, it never return all_unreclaimable=0
^^^^^ it's very important verb. ^^^^^ return? reset?
I can't understand your point due to the typo. Please correct the typo.
> beucase it typicall don't have reclaimable pages.
If DMA zone have very small reclaimable pages or zero reclaimable pages,
zone_reclaimable() can return false easily so all_unreclaimable() could return
true. Eventually oom-killer might works.
In my test, I saw the livelock, too so apparently we have a problem.
I couldn't dig in it recently by another urgent my work.
I think you know root cause but the description in this patch isn't enough
for me to be persuaded.
Could you explain the root cause in detail?
>
> Eventually, oom-killer never works on such systems. Let's remove
> this problematic logic completely.
>
> Reported-by: Andrey Vagin <avagin@openvz.org>
> Cc: Nick Piggin <npiggin@kernel.dk>
> Cc: Minchan Kim <minchan.kim@gmail.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> ---
> mm/vmscan.c | 36 +-----------------------------------
> 1 files changed, 1 insertions(+), 35 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 060e4c1..254aada 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1989,33 +1989,6 @@ static bool zone_reclaimable(struct zone *zone)
> }
>
> /*
> - * As hibernation is going on, kswapd is freezed so that it can't mark
> - * the zone into all_unreclaimable. It can't handle OOM during hibernation.
> - * So let's check zone's unreclaimable in direct reclaim as well as kswapd.
> - */
> -static bool all_unreclaimable(struct zonelist *zonelist,
> - struct scan_control *sc)
> -{
> - struct zoneref *z;
> - struct zone *zone;
> - bool all_unreclaimable = true;
> -
> - for_each_zone_zonelist_nodemask(zone, z, zonelist,
> - gfp_zone(sc->gfp_mask), sc->nodemask) {
> - if (!populated_zone(zone))
> - continue;
> - if (!cpuset_zone_allowed_hardwall(zone, GFP_KERNEL))
> - continue;
> - if (zone_reclaimable(zone)) {
> - all_unreclaimable = false;
> - break;
> - }
> - }
> -
> - return all_unreclaimable;
> -}
> -
> -/*
> * This is the main entry point to direct page reclaim.
> *
> * If a full scan of the inactive list fails to free enough memory then we
> @@ -2105,14 +2078,7 @@ out:
> delayacct_freepages_end();
> put_mems_allowed();
>
> - if (sc->nr_reclaimed)
> - return sc->nr_reclaimed;
> -
> - /* top priority shrink_zones still had more to do? don't OOM, then */
> - if (scanning_global_lru(sc) && !all_unreclaimable(zonelist, sc))
> - return 1;
> -
> - return 0;
> + return sc->nr_reclaimed;
> }
>
> unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
> --
> 1.6.5.2
>
>
>
--
Kind regards,
Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2011-03-22 14:50 UTC|newest]
Thread overview: 134+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-03-15 1:49 Linux 2.6.38 Linus Torvalds
2011-03-15 3:13 ` David Rientjes
2011-03-15 4:06 ` Steven Rostedt
2011-03-15 4:14 ` Linus Torvalds
2011-03-15 4:29 ` David Rientjes
2011-03-15 4:33 ` Andrew Morton
2011-03-15 4:50 ` David Rientjes
2011-03-15 6:21 ` Andrew Morton
2011-03-16 9:09 ` KOSAKI Motohiro
2011-03-22 11:04 ` [patch 0/5] oom: a few anti fork bomb patches KOSAKI Motohiro
2011-03-22 11:05 ` [PATCH 1/5] vmscan: remove all_unreclaimable check from direct reclaim path completely KOSAKI Motohiro
2011-03-22 11:05 ` KOSAKI Motohiro
2011-03-22 14:49 ` Minchan Kim [this message]
2011-03-22 14:49 ` Minchan Kim
2011-03-23 5:21 ` KOSAKI Motohiro
2011-03-23 5:21 ` KOSAKI Motohiro
2011-03-23 6:59 ` Minchan Kim
2011-03-23 6:59 ` Minchan Kim
2011-03-23 7:13 ` KOSAKI Motohiro
2011-03-23 7:13 ` KOSAKI Motohiro
2011-03-23 8:24 ` Minchan Kim
2011-03-23 8:24 ` Minchan Kim
2011-03-23 8:44 ` KOSAKI Motohiro
2011-03-23 8:44 ` KOSAKI Motohiro
2011-03-23 9:02 ` Minchan Kim
2011-03-23 9:02 ` Minchan Kim
2011-03-24 2:11 ` KOSAKI Motohiro
2011-03-24 2:11 ` KOSAKI Motohiro
2011-03-24 2:21 ` Andrew Morton
2011-03-24 2:21 ` Andrew Morton
2011-03-24 2:48 ` KOSAKI Motohiro
2011-03-24 2:48 ` KOSAKI Motohiro
2011-03-24 3:04 ` Andrew Morton
2011-03-24 3:04 ` Andrew Morton
2011-03-24 5:35 ` KOSAKI Motohiro
2011-03-24 5:35 ` KOSAKI Motohiro
2011-03-24 4:19 ` Minchan Kim
2011-03-24 4:19 ` Minchan Kim
2011-03-24 5:35 ` KOSAKI Motohiro
2011-03-24 5:35 ` KOSAKI Motohiro
2011-03-24 5:53 ` Minchan Kim
2011-03-24 5:53 ` Minchan Kim
2011-03-24 6:16 ` KOSAKI Motohiro
2011-03-24 6:16 ` KOSAKI Motohiro
2011-03-24 6:32 ` Minchan Kim
2011-03-24 6:32 ` Minchan Kim
2011-03-24 7:03 ` KOSAKI Motohiro
2011-03-24 7:03 ` KOSAKI Motohiro
2011-03-24 7:25 ` Minchan Kim
2011-03-24 7:25 ` Minchan Kim
2011-03-24 7:28 ` KOSAKI Motohiro
2011-03-24 7:28 ` KOSAKI Motohiro
2011-03-24 7:34 ` Minchan Kim
2011-03-24 7:34 ` Minchan Kim
2011-03-24 7:41 ` Minchan Kim
2011-03-24 7:41 ` Minchan Kim
2011-03-24 7:43 ` KOSAKI Motohiro
2011-03-24 7:43 ` KOSAKI Motohiro
2011-03-24 7:43 ` Minchan Kim
2011-03-24 7:43 ` Minchan Kim
2011-03-23 7:41 ` KAMEZAWA Hiroyuki
2011-03-23 7:41 ` KAMEZAWA Hiroyuki
2011-03-23 7:55 ` KOSAKI Motohiro
2011-03-23 7:55 ` KOSAKI Motohiro
2011-03-22 11:06 ` [PATCH 2/5] Revert "oom: give the dying task a higher priority" KOSAKI Motohiro
2011-03-23 7:42 ` KAMEZAWA Hiroyuki
2011-03-23 13:40 ` Luis Claudio R. Goncalves
2011-03-23 13:40 ` Luis Claudio R. Goncalves
2011-03-24 0:06 ` KOSAKI Motohiro
2011-03-24 0:06 ` KOSAKI Motohiro
2011-03-24 15:27 ` Minchan Kim
2011-03-24 15:27 ` Minchan Kim
2011-03-28 9:48 ` KOSAKI Motohiro
2011-03-28 9:48 ` KOSAKI Motohiro
2011-03-28 12:28 ` Minchan Kim
2011-03-28 12:28 ` Minchan Kim
2011-03-28 9:51 ` Peter Zijlstra
2011-03-28 9:51 ` Peter Zijlstra
2011-03-28 12:21 ` Minchan Kim
2011-03-28 12:21 ` Minchan Kim
2011-03-28 12:28 ` Peter Zijlstra
2011-03-28 12:28 ` Peter Zijlstra
2011-03-28 12:40 ` Minchan Kim
2011-03-28 12:40 ` Minchan Kim
2011-03-28 13:10 ` Luis Claudio R. Goncalves
2011-03-28 13:10 ` Luis Claudio R. Goncalves
2011-03-28 13:18 ` Peter Zijlstra
2011-03-28 13:18 ` Peter Zijlstra
2011-03-28 13:56 ` Luis Claudio R. Goncalves
2011-03-28 13:56 ` Luis Claudio R. Goncalves
2011-03-29 2:46 ` KOSAKI Motohiro
2011-03-29 2:46 ` KOSAKI Motohiro
2011-03-28 13:48 ` Minchan Kim
2011-03-28 13:48 ` Minchan Kim
2011-03-22 11:08 ` [PATCH 3/5] oom: create oom autogroup KOSAKI Motohiro
2011-03-22 11:08 ` KOSAKI Motohiro
2011-03-22 23:21 ` Minchan Kim
2011-03-22 23:21 ` Minchan Kim
2011-03-23 1:27 ` KOSAKI Motohiro
2011-03-23 1:27 ` KOSAKI Motohiro
2011-03-23 2:41 ` Mike Galbraith
2011-03-23 2:41 ` Mike Galbraith
2011-03-22 11:08 ` [PATCH 4/5] mm: introduce wait_on_page_locked_killable KOSAKI Motohiro
2011-03-22 11:08 ` KOSAKI Motohiro
2011-03-23 7:44 ` KAMEZAWA Hiroyuki
2011-03-23 7:44 ` KAMEZAWA Hiroyuki
2011-03-24 15:04 ` Minchan Kim
2011-03-24 15:04 ` Minchan Kim
2011-03-22 11:09 ` [PATCH 5/5] x86,mm: make pagefault killable KOSAKI Motohiro
2011-03-22 11:09 ` KOSAKI Motohiro
2011-03-23 7:49 ` KAMEZAWA Hiroyuki
2011-03-23 7:49 ` KAMEZAWA Hiroyuki
2011-03-23 8:09 ` KOSAKI Motohiro
2011-03-23 8:09 ` KOSAKI Motohiro
2011-03-23 14:34 ` Linus Torvalds
2011-03-23 14:34 ` Linus Torvalds
2011-03-24 15:10 ` Minchan Kim
2011-03-24 15:10 ` Minchan Kim
2011-03-24 17:13 ` Oleg Nesterov
2011-03-24 17:13 ` Oleg Nesterov
2011-03-24 17:34 ` Linus Torvalds
2011-03-24 17:34 ` Linus Torvalds
2011-03-28 7:00 ` KOSAKI Motohiro
2011-03-28 7:00 ` KOSAKI Motohiro
2011-03-15 21:08 ` Linux 2.6.38 Oleg Nesterov
2011-03-15 23:32 ` unnecessary oom killer panics in 2.6.38 (was Re: Linux 2.6.38) David Rientjes
2011-03-15 3:14 ` Linux 2.6.38 Steven Rostedt
2011-03-15 4:15 ` Linus Torvalds
2011-03-16 17:30 ` i915/kms regression after 2.6.38-rc8 (was: Re: Linux 2.6.38) Melchior FRANZ
2011-03-16 19:22 ` i915/kms regression after 2.6.38-rc8 Jiri Slaby
2011-03-16 19:22 ` Jiri Slaby
2011-03-16 19:43 ` i915/kms regression after 2.6.38-rc8 (was: Re: Linux 2.6.38) Chris Wilson
2011-03-16 21:09 ` i915/kms regression after 2.6.38-rc8 Melchior FRANZ
2011-03-20 18:30 ` i915/kms regression after 2.6.38-rc8 (was: Re: Linux 2.6.38) Maciej Rutecki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110322144950.GA2628@barrios-desktop \
--to=minchan.kim@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=avagin@openvz.org \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=npiggin@kernel.dk \
--cc=oleg@redhat.com \
--cc=riel@redhat.com \
--cc=rientjes@google.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.