From: Ying Han <yinghan@google.com>
To: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Michal Hocko <mhocko@suse.cz>,
Johannes Weiner <hannes@cmpxchg.org>, Mel Gorman <mel@csn.ul.ie>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
Rik van Riel <riel@redhat.com>,
Minchan Kim <minchan.kim@gmail.com>,
Hugh Dickins <hughd@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, Nick Piggin <npiggin@gmail.com>
Subject: Re: [RFC PATCH] do_try_to_free_pages() might enter infinite loop
Date: Wed, 13 Jun 2012 22:25:07 -0700 [thread overview]
Message-ID: <CALWz4iyAFHicoLq_rqLux_YOFaUGXpzCfPn3bde1xLge1vVkfg@mail.gmail.com> (raw)
In-Reply-To: <CAHGf_=pY41-BMph5qCZ1ZwQy+Or1xi90FARZk9X8=kvKNr-DVA@mail.gmail.com>
On Mon, Jun 11, 2012 at 4:37 PM, KOSAKI Motohiro
<kosaki.motohiro@jp.fujitsu.com> wrote:
> Sigh, fix Nick's e-mail address.
>
>
> <full quote intentionally>
>
>> On Mon, Apr 23, 2012 at 4:56 PM, Ying Han <yinghan@google.com> wrote:
>>> This is not a patch targeted to be merged at all, but trying to understand
>>> a logic in global direct reclaim.
>>>
>>> There is a logic in global direct reclaim where reclaim fails on priority 0
>>> and zone->all_unreclaimable is not set, it will cause the direct to start over
>>> from DEF_PRIORITY. In some extreme cases, we've seen the system hang which is
>>> very likely caused by direct reclaim enters infinite loop.
>>>
>>> There have been serious patches trying to fix similar issue and the latest
>>> patch has good summary of all the efforts:
>>>
>>> commit 929bea7c714220fc76ce3f75bef9056477c28e74
>>> Author: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
>>> Date: Thu Apr 14 15:22:12 2011 -0700
>>>
>>> vmscan: all_unreclaimable() use zone->all_unreclaimable as a name
>>>
>>> Kosaki explained the problem triggered by async zone->all_unreclaimable and
>>> zone->pages_scanned where the later one was being checked by direct reclaim.
>>> However, after the patch, the problem remains where the setting of
>>> zone->all_unreclaimable is asynchronous with zone is actually reclaimable or not.
>>>
>>> The zone->all_unreclaimable flag is set by kswapd by checking zone->pages_scanned in
>>> zone_reclaimable(). Is that possible to have zone->all_unreclaimable == false while
>>> the zone is actually unreclaimable?
>>
>> I'm backed very old threads. :-(
>> I could reproduce this issue by using memory hotplug. Can anyone
>> review following patch?
>>
>>
>> From 767b9ff5b53a34cb95e59a7c230aef3fda07be49 Mon Sep 17 00:00:00 2001
>> From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
>> Date: Mon, 11 Jun 2012 18:48:03 -0400
>> Subject: [PATCH] mm, vmscan: fix do_try_to_free_pages() livelock
>>
>> Currently, do_try_to_free_pages() can enter livelock. Because of,
>> now vmscan has two conflicted policies.
>>
>> 1) kswapd sleep when it couldn't reclaim any page even though
>> reach priority 0. This is because to avoid kswapd() infinite
>> loop. That said, kswapd assume direct reclaim makes enough
>> free pages either regular page reclaim or oom-killer.
>> This logic makes kswapd -> direct-reclaim dependency.
>> 2) direct reclaim continue to reclaim without oom-killer until
>> kswapd turn on zone->all_unreclaimble. This is because
>> to avoid too early oom-kill.
>> This logic makes direct-reclaim -> kswapd dependency.
>>
>> In worst case, direct-reclaim may continue to page reclaim forever
>> when kswapd is slept and any other thread don't wakeup kswapd.
>>
>> We can't turn on zone->all_unreclaimable because this is racy.
>> direct reclaim path don't take any lock. Thus this patch removes
>> zone->all_unreclaimable field completely and recalculates every
>> time.
>>
>> Note: we can't take the idea that direct-reclaim see zone->pages_scanned
>> directly and kswapd continue to use zone->all_unreclaimable. Because,
>> it is racy. commit 929bea7c71 (vmscan: all_unreclaimable() use
>> zone->all_unreclaimable as a name) describes the detail.
>>
>> Reported-by: Aaditya Kumar <aaditya.kumar.30@gmail.com>
>> Reported-by: Ying Han <yinghan@google.com>
>> Cc: Nick Piggin <npiggin@gmail.com>
>> Cc: Rik van Riel <riel@redhat.com>
>> Cc: Michal Hocko <mhocko@suse.cz>
>> Cc: Johannes Weiner <hannes@cmpxchg.org>
>> Cc: Mel Gorman <mel@csn.ul.ie>
>> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>> Cc: Minchan Kim <minchan.kim@gmail.com>
>> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
>> ---
>> include/linux/mm_inline.h | 19 +++++++++++++++++
>> include/linux/mmzone.h | 2 +-
>> include/linux/vmstat.h | 1 -
>> mm/page-writeback.c | 2 +
>> mm/page_alloc.c | 5 +--
>> mm/vmscan.c | 48 ++++++++++++--------------------------------
>> mm/vmstat.c | 3 +-
>> 7 files changed, 39 insertions(+), 41 deletions(-)
>>
>> diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
>> index 1397ccf..04f32e1 100644
>> --- a/include/linux/mm_inline.h
>> +++ b/include/linux/mm_inline.h
>> @@ -2,6 +2,7 @@
>> #define LINUX_MM_INLINE_H
>>
>> #include <linux/huge_mm.h>
>> +#include <linux/swap.h>
>>
>> /**
>> * page_is_file_cache - should the page be on a file LRU or anon LRU?
>> @@ -99,4 +100,22 @@ static __always_inline enum lru_list
>> page_lru(struct page *page)
>> return lru;
>> }
>>
>> +static inline unsigned long zone_reclaimable_pages(struct zone *zone)
>> +{
>> + int nr;
>> +
>> + nr = zone_page_state(zone, NR_ACTIVE_FILE) +
>> + zone_page_state(zone, NR_INACTIVE_FILE);
>> +
>> + if (nr_swap_pages > 0)
>> + nr += zone_page_state(zone, NR_ACTIVE_ANON) +
>> + zone_page_state(zone, NR_INACTIVE_ANON);
>> +
>> + return nr;
>> +}
>> +
>> +static inline bool zone_reclaimable(struct zone *zone)
>> +{
>> + return zone->pages_scanned < zone_reclaimable_pages(zone) * 6;
>> +}
>> #endif
>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>> index 2427706..9d2a720 100644
>> --- a/include/linux/mmzone.h
>> +++ b/include/linux/mmzone.h
>> @@ -368,7 +368,7 @@ struct zone {
>> * free areas of different sizes
>> */
>> spinlock_t lock;
>> - int all_unreclaimable; /* All pages pinned */
>> +
>> #ifdef CONFIG_MEMORY_HOTPLUG
>> /* see spanned/present_pages for more description */
>> seqlock_t span_seqlock;
>> diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
>> index 65efb92..9607256 100644
>> --- a/include/linux/vmstat.h
>> +++ b/include/linux/vmstat.h
>> @@ -140,7 +140,6 @@ static inline unsigned long
>> zone_page_state_snapshot(struct zone *zone,
>> }
>>
>> extern unsigned long global_reclaimable_pages(void);
>> -extern unsigned long zone_reclaimable_pages(struct zone *zone);
>>
>> #ifdef CONFIG_NUMA
>> /*
>> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
>> index 93d8d2f..d2d957f 100644
>> --- a/mm/page-writeback.c
>> +++ b/mm/page-writeback.c
>> @@ -34,6 +34,8 @@
>> #include <linux/syscalls.h>
>> #include <linux/buffer_head.h> /* __set_page_dirty_buffers */
>> #include <linux/pagevec.h>
>> +#include <linux/mm_inline.h>
>> +
>> #include <trace/events/writeback.h>
>>
>> /*
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 4403009..5716b00 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -59,6 +59,7 @@
>> #include <linux/prefetch.h>
>> #include <linux/migrate.h>
>> #include <linux/page-debug-flags.h>
>> +#include <linux/mm_inline.h>
>>
>> #include <asm/tlbflush.h>
>> #include <asm/div64.h>
>> @@ -638,7 +639,6 @@ static void free_pcppages_bulk(struct zone *zone, int count,
>> int to_free = count;
>>
>> spin_lock(&zone->lock);
>> - zone->all_unreclaimable = 0;
>> zone->pages_scanned = 0;
>>
>> while (to_free) {
>> @@ -680,7 +680,6 @@ static void free_one_page(struct zone *zone,
>> struct page *page, int order,
>> int migratetype)
>> {
>> spin_lock(&zone->lock);
>> - zone->all_unreclaimable = 0;
>> zone->pages_scanned = 0;
>>
>> __free_one_page(page, zone, order, migratetype);
>> @@ -2870,7 +2869,7 @@ void show_free_areas(unsigned int filter)
>> K(zone_page_state(zone, NR_BOUNCE)),
>> K(zone_page_state(zone, NR_WRITEBACK_TEMP)),
>> zone->pages_scanned,
>> - (zone->all_unreclaimable ? "yes" : "no")
>> + (zone_reclaimable(zone) ? "yes" : "no")
>> );
>> printk("lowmem_reserve[]:");
>> for (i = 0; i < MAX_NR_ZONES; i++)
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index eeb3bc9..033671c 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -1592,7 +1592,7 @@ static void get_scan_count(struct lruvec
>> *lruvec, struct scan_control *sc,
>> * latencies, so it's better to scan a minimum amount there as
>> * well.
>> */
>> - if (current_is_kswapd() && zone->all_unreclaimable)
>> + if (current_is_kswapd() && !zone_reclaimable(zone))
>> force_scan = true;
>> if (!global_reclaim(sc))
>> force_scan = true;
>> @@ -1936,8 +1936,8 @@ static bool shrink_zones(struct zonelist
>> *zonelist, struct scan_control *sc)
>> if (global_reclaim(sc)) {
>> if (!cpuset_zone_allowed_hardwall(zone, GFP_KERNEL))
>> continue;
>> - if (zone->all_unreclaimable &&
>> - sc->priority != DEF_PRIORITY)
>> + if (!zone_reclaimable(zone) &&
>> + sc->priority != DEF_PRIORITY)
>> continue; /* Let kswapd poll it */
>> if (COMPACTION_BUILD) {
>> /*
>> @@ -1975,11 +1975,6 @@ static bool shrink_zones(struct zonelist
>> *zonelist, struct scan_control *sc)
>> return aborted_reclaim;
>> }
>>
>> -static bool zone_reclaimable(struct zone *zone)
>> -{
>> - return zone->pages_scanned < zone_reclaimable_pages(zone) * 6;
>> -}
>> -
>> /* All zones in zonelist are unreclaimable? */
>> static bool all_unreclaimable(struct zonelist *zonelist,
>> struct scan_control *sc)
>> @@ -1993,7 +1988,7 @@ static bool all_unreclaimable(struct zonelist *zonelist,
>> continue;
>> if (!cpuset_zone_allowed_hardwall(zone, GFP_KERNEL))
>> continue;
>> - if (!zone->all_unreclaimable)
>> + if (zone_reclaimable(zone))
>> return false;
>> }
>>
>> @@ -2299,7 +2294,7 @@ static bool sleeping_prematurely(pg_data_t
>> *pgdat, int order, long remaining,
>> * they must be considered balanced here as well if kswapd
>> * is to sleep
>> */
>> - if (zone->all_unreclaimable) {
>> + if (zone_reclaimable(zone)) {
>> balanced += zone->present_pages;
>> continue;
>> }
>> @@ -2393,8 +2388,7 @@ loop_again:
>> if (!populated_zone(zone))
>> continue;
>>
>> - if (zone->all_unreclaimable &&
>> - sc.priority != DEF_PRIORITY)
>> + if (!zone_reclaimable(zone) && sc.priority != DEF_PRIORITY)
>> continue;
>>
>> /*
>> @@ -2443,14 +2437,13 @@ loop_again:
>> */
>> for (i = 0; i <= end_zone; i++) {
>> struct zone *zone = pgdat->node_zones + i;
>> - int nr_slab, testorder;
>> + int testorder;
>> unsigned long balance_gap;
>>
>> if (!populated_zone(zone))
>> continue;
>>
>> - if (zone->all_unreclaimable &&
>> - sc.priority != DEF_PRIORITY)
>> + if (!zone_reclaimable(zone) && sc.priority != DEF_PRIORITY)
>> continue;
>>
>> sc.nr_scanned = 0;
>> @@ -2497,12 +2490,11 @@ loop_again:
>> shrink_zone(zone, &sc);
>>
>> reclaim_state->reclaimed_slab = 0;
>> - nr_slab = shrink_slab(&shrink, sc.nr_scanned, lru_pages);
>> + shrink_slab(&shrink, sc.nr_scanned, lru_pages);
>> sc.nr_reclaimed += reclaim_state->reclaimed_slab;
>> total_scanned += sc.nr_scanned;
>>
>> - if (nr_slab == 0 && !zone_reclaimable(zone))
>> - zone->all_unreclaimable = 1;
>> +
>> }
>>
>> /*
>> @@ -2514,7 +2506,7 @@ loop_again:
>> total_scanned > sc.nr_reclaimed + sc.nr_reclaimed / 2)
>> sc.may_writepage = 1;
>>
>> - if (zone->all_unreclaimable) {
>> + if (!zone_reclaimable(zone)) {
>> if (end_zone && end_zone == i)
>> end_zone--;
>> continue;
>> @@ -2616,7 +2608,7 @@ out:
>> if (!populated_zone(zone))
>> continue;
>>
>> - if (zone->all_unreclaimable &&
>> + if (!zone_reclaimable(zone) &&
>> sc.priority != DEF_PRIORITY)
>> continue;
>>
>> @@ -2850,20 +2842,6 @@ unsigned long global_reclaimable_pages(void)
>> return nr;
>> }
>>
>> -unsigned long zone_reclaimable_pages(struct zone *zone)
>> -{
>> - int nr;
>> -
>> - nr = zone_page_state(zone, NR_ACTIVE_FILE) +
>> - zone_page_state(zone, NR_INACTIVE_FILE);
>> -
>> - if (nr_swap_pages > 0)
>> - nr += zone_page_state(zone, NR_ACTIVE_ANON) +
>> - zone_page_state(zone, NR_INACTIVE_ANON);
>> -
>> - return nr;
>> -}
>> -
>> #ifdef CONFIG_HIBERNATION
>> /*
>> * Try to free `nr_to_reclaim' of memory, system-wide, and return the number of
>> @@ -3158,7 +3136,7 @@ int zone_reclaim(struct zone *zone, gfp_t
>> gfp_mask, unsigned int order)
>> zone_page_state(zone, NR_SLAB_RECLAIMABLE) <= zone->min_slab_pages)
>> return ZONE_RECLAIM_FULL;
>>
>> - if (zone->all_unreclaimable)
>> + if (!zone_reclaimable(zone))
>> return ZONE_RECLAIM_FULL;
>>
>> /*
>> diff --git a/mm/vmstat.c b/mm/vmstat.c
>> index 1bbbbd9..94b9d4c 100644
>> --- a/mm/vmstat.c
>> +++ b/mm/vmstat.c
>> @@ -19,6 +19,7 @@
>> #include <linux/math64.h>
>> #include <linux/writeback.h>
>> #include <linux/compaction.h>
>> +#include <linux/mm_inline.h>
>>
>> #ifdef CONFIG_VM_EVENT_COUNTERS
>> DEFINE_PER_CPU(struct vm_event_state, vm_event_states) = {{0}};
>> @@ -1022,7 +1023,7 @@ static void zoneinfo_show_print(struct seq_file
>> *m, pg_data_t *pgdat,
>> "¥n all_unreclaimable: %u"
>> "¥n start_pfn: %lu"
>> "¥n inactive_ratio: %u",
>> - zone->all_unreclaimable,
>> + !zone_reclaimable(zone),
>> zone->zone_start_pfn,
>> zone->inactive_ratio);
>> seq_putc(m, '¥n');
>> --
>> 1.7.1
Thank you for looking into this, and I like the idea of removing the
zone->all_unreclaimable flag. We've seen the same problem triggered in
production recently again, but I don't have simple workload to
reproduce it manually.
--Ying
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2012-06-14 5:25 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-04-23 20:56 [RFC PATCH] do_try_to_free_pages() might enter infinite loop Ying Han
2012-04-23 22:20 ` KOSAKI Motohiro
2012-04-23 23:18 ` Ying Han
2012-04-23 23:19 ` Ying Han
2012-04-24 1:31 ` Minchan Kim
2012-04-24 2:06 ` Ying Han
2012-04-24 16:36 ` Ying Han
2012-04-24 16:38 ` Rik van Riel
2012-04-24 16:45 ` KOSAKI Motohiro
2012-04-24 17:22 ` Ying Han
2012-04-24 17:17 ` Ying Han
2012-04-24 5:36 ` Nick Piggin
2012-04-24 18:37 ` Ying Han
2012-05-01 3:34 ` Nick Piggin
2012-05-01 16:18 ` Ying Han
2012-05-01 16:20 ` Ying Han
2012-05-01 17:06 ` Rik van Riel
2012-05-02 3:25 ` Nick Piggin
2012-06-11 23:33 ` KOSAKI Motohiro
2012-06-11 23:37 ` KOSAKI Motohiro
2012-06-14 5:25 ` Ying Han [this message]
2012-06-12 0:53 ` Rik van Riel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CALWz4iyAFHicoLq_rqLux_YOFaUGXpzCfPn3bde1xLge1vVkfg@mail.gmail.com \
--to=yinghan@google.com \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-mm@kvack.org \
--cc=mel@csn.ul.ie \
--cc=mhocko@suse.cz \
--cc=minchan.kim@gmail.com \
--cc=npiggin@gmail.com \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).