From mboxrd@z Thu Jan 1 00:00:00 1970 From: akpm@linux-foundation.org Subject: [merged] mm-vmscan-fix-do_try_to_free_pages-livelock.patch removed from -mm tree Date: Thu, 12 Sep 2013 12:45:01 -0700 Message-ID: <523219bd.DMkUN47FiapiWSq1%akpm@linux-foundation.org> Reply-To: linux-kernel@vger.kernel.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Return-path: Received: from mail.linuxfoundation.org ([140.211.169.12]:58502 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756123Ab3ILTpD (ORCPT ); Thu, 12 Sep 2013 15:45:03 -0400 Sender: mm-commits-owner@vger.kernel.org List-Id: mm-commits@vger.kernel.org To: mm-commits@vger.kernel.org, zhangwm@marvell.com, yinghan@google.com, riel@redhat.com, npiggin@gmail.com, minchan@kernel.org, mhocko@suse.cz, mel@csn.ul.ie, lliubbo@gmail.com, linux@arm.linux.org.uk, kosaki.motohiro@jp.fujitsu.com, kamezawa.hiroyu@jp.fujitsu.com, hannes@cmpxchg.org, cl@linux.com, aaditya.kumar.30@gmail.com, cldu@marvell.com Subject: [merged] mm-vmscan-fix-do_try_to_free_pages-livelock.patch removed from -mm tree To: cldu@marvell.com,aaditya.kumar.30@gmail.com,cl@linux.com,hannes@cmpxchg.org,kamezawa.hiroyu@jp.fujitsu.com,kosaki.motohiro@jp.fujitsu.com,linux@arm.linux.org.uk,lliubbo@gmail.com,mel@csn.ul.ie,mhocko@suse.cz,minchan@kernel.org,npiggin@gmail.com,riel@redhat.com,yinghan@google.com,zhangwm@marvell.com,mm-commits@vger.kernel.org From: akpm@linux-foundation.org Date: Thu, 12 Sep 2013 12:45:01 -0700 The patch titled Subject: mm: vmscan: fix do_try_to_free_pages() livelock has been removed from the -mm tree. Its filename was mm-vmscan-fix-do_try_to_free_pages-livelock.patch This patch was dropped because it was merged into mainline or a subsystem tree ------------------------------------------------------ From: Lisa Du Subject: mm: vmscan: fix do_try_to_free_pages() livelock This patch is based on KOSAKI's work and I add a little more description, please refer https://lkml.org/lkml/2012/6/14/74. Currently, I found system can enter a state that there are lots of free pages in a zone but only order-0 and order-1 pages which means the zone is heavily fragmented, then high order allocation could make direct reclaim path's long stall(ex, 60 seconds) especially in no swap and no compaciton enviroment. This problem happened on v3.4, but it seems issue still lives in current tree, the reason is do_try_to_free_pages enter live lock: kswapd will go to sleep if the zones have been fully scanned and are still not balanced. As kswapd thinks there's little point trying all over again to avoid infinite loop. Instead it changes order from high-order to 0-order because kswapd think order-0 is the most important. Look at 73ce02e9 in detail. If watermarks are ok, kswapd will go back to sleep and may leave zone->all_unreclaimable =3D 0. It assume high-order users can still perform direct reclaim if they wish. Direct reclaim continue to reclaim for a high order which is not a COSTLY_ORDER without oom-killer until kswapd turn on zone->all_unreclaimble= . This is because to avoid too early oom-kill. So it means direct_reclaim depends on kswapd to break this loop. In worst case, direct-reclaim may continue to page reclaim forever when kswapd sleeps forever until someone like watchdog detect and finally kill the process. As described in: http://thread.gmane.org/gmane.linux.kernel.mm/103737 We can't turn on zone->all_unreclaimable from direct reclaim path because direct reclaim path don't take any lock and this way is racy. Thus this patch removes zone->all_unreclaimable field completely and recalculates zone reclaimable state every time. Note: we can't take the idea that direct-reclaim see zone->pages_scanned directly and kswapd continue to use zone->all_unreclaimable. Because, it is racy. commit 929bea7c71 (vmscan: all_unreclaimable() use zone->all_unreclaimable as a name) describes the detail. [akpm@linux-foundation.org: uninline zone_reclaimable_pages() and zone_reclaimable()] Cc: Aaditya Kumar Cc: Ying Han Cc: Nick Piggin Acked-by: Rik van Riel Cc: Mel Gorman Cc: KAMEZAWA Hiroyuki Cc: Christoph Lameter Cc: Bob Liu Cc: Neil Zhang Cc: Russell King - ARM Linux Reviewed-by: Michal Hocko Acked-by: Minchan Kim Acked-by: Johannes Weiner Signed-off-by: KOSAKI Motohiro Signed-off-by: Lisa Du Signed-off-by: Andrew Morton --- include/linux/mm_inline.h | 1 include/linux/mmzone.h | 1 include/linux/vmstat.h | 1 mm/internal.h | 2 + mm/migrate.c | 2 - mm/page-writeback.c | 3 + mm/page_alloc.c | 5 +- mm/vmscan.c | 66 ++++++++++++++++-------------------- mm/vmstat.c | 5 ++ 9 files changed, 44 insertions(+), 42 deletions(-) diff -puN include/linux/mm_inline.h~mm-vmscan-fix-do_try_to_free_pages-livelock include/linux/mm_inline.h --- a/include/linux/mm_inline.h~mm-vmscan-fix-do_try_to_free_pages-livelock +++ a/include/linux/mm_inline.h @@ -2,6 +2,7 @@ #define LINUX_MM_INLINE_H #include +#include /** * page_is_file_cache - should the page be on a file LRU or anon LRU? diff -puN include/linux/mmzone.h~mm-vmscan-fix-do_try_to_free_pages-livelock include/linux/mmzone.h --- a/include/linux/mmzone.h~mm-vmscan-fix-do_try_to_free_pages-livelock +++ a/include/linux/mmzone.h @@ -353,7 +353,6 @@ struct zone { * free areas of different sizes */ spinlock_t lock; - int all_unreclaimable; /* All pages pinned */ #if defined CONFIG_COMPACTION || defined CONFIG_CMA /* Set to true when the PG_migrate_skip bits should be cleared */ bool compact_blockskip_flush; diff -puN include/linux/vmstat.h~mm-vmscan-fix-do_try_to_free_pages-livelock include/linux/vmstat.h --- a/include/linux/vmstat.h~mm-vmscan-fix-do_try_to_free_pages-livelock +++ a/include/linux/vmstat.h @@ -143,7 +143,6 @@ static inline unsigned long zone_page_st } extern unsigned long global_reclaimable_pages(void); -extern unsigned long zone_reclaimable_pages(struct zone *zone); #ifdef CONFIG_NUMA /* diff -puN mm/internal.h~mm-vmscan-fix-do_try_to_free_pages-livelock mm/internal.h --- a/mm/internal.h~mm-vmscan-fix-do_try_to_free_pages-livelock +++ a/mm/internal.h @@ -85,6 +85,8 @@ extern unsigned long highest_memmap_pfn; */ extern int isolate_lru_page(struct page *page); extern void putback_lru_page(struct page *page); +extern unsigned long zone_reclaimable_pages(struct zone *zone); +extern bool zone_reclaimable(struct zone *zone); /* * in mm/rmap.c: diff -puN mm/migrate.c~mm-vmscan-fix-do_try_to_free_pages-livelock mm/migrate.c --- a/mm/migrate.c~mm-vmscan-fix-do_try_to_free_pages-livelock +++ a/mm/migrate.c @@ -1471,7 +1471,7 @@ static bool migrate_balanced_pgdat(struc if (!populated_zone(zone)) continue; - if (zone->all_unreclaimable) + if (!zone_reclaimable(zone)) continue; /* Avoid waking kswapd by allocating pages_to_migrate pages. */ diff -puN mm/page-writeback.c~mm-vmscan-fix-do_try_to_free_pages-livelock mm/page-writeback.c --- a/mm/page-writeback.c~mm-vmscan-fix-do_try_to_free_pages-livelock +++ a/mm/page-writeback.c @@ -36,8 +36,11 @@ #include #include #include +#include #include +#include "internal.h" + /* * Sleep at most 200ms at a time in balance_dirty_pages(). */ diff -puN mm/page_alloc.c~mm-vmscan-fix-do_try_to_free_pages-livelock mm/page_alloc.c --- a/mm/page_alloc.c~mm-vmscan-fix-do_try_to_free_pages-livelock +++ a/mm/page_alloc.c @@ -56,6 +56,7 @@ #include #include #include +#include #include #include #include @@ -647,7 +648,6 @@ static void free_pcppages_bulk(struct zo int to_free = count; spin_lock(&zone->lock); - zone->all_unreclaimable = 0; zone->pages_scanned = 0; while (to_free) { @@ -696,7 +696,6 @@ static void free_one_page(struct zone *z int migratetype) { spin_lock(&zone->lock); - zone->all_unreclaimable = 0; zone->pages_scanned = 0; __free_one_page(page, zone, order, migratetype); @@ -3164,7 +3163,7 @@ void show_free_areas(unsigned int filter K(zone_page_state(zone, NR_FREE_CMA_PAGES)), K(zone_page_state(zone, NR_WRITEBACK_TEMP)), zone->pages_scanned, - (zone->all_unreclaimable ? "yes" : "no") + (!zone_reclaimable(zone) ? "yes" : "no") ); printk("lowmem_reserve[]:"); for (i = 0; i < MAX_NR_ZONES; i++) diff -puN mm/vmscan.c~mm-vmscan-fix-do_try_to_free_pages-livelock mm/vmscan.c --- a/mm/vmscan.c~mm-vmscan-fix-do_try_to_free_pages-livelock +++ a/mm/vmscan.c @@ -146,6 +146,25 @@ static bool global_reclaim(struct scan_c } #endif +unsigned long zone_reclaimable_pages(struct zone *zone) +{ + int nr; + + nr = zone_page_state(zone, NR_ACTIVE_FILE) + + zone_page_state(zone, NR_INACTIVE_FILE); + + if (get_nr_swap_pages() > 0) + nr += zone_page_state(zone, NR_ACTIVE_ANON) + + zone_page_state(zone, NR_INACTIVE_ANON); + + return nr; +} + +bool zone_reclaimable(struct zone *zone) +{ + return zone->pages_scanned < zone_reclaimable_pages(zone) * 6; +} + static unsigned long get_lru_size(struct lruvec *lruvec, enum lru_list lru) { if (!mem_cgroup_disabled()) @@ -1789,7 +1808,7 @@ static void get_scan_count(struct lruvec * latencies, so it's better to scan a minimum amount there as * well. */ - if (current_is_kswapd() && zone->all_unreclaimable) + if (current_is_kswapd() && !zone_reclaimable(zone)) force_scan = true; if (!global_reclaim(sc)) force_scan = true; @@ -2244,8 +2263,8 @@ static bool shrink_zones(struct zonelist if (global_reclaim(sc)) { if (!cpuset_zone_allowed_hardwall(zone, GFP_KERNEL)) continue; - if (zone->all_unreclaimable && - sc->priority != DEF_PRIORITY) + if (sc->priority != DEF_PRIORITY && + !zone_reclaimable(zone)) continue; /* Let kswapd poll it */ if (IS_ENABLED(CONFIG_COMPACTION)) { /* @@ -2283,11 +2302,6 @@ static bool shrink_zones(struct zonelist return aborted_reclaim; } -static bool zone_reclaimable(struct zone *zone) -{ - return zone->pages_scanned < zone_reclaimable_pages(zone) * 6; -} - /* All zones in zonelist are unreclaimable? */ static bool all_unreclaimable(struct zonelist *zonelist, struct scan_control *sc) @@ -2301,7 +2315,7 @@ static bool all_unreclaimable(struct zon continue; if (!cpuset_zone_allowed_hardwall(zone, GFP_KERNEL)) continue; - if (!zone->all_unreclaimable) + if (zone_reclaimable(zone)) return false; } @@ -2712,7 +2726,7 @@ static bool pgdat_balanced(pg_data_t *pg * DEF_PRIORITY. Effectively, it considers them balanced so * they must be considered balanced here as well! */ - if (zone->all_unreclaimable) { + if (!zone_reclaimable(zone)) { balanced_pages += zone->managed_pages; continue; } @@ -2773,7 +2787,6 @@ static bool kswapd_shrink_zone(struct zo unsigned long lru_pages, unsigned long *nr_attempted) { - unsigned long nr_slab; int testorder = sc->order; unsigned long balance_gap; struct reclaim_state *reclaim_state = current->reclaim_state; @@ -2818,15 +2831,12 @@ static bool kswapd_shrink_zone(struct zo shrink_zone(zone, sc); reclaim_state->reclaimed_slab = 0; - nr_slab = shrink_slab(&shrink, sc->nr_scanned, lru_pages); + shrink_slab(&shrink, sc->nr_scanned, lru_pages); sc->nr_reclaimed += reclaim_state->reclaimed_slab; /* Account for the number of pages attempted to reclaim */ *nr_attempted += sc->nr_to_reclaim; - if (nr_slab == 0 && !zone_reclaimable(zone)) - zone->all_unreclaimable = 1; - zone_clear_flag(zone, ZONE_WRITEBACK); /* @@ -2835,7 +2845,7 @@ static bool kswapd_shrink_zone(struct zo * BDIs but as pressure is relieved, speculatively avoid congestion * waits. */ - if (!zone->all_unreclaimable && + if (zone_reclaimable(zone) && zone_balanced(zone, testorder, 0, classzone_idx)) { zone_clear_flag(zone, ZONE_CONGESTED); zone_clear_flag(zone, ZONE_TAIL_LRU_DIRTY); @@ -2901,8 +2911,8 @@ static unsigned long balance_pgdat(pg_da if (!populated_zone(zone)) continue; - if (zone->all_unreclaimable && - sc.priority != DEF_PRIORITY) + if (sc.priority != DEF_PRIORITY && + !zone_reclaimable(zone)) continue; /* @@ -2980,8 +2990,8 @@ static unsigned long balance_pgdat(pg_da if (!populated_zone(zone)) continue; - if (zone->all_unreclaimable && - sc.priority != DEF_PRIORITY) + if (sc.priority != DEF_PRIORITY && + !zone_reclaimable(zone)) continue; sc.nr_scanned = 0; @@ -3265,20 +3275,6 @@ unsigned long global_reclaimable_pages(v return nr; } -unsigned long zone_reclaimable_pages(struct zone *zone) -{ - int nr; - - nr = zone_page_state(zone, NR_ACTIVE_FILE) + - zone_page_state(zone, NR_INACTIVE_FILE); - - if (get_nr_swap_pages() > 0) - nr += zone_page_state(zone, NR_ACTIVE_ANON) + - zone_page_state(zone, NR_INACTIVE_ANON); - - return nr; -} - #ifdef CONFIG_HIBERNATION /* * Try to free `nr_to_reclaim' of memory, system-wide, and return the number of @@ -3576,7 +3572,7 @@ int zone_reclaim(struct zone *zone, gfp_ zone_page_state(zone, NR_SLAB_RECLAIMABLE) <= zone->min_slab_pages) return ZONE_RECLAIM_FULL; - if (zone->all_unreclaimable) + if (!zone_reclaimable(zone)) return ZONE_RECLAIM_FULL; /* diff -puN mm/vmstat.c~mm-vmscan-fix-do_try_to_free_pages-livelock mm/vmstat.c --- a/mm/vmstat.c~mm-vmscan-fix-do_try_to_free_pages-livelock +++ a/mm/vmstat.c @@ -19,6 +19,9 @@ #include #include #include +#include + +#include "internal.h" #ifdef CONFIG_VM_EVENT_COUNTERS DEFINE_PER_CPU(struct vm_event_state, vm_event_states) = {{0}}; @@ -1088,7 +1091,7 @@ static void zoneinfo_show_print(struct s "\n all_unreclaimable: %u" "\n start_pfn: %lu" "\n inactive_ratio: %u", - zone->all_unreclaimable, + !zone_reclaimable(zone), zone->zone_start_pfn, zone->inactive_ratio); seq_putc(m, '\n'); _ Patches currently in -mm which might be from cldu@marvell.com are origin.patch