* [PATCH 0/10] Reduce system disruption due to kswapd V3 @ 2013-04-11 19:57 Mel Gorman 2013-04-11 19:57 ` [PATCH 01/10] mm: vmscan: Limit the number of pages kswapd reclaims at each priority Mel Gorman ` (9 more replies) 0 siblings, 10 replies; 44+ messages in thread From: Mel Gorman @ 2013-04-11 19:57 UTC (permalink / raw) To: Andrew Morton Cc: Jiri Slaby, Valdis Kletnieks, Rik van Riel, Zlatko Calusic, Johannes Weiner, dormando, Michal Hocko, Kamezawa Hiroyuki, Linux-MM, LKML, Mel Gorman Big change is again related to proportional reclaim. Changelog since V2 o Preserve ratio properly for proportional scanning (kamezawa) Changelog since V1 o Rename ZONE_DIRTY to ZONE_TAIL_LRU_DIRTY (andi) o Reformat comment in shrink_page_list (andi) o Clarify some comments (dhillf) o Rework how the proportional scanning is preserved o Add PageReclaim check before kswapd starts writeback o Reset sc.nr_reclaimed on every full zone scan Kswapd and page reclaim behaviour has been screwy in one way or the other for a long time. Very broadly speaking it worked in the far past because machines were limited in memory so it did not have that many pages to scan and it stalled congestion_wait() frequently to prevent it going completely nuts. In recent times it has behaved very unsatisfactorily with some of the problems compounded by the removal of stall logic and the introduction of transparent hugepage support with high-order reclaims. There are many variations of bugs that are rooted in this area. One example is reports of a large copy operations or backup causing the machine to grind to a halt or applications pushed to swap. Sometimes in low memory situations a large percentage of memory suddenly gets reclaimed. In other cases an application starts and kswapd hits 100% CPU usage for prolonged periods of time and so on. There is now talk of introducing features like an extra free kbytes tunable to work around aspects of the problem instead of trying to deal with it. It's compounded by the problem that it can be very workload and machine specific. This series aims at addressing some of the worst of these problems without attempting to fundmentally alter how page reclaim works. Patches 1-2 limits the number of pages kswapd reclaims while still obeying the anon/file proportion of the LRUs it should be scanning. Patches 3-4 control how and when kswapd raises its scanning priority and deletes the scanning restart logic which is tricky to follow. Patch 5 notes that it is too easy for kswapd to reach priority 0 when scanning and then reclaim the world. Down with that sort of thing. Patch 6 notes that kswapd starts writeback based on scanning priority which is not necessarily related to dirty pages. It will have kswapd writeback pages if a number of unqueued dirty pages have been recently encountered at the tail of the LRU. Patch 7 notes that sometimes kswapd should stall waiting on IO to complete to reduce LRU churn and the likelihood that it'll reclaim young clean pages or push applications to swap. It will cause kswapd to block on IO if it detects that pages being reclaimed under writeback are recycling through the LRU before the IO completes. Patch 8 shrinks slab just once per priority scanned or if a zone is otherwise unreclaimable to avoid hammering slab when kswapd has to skip a large number of pages. Patches 9-10 are cosmetic but balance_pgdat() might be easier to follow. This was tested using memcached+memcachetest while some background IO was in progress as implemented by the parallel IO tests implement in MM Tests. memcachetest benchmarks how many operations/second memcached can service and it is run multiple times. It starts with no background IO and then re-runs the test with larger amounts of IO in the background to roughly simulate a large copy in progress. The expectation is that the IO should have little or no impact on memcachetest which is running entirely in memory. 3.9.0-rc6 3.9.0-rc6 vanilla lessdisrupt-v3r6 Ops memcachetest-0M 10868.00 ( 0.00%) 10932.00 ( 0.59%) Ops memcachetest-749M 10976.00 ( 0.00%) 10986.00 ( 0.09%) Ops memcachetest-2498M 3406.00 ( 0.00%) 10871.00 (219.17%) Ops memcachetest-4246M 2402.00 ( 0.00%) 10936.00 (355.29%) Ops io-duration-0M 0.00 ( 0.00%) 0.00 ( 0.00%) Ops io-duration-749M 15.00 ( 0.00%) 9.00 ( 40.00%) Ops io-duration-2498M 107.00 ( 0.00%) 27.00 ( 74.77%) Ops io-duration-4246M 193.00 ( 0.00%) 47.00 ( 75.65%) Ops swaptotal-0M 0.00 ( 0.00%) 0.00 ( 0.00%) Ops swaptotal-749M 155965.00 ( 0.00%) 25.00 ( 99.98%) Ops swaptotal-2498M 335917.00 ( 0.00%) 287.00 ( 99.91%) Ops swaptotal-4246M 463021.00 ( 0.00%) 0.00 ( 0.00%) Ops swapin-0M 0.00 ( 0.00%) 0.00 ( 0.00%) Ops swapin-749M 0.00 ( 0.00%) 0.00 ( 0.00%) Ops swapin-2498M 139128.00 ( 0.00%) 0.00 ( 0.00%) Ops swapin-4246M 156276.00 ( 0.00%) 0.00 ( 0.00%) Ops minorfaults-0M 1677257.00 ( 0.00%) 1642376.00 ( 2.08%) Ops minorfaults-749M 1819566.00 ( 0.00%) 1572243.00 ( 13.59%) Ops minorfaults-2498M 1842140.00 ( 0.00%) 1652508.00 ( 10.29%) Ops minorfaults-4246M 1796116.00 ( 0.00%) 1651464.00 ( 8.05%) Ops majorfaults-0M 6.00 ( 0.00%) 6.00 ( 0.00%) Ops majorfaults-749M 55.00 ( 0.00%) 49.00 ( 10.91%) Ops majorfaults-2498M 20936.00 ( 0.00%) 110.00 ( 99.47%) Ops majorfaults-4246M 22487.00 ( 0.00%) 185.00 ( 99.18%) Note how the vanilla kernels performance collapses when there is enough IO taking place in the background. This drop in performance is part of users complain of when they start backups. Note how the swapin and major fault figures indicate that processes were being pushed to swap prematurely. With the series applied, there is no noticable performance drop and while there is still some swap activity, it's tiny. 3.9.0-rc6 3.9.0-rc6 vanillalessdisrupt-v3r6 Page Ins 1281068 89224 Page Outs 15697620 11478616 Swap Ins 295654 0 Swap Outs 659499 312 Direct pages scanned 0 78668 Kswapd pages scanned 7166977 4416457 Kswapd pages reclaimed 1185518 1051751 Direct pages reclaimed 0 72993 Kswapd efficiency 16% 23% Kswapd velocity 5558.640 3420.614 Direct efficiency 100% 92% Direct velocity 0.000 60.930 Percentage direct scans 0% 1% Page writes by reclaim 2044715 2922251 Page writes file 1385216 2921939 Page writes anon 659499 312 Page reclaim immediate 4040 218 Page rescued immediate 0 0 Slabs scanned 35456 26624 Direct inode steals 0 0 Kswapd inode steals 19898 1420 Kswapd skipped wait 0 0 THP fault alloc 11 51 THP collapse alloc 574 609 THP splits 9 6 THP fault fallback 0 0 THP collapse fail 0 0 Compaction stalls 0 0 Compaction success 0 0 Compaction failures 0 0 Page migrate success 0 0 Page migrate failure 0 0 Compaction pages isolated 0 0 Compaction migrate scanned 0 0 Compaction free scanned 0 0 Compaction cost 0 0 NUMA PTE updates 0 0 NUMA hint faults 0 0 NUMA hint local faults 0 0 NUMA pages migrated 0 0 AutoNUMA cost 0 0 Note that kswapd efficiency is slightly improved. Unfortunately, also note that there is a small amount of direct reclaim due to kswapd no longer reclaiming the world. Using ftrace it would appear that the direct reclaim stalls are mostly harmless with the vast bulk of the stalls incurred by dd 2 gzip-3111 5 memcachetest-12607 26 tclsh-3109 67 tee-3110 89 flush-8:0-286 2055 dd-12795 There is a risk that kswapd not reclaiming the world may mean that it stays awake balancing zones, does not stall on the appropriate events and continually scans pages it cannot reclaim consuming CPU. This will be visible as continued high CPU usage but in my own tests I only saw a single spike lasting less than a second and I did not observe any problems related to reclaim while running the series on my desktop. include/linux/mmzone.h | 17 ++ mm/vmscan.c | 461 ++++++++++++++++++++++++++++++------------------- 2 files changed, 305 insertions(+), 173 deletions(-) -- 1.8.1.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH 01/10] mm: vmscan: Limit the number of pages kswapd reclaims at each priority 2013-04-11 19:57 [PATCH 0/10] Reduce system disruption due to kswapd V3 Mel Gorman @ 2013-04-11 19:57 ` Mel Gorman 2013-04-11 19:57 ` [PATCH 02/10] mm: vmscan: Obey proportional scanning requirements for kswapd Mel Gorman ` (8 subsequent siblings) 9 siblings, 0 replies; 44+ messages in thread From: Mel Gorman @ 2013-04-11 19:57 UTC (permalink / raw) To: Andrew Morton Cc: Jiri Slaby, Valdis Kletnieks, Rik van Riel, Zlatko Calusic, Johannes Weiner, dormando, Michal Hocko, Kamezawa Hiroyuki, Linux-MM, LKML, Mel Gorman The number of pages kswapd can reclaim is bound by the number of pages it scans which is related to the size of the zone and the scanning priority. In many cases the priority remains low because it's reset every SWAP_CLUSTER_MAX reclaimed pages but in the event kswapd scans a large number of pages it cannot reclaim, it will raise the priority and potentially discard a large percentage of the zone as sc->nr_to_reclaim is ULONG_MAX. The user-visible effect is a reclaim "spike" where a large percentage of memory is suddenly freed. It would be bad enough if this was just unused memory but because of how anon/file pages are balanced it is possible that applications get pushed to swap unnecessarily. This patch limits the number of pages kswapd will reclaim to the high watermark. Reclaim will still overshoot due to it not being a hard limit as shrink_lruvec() will ignore the sc.nr_to_reclaim at DEF_PRIORITY but it prevents kswapd reclaiming the world at higher priorities. The number of pages it reclaims is not adjusted for high-order allocations as kswapd will reclaim excessively if it is to balance zones for high-order allocations. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> --- mm/vmscan.c | 53 +++++++++++++++++++++++++++++------------------------ 1 file changed, 29 insertions(+), 24 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 88c5fed..4835a7a 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2593,6 +2593,32 @@ static bool prepare_kswapd_sleep(pg_data_t *pgdat, int order, long remaining, } /* + * kswapd shrinks the zone by the number of pages required to reach + * the high watermark. + */ +static void kswapd_shrink_zone(struct zone *zone, + struct scan_control *sc, + unsigned long lru_pages) +{ + unsigned long nr_slab; + struct reclaim_state *reclaim_state = current->reclaim_state; + struct shrink_control shrink = { + .gfp_mask = sc->gfp_mask, + }; + + /* Reclaim above the high watermark. */ + sc->nr_to_reclaim = max(SWAP_CLUSTER_MAX, high_wmark_pages(zone)); + shrink_zone(zone, sc); + + reclaim_state->reclaimed_slab = 0; + nr_slab = shrink_slab(&shrink, sc->nr_scanned, lru_pages); + sc->nr_reclaimed += reclaim_state->reclaimed_slab; + + if (nr_slab == 0 && !zone_reclaimable(zone)) + zone->all_unreclaimable = 1; +} + +/* * For kswapd, balance_pgdat() will work across all this node's zones until * they are all at high_wmark_pages(zone). * @@ -2619,27 +2645,16 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order, bool pgdat_is_balanced = false; int i; int end_zone = 0; /* Inclusive. 0 = ZONE_DMA */ - unsigned long total_scanned; - struct reclaim_state *reclaim_state = current->reclaim_state; unsigned long nr_soft_reclaimed; unsigned long nr_soft_scanned; struct scan_control sc = { .gfp_mask = GFP_KERNEL, .may_unmap = 1, .may_swap = 1, - /* - * kswapd doesn't want to be bailed out while reclaim. because - * we want to put equal scanning pressure on each zone. - */ - .nr_to_reclaim = ULONG_MAX, .order = order, .target_mem_cgroup = NULL, }; - struct shrink_control shrink = { - .gfp_mask = sc.gfp_mask, - }; loop_again: - total_scanned = 0; sc.priority = DEF_PRIORITY; sc.nr_reclaimed = 0; sc.may_writepage = !laptop_mode; @@ -2710,7 +2725,7 @@ loop_again: */ for (i = 0; i <= end_zone; i++) { struct zone *zone = pgdat->node_zones + i; - int nr_slab, testorder; + int testorder; unsigned long balance_gap; if (!populated_zone(zone)) @@ -2730,7 +2745,6 @@ loop_again: order, sc.gfp_mask, &nr_soft_scanned); sc.nr_reclaimed += nr_soft_reclaimed; - total_scanned += nr_soft_scanned; /* * We put equal pressure on every zone, unless @@ -2759,17 +2773,8 @@ loop_again: if ((buffer_heads_over_limit && is_highmem_idx(i)) || !zone_balanced(zone, testorder, - balance_gap, end_zone)) { - shrink_zone(zone, &sc); - - reclaim_state->reclaimed_slab = 0; - nr_slab = shrink_slab(&shrink, sc.nr_scanned, lru_pages); - sc.nr_reclaimed += reclaim_state->reclaimed_slab; - total_scanned += sc.nr_scanned; - - if (nr_slab == 0 && !zone_reclaimable(zone)) - zone->all_unreclaimable = 1; - } + balance_gap, end_zone)) + kswapd_shrink_zone(zone, &sc, lru_pages); /* * If we're getting trouble reclaiming, start doing -- 1.8.1.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH 02/10] mm: vmscan: Obey proportional scanning requirements for kswapd 2013-04-11 19:57 [PATCH 0/10] Reduce system disruption due to kswapd V3 Mel Gorman 2013-04-11 19:57 ` [PATCH 01/10] mm: vmscan: Limit the number of pages kswapd reclaims at each priority Mel Gorman @ 2013-04-11 19:57 ` Mel Gorman 2013-04-18 15:01 ` Johannes Weiner 2013-04-11 19:57 ` [PATCH 03/10] mm: vmscan: Flatten kswapd priority loop Mel Gorman ` (7 subsequent siblings) 9 siblings, 1 reply; 44+ messages in thread From: Mel Gorman @ 2013-04-11 19:57 UTC (permalink / raw) To: Andrew Morton Cc: Jiri Slaby, Valdis Kletnieks, Rik van Riel, Zlatko Calusic, Johannes Weiner, dormando, Michal Hocko, Kamezawa Hiroyuki, Linux-MM, LKML, Mel Gorman Simplistically, the anon and file LRU lists are scanned proportionally depending on the value of vm.swappiness although there are other factors taken into account by get_scan_count(). The patch "mm: vmscan: Limit the number of pages kswapd reclaims" limits the number of pages kswapd reclaims but it breaks this proportional scanning and may evenly shrink anon/file LRUs regardless of vm.swappiness. This patch preserves the proportional scanning and reclaim. It does mean that kswapd will reclaim more than requested but the number of pages will be related to the high watermark. [mhocko@suse.cz: Correct proportional reclaim for memcg and simplify] [kamezawa.hiroyu@jp.fujitsu.com: Recalculate scan based on target] Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com> --- mm/vmscan.c | 64 +++++++++++++++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 56 insertions(+), 8 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 4835a7a..a6bca2c 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1821,17 +1821,24 @@ out: static void shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc) { unsigned long nr[NR_LRU_LISTS]; + unsigned long targets[NR_LRU_LISTS]; unsigned long nr_to_scan; enum lru_list lru; unsigned long nr_reclaimed = 0; unsigned long nr_to_reclaim = sc->nr_to_reclaim; struct blk_plug plug; + bool scan_adjusted = false; get_scan_count(lruvec, sc, nr); + /* Record the original scan target for proportional adjustments later */ + memcpy(targets, nr, sizeof(nr)); + blk_start_plug(&plug); while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] || nr[LRU_INACTIVE_FILE]) { + unsigned long nr_anon, nr_file, percentage; + for_each_evictable_lru(lru) { if (nr[lru]) { nr_to_scan = min(nr[lru], SWAP_CLUSTER_MAX); @@ -1841,17 +1848,58 @@ static void shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc) lruvec, sc); } } + + if (nr_reclaimed < nr_to_reclaim || scan_adjusted) + continue; + /* - * On large memory systems, scan >> priority can become - * really large. This is fine for the starting priority; - * we want to put equal scanning pressure on each zone. - * However, if the VM has a harder time of freeing pages, - * with multiple processes reclaiming pages, the total - * freeing target can get unreasonably large. + * For global direct reclaim, reclaim only the number of pages + * requested. Less care is taken to scan proportionally as it + * is more important to minimise direct reclaim stall latency + * than it is to properly age the LRU lists. */ - if (nr_reclaimed >= nr_to_reclaim && - sc->priority < DEF_PRIORITY) + if (global_reclaim(sc) && !current_is_kswapd()) break; + + /* + * For kswapd and memcg, reclaim at least the number of pages + * requested. Ensure that the anon and file LRUs shrink + * proportionally what was requested by get_scan_count(). We + * stop reclaiming one LRU and reduce the amount scanning + * proportional to the original scan target. + */ + nr_file = nr[LRU_INACTIVE_FILE] + nr[LRU_ACTIVE_FILE]; + nr_anon = nr[LRU_INACTIVE_ANON] + nr[LRU_ACTIVE_ANON]; + + if (nr_file > nr_anon) { + unsigned long scan_target = targets[LRU_INACTIVE_ANON] + + targets[LRU_ACTIVE_ANON] + 1; + lru = LRU_BASE; + percentage = nr_anon * 100 / scan_target; + } else { + unsigned long scan_target = targets[LRU_INACTIVE_FILE] + + targets[LRU_ACTIVE_FILE] + 1; + lru = LRU_FILE; + percentage = nr_file * 100 / scan_target; + } + + /* Stop scanning the smaller of the LRU */ + nr[lru] = 0; + nr[lru + LRU_ACTIVE] = 0; + + /* + * Recalculate the other LRU scan count based on its original + * scan target and the percentage scanning already complete + */ + lru = (lru == LRU_FILE) ? LRU_BASE : LRU_FILE; + nr[lru] = targets[lru] * (100 - percentage) / 100; + nr[lru] -= min(nr[lru], (targets[lru] - nr[lru])); + + lru += LRU_ACTIVE; + nr[lru] = targets[lru] * (100 - percentage) / 100; + nr[lru] -= min(nr[lru], (targets[lru] - nr[lru])); + + scan_adjusted = true; } blk_finish_plug(&plug); sc->nr_reclaimed += nr_reclaimed; -- 1.8.1.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH 02/10] mm: vmscan: Obey proportional scanning requirements for kswapd 2013-04-11 19:57 ` [PATCH 02/10] mm: vmscan: Obey proportional scanning requirements for kswapd Mel Gorman @ 2013-04-18 15:01 ` Johannes Weiner 2013-04-18 15:58 ` Mel Gorman 0 siblings, 1 reply; 44+ messages in thread From: Johannes Weiner @ 2013-04-18 15:01 UTC (permalink / raw) To: Mel Gorman Cc: Andrew Morton, Jiri Slaby, Valdis Kletnieks, Rik van Riel, Zlatko Calusic, dormando, Michal Hocko, Kamezawa Hiroyuki, Linux-MM, LKML On Thu, Apr 11, 2013 at 08:57:50PM +0100, Mel Gorman wrote: > @@ -1841,17 +1848,58 @@ static void shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc) > lruvec, sc); > } > } > + > + if (nr_reclaimed < nr_to_reclaim || scan_adjusted) > + continue; > + > /* > - * On large memory systems, scan >> priority can become > - * really large. This is fine for the starting priority; > - * we want to put equal scanning pressure on each zone. > - * However, if the VM has a harder time of freeing pages, > - * with multiple processes reclaiming pages, the total > - * freeing target can get unreasonably large. > + * For global direct reclaim, reclaim only the number of pages > + * requested. Less care is taken to scan proportionally as it > + * is more important to minimise direct reclaim stall latency > + * than it is to properly age the LRU lists. > */ > - if (nr_reclaimed >= nr_to_reclaim && > - sc->priority < DEF_PRIORITY) > + if (global_reclaim(sc) && !current_is_kswapd()) > break; > + > + /* > + * For kswapd and memcg, reclaim at least the number of pages > + * requested. Ensure that the anon and file LRUs shrink > + * proportionally what was requested by get_scan_count(). We > + * stop reclaiming one LRU and reduce the amount scanning > + * proportional to the original scan target. > + */ > + nr_file = nr[LRU_INACTIVE_FILE] + nr[LRU_ACTIVE_FILE]; > + nr_anon = nr[LRU_INACTIVE_ANON] + nr[LRU_ACTIVE_ANON]; > + > + if (nr_file > nr_anon) { > + unsigned long scan_target = targets[LRU_INACTIVE_ANON] + > + targets[LRU_ACTIVE_ANON] + 1; > + lru = LRU_BASE; > + percentage = nr_anon * 100 / scan_target; > + } else { > + unsigned long scan_target = targets[LRU_INACTIVE_FILE] + > + targets[LRU_ACTIVE_FILE] + 1; > + lru = LRU_FILE; > + percentage = nr_file * 100 / scan_target; > + } > + > + /* Stop scanning the smaller of the LRU */ > + nr[lru] = 0; > + nr[lru + LRU_ACTIVE] = 0; > + > + /* > + * Recalculate the other LRU scan count based on its original > + * scan target and the percentage scanning already complete > + */ > + lru = (lru == LRU_FILE) ? LRU_BASE : LRU_FILE; > + nr[lru] = targets[lru] * (100 - percentage) / 100; > + nr[lru] -= min(nr[lru], (targets[lru] - nr[lru])); This doesn't seem right. Say percentage is 60, then nr[lru] = targets[lru] * (100 - percentage) / 100; sets nr[lru] to 40% of targets[lru], and so in nr[lru] -= min(nr[lru], (targets[lru] - nr[lru])); targets[lru] - nr[lru] is 60% of targets[lru], making it bigger than nr[lru], which is in turn subtracted from itself, i.e. it leaves the remaining type at 0 if >= 50% of the other type were scanned, and at half of the inverted scan percentage if less than 50% were scanned. Would this be more sensible? already_scanned = targets[lru] - nr[lru]; nr[lru] = targets[lru] * percentage / 100; /* adjusted original target */ nr[lru] -= min(nr[lru], already_scanned); /* minus work already done */ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 02/10] mm: vmscan: Obey proportional scanning requirements for kswapd 2013-04-18 15:01 ` Johannes Weiner @ 2013-04-18 15:58 ` Mel Gorman 0 siblings, 0 replies; 44+ messages in thread From: Mel Gorman @ 2013-04-18 15:58 UTC (permalink / raw) To: Johannes Weiner Cc: Andrew Morton, Jiri Slaby, Valdis Kletnieks, Rik van Riel, Zlatko Calusic, dormando, Michal Hocko, Kamezawa Hiroyuki, Linux-MM, LKML On Thu, Apr 18, 2013 at 08:01:05AM -0700, Johannes Weiner wrote: > On Thu, Apr 11, 2013 at 08:57:50PM +0100, Mel Gorman wrote: > > @@ -1841,17 +1848,58 @@ static void shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc) > > lruvec, sc); > > } > > } > > + > > + if (nr_reclaimed < nr_to_reclaim || scan_adjusted) > > + continue; > > + > > /* > > - * On large memory systems, scan >> priority can become > > - * really large. This is fine for the starting priority; > > - * we want to put equal scanning pressure on each zone. > > - * However, if the VM has a harder time of freeing pages, > > - * with multiple processes reclaiming pages, the total > > - * freeing target can get unreasonably large. > > + * For global direct reclaim, reclaim only the number of pages > > + * requested. Less care is taken to scan proportionally as it > > + * is more important to minimise direct reclaim stall latency > > + * than it is to properly age the LRU lists. > > */ > > - if (nr_reclaimed >= nr_to_reclaim && > > - sc->priority < DEF_PRIORITY) > > + if (global_reclaim(sc) && !current_is_kswapd()) > > break; > > + > > + /* > > + * For kswapd and memcg, reclaim at least the number of pages > > + * requested. Ensure that the anon and file LRUs shrink > > + * proportionally what was requested by get_scan_count(). We > > + * stop reclaiming one LRU and reduce the amount scanning > > + * proportional to the original scan target. > > + */ > > + nr_file = nr[LRU_INACTIVE_FILE] + nr[LRU_ACTIVE_FILE]; > > + nr_anon = nr[LRU_INACTIVE_ANON] + nr[LRU_ACTIVE_ANON]; > > + > > + if (nr_file > nr_anon) { > > + unsigned long scan_target = targets[LRU_INACTIVE_ANON] + > > + targets[LRU_ACTIVE_ANON] + 1; > > + lru = LRU_BASE; > > + percentage = nr_anon * 100 / scan_target; > > + } else { > > + unsigned long scan_target = targets[LRU_INACTIVE_FILE] + > > + targets[LRU_ACTIVE_FILE] + 1; > > + lru = LRU_FILE; > > + percentage = nr_file * 100 / scan_target; > > + } > > + > > + /* Stop scanning the smaller of the LRU */ > > + nr[lru] = 0; > > + nr[lru + LRU_ACTIVE] = 0; > > + > > + /* > > + * Recalculate the other LRU scan count based on its original > > + * scan target and the percentage scanning already complete > > + */ > > + lru = (lru == LRU_FILE) ? LRU_BASE : LRU_FILE; > > + nr[lru] = targets[lru] * (100 - percentage) / 100; > > + nr[lru] -= min(nr[lru], (targets[lru] - nr[lru])); > > This doesn't seem right. Say percentage is 60, then > > nr[lru] = targets[lru] * (100 - percentage) / 100; > > sets nr[lru] to 40% of targets[lru], and so in > > nr[lru] -= min(nr[lru], (targets[lru] - nr[lru])); > > targets[lru] - nr[lru] is 60% of targets[lru], making it bigger than > nr[lru], which is in turn subtracted from itself, i.e. it leaves the > remaining type at 0 if >= 50% of the other type were scanned, and at > half of the inverted scan percentage if less than 50% were scanned. > > Would this be more sensible? > > already_scanned = targets[lru] - nr[lru]; > nr[lru] = targets[lru] * percentage / 100; /* adjusted original target */ > nr[lru] -= min(nr[lru], already_scanned); /* minus work already done */ Bah, yes, that was the intent as I was writing it. It's not what came out my fingers. Thanks for the bashing with a clue stick. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH 03/10] mm: vmscan: Flatten kswapd priority loop 2013-04-11 19:57 [PATCH 0/10] Reduce system disruption due to kswapd V3 Mel Gorman 2013-04-11 19:57 ` [PATCH 01/10] mm: vmscan: Limit the number of pages kswapd reclaims at each priority Mel Gorman 2013-04-11 19:57 ` [PATCH 02/10] mm: vmscan: Obey proportional scanning requirements for kswapd Mel Gorman @ 2013-04-11 19:57 ` Mel Gorman 2013-04-18 15:02 ` Johannes Weiner 2013-04-11 19:57 ` [PATCH 04/10] mm: vmscan: Decide whether to compact the pgdat based on reclaim progress Mel Gorman ` (6 subsequent siblings) 9 siblings, 1 reply; 44+ messages in thread From: Mel Gorman @ 2013-04-11 19:57 UTC (permalink / raw) To: Andrew Morton Cc: Jiri Slaby, Valdis Kletnieks, Rik van Riel, Zlatko Calusic, Johannes Weiner, dormando, Michal Hocko, Kamezawa Hiroyuki, Linux-MM, LKML, Mel Gorman kswapd stops raising the scanning priority when at least SWAP_CLUSTER_MAX pages have been reclaimed or the pgdat is considered balanced. It then rechecks if it needs to restart at DEF_PRIORITY and whether high-order reclaim needs to be reset. This is not wrong per-se but it is confusing to follow and forcing kswapd to stay at DEF_PRIORITY may require several restarts before it has scanned enough pages to meet the high watermark even at 100% efficiency. This patch irons out the logic a bit by controlling when priority is raised and removing the "goto loop_again". This patch has kswapd raise the scanning priority until it is scanning enough pages that it could meet the high watermark in one shrink of the LRU lists if it is able to reclaim at 100% efficiency. It will not raise the scanning prioirty higher unless it is failing to reclaim any pages. To avoid infinite looping for high-order allocation requests kswapd will not reclaim for high-order allocations when it has reclaimed at least twice the number of pages as the allocation request. Signed-off-by: Mel Gorman <mgorman@suse.de> --- mm/vmscan.c | 86 +++++++++++++++++++++++++++++-------------------------------- 1 file changed, 41 insertions(+), 45 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index a6bca2c..f979a67 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2643,8 +2643,12 @@ static bool prepare_kswapd_sleep(pg_data_t *pgdat, int order, long remaining, /* * kswapd shrinks the zone by the number of pages required to reach * the high watermark. + * + * Returns true if kswapd scanned at least the requested number of pages to + * reclaim. This is used to determine if the scanning priority needs to be + * raised. */ -static void kswapd_shrink_zone(struct zone *zone, +static bool kswapd_shrink_zone(struct zone *zone, struct scan_control *sc, unsigned long lru_pages) { @@ -2664,6 +2668,8 @@ static void kswapd_shrink_zone(struct zone *zone, if (nr_slab == 0 && !zone_reclaimable(zone)) zone->all_unreclaimable = 1; + + return sc->nr_scanned >= sc->nr_to_reclaim; } /* @@ -2690,26 +2696,26 @@ static void kswapd_shrink_zone(struct zone *zone, static unsigned long balance_pgdat(pg_data_t *pgdat, int order, int *classzone_idx) { - bool pgdat_is_balanced = false; int i; int end_zone = 0; /* Inclusive. 0 = ZONE_DMA */ unsigned long nr_soft_reclaimed; unsigned long nr_soft_scanned; struct scan_control sc = { .gfp_mask = GFP_KERNEL, + .priority = DEF_PRIORITY, .may_unmap = 1, .may_swap = 1, + .may_writepage = !laptop_mode, .order = order, .target_mem_cgroup = NULL, }; -loop_again: - sc.priority = DEF_PRIORITY; - sc.nr_reclaimed = 0; - sc.may_writepage = !laptop_mode; count_vm_event(PAGEOUTRUN); do { unsigned long lru_pages = 0; + bool raise_priority = true; + + sc.nr_reclaimed = 0; /* * Scan in the highmem->dma direction for the highest @@ -2751,10 +2757,8 @@ loop_again: } } - if (i < 0) { - pgdat_is_balanced = true; + if (i < 0) goto out; - } for (i = 0; i <= end_zone; i++) { struct zone *zone = pgdat->node_zones + i; @@ -2821,8 +2825,16 @@ loop_again: if ((buffer_heads_over_limit && is_highmem_idx(i)) || !zone_balanced(zone, testorder, - balance_gap, end_zone)) - kswapd_shrink_zone(zone, &sc, lru_pages); + balance_gap, end_zone)) { + /* + * There should be no need to raise the + * scanning priority if enough pages are + * already being scanned that high + * watermark would be met at 100% efficiency. + */ + if (kswapd_shrink_zone(zone, &sc, lru_pages)) + raise_priority = false; + } /* * If we're getting trouble reclaiming, start doing @@ -2857,46 +2869,29 @@ loop_again: pfmemalloc_watermark_ok(pgdat)) wake_up(&pgdat->pfmemalloc_wait); - if (pgdat_balanced(pgdat, order, *classzone_idx)) { - pgdat_is_balanced = true; - break; /* kswapd: all done */ - } - /* - * We do this so kswapd doesn't build up large priorities for - * example when it is freeing in parallel with allocators. It - * matches the direct reclaim path behaviour in terms of impact - * on zone->*_priority. + * Fragmentation may mean that the system cannot be rebalanced + * for high-order allocations in all zones. If twice the + * allocation size has been reclaimed and the zones are still + * not balanced then recheck the watermarks at order-0 to + * prevent kswapd reclaiming excessively. Assume that a + * process requested a high-order can direct reclaim/compact. */ - if (sc.nr_reclaimed >= SWAP_CLUSTER_MAX) - break; - } while (--sc.priority >= 0); - -out: - if (!pgdat_is_balanced) { - cond_resched(); + if (order && sc.nr_reclaimed >= 2UL << order) + order = sc.order = 0; - try_to_freeze(); + /* Check if kswapd should be suspending */ + if (try_to_freeze() || kthread_should_stop()) + break; /* - * Fragmentation may mean that the system cannot be - * rebalanced for high-order allocations in all zones. - * At this point, if nr_reclaimed < SWAP_CLUSTER_MAX, - * it means the zones have been fully scanned and are still - * not balanced. For high-order allocations, there is - * little point trying all over again as kswapd may - * infinite loop. - * - * Instead, recheck all watermarks at order-0 as they - * are the most important. If watermarks are ok, kswapd will go - * back to sleep. High-order users can still perform direct - * reclaim if they wish. + * Raise priority if scanning rate is too low or there was no + * progress in reclaiming pages */ - if (sc.nr_reclaimed < SWAP_CLUSTER_MAX) - order = sc.order = 0; - - goto loop_again; - } + if (raise_priority || !sc.nr_reclaimed) + sc.priority--; + } while (sc.priority >= 0 && + !pgdat_balanced(pgdat, order, *classzone_idx)); /* * If kswapd was reclaiming at a higher order, it has the option of @@ -2925,6 +2920,7 @@ out: compact_pgdat(pgdat, order); } +out: /* * Return the order we were reclaiming at so prepare_kswapd_sleep() * makes a decision on the order we were last reclaiming at. However, -- 1.8.1.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH 03/10] mm: vmscan: Flatten kswapd priority loop 2013-04-11 19:57 ` [PATCH 03/10] mm: vmscan: Flatten kswapd priority loop Mel Gorman @ 2013-04-18 15:02 ` Johannes Weiner 0 siblings, 0 replies; 44+ messages in thread From: Johannes Weiner @ 2013-04-18 15:02 UTC (permalink / raw) To: Mel Gorman Cc: Andrew Morton, Jiri Slaby, Valdis Kletnieks, Rik van Riel, Zlatko Calusic, dormando, Michal Hocko, Kamezawa Hiroyuki, Linux-MM, LKML On Thu, Apr 11, 2013 at 08:57:51PM +0100, Mel Gorman wrote: > kswapd stops raising the scanning priority when at least SWAP_CLUSTER_MAX > pages have been reclaimed or the pgdat is considered balanced. It then > rechecks if it needs to restart at DEF_PRIORITY and whether high-order > reclaim needs to be reset. This is not wrong per-se but it is confusing > to follow and forcing kswapd to stay at DEF_PRIORITY may require several > restarts before it has scanned enough pages to meet the high watermark even > at 100% efficiency. This patch irons out the logic a bit by controlling > when priority is raised and removing the "goto loop_again". > > This patch has kswapd raise the scanning priority until it is scanning > enough pages that it could meet the high watermark in one shrink of the > LRU lists if it is able to reclaim at 100% efficiency. It will not raise > the scanning prioirty higher unless it is failing to reclaim any pages. > > To avoid infinite looping for high-order allocation requests kswapd will > not reclaim for high-order allocations when it has reclaimed at least > twice the number of pages as the allocation request. > > Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Johannes Weiner <hannes@cmpxchg.org> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH 04/10] mm: vmscan: Decide whether to compact the pgdat based on reclaim progress 2013-04-11 19:57 [PATCH 0/10] Reduce system disruption due to kswapd V3 Mel Gorman ` (2 preceding siblings ...) 2013-04-11 19:57 ` [PATCH 03/10] mm: vmscan: Flatten kswapd priority loop Mel Gorman @ 2013-04-11 19:57 ` Mel Gorman 2013-04-18 15:09 ` Johannes Weiner 2013-04-11 19:57 ` [PATCH 05/10] mm: vmscan: Do not allow kswapd to scan at maximum priority Mel Gorman ` (5 subsequent siblings) 9 siblings, 1 reply; 44+ messages in thread From: Mel Gorman @ 2013-04-11 19:57 UTC (permalink / raw) To: Andrew Morton Cc: Jiri Slaby, Valdis Kletnieks, Rik van Riel, Zlatko Calusic, Johannes Weiner, dormando, Michal Hocko, Kamezawa Hiroyuki, Linux-MM, LKML, Mel Gorman In the past, kswapd makes a decision on whether to compact memory after the pgdat was considered balanced. This more or less worked but it is late to make such a decision and does not fit well now that kswapd makes a decision whether to exit the zone scanning loop depending on reclaim progress. This patch will compact a pgdat if at least the requested number of pages were reclaimed from unbalanced zones for a given priority. If any zone is currently balanced, kswapd will not call compaction as it is expected the necessary pages are already available. Signed-off-by: Mel Gorman <mgorman@suse.de> --- mm/vmscan.c | 59 ++++++++++++++++++++++++++++++----------------------------- 1 file changed, 30 insertions(+), 29 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index f979a67..25d89af 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2650,7 +2650,8 @@ static bool prepare_kswapd_sleep(pg_data_t *pgdat, int order, long remaining, */ static bool kswapd_shrink_zone(struct zone *zone, struct scan_control *sc, - unsigned long lru_pages) + unsigned long lru_pages, + unsigned long *nr_attempted) { unsigned long nr_slab; struct reclaim_state *reclaim_state = current->reclaim_state; @@ -2666,6 +2667,9 @@ static bool kswapd_shrink_zone(struct zone *zone, nr_slab = shrink_slab(&shrink, sc->nr_scanned, lru_pages); sc->nr_reclaimed += reclaim_state->reclaimed_slab; + /* Account for the number of pages attempted to reclaim */ + *nr_attempted += sc->nr_to_reclaim; + if (nr_slab == 0 && !zone_reclaimable(zone)) zone->all_unreclaimable = 1; @@ -2713,7 +2717,9 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order, do { unsigned long lru_pages = 0; + unsigned long nr_attempted = 0; bool raise_priority = true; + bool pgdat_needs_compaction = (order > 0); sc.nr_reclaimed = 0; @@ -2763,7 +2769,21 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order, for (i = 0; i <= end_zone; i++) { struct zone *zone = pgdat->node_zones + i; + if (!populated_zone(zone)) + continue; + lru_pages += zone_reclaimable_pages(zone); + + /* + * If any zone is currently balanced then kswapd will + * not call compaction as it is expected that the + * necessary pages are already available. + */ + if (pgdat_needs_compaction && + zone_watermark_ok(zone, order, + low_wmark_pages(zone), + *classzone_idx, 0)) + pgdat_needs_compaction = false; } /* @@ -2832,7 +2852,8 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order, * already being scanned that high * watermark would be met at 100% efficiency. */ - if (kswapd_shrink_zone(zone, &sc, lru_pages)) + if (kswapd_shrink_zone(zone, &sc, lru_pages, + &nr_attempted)) raise_priority = false; } @@ -2885,6 +2906,13 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order, break; /* + * Compact if necessary and kswapd is reclaiming at least the + * high watermark number of pages as requsted + */ + if (pgdat_needs_compaction && sc.nr_reclaimed > nr_attempted) + compact_pgdat(pgdat, order); + + /* * Raise priority if scanning rate is too low or there was no * progress in reclaiming pages */ @@ -2893,33 +2921,6 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order, } while (sc.priority >= 0 && !pgdat_balanced(pgdat, order, *classzone_idx)); - /* - * If kswapd was reclaiming at a higher order, it has the option of - * sleeping without all zones being balanced. Before it does, it must - * ensure that the watermarks for order-0 on *all* zones are met and - * that the congestion flags are cleared. The congestion flag must - * be cleared as kswapd is the only mechanism that clears the flag - * and it is potentially going to sleep here. - */ - if (order) { - int zones_need_compaction = 1; - - for (i = 0; i <= end_zone; i++) { - struct zone *zone = pgdat->node_zones + i; - - if (!populated_zone(zone)) - continue; - - /* Check if the memory needs to be defragmented. */ - if (zone_watermark_ok(zone, order, - low_wmark_pages(zone), *classzone_idx, 0)) - zones_need_compaction = 0; - } - - if (zones_need_compaction) - compact_pgdat(pgdat, order); - } - out: /* * Return the order we were reclaiming at so prepare_kswapd_sleep() -- 1.8.1.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH 04/10] mm: vmscan: Decide whether to compact the pgdat based on reclaim progress 2013-04-11 19:57 ` [PATCH 04/10] mm: vmscan: Decide whether to compact the pgdat based on reclaim progress Mel Gorman @ 2013-04-18 15:09 ` Johannes Weiner 0 siblings, 0 replies; 44+ messages in thread From: Johannes Weiner @ 2013-04-18 15:09 UTC (permalink / raw) To: Mel Gorman Cc: Andrew Morton, Jiri Slaby, Valdis Kletnieks, Rik van Riel, Zlatko Calusic, dormando, Michal Hocko, Kamezawa Hiroyuki, Linux-MM, LKML On Thu, Apr 11, 2013 at 08:57:52PM +0100, Mel Gorman wrote: > @@ -2763,7 +2769,21 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order, > for (i = 0; i <= end_zone; i++) { > struct zone *zone = pgdat->node_zones + i; > > + if (!populated_zone(zone)) > + continue; > + > lru_pages += zone_reclaimable_pages(zone); > + > + /* > + * If any zone is currently balanced then kswapd will > + * not call compaction as it is expected that the > + * necessary pages are already available. > + */ > + if (pgdat_needs_compaction && > + zone_watermark_ok(zone, order, > + low_wmark_pages(zone), > + *classzone_idx, 0)) > + pgdat_needs_compaction = false; > } > > /* > @@ -2832,7 +2852,8 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order, > * already being scanned that high > * watermark would be met at 100% efficiency. > */ > - if (kswapd_shrink_zone(zone, &sc, lru_pages)) > + if (kswapd_shrink_zone(zone, &sc, lru_pages, > + &nr_attempted)) > raise_priority = false; > } There is the odd chance that the watermark is met after reclaim, would it make sense to defer the pgdat_needs_compaction check? Not really a big deal, though, so: Acked-by: Johannes Weiner <hannes@cmpxchg.org> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH 05/10] mm: vmscan: Do not allow kswapd to scan at maximum priority 2013-04-11 19:57 [PATCH 0/10] Reduce system disruption due to kswapd V3 Mel Gorman ` (3 preceding siblings ...) 2013-04-11 19:57 ` [PATCH 04/10] mm: vmscan: Decide whether to compact the pgdat based on reclaim progress Mel Gorman @ 2013-04-11 19:57 ` Mel Gorman 2013-04-18 15:11 ` Johannes Weiner 2013-04-11 19:57 ` [PATCH 06/10] mm: vmscan: Have kswapd writeback pages based on dirty pages encountered, not priority Mel Gorman ` (4 subsequent siblings) 9 siblings, 1 reply; 44+ messages in thread From: Mel Gorman @ 2013-04-11 19:57 UTC (permalink / raw) To: Andrew Morton Cc: Jiri Slaby, Valdis Kletnieks, Rik van Riel, Zlatko Calusic, Johannes Weiner, dormando, Michal Hocko, Kamezawa Hiroyuki, Linux-MM, LKML, Mel Gorman Page reclaim at priority 0 will scan the entire LRU as priority 0 is considered to be a near OOM condition. Kswapd can reach priority 0 quite easily if it is encountering a large number of pages it cannot reclaim such as pages under writeback. When this happens, kswapd reclaims very aggressively even though there may be no real risk of allocation failure or OOM. This patch prevents kswapd reaching priority 0 and trying to reclaim the world. Direct reclaimers will still reach priority 0 in the event of an OOM situation. Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> --- mm/vmscan.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 25d89af..bc4c2a7 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2918,7 +2918,7 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order, */ if (raise_priority || !sc.nr_reclaimed) sc.priority--; - } while (sc.priority >= 0 && + } while (sc.priority >= 1 && !pgdat_balanced(pgdat, order, *classzone_idx)); out: -- 1.8.1.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH 05/10] mm: vmscan: Do not allow kswapd to scan at maximum priority 2013-04-11 19:57 ` [PATCH 05/10] mm: vmscan: Do not allow kswapd to scan at maximum priority Mel Gorman @ 2013-04-18 15:11 ` Johannes Weiner 0 siblings, 0 replies; 44+ messages in thread From: Johannes Weiner @ 2013-04-18 15:11 UTC (permalink / raw) To: Mel Gorman Cc: Andrew Morton, Jiri Slaby, Valdis Kletnieks, Rik van Riel, Zlatko Calusic, dormando, Michal Hocko, Kamezawa Hiroyuki, Linux-MM, LKML On Thu, Apr 11, 2013 at 08:57:53PM +0100, Mel Gorman wrote: > Page reclaim at priority 0 will scan the entire LRU as priority 0 is > considered to be a near OOM condition. Kswapd can reach priority 0 quite > easily if it is encountering a large number of pages it cannot reclaim > such as pages under writeback. When this happens, kswapd reclaims very > aggressively even though there may be no real risk of allocation failure > or OOM. > > This patch prevents kswapd reaching priority 0 and trying to reclaim > the world. Direct reclaimers will still reach priority 0 in the event > of an OOM situation. > > Signed-off-by: Mel Gorman <mgorman@suse.de> > Acked-by: Rik van Riel <riel@redhat.com> > Reviewed-by: Michal Hocko <mhocko@suse.cz> Acked-by: Johannes Weiner <hannes@cmpxchg.org> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH 06/10] mm: vmscan: Have kswapd writeback pages based on dirty pages encountered, not priority 2013-04-11 19:57 [PATCH 0/10] Reduce system disruption due to kswapd V3 Mel Gorman ` (4 preceding siblings ...) 2013-04-11 19:57 ` [PATCH 05/10] mm: vmscan: Do not allow kswapd to scan at maximum priority Mel Gorman @ 2013-04-11 19:57 ` Mel Gorman 2013-04-18 15:16 ` Johannes Weiner 2013-04-11 19:57 ` [PATCH 07/10] mm: vmscan: Block kswapd if it is encountering pages under writeback Mel Gorman ` (3 subsequent siblings) 9 siblings, 1 reply; 44+ messages in thread From: Mel Gorman @ 2013-04-11 19:57 UTC (permalink / raw) To: Andrew Morton Cc: Jiri Slaby, Valdis Kletnieks, Rik van Riel, Zlatko Calusic, Johannes Weiner, dormando, Michal Hocko, Kamezawa Hiroyuki, Linux-MM, LKML, Mel Gorman Currently kswapd queues dirty pages for writeback if scanning at an elevated priority but the priority kswapd scans at is not related to the number of unqueued dirty encountered. Since commit "mm: vmscan: Flatten kswapd priority loop", the priority is related to the size of the LRU and the zone watermark which is no indication as to whether kswapd should write pages or not. This patch tracks if an excessive number of unqueued dirty pages are being encountered at the end of the LRU. If so, it indicates that dirty pages are being recycled before flusher threads can clean them and flags the zone so that kswapd will start writing pages until the zone is balanced. Signed-off-by: Mel Gorman <mgorman@suse.de> --- include/linux/mmzone.h | 9 +++++++++ mm/vmscan.c | 31 +++++++++++++++++++++++++------ 2 files changed, 34 insertions(+), 6 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index c74092e..ecf0c7d 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -495,6 +495,10 @@ typedef enum { ZONE_CONGESTED, /* zone has many dirty pages backed by * a congested BDI */ + ZONE_TAIL_LRU_DIRTY, /* reclaim scanning has recently found + * many dirty file pages at the tail + * of the LRU. + */ } zone_flags_t; static inline void zone_set_flag(struct zone *zone, zone_flags_t flag) @@ -517,6 +521,11 @@ static inline int zone_is_reclaim_congested(const struct zone *zone) return test_bit(ZONE_CONGESTED, &zone->flags); } +static inline int zone_is_reclaim_dirty(const struct zone *zone) +{ + return test_bit(ZONE_TAIL_LRU_DIRTY, &zone->flags); +} + static inline int zone_is_reclaim_locked(const struct zone *zone) { return test_bit(ZONE_RECLAIM_LOCKED, &zone->flags); diff --git a/mm/vmscan.c b/mm/vmscan.c index bc4c2a7..22e8ca9 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -675,13 +675,14 @@ static unsigned long shrink_page_list(struct list_head *page_list, struct zone *zone, struct scan_control *sc, enum ttu_flags ttu_flags, - unsigned long *ret_nr_dirty, + unsigned long *ret_nr_unqueued_dirty, unsigned long *ret_nr_writeback, bool force_reclaim) { LIST_HEAD(ret_pages); LIST_HEAD(free_pages); int pgactivate = 0; + unsigned long nr_unqueued_dirty = 0; unsigned long nr_dirty = 0; unsigned long nr_congested = 0; unsigned long nr_reclaimed = 0; @@ -807,14 +808,17 @@ static unsigned long shrink_page_list(struct list_head *page_list, if (PageDirty(page)) { nr_dirty++; + if (!PageWriteback(page)) + nr_unqueued_dirty++; + /* * Only kswapd can writeback filesystem pages to - * avoid risk of stack overflow but do not writeback - * unless under significant pressure. + * avoid risk of stack overflow but only writeback + * if many dirty pages have been encountered. */ if (page_is_file_cache(page) && (!current_is_kswapd() || - sc->priority >= DEF_PRIORITY - 2)) { + !zone_is_reclaim_dirty(zone))) { /* * Immediately reclaim when written back. * Similar in principal to deactivate_page() @@ -959,7 +963,7 @@ keep: list_splice(&ret_pages, page_list); count_vm_events(PGACTIVATE, pgactivate); mem_cgroup_uncharge_end(); - *ret_nr_dirty += nr_dirty; + *ret_nr_unqueued_dirty += nr_unqueued_dirty; *ret_nr_writeback += nr_writeback; return nr_reclaimed; } @@ -1372,6 +1376,15 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec, (nr_taken >> (DEF_PRIORITY - sc->priority))) wait_iff_congested(zone, BLK_RW_ASYNC, HZ/10); + /* + * Similarly, if many dirty pages are encountered that are not + * currently being written then flag that kswapd should start + * writing back pages. + */ + if (global_reclaim(sc) && nr_dirty && + nr_dirty >= (nr_taken >> (DEF_PRIORITY - sc->priority))) + zone_set_flag(zone, ZONE_TAIL_LRU_DIRTY); + trace_mm_vmscan_lru_shrink_inactive(zone->zone_pgdat->node_id, zone_idx(zone), nr_scanned, nr_reclaimed, @@ -2758,8 +2771,12 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order, end_zone = i; break; } else { - /* If balanced, clear the congested flag */ + /* + * If balanced, clear the dirty and congested + * flags + */ zone_clear_flag(zone, ZONE_CONGESTED); + zone_clear_flag(zone, ZONE_TAIL_LRU_DIRTY); } } @@ -2877,8 +2894,10 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order, * possible there are dirty pages backed by * congested BDIs but as pressure is relieved, * speculatively avoid congestion waits + * or writing pages from kswapd context. */ zone_clear_flag(zone, ZONE_CONGESTED); + zone_clear_flag(zone, ZONE_TAIL_LRU_DIRTY); } /* -- 1.8.1.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH 06/10] mm: vmscan: Have kswapd writeback pages based on dirty pages encountered, not priority 2013-04-11 19:57 ` [PATCH 06/10] mm: vmscan: Have kswapd writeback pages based on dirty pages encountered, not priority Mel Gorman @ 2013-04-18 15:16 ` Johannes Weiner 0 siblings, 0 replies; 44+ messages in thread From: Johannes Weiner @ 2013-04-18 15:16 UTC (permalink / raw) To: Mel Gorman Cc: Andrew Morton, Jiri Slaby, Valdis Kletnieks, Rik van Riel, Zlatko Calusic, dormando, Michal Hocko, Kamezawa Hiroyuki, Linux-MM, LKML On Thu, Apr 11, 2013 at 08:57:54PM +0100, Mel Gorman wrote: > Currently kswapd queues dirty pages for writeback if scanning at an elevated > priority but the priority kswapd scans at is not related to the number > of unqueued dirty encountered. Since commit "mm: vmscan: Flatten kswapd > priority loop", the priority is related to the size of the LRU and the > zone watermark which is no indication as to whether kswapd should write > pages or not. > > This patch tracks if an excessive number of unqueued dirty pages are being > encountered at the end of the LRU. If so, it indicates that dirty pages > are being recycled before flusher threads can clean them and flags the > zone so that kswapd will start writing pages until the zone is balanced. > > Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Johannes Weiner <hannes@cmpxchg.org> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH 07/10] mm: vmscan: Block kswapd if it is encountering pages under writeback 2013-04-11 19:57 [PATCH 0/10] Reduce system disruption due to kswapd V3 Mel Gorman ` (5 preceding siblings ...) 2013-04-11 19:57 ` [PATCH 06/10] mm: vmscan: Have kswapd writeback pages based on dirty pages encountered, not priority Mel Gorman @ 2013-04-11 19:57 ` Mel Gorman 2013-04-11 19:57 ` [PATCH 08/10] mm: vmscan: Have kswapd shrink slab only once per priority Mel Gorman ` (2 subsequent siblings) 9 siblings, 0 replies; 44+ messages in thread From: Mel Gorman @ 2013-04-11 19:57 UTC (permalink / raw) To: Andrew Morton Cc: Jiri Slaby, Valdis Kletnieks, Rik van Riel, Zlatko Calusic, Johannes Weiner, dormando, Michal Hocko, Kamezawa Hiroyuki, Linux-MM, LKML, Mel Gorman Historically, kswapd used to congestion_wait() at higher priorities if it was not making forward progress. This made no sense as the failure to make progress could be completely independent of IO. It was later replaced by wait_iff_congested() and removed entirely by commit 258401a6 (mm: don't wait on congested zones in balance_pgdat()) as it was duplicating logic in shrink_inactive_list(). This is problematic. If kswapd encounters many pages under writeback and it continues to scan until it reaches the high watermark then it will quickly skip over the pages under writeback and reclaim clean young pages or push applications out to swap. The use of wait_iff_congested() is not suited to kswapd as it will only stall if the underlying BDI is really congested or a direct reclaimer was unable to write to the underlying BDI. kswapd bypasses the BDI congestion as it sets PF_SWAPWRITE but even if this was taken into account then it would cause direct reclaimers to stall on writeback which is not desirable. This patch sets a ZONE_WRITEBACK flag if direct reclaim or kswapd is encountering too many pages under writeback. If this flag is set and kswapd encounters a PageReclaim page under writeback then it'll assume that the LRU lists are being recycled too quickly before IO can complete and block waiting for some IO to complete. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Michal Hocko <mhocko@suse.cz> --- include/linux/mmzone.h | 8 ++++++ mm/vmscan.c | 78 ++++++++++++++++++++++++++++++++++++-------------- 2 files changed, 64 insertions(+), 22 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index ecf0c7d..264e203 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -499,6 +499,9 @@ typedef enum { * many dirty file pages at the tail * of the LRU. */ + ZONE_WRITEBACK, /* reclaim scanning has recently found + * many pages under writeback + */ } zone_flags_t; static inline void zone_set_flag(struct zone *zone, zone_flags_t flag) @@ -526,6 +529,11 @@ static inline int zone_is_reclaim_dirty(const struct zone *zone) return test_bit(ZONE_TAIL_LRU_DIRTY, &zone->flags); } +static inline int zone_is_reclaim_writeback(const struct zone *zone) +{ + return test_bit(ZONE_WRITEBACK, &zone->flags); +} + static inline int zone_is_reclaim_locked(const struct zone *zone) { return test_bit(ZONE_RECLAIM_LOCKED, &zone->flags); diff --git a/mm/vmscan.c b/mm/vmscan.c index 22e8ca9..a20f2a9 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -723,25 +723,51 @@ static unsigned long shrink_page_list(struct list_head *page_list, may_enter_fs = (sc->gfp_mask & __GFP_FS) || (PageSwapCache(page) && (sc->gfp_mask & __GFP_IO)); + /* + * If a page at the tail of the LRU is under writeback, there + * are three cases to consider. + * + * 1) If reclaim is encountering an excessive number of pages + * under writeback and this page is both under writeback and + * PageReclaim then it indicates that pages are being queued + * for IO but are being recycled through the LRU before the + * IO can complete. In this case, wait on the IO to complete + * and then clear the ZONE_WRITEBACK flag to recheck if the + * condition exists. + * + * 2) Global reclaim encounters a page, memcg encounters a + * page that is not marked for immediate reclaim or + * the caller does not have __GFP_IO. In this case mark + * the page for immediate reclaim and continue scanning. + * + * __GFP_IO is checked because a loop driver thread might + * enter reclaim, and deadlock if it waits on a page for + * which it is needed to do the write (loop masks off + * __GFP_IO|__GFP_FS for this reason); but more thought + * would probably show more reasons. + * + * Don't require __GFP_FS, since we're not going into the + * FS, just waiting on its writeback completion. Worryingly, + * ext4 gfs2 and xfs allocate pages with + * grab_cache_page_write_begin(,,AOP_FLAG_NOFS), so testing + * may_enter_fs here is liable to OOM on them. + * + * 3) memcg encounters a page that is not already marked + * PageReclaim. memcg does not have any dirty pages + * throttling so we could easily OOM just because too many + * pages are in writeback and there is nothing else to + * reclaim. Wait for the writeback to complete. + */ if (PageWriteback(page)) { - /* - * memcg doesn't have any dirty pages throttling so we - * could easily OOM just because too many pages are in - * writeback and there is nothing else to reclaim. - * - * Check __GFP_IO, certainly because a loop driver - * thread might enter reclaim, and deadlock if it waits - * on a page for which it is needed to do the write - * (loop masks off __GFP_IO|__GFP_FS for this reason); - * but more thought would probably show more reasons. - * - * Don't require __GFP_FS, since we're not going into - * the FS, just waiting on its writeback completion. - * Worryingly, ext4 gfs2 and xfs allocate pages with - * grab_cache_page_write_begin(,,AOP_FLAG_NOFS), so - * testing may_enter_fs here is liable to OOM on them. - */ - if (global_reclaim(sc) || + /* Case 1 above */ + if (current_is_kswapd() && + PageReclaim(page) && + zone_is_reclaim_writeback(zone)) { + wait_on_page_writeback(page); + zone_clear_flag(zone, ZONE_WRITEBACK); + + /* Case 2 above */ + } else if (global_reclaim(sc) || !PageReclaim(page) || !(sc->gfp_mask & __GFP_IO)) { /* * This is slightly racy - end_page_writeback() @@ -756,9 +782,13 @@ static unsigned long shrink_page_list(struct list_head *page_list, */ SetPageReclaim(page); nr_writeback++; + goto keep_locked; + + /* Case 3 above */ + } else { + wait_on_page_writeback(page); } - wait_on_page_writeback(page); } if (!force_reclaim) @@ -1373,8 +1403,10 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec, * isolated page is PageWriteback */ if (nr_writeback && nr_writeback >= - (nr_taken >> (DEF_PRIORITY - sc->priority))) + (nr_taken >> (DEF_PRIORITY - sc->priority))) { wait_iff_congested(zone, BLK_RW_ASYNC, HZ/10); + zone_set_flag(zone, ZONE_WRITEBACK); + } /* * Similarly, if many dirty pages are encountered that are not @@ -2658,8 +2690,8 @@ static bool prepare_kswapd_sleep(pg_data_t *pgdat, int order, long remaining, * the high watermark. * * Returns true if kswapd scanned at least the requested number of pages to - * reclaim. This is used to determine if the scanning priority needs to be - * raised. + * reclaim or if the lack of progress was due to pages under writeback. + * This is used to determine if the scanning priority needs to be raised. */ static bool kswapd_shrink_zone(struct zone *zone, struct scan_control *sc, @@ -2686,6 +2718,8 @@ static bool kswapd_shrink_zone(struct zone *zone, if (nr_slab == 0 && !zone_reclaimable(zone)) zone->all_unreclaimable = 1; + zone_clear_flag(zone, ZONE_WRITEBACK); + return sc->nr_scanned >= sc->nr_to_reclaim; } -- 1.8.1.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH 08/10] mm: vmscan: Have kswapd shrink slab only once per priority 2013-04-11 19:57 [PATCH 0/10] Reduce system disruption due to kswapd V3 Mel Gorman ` (6 preceding siblings ...) 2013-04-11 19:57 ` [PATCH 07/10] mm: vmscan: Block kswapd if it is encountering pages under writeback Mel Gorman @ 2013-04-11 19:57 ` Mel Gorman 2013-04-18 16:43 ` Johannes Weiner 2013-04-11 19:57 ` [PATCH 09/10] mm: vmscan: Check if kswapd should writepage once per pgdat scan Mel Gorman 2013-04-11 19:57 ` [PATCH 10/10] mm: vmscan: Move logic from balance_pgdat() to kswapd_shrink_zone() Mel Gorman 9 siblings, 1 reply; 44+ messages in thread From: Mel Gorman @ 2013-04-11 19:57 UTC (permalink / raw) To: Andrew Morton Cc: Jiri Slaby, Valdis Kletnieks, Rik van Riel, Zlatko Calusic, Johannes Weiner, dormando, Michal Hocko, Kamezawa Hiroyuki, Linux-MM, LKML, Mel Gorman If kswaps fails to make progress but continues to shrink slab then it'll either discard all of slab or consume CPU uselessly scanning shrinkers. This patch causes kswapd to only call the shrinkers once per priority. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Michal Hocko <mhocko@suse.cz> Acked-by: Rik van Riel <riel@redhat.com> --- mm/vmscan.c | 28 +++++++++++++++++++++------- 1 file changed, 21 insertions(+), 7 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index a20f2a9..0fa588d 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2696,9 +2696,10 @@ static bool prepare_kswapd_sleep(pg_data_t *pgdat, int order, long remaining, static bool kswapd_shrink_zone(struct zone *zone, struct scan_control *sc, unsigned long lru_pages, + bool shrinking_slab, unsigned long *nr_attempted) { - unsigned long nr_slab; + unsigned long nr_slab = 0; struct reclaim_state *reclaim_state = current->reclaim_state; struct shrink_control shrink = { .gfp_mask = sc->gfp_mask, @@ -2708,9 +2709,15 @@ static bool kswapd_shrink_zone(struct zone *zone, sc->nr_to_reclaim = max(SWAP_CLUSTER_MAX, high_wmark_pages(zone)); shrink_zone(zone, sc); - reclaim_state->reclaimed_slab = 0; - nr_slab = shrink_slab(&shrink, sc->nr_scanned, lru_pages); - sc->nr_reclaimed += reclaim_state->reclaimed_slab; + /* + * Slabs are shrunk for each zone once per priority or if the zone + * being balanced is otherwise unreclaimable + */ + if (shrinking_slab || !zone_reclaimable(zone)) { + reclaim_state->reclaimed_slab = 0; + nr_slab = shrink_slab(&shrink, sc->nr_scanned, lru_pages); + sc->nr_reclaimed += reclaim_state->reclaimed_slab; + } /* Account for the number of pages attempted to reclaim */ *nr_attempted += sc->nr_to_reclaim; @@ -2751,6 +2758,7 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order, int end_zone = 0; /* Inclusive. 0 = ZONE_DMA */ unsigned long nr_soft_reclaimed; unsigned long nr_soft_scanned; + bool shrinking_slab = true; struct scan_control sc = { .gfp_mask = GFP_KERNEL, .priority = DEF_PRIORITY, @@ -2903,8 +2911,9 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order, * already being scanned that high * watermark would be met at 100% efficiency. */ - if (kswapd_shrink_zone(zone, &sc, lru_pages, - &nr_attempted)) + if (kswapd_shrink_zone(zone, &sc, + lru_pages, shrinking_slab, + &nr_attempted)) raise_priority = false; } @@ -2943,6 +2952,9 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order, pfmemalloc_watermark_ok(pgdat)) wake_up(&pgdat->pfmemalloc_wait); + /* Only shrink slab once per priority */ + shrinking_slab = false; + /* * Fragmentation may mean that the system cannot be rebalanced * for high-order allocations in all zones. If twice the @@ -2969,8 +2981,10 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order, * Raise priority if scanning rate is too low or there was no * progress in reclaiming pages */ - if (raise_priority || !sc.nr_reclaimed) + if (raise_priority || !sc.nr_reclaimed) { sc.priority--; + shrinking_slab = true; + } } while (sc.priority >= 1 && !pgdat_balanced(pgdat, order, *classzone_idx)); -- 1.8.1.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH 08/10] mm: vmscan: Have kswapd shrink slab only once per priority 2013-04-11 19:57 ` [PATCH 08/10] mm: vmscan: Have kswapd shrink slab only once per priority Mel Gorman @ 2013-04-18 16:43 ` Johannes Weiner 0 siblings, 0 replies; 44+ messages in thread From: Johannes Weiner @ 2013-04-18 16:43 UTC (permalink / raw) To: Mel Gorman Cc: Andrew Morton, Jiri Slaby, Valdis Kletnieks, Rik van Riel, Zlatko Calusic, dormando, Michal Hocko, Kamezawa Hiroyuki, Linux-MM, LKML On Thu, Apr 11, 2013 at 08:57:56PM +0100, Mel Gorman wrote: > If kswaps fails to make progress but continues to shrink slab then it'll > either discard all of slab or consume CPU uselessly scanning shrinkers. > This patch causes kswapd to only call the shrinkers once per priority. But the priority level changes _only_ when kswapd is not making progress, so I don't see how this fixes this case. On the other hand, what about shrinkable memory like dentries and inodes that build up during a streaming IO load like a backup program? Kswapd may be cooperating with the page allocator and never change priority as it reclaims the continuous file page stream, but it won't do the same for the stream of slab memory. So if anything, I would expect us to lay off slab memory when lru reclaim is struggling, but receive continuous aging and pushback as long as lru reclaim is comfortably running alongside the workload. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH 09/10] mm: vmscan: Check if kswapd should writepage once per pgdat scan 2013-04-11 19:57 [PATCH 0/10] Reduce system disruption due to kswapd V3 Mel Gorman ` (7 preceding siblings ...) 2013-04-11 19:57 ` [PATCH 08/10] mm: vmscan: Have kswapd shrink slab only once per priority Mel Gorman @ 2013-04-11 19:57 ` Mel Gorman 2013-04-18 16:44 ` Johannes Weiner 2013-04-11 19:57 ` [PATCH 10/10] mm: vmscan: Move logic from balance_pgdat() to kswapd_shrink_zone() Mel Gorman 9 siblings, 1 reply; 44+ messages in thread From: Mel Gorman @ 2013-04-11 19:57 UTC (permalink / raw) To: Andrew Morton Cc: Jiri Slaby, Valdis Kletnieks, Rik van Riel, Zlatko Calusic, Johannes Weiner, dormando, Michal Hocko, Kamezawa Hiroyuki, Linux-MM, LKML, Mel Gorman Currently kswapd checks if it should start writepage as it shrinks each zone without taking into consideration if the zone is balanced or not. This is not wrong as such but it does not make much sense either. This patch checks once per pgdat scan if kswapd should be writing pages. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Michal Hocko <mhocko@suse.cz> Acked-by: Rik van Riel <riel@redhat.com> --- mm/vmscan.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 0fa588d..d45f6e2 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2846,6 +2846,13 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order, } /* + * If we're getting trouble reclaiming, start doing writepage + * even in laptop mode. + */ + if (sc.priority < DEF_PRIORITY - 2) + sc.may_writepage = 1; + + /* * Now scan the zone in the dma->highmem direction, stopping * at the last zone which needs scanning. * @@ -2917,13 +2924,6 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order, raise_priority = false; } - /* - * If we're getting trouble reclaiming, start doing - * writepage even in laptop mode. - */ - if (sc.priority < DEF_PRIORITY - 2) - sc.may_writepage = 1; - if (zone->all_unreclaimable) { if (end_zone && end_zone == i) end_zone--; -- 1.8.1.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH 09/10] mm: vmscan: Check if kswapd should writepage once per pgdat scan 2013-04-11 19:57 ` [PATCH 09/10] mm: vmscan: Check if kswapd should writepage once per pgdat scan Mel Gorman @ 2013-04-18 16:44 ` Johannes Weiner 0 siblings, 0 replies; 44+ messages in thread From: Johannes Weiner @ 2013-04-18 16:44 UTC (permalink / raw) To: Mel Gorman Cc: Andrew Morton, Jiri Slaby, Valdis Kletnieks, Rik van Riel, Zlatko Calusic, dormando, Michal Hocko, Kamezawa Hiroyuki, Linux-MM, LKML On Thu, Apr 11, 2013 at 08:57:57PM +0100, Mel Gorman wrote: > Currently kswapd checks if it should start writepage as it shrinks > each zone without taking into consideration if the zone is balanced or > not. This is not wrong as such but it does not make much sense either. > This patch checks once per pgdat scan if kswapd should be writing pages. > > Signed-off-by: Mel Gorman <mgorman@suse.de> > Reviewed-by: Michal Hocko <mhocko@suse.cz> > Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH 10/10] mm: vmscan: Move logic from balance_pgdat() to kswapd_shrink_zone() 2013-04-11 19:57 [PATCH 0/10] Reduce system disruption due to kswapd V3 Mel Gorman ` (8 preceding siblings ...) 2013-04-11 19:57 ` [PATCH 09/10] mm: vmscan: Check if kswapd should writepage once per pgdat scan Mel Gorman @ 2013-04-11 19:57 ` Mel Gorman 2013-04-18 16:56 ` Johannes Weiner 9 siblings, 1 reply; 44+ messages in thread From: Mel Gorman @ 2013-04-11 19:57 UTC (permalink / raw) To: Andrew Morton Cc: Jiri Slaby, Valdis Kletnieks, Rik van Riel, Zlatko Calusic, Johannes Weiner, dormando, Michal Hocko, Kamezawa Hiroyuki, Linux-MM, LKML, Mel Gorman balance_pgdat() is very long and some of the logic can and should be internal to kswapd_shrink_zone(). Move it so the flow of balance_pgdat() is marginally easier to follow. Signed-off-by: Mel Gorman <mgorman@suse.de> --- mm/vmscan.c | 112 +++++++++++++++++++++++++++++------------------------------- 1 file changed, 55 insertions(+), 57 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index d45f6e2..2f1adf6 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2694,19 +2694,54 @@ static bool prepare_kswapd_sleep(pg_data_t *pgdat, int order, long remaining, * This is used to determine if the scanning priority needs to be raised. */ static bool kswapd_shrink_zone(struct zone *zone, + int classzone_idx, struct scan_control *sc, unsigned long lru_pages, bool shrinking_slab, unsigned long *nr_attempted) { + int testorder = sc->order; unsigned long nr_slab = 0; + unsigned long balance_gap; struct reclaim_state *reclaim_state = current->reclaim_state; struct shrink_control shrink = { .gfp_mask = sc->gfp_mask, }; + bool lowmem_pressure; /* Reclaim above the high watermark. */ sc->nr_to_reclaim = max(SWAP_CLUSTER_MAX, high_wmark_pages(zone)); + + /* + * Kswapd reclaims only single pages with compaction enabled. Trying + * too hard to reclaim until contiguous free pages have become + * available can hurt performance by evicting too much useful data + * from memory. Do not reclaim more than needed for compaction. + */ + if (IS_ENABLED(CONFIG_COMPACTION) && sc->order && + compaction_suitable(zone, sc->order) != + COMPACT_SKIPPED) + testorder = 0; + + /* + * We put equal pressure on every zone, unless one zone has way too + * many pages free already. The "too many pages" is defined as the + * high wmark plus a "gap" where the gap is either the low + * watermark or 1% of the zone, whichever is smaller. + */ + balance_gap = min(low_wmark_pages(zone), + (zone->managed_pages + KSWAPD_ZONE_BALANCE_GAP_RATIO-1) / + KSWAPD_ZONE_BALANCE_GAP_RATIO); + + /* + * If there is no low memory pressure or the zone is balanced then no + * reclaim is necessary + */ + lowmem_pressure = (buffer_heads_over_limit && is_highmem(zone)); + if (!lowmem_pressure && zone_balanced(zone, testorder, + balance_gap, classzone_idx)) + return true; + shrink_zone(zone, sc); /* @@ -2727,6 +2762,18 @@ static bool kswapd_shrink_zone(struct zone *zone, zone_clear_flag(zone, ZONE_WRITEBACK); + /* + * If a zone reaches its high watermark, consider it to be no longer + * congested. It's possible there are dirty pages backed by congested + * BDIs but as pressure is relieved, speculatively avoid congestion + * waits. + */ + if (!zone->all_unreclaimable && + zone_balanced(zone, testorder, 0, classzone_idx)) { + zone_clear_flag(zone, ZONE_CONGESTED); + zone_clear_flag(zone, ZONE_TAIL_LRU_DIRTY); + } + return sc->nr_scanned >= sc->nr_to_reclaim; } @@ -2863,8 +2910,6 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order, */ for (i = 0; i <= end_zone; i++) { struct zone *zone = pgdat->node_zones + i; - int testorder; - unsigned long balance_gap; if (!populated_zone(zone)) continue; @@ -2885,62 +2930,15 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order, sc.nr_reclaimed += nr_soft_reclaimed; /* - * We put equal pressure on every zone, unless - * one zone has way too many pages free - * already. The "too many pages" is defined - * as the high wmark plus a "gap" where the - * gap is either the low watermark or 1% - * of the zone, whichever is smaller. - */ - balance_gap = min(low_wmark_pages(zone), - (zone->managed_pages + - KSWAPD_ZONE_BALANCE_GAP_RATIO-1) / - KSWAPD_ZONE_BALANCE_GAP_RATIO); - /* - * Kswapd reclaims only single pages with compaction - * enabled. Trying too hard to reclaim until contiguous - * free pages have become available can hurt performance - * by evicting too much useful data from memory. - * Do not reclaim more than needed for compaction. + * There should be no need to raise the scanning + * priority if enough pages are already being scanned + * that that high watermark would be met at 100% + * efficiency. */ - testorder = order; - if (IS_ENABLED(CONFIG_COMPACTION) && order && - compaction_suitable(zone, order) != - COMPACT_SKIPPED) - testorder = 0; - - if ((buffer_heads_over_limit && is_highmem_idx(i)) || - !zone_balanced(zone, testorder, - balance_gap, end_zone)) { - /* - * There should be no need to raise the - * scanning priority if enough pages are - * already being scanned that high - * watermark would be met at 100% efficiency. - */ - if (kswapd_shrink_zone(zone, &sc, - lru_pages, shrinking_slab, - &nr_attempted)) - raise_priority = false; - } - - if (zone->all_unreclaimable) { - if (end_zone && end_zone == i) - end_zone--; - continue; - } - - if (zone_balanced(zone, testorder, 0, end_zone)) - /* - * If a zone reaches its high watermark, - * consider it to be no longer congested. It's - * possible there are dirty pages backed by - * congested BDIs but as pressure is relieved, - * speculatively avoid congestion waits - * or writing pages from kswapd context. - */ - zone_clear_flag(zone, ZONE_CONGESTED); - zone_clear_flag(zone, ZONE_TAIL_LRU_DIRTY); + if (kswapd_shrink_zone(zone, end_zone, &sc, + lru_pages, shrinking_slab, + &nr_attempted)) + raise_priority = false; } /* -- 1.8.1.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH 10/10] mm: vmscan: Move logic from balance_pgdat() to kswapd_shrink_zone() 2013-04-11 19:57 ` [PATCH 10/10] mm: vmscan: Move logic from balance_pgdat() to kswapd_shrink_zone() Mel Gorman @ 2013-04-18 16:56 ` Johannes Weiner 0 siblings, 0 replies; 44+ messages in thread From: Johannes Weiner @ 2013-04-18 16:56 UTC (permalink / raw) To: Mel Gorman Cc: Andrew Morton, Jiri Slaby, Valdis Kletnieks, Rik van Riel, Zlatko Calusic, dormando, Michal Hocko, Kamezawa Hiroyuki, Linux-MM, LKML On Thu, Apr 11, 2013 at 08:57:58PM +0100, Mel Gorman wrote: > balance_pgdat() is very long and some of the logic can and should > be internal to kswapd_shrink_zone(). Move it so the flow of > balance_pgdat() is marginally easier to follow. > > Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Johannes Weiner <hannes@cmpxchg.org> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH 0/10] Reduce system disruption due to kswapd V2 @ 2013-04-09 11:06 Mel Gorman 2013-04-09 11:06 ` [PATCH 02/10] mm: vmscan: Obey proportional scanning requirements for kswapd Mel Gorman 0 siblings, 1 reply; 44+ messages in thread From: Mel Gorman @ 2013-04-09 11:06 UTC (permalink / raw) To: Andrew Morton Cc: Jiri Slaby, Valdis Kletnieks, Rik van Riel, Zlatko Calusic, Johannes Weiner, dormando, Satoru Moriya, Michal Hocko, Linux-MM, LKML, Mel Gorman Posting V2 of this series got delayed due to trying to pin down an unrelated regression in 3.9-rc where interactive performance is shot to hell. That problem still has not been identified as it's resisting attempts to be reproducible by a script for the purposes of bisection. For those that looked at V1, the most important difference in this version is how patch 2 preserves the proportional scanning of anon/file LRUs. The series is against 3.9-rc6. Changelog since V1 o Rename ZONE_DIRTY to ZONE_TAIL_LRU_DIRTY (andi) o Reformat comment in shrink_page_list (andi) o Clarify some comments (dhillf) o Rework how the proportional scanning is preserved o Add PageReclaim check before kswapd starts writeback o Reset sc.nr_reclaimed on every full zone scan Kswapd and page reclaim behaviour has been screwy in one way or the other for a long time. Very broadly speaking it worked in the far past because machines were limited in memory so it did not have that many pages to scan and it stalled congestion_wait() frequently to prevent it going completely nuts. In recent times it has behaved very unsatisfactorily with some of the problems compounded by the removal of stall logic and the introduction of transparent hugepage support with high-order reclaims. There are many variations of bugs that are rooted in this area. One example is reports of a large copy operations or backup causing the machine to grind to a halt or applications pushed to swap. Sometimes in low memory situations a large percentage of memory suddenly gets reclaimed. In other cases an application starts and kswapd hits 100% CPU usage for prolonged periods of time and so on. There is now talk of introducing features like an extra free kbytes tunable to work around aspects of the problem instead of trying to deal with it. It's compounded by the problem that it can be very workload and machine specific. This series aims at addressing some of the worst of these problems without attempting to fundmentally alter how page reclaim works. Patches 1-2 limits the number of pages kswapd reclaims while still obeying the anon/file proportion of the LRUs it should be scanning. Patches 3-4 control how and when kswapd raises its scanning priority and deletes the scanning restart logic which is tricky to follow. Patch 5 notes that it is too easy for kswapd to reach priority 0 when scanning and then reclaim the world. Down with that sort of thing. Patch 6 notes that kswapd starts writeback based on scanning priority which is not necessarily related to dirty pages. It will have kswapd writeback pages if a number of unqueued dirty pages have been recently encountered at the tail of the LRU. Patch 7 notes that sometimes kswapd should stall waiting on IO to complete to reduce LRU churn and the likelihood that it'll reclaim young clean pages or push applications to swap. It will cause kswapd to block on IO if it detects that pages being reclaimed under writeback are recycling through the LRU before the IO completes. Patch 8 shrinks slab just once per priority scanned or if a zone is otherwise unreclaimable to avoid hammering slab when kswapd has to skip a large number of pages. Patches 9-10 are cosmetic but balance_pgdat() might be easier to follow. This was tested using memcached+memcachetest while some background IO was in progress as implemented by the parallel IO tests implement in MM Tests. memcachetest benchmarks how many operations/second memcached can service and it is run multiple times. It starts with no background IO and then re-runs the test with larger amounts of IO in the background to roughly simulate a large copy in progress. The expectation is that the IO should have little or no impact on memcachetest which is running entirely in memory. 3.9.0-rc6 3.9.0-rc6 vanilla lessdisrupt-v2r11 Ops memcachetest-0M 11106.00 ( 0.00%) 10997.00 ( -0.98%) Ops memcachetest-749M 10960.00 ( 0.00%) 11032.00 ( 0.66%) Ops memcachetest-2498M 2588.00 ( 0.00%) 10948.00 (323.03%) Ops memcachetest-4246M 2401.00 ( 0.00%) 10960.00 (356.48%) Ops io-duration-0M 0.00 ( 0.00%) 0.00 ( 0.00%) Ops io-duration-749M 15.00 ( 0.00%) 8.00 ( 46.67%) Ops io-duration-2498M 112.00 ( 0.00%) 25.00 ( 77.68%) Ops io-duration-4246M 170.00 ( 0.00%) 45.00 ( 73.53%) Ops swaptotal-0M 0.00 ( 0.00%) 0.00 ( 0.00%) Ops swaptotal-749M 161678.00 ( 0.00%) 16.00 ( 99.99%) Ops swaptotal-2498M 471903.00 ( 0.00%) 192.00 ( 99.96%) Ops swaptotal-4246M 444010.00 ( 0.00%) 1323.00 ( 99.70%) Ops swapin-0M 0.00 ( 0.00%) 0.00 ( 0.00%) Ops swapin-749M 789.00 ( 0.00%) 0.00 ( 0.00%) Ops swapin-2498M 196496.00 ( 0.00%) 192.00 ( 99.90%) Ops swapin-4246M 168269.00 ( 0.00%) 154.00 ( 99.91%) Ops minorfaults-0M 1596126.00 ( 0.00%) 1521332.00 ( 4.69%) Ops minorfaults-749M 1766556.00 ( 0.00%) 1596350.00 ( 9.63%) Ops minorfaults-2498M 1661445.00 ( 0.00%) 1598762.00 ( 3.77%) Ops minorfaults-4246M 1628375.00 ( 0.00%) 1597624.00 ( 1.89%) Ops majorfaults-0M 9.00 ( 0.00%) 0.00 ( 0.00%) Ops majorfaults-749M 154.00 ( 0.00%) 101.00 ( 34.42%) Ops majorfaults-2498M 27214.00 ( 0.00%) 165.00 ( 99.39%) Ops majorfaults-4246M 23229.00 ( 0.00%) 114.00 ( 99.51%) Note how the vanilla kernels performance collapses when there is enough IO taking place in the background. This drop in performance is part of users complain of when they start backups. Note how the swapin and major fault figures indicate that processes were being pushed to swap prematurely. With the series applied, there is no noticable performance drop and while there is still some swap activity, it's tiny. 3.9.0-rc6 3.9.0-rc6 vanilla lessdisrupt-v2r11 Page Ins 9094288 346092 Page Outs 62897388 47599884 Swap Ins 2243749 19389 Swap Outs 3953966 142258 Direct pages scanned 0 2262897 Kswapd pages scanned 55530838 75725437 Kswapd pages reclaimed 6682620 1814689 Direct pages reclaimed 0 2187167 Kswapd efficiency 12% 2% Kswapd velocity 10537.501 14377.501 Direct efficiency 100% 96% Direct velocity 0.000 429.642 Percentage direct scans 0% 2% Page writes by reclaim 10835163 72419297 Page writes file 6881197 72277039 Page writes anon 3953966 142258 Page reclaim immediate 11463 8199 Page rescued immediate 0 0 Slabs scanned 38144 30592 Direct inode steals 0 0 Kswapd inode steals 11383 791 Kswapd skipped wait 0 0 THP fault alloc 10 111 THP collapse alloc 2782 1779 THP splits 10 27 THP fault fallback 0 5 THP collapse fail 0 21 Compaction stalls 0 89 Compaction success 0 53 Compaction failures 0 36 Page migrate success 0 37062 Page migrate failure 0 0 Compaction pages isolated 0 83481 Compaction migrate scanned 0 80830 Compaction free scanned 0 2660824 Compaction cost 0 40 NUMA PTE updates 0 0 NUMA hint faults 0 0 NUMA hint local faults 0 0 NUMA pages migrated 0 0 AutoNUMA cost 0 0 Note that while there is no noticeable performance drop and swap activity is massively reduced there are processes that direct reclaim as a consequence of the series due to kswapd not reclaiming the world. ftrace was not enabled for this particular test to avoid disruption but on a similar test with ftrace I found that the vast bulk of the direct reclaims were in the dd processes. The top direct reclaimers were; 11 ps-13204 12 top-13198 15 memcachetest-11712 20 gzip-3126 67 tclsh-3124 80 memcachetest-12924 191 flush-8:0-292 338 tee-3125 2184 dd-12135 10751 dd-13124 While processes did stall, it was mostly the "correct" processes that stalled. There is also still a risk that kswapd not reclaiming the world may mean that it stays awake balancing zones, does not stall on the appropriate events and continually scans pages it cannot reclaim consuming CPU. This will be visible as continued high CPU usage but in my own tests I only saw a single spike lasting less than a second and I did not observe any problems related to reclaim while running the series on my desktop. include/linux/mmzone.h | 17 ++ mm/vmscan.c | 449 ++++++++++++++++++++++++++++++------------------- 2 files changed, 293 insertions(+), 173 deletions(-) -- 1.8.1.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH 02/10] mm: vmscan: Obey proportional scanning requirements for kswapd 2013-04-09 11:06 [PATCH 0/10] Reduce system disruption due to kswapd V2 Mel Gorman @ 2013-04-09 11:06 ` Mel Gorman 2013-04-10 7:16 ` Kamezawa Hiroyuki 0 siblings, 1 reply; 44+ messages in thread From: Mel Gorman @ 2013-04-09 11:06 UTC (permalink / raw) To: Andrew Morton Cc: Jiri Slaby, Valdis Kletnieks, Rik van Riel, Zlatko Calusic, Johannes Weiner, dormando, Satoru Moriya, Michal Hocko, Linux-MM, LKML, Mel Gorman Simplistically, the anon and file LRU lists are scanned proportionally depending on the value of vm.swappiness although there are other factors taken into account by get_scan_count(). The patch "mm: vmscan: Limit the number of pages kswapd reclaims" limits the number of pages kswapd reclaims but it breaks this proportional scanning and may evenly shrink anon/file LRUs regardless of vm.swappiness. This patch preserves the proportional scanning and reclaim. It does mean that kswapd will reclaim more than requested but the number of pages will be related to the high watermark. [mhocko@suse.cz: Correct proportional reclaim for memcg and simplify] Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com> --- mm/vmscan.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 46 insertions(+), 8 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 4835a7a..0742c45 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1825,13 +1825,21 @@ static void shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc) enum lru_list lru; unsigned long nr_reclaimed = 0; unsigned long nr_to_reclaim = sc->nr_to_reclaim; + unsigned long nr_anon_scantarget, nr_file_scantarget; struct blk_plug plug; + bool scan_adjusted = false; get_scan_count(lruvec, sc, nr); + /* Record the original scan target for proportional adjustments later */ + nr_file_scantarget = nr[LRU_INACTIVE_FILE] + nr[LRU_ACTIVE_FILE] + 1; + nr_anon_scantarget = nr[LRU_INACTIVE_ANON] + nr[LRU_ACTIVE_ANON] + 1; + blk_start_plug(&plug); while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] || nr[LRU_INACTIVE_FILE]) { + unsigned long nr_anon, nr_file, percentage; + for_each_evictable_lru(lru) { if (nr[lru]) { nr_to_scan = min(nr[lru], SWAP_CLUSTER_MAX); @@ -1841,17 +1849,47 @@ static void shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc) lruvec, sc); } } + + if (nr_reclaimed < nr_to_reclaim || scan_adjusted) + continue; + /* - * On large memory systems, scan >> priority can become - * really large. This is fine for the starting priority; - * we want to put equal scanning pressure on each zone. - * However, if the VM has a harder time of freeing pages, - * with multiple processes reclaiming pages, the total - * freeing target can get unreasonably large. + * For global direct reclaim, reclaim only the number of pages + * requested. Less care is taken to scan proportionally as it + * is more important to minimise direct reclaim stall latency + * than it is to properly age the LRU lists. */ - if (nr_reclaimed >= nr_to_reclaim && - sc->priority < DEF_PRIORITY) + if (global_reclaim(sc) && !current_is_kswapd()) break; + + /* + * For kswapd and memcg, reclaim at least the number of pages + * requested. Ensure that the anon and file LRUs shrink + * proportionally what was requested by get_scan_count(). We + * stop reclaiming one LRU and reduce the amount scanning + * proportional to the original scan target. + */ + nr_file = nr[LRU_INACTIVE_FILE] + nr[LRU_ACTIVE_FILE]; + nr_anon = nr[LRU_INACTIVE_ANON] + nr[LRU_ACTIVE_ANON]; + + if (nr_file > nr_anon) { + lru = LRU_BASE; + percentage = nr_anon * 100 / nr_anon_scantarget; + } else { + lru = LRU_FILE; + percentage = nr_file * 100 / nr_file_scantarget; + } + + /* Stop scanning the smaller of the LRU */ + nr[lru] = 0; + nr[lru + LRU_ACTIVE] = 0; + + /* Reduce scanning of the other LRU proportionally */ + lru = (lru == LRU_FILE) ? LRU_BASE : LRU_FILE; + nr[lru] = nr[lru] * percentage / 100;; + nr[lru + LRU_ACTIVE] = nr[lru + LRU_ACTIVE] * percentage / 100; + + scan_adjusted = true; } blk_finish_plug(&plug); sc->nr_reclaimed += nr_reclaimed; -- 1.8.1.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH 02/10] mm: vmscan: Obey proportional scanning requirements for kswapd 2013-04-09 11:06 ` [PATCH 02/10] mm: vmscan: Obey proportional scanning requirements for kswapd Mel Gorman @ 2013-04-10 7:16 ` Kamezawa Hiroyuki 2013-04-10 14:08 ` Mel Gorman 0 siblings, 1 reply; 44+ messages in thread From: Kamezawa Hiroyuki @ 2013-04-10 7:16 UTC (permalink / raw) To: Mel Gorman Cc: Andrew Morton, Jiri Slaby, Valdis Kletnieks, Rik van Riel, Zlatko Calusic, Johannes Weiner, dormando, Satoru Moriya, Michal Hocko, Linux-MM, LKML (2013/04/09 20:06), Mel Gorman wrote: > Simplistically, the anon and file LRU lists are scanned proportionally > depending on the value of vm.swappiness although there are other factors > taken into account by get_scan_count(). The patch "mm: vmscan: Limit > the number of pages kswapd reclaims" limits the number of pages kswapd > reclaims but it breaks this proportional scanning and may evenly shrink > anon/file LRUs regardless of vm.swappiness. > > This patch preserves the proportional scanning and reclaim. It does mean > that kswapd will reclaim more than requested but the number of pages will > be related to the high watermark. > > [mhocko@suse.cz: Correct proportional reclaim for memcg and simplify] > Signed-off-by: Mel Gorman <mgorman@suse.de> > Acked-by: Rik van Riel <riel@redhat.com> > --- > mm/vmscan.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++-------- > 1 file changed, 46 insertions(+), 8 deletions(-) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 4835a7a..0742c45 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1825,13 +1825,21 @@ static void shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc) > enum lru_list lru; > unsigned long nr_reclaimed = 0; > unsigned long nr_to_reclaim = sc->nr_to_reclaim; > + unsigned long nr_anon_scantarget, nr_file_scantarget; > struct blk_plug plug; > + bool scan_adjusted = false; > > get_scan_count(lruvec, sc, nr); > > + /* Record the original scan target for proportional adjustments later */ > + nr_file_scantarget = nr[LRU_INACTIVE_FILE] + nr[LRU_ACTIVE_FILE] + 1; > + nr_anon_scantarget = nr[LRU_INACTIVE_ANON] + nr[LRU_ACTIVE_ANON] + 1; > + I'm sorry I couldn't understand the calc... Assume here nr_file_scantarget = 100 nr_anon_file_target = 100. > blk_start_plug(&plug); > while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] || > nr[LRU_INACTIVE_FILE]) { > + unsigned long nr_anon, nr_file, percentage; > + > for_each_evictable_lru(lru) { > if (nr[lru]) { > nr_to_scan = min(nr[lru], SWAP_CLUSTER_MAX); > @@ -1841,17 +1849,47 @@ static void shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc) > lruvec, sc); > } > } > + > + if (nr_reclaimed < nr_to_reclaim || scan_adjusted) > + continue; > + > /* > - * On large memory systems, scan >> priority can become > - * really large. This is fine for the starting priority; > - * we want to put equal scanning pressure on each zone. > - * However, if the VM has a harder time of freeing pages, > - * with multiple processes reclaiming pages, the total > - * freeing target can get unreasonably large. > + * For global direct reclaim, reclaim only the number of pages > + * requested. Less care is taken to scan proportionally as it > + * is more important to minimise direct reclaim stall latency > + * than it is to properly age the LRU lists. > */ > - if (nr_reclaimed >= nr_to_reclaim && > - sc->priority < DEF_PRIORITY) > + if (global_reclaim(sc) && !current_is_kswapd()) > break; > + > + /* > + * For kswapd and memcg, reclaim at least the number of pages > + * requested. Ensure that the anon and file LRUs shrink > + * proportionally what was requested by get_scan_count(). We > + * stop reclaiming one LRU and reduce the amount scanning > + * proportional to the original scan target. > + */ > + nr_file = nr[LRU_INACTIVE_FILE] + nr[LRU_ACTIVE_FILE]; > + nr_anon = nr[LRU_INACTIVE_ANON] + nr[LRU_ACTIVE_ANON]; > + Then, nr_file = 80, nr_anon=70. > + if (nr_file > nr_anon) { > + lru = LRU_BASE; > + percentage = nr_anon * 100 / nr_anon_scantarget; > + } else { > + lru = LRU_FILE; > + percentage = nr_file * 100 / nr_file_scantarget; > + } the percentage will be 70. > + > + /* Stop scanning the smaller of the LRU */ > + nr[lru] = 0; > + nr[lru + LRU_ACTIVE] = 0; > + this will stop anon scan. > + /* Reduce scanning of the other LRU proportionally */ > + lru = (lru == LRU_FILE) ? LRU_BASE : LRU_FILE; > + nr[lru] = nr[lru] * percentage / 100;; > + nr[lru + LRU_ACTIVE] = nr[lru + LRU_ACTIVE] * percentage / 100; > + finally, in the next iteration, nr[file] = 80 * 0.7 = 56. After loop, anon-scan is 30 pages , file-scan is 76(20+56) pages.. I think the calc here should be nr[lru] = nr_lru_scantarget * percentage / 100 - nr[lru] Here, 80-70=10 more pages to scan..should be proportional. Am I misunderstanding ? Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 02/10] mm: vmscan: Obey proportional scanning requirements for kswapd 2013-04-10 7:16 ` Kamezawa Hiroyuki @ 2013-04-10 14:08 ` Mel Gorman 2013-04-11 0:14 ` Kamezawa Hiroyuki 0 siblings, 1 reply; 44+ messages in thread From: Mel Gorman @ 2013-04-10 14:08 UTC (permalink / raw) To: Kamezawa Hiroyuki Cc: Andrew Morton, Jiri Slaby, Valdis Kletnieks, Rik van Riel, Zlatko Calusic, Johannes Weiner, dormando, Satoru Moriya, Michal Hocko, Linux-MM, LKML On Wed, Apr 10, 2013 at 04:16:47PM +0900, Kamezawa Hiroyuki wrote: > (2013/04/09 20:06), Mel Gorman wrote: > > Simplistically, the anon and file LRU lists are scanned proportionally > > depending on the value of vm.swappiness although there are other factors > > taken into account by get_scan_count(). The patch "mm: vmscan: Limit > > the number of pages kswapd reclaims" limits the number of pages kswapd > > reclaims but it breaks this proportional scanning and may evenly shrink > > anon/file LRUs regardless of vm.swappiness. > > > > This patch preserves the proportional scanning and reclaim. It does mean > > that kswapd will reclaim more than requested but the number of pages will > > be related to the high watermark. > > > > [mhocko@suse.cz: Correct proportional reclaim for memcg and simplify] > > Signed-off-by: Mel Gorman <mgorman@suse.de> > > Acked-by: Rik van Riel <riel@redhat.com> > > --- > > mm/vmscan.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++-------- > > 1 file changed, 46 insertions(+), 8 deletions(-) > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > index 4835a7a..0742c45 100644 > > --- a/mm/vmscan.c > > +++ b/mm/vmscan.c > > @@ -1825,13 +1825,21 @@ static void shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc) > > enum lru_list lru; > > unsigned long nr_reclaimed = 0; > > unsigned long nr_to_reclaim = sc->nr_to_reclaim; > > + unsigned long nr_anon_scantarget, nr_file_scantarget; > > struct blk_plug plug; > > + bool scan_adjusted = false; > > > > get_scan_count(lruvec, sc, nr); > > > > + /* Record the original scan target for proportional adjustments later */ > > + nr_file_scantarget = nr[LRU_INACTIVE_FILE] + nr[LRU_ACTIVE_FILE] + 1; > > + nr_anon_scantarget = nr[LRU_INACTIVE_ANON] + nr[LRU_ACTIVE_ANON] + 1; > > + > > I'm sorry I couldn't understand the calc... > > Assume here > nr_file_scantarget = 100 > nr_anon_file_target = 100. > I think you might have meant nr_anon_scantarget here instead of nr_anon_file_target. > > > blk_start_plug(&plug); > > while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] || > > nr[LRU_INACTIVE_FILE]) { > > + unsigned long nr_anon, nr_file, percentage; > > + > > for_each_evictable_lru(lru) { > > if (nr[lru]) { > > nr_to_scan = min(nr[lru], SWAP_CLUSTER_MAX); > > @@ -1841,17 +1849,47 @@ static void shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc) > > lruvec, sc); > > } > > } > > + > > + if (nr_reclaimed < nr_to_reclaim || scan_adjusted) > > + continue; > > + > > /* > > - * On large memory systems, scan >> priority can become > > - * really large. This is fine for the starting priority; > > - * we want to put equal scanning pressure on each zone. > > - * However, if the VM has a harder time of freeing pages, > > - * with multiple processes reclaiming pages, the total > > - * freeing target can get unreasonably large. > > + * For global direct reclaim, reclaim only the number of pages > > + * requested. Less care is taken to scan proportionally as it > > + * is more important to minimise direct reclaim stall latency > > + * than it is to properly age the LRU lists. > > */ > > - if (nr_reclaimed >= nr_to_reclaim && > > - sc->priority < DEF_PRIORITY) > > + if (global_reclaim(sc) && !current_is_kswapd()) > > break; > > + > > + /* > > + * For kswapd and memcg, reclaim at least the number of pages > > + * requested. Ensure that the anon and file LRUs shrink > > + * proportionally what was requested by get_scan_count(). We > > + * stop reclaiming one LRU and reduce the amount scanning > > + * proportional to the original scan target. > > + */ > > + nr_file = nr[LRU_INACTIVE_FILE] + nr[LRU_ACTIVE_FILE]; > > + nr_anon = nr[LRU_INACTIVE_ANON] + nr[LRU_ACTIVE_ANON]; > > + > > Then, nr_file = 80, nr_anon=70. > As we scan evenly in SCAN_CLUSTER_MAX groups of pages, this wouldn't happen but for the purposes of discussions, lets assume it did. > > > + if (nr_file > nr_anon) { > > + lru = LRU_BASE; > > + percentage = nr_anon * 100 / nr_anon_scantarget; > > + } else { > > + lru = LRU_FILE; > > + percentage = nr_file * 100 / nr_file_scantarget; > > + } > > the percentage will be 70. > Yes. > > + > > + /* Stop scanning the smaller of the LRU */ > > + nr[lru] = 0; > > + nr[lru + LRU_ACTIVE] = 0; > > + > > this will stop anon scan. > Yes. > > + /* Reduce scanning of the other LRU proportionally */ > > + lru = (lru == LRU_FILE) ? LRU_BASE : LRU_FILE; > > + nr[lru] = nr[lru] * percentage / 100;; > > + nr[lru + LRU_ACTIVE] = nr[lru + LRU_ACTIVE] * percentage / 100; > > + > > finally, in the next iteration, > > nr[file] = 80 * 0.7 = 56. > > After loop, anon-scan is 30 pages , file-scan is 76(20+56) pages.. > Well spotted, this would indeed reclaim too many pages from the other LRU. I wanted to avoid recording the original scan targets as it's an extra 40 bytes on the stack but it's unavoidable. > I think the calc here should be > > nr[lru] = nr_lru_scantarget * percentage / 100 - nr[lru] > > Here, 80-70=10 more pages to scan..should be proportional. > nr[lru] at the end there is pages remaining to be scanned not pages scanned already. Did you mean something like this? nr[lru] = scantarget[lru] * percentage / 100 - (scantarget[lru] - nr[lru]) With care taken to ensure we do not underflow? Something like unsigned long nr[NR_LRU_LISTS]; unsigned long targets[NR_LRU_LISTS]; ... memcpy(targets, nr, sizeof(nr)); ... nr[lru] = targets[lru] * percentage / 100; nr[lru] -= min(nr[lru], (targets[lru] - nr[lru])); lru += LRU_ACTIVE; nr[lru] = targets[lru] * percentage / 100; nr[lru] -= min(nr[lru], (targets[lru] - nr[lru])); ? -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 02/10] mm: vmscan: Obey proportional scanning requirements for kswapd 2013-04-10 14:08 ` Mel Gorman @ 2013-04-11 0:14 ` Kamezawa Hiroyuki 2013-04-11 9:09 ` Mel Gorman 0 siblings, 1 reply; 44+ messages in thread From: Kamezawa Hiroyuki @ 2013-04-11 0:14 UTC (permalink / raw) To: Mel Gorman Cc: Andrew Morton, Jiri Slaby, Valdis Kletnieks, Rik van Riel, Zlatko Calusic, Johannes Weiner, dormando, Satoru Moriya, Michal Hocko, Linux-MM, LKML (2013/04/10 23:08), Mel Gorman wrote: > On Wed, Apr 10, 2013 at 04:16:47PM +0900, Kamezawa Hiroyuki wrote: >> (2013/04/09 20:06), Mel Gorman wrote: >>> Simplistically, the anon and file LRU lists are scanned proportionally >>> depending on the value of vm.swappiness although there are other factors >>> taken into account by get_scan_count(). The patch "mm: vmscan: Limit >>> the number of pages kswapd reclaims" limits the number of pages kswapd >>> reclaims but it breaks this proportional scanning and may evenly shrink >>> anon/file LRUs regardless of vm.swappiness. >>> >>> This patch preserves the proportional scanning and reclaim. It does mean >>> that kswapd will reclaim more than requested but the number of pages will >>> be related to the high watermark. >>> >>> [mhocko@suse.cz: Correct proportional reclaim for memcg and simplify] >>> Signed-off-by: Mel Gorman <mgorman@suse.de> >>> Acked-by: Rik van Riel <riel@redhat.com> >>> --- >>> mm/vmscan.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++-------- >>> 1 file changed, 46 insertions(+), 8 deletions(-) >>> >>> diff --git a/mm/vmscan.c b/mm/vmscan.c >>> index 4835a7a..0742c45 100644 >>> --- a/mm/vmscan.c >>> +++ b/mm/vmscan.c >>> @@ -1825,13 +1825,21 @@ static void shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc) >>> enum lru_list lru; >>> unsigned long nr_reclaimed = 0; >>> unsigned long nr_to_reclaim = sc->nr_to_reclaim; >>> + unsigned long nr_anon_scantarget, nr_file_scantarget; >>> struct blk_plug plug; >>> + bool scan_adjusted = false; >>> >>> get_scan_count(lruvec, sc, nr); >>> >>> + /* Record the original scan target for proportional adjustments later */ >>> + nr_file_scantarget = nr[LRU_INACTIVE_FILE] + nr[LRU_ACTIVE_FILE] + 1; >>> + nr_anon_scantarget = nr[LRU_INACTIVE_ANON] + nr[LRU_ACTIVE_ANON] + 1; >>> + >> >> I'm sorry I couldn't understand the calc... >> >> Assume here >> nr_file_scantarget = 100 >> nr_anon_file_target = 100. >> > > I think you might have meant nr_anon_scantarget here instead of > nr_anon_file_target. > >> >>> blk_start_plug(&plug); >>> while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] || >>> nr[LRU_INACTIVE_FILE]) { >>> + unsigned long nr_anon, nr_file, percentage; >>> + >>> for_each_evictable_lru(lru) { >>> if (nr[lru]) { >>> nr_to_scan = min(nr[lru], SWAP_CLUSTER_MAX); >>> @@ -1841,17 +1849,47 @@ static void shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc) >>> lruvec, sc); >>> } >>> } >>> + >>> + if (nr_reclaimed < nr_to_reclaim || scan_adjusted) >>> + continue; >>> + >>> /* >>> - * On large memory systems, scan >> priority can become >>> - * really large. This is fine for the starting priority; >>> - * we want to put equal scanning pressure on each zone. >>> - * However, if the VM has a harder time of freeing pages, >>> - * with multiple processes reclaiming pages, the total >>> - * freeing target can get unreasonably large. >>> + * For global direct reclaim, reclaim only the number of pages >>> + * requested. Less care is taken to scan proportionally as it >>> + * is more important to minimise direct reclaim stall latency >>> + * than it is to properly age the LRU lists. >>> */ >>> - if (nr_reclaimed >= nr_to_reclaim && >>> - sc->priority < DEF_PRIORITY) >>> + if (global_reclaim(sc) && !current_is_kswapd()) >>> break; >>> + >>> + /* >>> + * For kswapd and memcg, reclaim at least the number of pages >>> + * requested. Ensure that the anon and file LRUs shrink >>> + * proportionally what was requested by get_scan_count(). We >>> + * stop reclaiming one LRU and reduce the amount scanning >>> + * proportional to the original scan target. >>> + */ >>> + nr_file = nr[LRU_INACTIVE_FILE] + nr[LRU_ACTIVE_FILE]; >>> + nr_anon = nr[LRU_INACTIVE_ANON] + nr[LRU_ACTIVE_ANON]; >>> + >> >> Then, nr_file = 80, nr_anon=70. >> > > As we scan evenly in SCAN_CLUSTER_MAX groups of pages, this wouldn't happen > but for the purposes of discussions, lets assume it did. > >> >>> + if (nr_file > nr_anon) { >>> + lru = LRU_BASE; >>> + percentage = nr_anon * 100 / nr_anon_scantarget; >>> + } else { >>> + lru = LRU_FILE; >>> + percentage = nr_file * 100 / nr_file_scantarget; >>> + } >> >> the percentage will be 70. >> > > Yes. > >>> + >>> + /* Stop scanning the smaller of the LRU */ >>> + nr[lru] = 0; >>> + nr[lru + LRU_ACTIVE] = 0; >>> + >> >> this will stop anon scan. >> > > Yes. > >>> + /* Reduce scanning of the other LRU proportionally */ >>> + lru = (lru == LRU_FILE) ? LRU_BASE : LRU_FILE; >>> + nr[lru] = nr[lru] * percentage / 100;; >>> + nr[lru + LRU_ACTIVE] = nr[lru + LRU_ACTIVE] * percentage / 100; >>> + >> >> finally, in the next iteration, >> >> nr[file] = 80 * 0.7 = 56. >> >> After loop, anon-scan is 30 pages , file-scan is 76(20+56) pages.. >> > > Well spotted, this would indeed reclaim too many pages from the other > LRU. I wanted to avoid recording the original scan targets as it's an > extra 40 bytes on the stack but it's unavoidable. > >> I think the calc here should be >> >> nr[lru] = nr_lru_scantarget * percentage / 100 - nr[lru] >> >> Here, 80-70=10 more pages to scan..should be proportional. >> > > nr[lru] at the end there is pages remaining to be scanned not pages > scanned already. yes. > Did you mean something like this? > > nr[lru] = scantarget[lru] * percentage / 100 - (scantarget[lru] - nr[lru]) > For clarification, this "percentage" means the ratio of remaining scan target of another LRU. So, *scanned* percentage is "100 - percentage", right ? If I understand the changelog correctly, you'd like to keep scantarget[anon] : scantarget[file] == really_scanned_num[anon] : really_scanned_num[file] even if we stop scanning in the middle of scantarget. And you introduced "percentage" to make sure that both scantarget should be done in the same ratio. So...another lru should scan scantarget[x] * (100 - percentage)/100 in total. nr[lru] = scantarget[lru] * (100 - percentage)/100 - (scantarget[lru] - nr[lru]) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^ proportionally adjusted scan target already scanned num = nr[lru] - scantarget[lru] * percentage/100. This means to avoid scanning the amount of pages in the ratio which another lru didn't scan. > With care taken to ensure we do not underflow? yes. Regards, -Kame > Something like > > unsigned long nr[NR_LRU_LISTS]; > unsigned long targets[NR_LRU_LISTS]; > > ... > > memcpy(targets, nr, sizeof(nr)); > > ... > > nr[lru] = targets[lru] * percentage / 100; > nr[lru] -= min(nr[lru], (targets[lru] - nr[lru])); > > lru += LRU_ACTIVE; > nr[lru] = targets[lru] * percentage / 100; > nr[lru] -= min(nr[lru], (targets[lru] - nr[lru])); > > ? > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 02/10] mm: vmscan: Obey proportional scanning requirements for kswapd 2013-04-11 0:14 ` Kamezawa Hiroyuki @ 2013-04-11 9:09 ` Mel Gorman 0 siblings, 0 replies; 44+ messages in thread From: Mel Gorman @ 2013-04-11 9:09 UTC (permalink / raw) To: Kamezawa Hiroyuki Cc: Andrew Morton, Jiri Slaby, Valdis Kletnieks, Rik van Riel, Zlatko Calusic, Johannes Weiner, dormando, Satoru Moriya, Michal Hocko, Linux-MM, LKML On Thu, Apr 11, 2013 at 09:14:19AM +0900, Kamezawa Hiroyuki wrote: > > > >nr[lru] at the end there is pages remaining to be scanned not pages > >scanned already. > > yes. > > >Did you mean something like this? > > > >nr[lru] = scantarget[lru] * percentage / 100 - (scantarget[lru] - nr[lru]) > > > > For clarification, this "percentage" means the ratio of remaining scan target of > another LRU. So, *scanned* percentage is "100 - percentage", right ? > Yes, correct. > If I understand the changelog correctly, you'd like to keep > > scantarget[anon] : scantarget[file] > == really_scanned_num[anon] : really_scanned_num[file] > Yes. > even if we stop scanning in the middle of scantarget. And you introduced "percentage" > to make sure that both scantarget should be done in the same ratio. > Yes. > So...another lru should scan scantarget[x] * (100 - percentage)/100 in total. > > nr[lru] = scantarget[lru] * (100 - percentage)/100 - (scantarget[lru] - nr[lru]) > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^ > proportionally adjusted scan target already scanned num > > = nr[lru] - scantarget[lru] * percentage/100. > Yes, you are completely correct. This preserves the original ratio of anon:file scanning properly. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* [RFC PATCH 0/8] Reduce system disruption due to kswapd @ 2013-03-17 13:04 Mel Gorman 2013-03-17 13:04 ` [PATCH 02/10] mm: vmscan: Obey proportional scanning requirements for kswapd Mel Gorman 0 siblings, 1 reply; 44+ messages in thread From: Mel Gorman @ 2013-03-17 13:04 UTC (permalink / raw) To: Linux-MM Cc: Jiri Slaby, Valdis Kletnieks, Rik van Riel, Zlatko Calusic, Johannes Weiner, dormando, Satoru Moriya, Michal Hocko, LKML, Mel Gorman Kswapd and page reclaim behaviour has been screwy in one way or the other for a long time. Very broadly speaking it worked in the far past because machines were limited in memory so it did not have that many pages to scan and it stalled congestion_wait() frequently to prevent it going completely nuts. In recent times it has behaved very unsatisfactorily with some of the problems compounded by the removal of stall logic and the introduction of transparent hugepage support with high-order reclaims. There are many variations of bugs that are rooted in this area. One example is reports of a large copy operations or backup causing the machine to grind to a halt or applications pushed to swap. Sometimes in low memory situations a large percentage of memory suddenly gets reclaimed. In other cases an application starts and kswapd hits 100% CPU usage for prolonged periods of time and so on. There is now talk of introducing features like an extra free kbytes tunable to work around aspects of the problem instead of trying to deal with it. It's compounded by the problem that it can be very workload and machine specific. This RFC is aimed at investigating if kswapd can be address these various problems in a relatively straight-forward fashion without a fundamental rewrite. Patches 1-2 limits the number of pages kswapd reclaims while still obeying the anon/file proportion of the LRUs it should be scanning. Patches 3-4 control how and when kswapd raises its scanning priority and deletes the scanning restart logic which is tricky to follow. Patch 5 notes that it is too easy for kswapd to reach priority 0 when scanning and then reclaim the world. Down with that sort of thing. Patch 6 notes that kswapd starts writeback based on scanning priority which is not necessarily related to dirty pages. It will have kswapd writeback pages if a number of unqueued dirty pages have been recently encountered at the tail of the LRU. Patch 7 notes that sometimes kswapd should stall waiting on IO to complete to reduce LRU churn and the likelihood that it'll reclaim young clean pages or push applications to swap. It will cause kswapd to block on IO if it detects that pages being reclaimed under writeback are recycling through the LRU before the IO completes. Patch 8 shrinks slab just once per priority scanned or if a zone is otherwise unreclaimable to avoid hammering slab when kswapd has to skip a large number of pages. Patches 9-10 are cosmetic but balance_pgdat() might be easier to follow. This was tested using memcached+memcachetest while some background IO was in progress as implemented by the parallel IO tests implement in MM Tests. memcachetest benchmarks how many operations/second memcached can service and it is run multiple times. It starts with no background IO and then re-runs the test with larger amounts of IO in the background to roughly simulate a large copy in progress. The expectation is that the IO should have little or no impact on memcachetest which is running entirely in memory. Ordinarily this test is run a number of times for each amount of IO and the worse result reported but these results are based on just one run as a quick test. ftrace was also running so there was additional sources of interference and the results would be more varaiable than normal. More comprehensive tests are be queued but they'll take quite some time to complete. Kernel baseline is v3.9-rc2 and the following kernels were tested vanilla 3.9-rc2 flatten-v1r8 Patches 1-4 limitprio-v1r8 Patches 1-5 write-v1r8 Patches 1-6 block-v1r8 Patches 1-7 tidy-v1r8 Patches 1-10 3.9.0-rc2 3.9.0-rc2 3.9.0-rc2 3.9.0-rc2 3.9.0-rc2 vanilla flatten-v1r8 limitprio-v1r8 block-v1r8 tidy-v1r8 Ops memcachetest-0M 10932.00 ( 0.00%) 10898.00 ( -0.31%) 10903.00 ( -0.27%) 10911.00 ( -0.19%) 10916.00 ( -0.15%) Ops memcachetest-749M 7816.00 ( 0.00%) 10715.00 ( 37.09%) 11006.00 ( 40.81%) 10903.00 ( 39.50%) 10856.00 ( 38.89%) Ops memcachetest-2498M 3974.00 ( 0.00%) 3190.00 (-19.73%) 11623.00 (192.48%) 11142.00 (180.37%) 10930.00 (175.04%) Ops memcachetest-4246M 2355.00 ( 0.00%) 2915.00 ( 23.78%) 12619.00 (435.84%) 11212.00 (376.09%) 10904.00 (363.01%) Ops io-duration-0M 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) Ops io-duration-749M 31.00 ( 0.00%) 16.00 ( 48.39%) 9.00 ( 70.97%) 9.00 ( 70.97%) 8.00 ( 74.19%) Ops io-duration-2498M 89.00 ( 0.00%) 111.00 (-24.72%) 27.00 ( 69.66%) 28.00 ( 68.54%) 27.00 ( 69.66%) Ops io-duration-4246M 182.00 ( 0.00%) 165.00 ( 9.34%) 49.00 ( 73.08%) 46.00 ( 74.73%) 45.00 ( 75.27%) Ops swaptotal-0M 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) Ops swaptotal-749M 219394.00 ( 0.00%) 162045.00 ( 26.14%) 0.00 ( 0.00%) 0.00 ( 0.00%) 16.00 ( 99.99%) Ops swaptotal-2498M 312904.00 ( 0.00%) 389809.00 (-24.58%) 334.00 ( 99.89%) 1233.00 ( 99.61%) 8.00 (100.00%) Ops swaptotal-4246M 471517.00 ( 0.00%) 395170.00 ( 16.19%) 0.00 ( 0.00%) 1117.00 ( 99.76%) 29.00 ( 99.99%) Ops swapin-0M 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) Ops swapin-749M 62057.00 ( 0.00%) 5954.00 ( 90.41%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) Ops swapin-2498M 143617.00 ( 0.00%) 154592.00 ( -7.64%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) Ops swapin-4246M 160417.00 ( 0.00%) 125904.00 ( 21.51%) 0.00 ( 0.00%) 13.00 ( 99.99%) 0.00 ( 0.00%) Ops minorfaults-0M 1683549.00 ( 0.00%) 1685771.00 ( -0.13%) 1675398.00 ( 0.48%) 1723245.00 ( -2.36%) 1683717.00 ( -0.01%) Ops minorfaults-749M 1788977.00 ( 0.00%) 1871737.00 ( -4.63%) 1617193.00 ( 9.60%) 1610892.00 ( 9.95%) 1682760.00 ( 5.94%) Ops minorfaults-2498M 1836894.00 ( 0.00%) 1796566.00 ( 2.20%) 1677878.00 ( 8.66%) 1685741.00 ( 8.23%) 1609514.00 ( 12.38%) Ops minorfaults-4246M 1797685.00 ( 0.00%) 1819832.00 ( -1.23%) 1689258.00 ( 6.03%) 1690695.00 ( 5.95%) 1684430.00 ( 6.30%) Ops majorfaults-0M 5.00 ( 0.00%) 7.00 (-40.00%) 5.00 ( 0.00%) 24.00 (-380.00%) 9.00 (-80.00%) Ops majorfaults-749M 10310.00 ( 0.00%) 876.00 ( 91.50%) 73.00 ( 99.29%) 63.00 ( 99.39%) 90.00 ( 99.13%) Ops majorfaults-2498M 20809.00 ( 0.00%) 22377.00 ( -7.54%) 102.00 ( 99.51%) 110.00 ( 99.47%) 55.00 ( 99.74%) Ops majorfaults-4246M 23228.00 ( 0.00%) 20270.00 ( 12.73%) 196.00 ( 99.16%) 222.00 ( 99.04%) 102.00 ( 99.56%) Note how the vanilla kernel's performance is ruined by the parallel IO with performance of 10932 ops/sec dropping to 2355 ops/sec. Note that this is likely due to the swap activity and major faults as memcached is pushed to swap prematurely. flatten-v1r8 overall reduces the amount of reclaim but it's a minor improvement. limitprio-v1r8 almost eliminates the impact the parallel IO has on the memcachetest workload. The ops/sec remain above 10K ops/sec and there is no swapin activity. The remainer of the series has very little impact on the memcachetest workload but the impact on kswapd is visible in the vmstat figures. 3.9.0-rc2 3.9.0-rc2 3.9.0-rc2 3.9.0-rc2 3.9.0-rc2 vanillaflatten-v1r8limitprio-v1r8 block-v1r8 tidy-v1r8 Page Ins 1567012 1238608 90388 103832 75684 Page Outs 12837552 15223512 12726464 13613400 12668604 Swap Ins 366362 286798 0 13 0 Swap Outs 637724 660574 334 2337 53 Direct pages scanned 0 0 0 196955 292532 Kswapd pages scanned 11763732 4389473 207629411 22337712 3885443 Kswapd pages reclaimed 1262812 1186228 1228379 971375 685338 Direct pages reclaimed 0 0 0 186053 267255 Kswapd efficiency 10% 27% 0% 4% 17% Kswapd velocity 9111.544 3407.923 161226.742 17342.002 3009.265 Direct efficiency 100% 100% 100% 94% 91% Direct velocity 0.000 0.000 0.000 152.907 226.565 Percentage direct scans 0% 0% 0% 0% 7% Page writes by reclaim 2858699 1159073 42498573 21198413 3018972 Page writes file 2220975 498499 42498239 21196076 3018919 Page writes anon 637724 660574 334 2337 53 Page reclaim immediate 6243 125 69598 1056 4370 Page rescued immediate 0 0 0 0 0 Slabs scanned 35328 39296 32000 62080 25600 Direct inode steals 0 0 0 0 0 Kswapd inode steals 16899 5491 6375 19957 907 Kswapd skipped wait 0 0 0 0 0 THP fault alloc 14 7 10 50 7 THP collapse alloc 491 465 637 709 629 THP splits 10 12 5 7 5 THP fault fallback 0 0 0 0 0 THP collapse fail 0 0 0 0 0 Compaction stalls 0 0 0 81 3 Compaction success 0 0 0 74 0 Compaction failures 0 0 0 7 3 Page migrate success 0 0 0 43855 0 Page migrate failure 0 0 0 0 0 Compaction pages isolated 0 0 0 97582 0 Compaction migrate scanned 0 0 0 111419 0 Compaction free scanned 0 0 0 324617 0 Compaction cost 0 0 0 48 0 While limitprio-v1r8 improves the performance of memcachetest, note what it does to kswapd activity apparently scanning on average 162K pages/second. In reality what happened was that there was spikes in reclaim activity but nevertheless it's severe. The patch that blocks kswapd when it encounters too many pages under writeback severely reduces the amount of scanning activity. Note that the full series also reduces the amount of slab shrinking heavily reduces the amount of inodes reclaimed by kswapd. Comments? include/linux/mmzone.h | 16 ++ mm/vmscan.c | 387 +++++++++++++++++++++++++++++-------------------- 2 files changed, 245 insertions(+), 158 deletions(-) -- 1.8.1.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH 02/10] mm: vmscan: Obey proportional scanning requirements for kswapd 2013-03-17 13:04 [RFC PATCH 0/8] Reduce system disruption due to kswapd Mel Gorman @ 2013-03-17 13:04 ` Mel Gorman 2013-03-17 14:39 ` Andi Kleen ` (3 more replies) 0 siblings, 4 replies; 44+ messages in thread From: Mel Gorman @ 2013-03-17 13:04 UTC (permalink / raw) To: Linux-MM Cc: Jiri Slaby, Valdis Kletnieks, Rik van Riel, Zlatko Calusic, Johannes Weiner, dormando, Satoru Moriya, Michal Hocko, LKML, Mel Gorman Simplistically, the anon and file LRU lists are scanned proportionally depending on the value of vm.swappiness although there are other factors taken into account by get_scan_count(). The patch "mm: vmscan: Limit the number of pages kswapd reclaims" limits the number of pages kswapd reclaims but it breaks this proportional scanning and may evenly shrink anon/file LRUs regardless of vm.swappiness. This patch preserves the proportional scanning and reclaim. It does mean that kswapd will reclaim more than requested but the number of pages will be related to the high watermark. Signed-off-by: Mel Gorman <mgorman@suse.de> --- mm/vmscan.c | 52 +++++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 41 insertions(+), 11 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 4835a7a..182ff15 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1815,6 +1815,45 @@ out: } } +static void recalculate_scan_count(unsigned long nr_reclaimed, + unsigned long nr_to_reclaim, + unsigned long nr[NR_LRU_LISTS]) +{ + enum lru_list l; + + /* + * For direct reclaim, reclaim the number of pages requested. Less + * care is taken to ensure that scanning for each LRU is properly + * proportional. This is unfortunate and is improper aging but + * minimises the amount of time a process is stalled. + */ + if (!current_is_kswapd()) { + if (nr_reclaimed >= nr_to_reclaim) { + for_each_evictable_lru(l) + nr[l] = 0; + } + return; + } + + /* + * For kswapd, reclaim at least the number of pages requested. + * However, ensure that LRUs shrink by the proportion requested + * by get_scan_count() so vm.swappiness is obeyed. + */ + if (nr_reclaimed >= nr_to_reclaim) { + unsigned long min = ULONG_MAX; + + /* Find the LRU with the fewest pages to reclaim */ + for_each_evictable_lru(l) + if (nr[l] < min) + min = nr[l]; + + /* Normalise the scan counts so kswapd scans proportionally */ + for_each_evictable_lru(l) + nr[l] -= min; + } +} + /* * This is a basic per-zone page freer. Used by both kswapd and direct reclaim. */ @@ -1841,17 +1880,8 @@ static void shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc) lruvec, sc); } } - /* - * On large memory systems, scan >> priority can become - * really large. This is fine for the starting priority; - * we want to put equal scanning pressure on each zone. - * However, if the VM has a harder time of freeing pages, - * with multiple processes reclaiming pages, the total - * freeing target can get unreasonably large. - */ - if (nr_reclaimed >= nr_to_reclaim && - sc->priority < DEF_PRIORITY) - break; + + recalculate_scan_count(nr_reclaimed, nr_to_reclaim, nr); } blk_finish_plug(&plug); sc->nr_reclaimed += nr_reclaimed; -- 1.8.1.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH 02/10] mm: vmscan: Obey proportional scanning requirements for kswapd 2013-03-17 13:04 ` [PATCH 02/10] mm: vmscan: Obey proportional scanning requirements for kswapd Mel Gorman @ 2013-03-17 14:39 ` Andi Kleen 2013-03-17 15:08 ` Mel Gorman 2013-03-21 1:10 ` Rik van Riel ` (2 subsequent siblings) 3 siblings, 1 reply; 44+ messages in thread From: Andi Kleen @ 2013-03-17 14:39 UTC (permalink / raw) To: Mel Gorman Cc: Linux-MM, Jiri Slaby, Valdis Kletnieks, Rik van Riel, Zlatko Calusic, Johannes Weiner, dormando, Satoru Moriya, Michal Hocko, LKML Mel Gorman <mgorman@suse.de> writes: > + > + /* > + * For direct reclaim, reclaim the number of pages requested. Less > + * care is taken to ensure that scanning for each LRU is properly > + * proportional. This is unfortunate and is improper aging but > + * minimises the amount of time a process is stalled. > + */ > + if (!current_is_kswapd()) { > + if (nr_reclaimed >= nr_to_reclaim) { > + for_each_evictable_lru(l) Don't we need some NUMA awareness here? Similar below. -Andi -- ak@linux.intel.com -- Speaking for myself only -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 02/10] mm: vmscan: Obey proportional scanning requirements for kswapd 2013-03-17 14:39 ` Andi Kleen @ 2013-03-17 15:08 ` Mel Gorman 0 siblings, 0 replies; 44+ messages in thread From: Mel Gorman @ 2013-03-17 15:08 UTC (permalink / raw) To: Andi Kleen Cc: Linux-MM, Jiri Slaby, Valdis Kletnieks, Rik van Riel, Zlatko Calusic, Johannes Weiner, dormando, Satoru Moriya, Michal Hocko, LKML On Sun, Mar 17, 2013 at 07:39:37AM -0700, Andi Kleen wrote: > Mel Gorman <mgorman@suse.de> writes: > > + > > + /* > > + * For direct reclaim, reclaim the number of pages requested. Less > > + * care is taken to ensure that scanning for each LRU is properly > > + * proportional. This is unfortunate and is improper aging but > > + * minimises the amount of time a process is stalled. > > + */ > > + if (!current_is_kswapd()) { > > + if (nr_reclaimed >= nr_to_reclaim) { > > + for_each_evictable_lru(l) > > Don't we need some NUMA awareness here? > Similar below. > Of what sort? In this context we are usually dealing with a zone and in the case of kswapd it is only ever dealing with a single node. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 02/10] mm: vmscan: Obey proportional scanning requirements for kswapd 2013-03-17 13:04 ` [PATCH 02/10] mm: vmscan: Obey proportional scanning requirements for kswapd Mel Gorman 2013-03-17 14:39 ` Andi Kleen @ 2013-03-21 1:10 ` Rik van Riel 2013-03-21 9:54 ` Mel Gorman 2013-03-21 14:01 ` Michal Hocko 2013-03-21 16:25 ` Johannes Weiner 3 siblings, 1 reply; 44+ messages in thread From: Rik van Riel @ 2013-03-21 1:10 UTC (permalink / raw) To: Mel Gorman Cc: Linux-MM, Jiri Slaby, Valdis Kletnieks, Zlatko Calusic, Johannes Weiner, dormando, Satoru Moriya, Michal Hocko, LKML On 03/17/2013 09:04 AM, Mel Gorman wrote: > Simplistically, the anon and file LRU lists are scanned proportionally > depending on the value of vm.swappiness although there are other factors > taken into account by get_scan_count(). The patch "mm: vmscan: Limit > the number of pages kswapd reclaims" limits the number of pages kswapd > reclaims but it breaks this proportional scanning and may evenly shrink > anon/file LRUs regardless of vm.swappiness. > > This patch preserves the proportional scanning and reclaim. It does mean > that kswapd will reclaim more than requested but the number of pages will > be related to the high watermark. > > Signed-off-by: Mel Gorman <mgorman@suse.de> > --- > mm/vmscan.c | 52 +++++++++++++++++++++++++++++++++++++++++----------- > 1 file changed, 41 insertions(+), 11 deletions(-) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 4835a7a..182ff15 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1815,6 +1815,45 @@ out: > } > } > > +static void recalculate_scan_count(unsigned long nr_reclaimed, > + unsigned long nr_to_reclaim, > + unsigned long nr[NR_LRU_LISTS]) > +{ > + enum lru_list l; > + > + /* > + * For direct reclaim, reclaim the number of pages requested. Less > + * care is taken to ensure that scanning for each LRU is properly > + * proportional. This is unfortunate and is improper aging but > + * minimises the amount of time a process is stalled. > + */ > + if (!current_is_kswapd()) { > + if (nr_reclaimed >= nr_to_reclaim) { > + for_each_evictable_lru(l) > + nr[l] = 0; > + } > + return; > + } This part is obvious. > + /* > + * For kswapd, reclaim at least the number of pages requested. > + * However, ensure that LRUs shrink by the proportion requested > + * by get_scan_count() so vm.swappiness is obeyed. > + */ > + if (nr_reclaimed >= nr_to_reclaim) { > + unsigned long min = ULONG_MAX; > + > + /* Find the LRU with the fewest pages to reclaim */ > + for_each_evictable_lru(l) > + if (nr[l] < min) > + min = nr[l]; > + > + /* Normalise the scan counts so kswapd scans proportionally */ > + for_each_evictable_lru(l) > + nr[l] -= min; > + } > +} This part took me a bit longer to get. Before getting to this point, we scanned the LRUs evenly. By subtracting min from all of the LRUs, we end up stopping the scanning of the LRU where we have the fewest pages left to scan. This results in the scanning being concentrated where it should be - on the LRUs where we have not done nearly enough scanning yet. However, I am not sure how to document it better than your comment already has... Acked-by: Rik van Riel <riel@redhat.com> -- All rights reversed -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 02/10] mm: vmscan: Obey proportional scanning requirements for kswapd 2013-03-21 1:10 ` Rik van Riel @ 2013-03-21 9:54 ` Mel Gorman 0 siblings, 0 replies; 44+ messages in thread From: Mel Gorman @ 2013-03-21 9:54 UTC (permalink / raw) To: Rik van Riel Cc: Linux-MM, Jiri Slaby, Valdis Kletnieks, Zlatko Calusic, Johannes Weiner, dormando, Satoru Moriya, Michal Hocko, LKML On Wed, Mar 20, 2013 at 09:10:31PM -0400, Rik van Riel wrote: > On 03/17/2013 09:04 AM, Mel Gorman wrote: > >Simplistically, the anon and file LRU lists are scanned proportionally > >depending on the value of vm.swappiness although there are other factors > >taken into account by get_scan_count(). The patch "mm: vmscan: Limit > >the number of pages kswapd reclaims" limits the number of pages kswapd > >reclaims but it breaks this proportional scanning and may evenly shrink > >anon/file LRUs regardless of vm.swappiness. > > > >This patch preserves the proportional scanning and reclaim. It does mean > >that kswapd will reclaim more than requested but the number of pages will > >be related to the high watermark. > > > >Signed-off-by: Mel Gorman <mgorman@suse.de> > >--- > > mm/vmscan.c | 52 +++++++++++++++++++++++++++++++++++++++++----------- > > 1 file changed, 41 insertions(+), 11 deletions(-) > > > >diff --git a/mm/vmscan.c b/mm/vmscan.c > >index 4835a7a..182ff15 100644 > >--- a/mm/vmscan.c > >+++ b/mm/vmscan.c > >@@ -1815,6 +1815,45 @@ out: > > } > > } > > > >+static void recalculate_scan_count(unsigned long nr_reclaimed, > >+ unsigned long nr_to_reclaim, > >+ unsigned long nr[NR_LRU_LISTS]) > >+{ > >+ enum lru_list l; > >+ > >+ /* > >+ * For direct reclaim, reclaim the number of pages requested. Less > >+ * care is taken to ensure that scanning for each LRU is properly > >+ * proportional. This is unfortunate and is improper aging but > >+ * minimises the amount of time a process is stalled. > >+ */ > >+ if (!current_is_kswapd()) { > >+ if (nr_reclaimed >= nr_to_reclaim) { > >+ for_each_evictable_lru(l) > >+ nr[l] = 0; > >+ } > >+ return; > >+ } > > This part is obvious. > > >+ /* > >+ * For kswapd, reclaim at least the number of pages requested. > >+ * However, ensure that LRUs shrink by the proportion requested > >+ * by get_scan_count() so vm.swappiness is obeyed. > >+ */ > >+ if (nr_reclaimed >= nr_to_reclaim) { > >+ unsigned long min = ULONG_MAX; > >+ > >+ /* Find the LRU with the fewest pages to reclaim */ > >+ for_each_evictable_lru(l) > >+ if (nr[l] < min) > >+ min = nr[l]; > >+ > >+ /* Normalise the scan counts so kswapd scans proportionally */ > >+ for_each_evictable_lru(l) > >+ nr[l] -= min; > >+ } > >+} > > This part took me a bit longer to get. > > Before getting to this point, we scanned the LRUs evenly. > By subtracting min from all of the LRUs, we end up stopping > the scanning of the LRU where we have the fewest pages left > to scan. > > This results in the scanning being concentrated where it > should be - on the LRUs where we have not done nearly > enough scanning yet. > This is exactly what my intention was. It does mean that we potentially reclaim much more than required by sc->nr_to_reclaim but I did not think of a straight-forward way around that that would work in every case. > However, I am not sure how to document it better than > your comment already has... > > Acked-by: Rik van Riel <riel@redhat.com> > Thanks. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 02/10] mm: vmscan: Obey proportional scanning requirements for kswapd 2013-03-17 13:04 ` [PATCH 02/10] mm: vmscan: Obey proportional scanning requirements for kswapd Mel Gorman 2013-03-17 14:39 ` Andi Kleen 2013-03-21 1:10 ` Rik van Riel @ 2013-03-21 14:01 ` Michal Hocko 2013-03-21 14:31 ` Mel Gorman 2013-03-21 16:25 ` Johannes Weiner 3 siblings, 1 reply; 44+ messages in thread From: Michal Hocko @ 2013-03-21 14:01 UTC (permalink / raw) To: Mel Gorman Cc: Linux-MM, Jiri Slaby, Valdis Kletnieks, Rik van Riel, Zlatko Calusic, Johannes Weiner, dormando, Satoru Moriya, LKML On Sun 17-03-13 13:04:08, Mel Gorman wrote: > Simplistically, the anon and file LRU lists are scanned proportionally > depending on the value of vm.swappiness although there are other factors > taken into account by get_scan_count(). The patch "mm: vmscan: Limit > the number of pages kswapd reclaims" limits the number of pages kswapd > reclaims but it breaks this proportional scanning and may evenly shrink > anon/file LRUs regardless of vm.swappiness. > > This patch preserves the proportional scanning and reclaim. It does mean > that kswapd will reclaim more than requested but the number of pages will > be related to the high watermark. > > Signed-off-by: Mel Gorman <mgorman@suse.de> > --- > mm/vmscan.c | 52 +++++++++++++++++++++++++++++++++++++++++----------- > 1 file changed, 41 insertions(+), 11 deletions(-) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 4835a7a..182ff15 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1815,6 +1815,45 @@ out: > } > } > > +static void recalculate_scan_count(unsigned long nr_reclaimed, > + unsigned long nr_to_reclaim, > + unsigned long nr[NR_LRU_LISTS]) > +{ > + enum lru_list l; > + > + /* > + * For direct reclaim, reclaim the number of pages requested. Less > + * care is taken to ensure that scanning for each LRU is properly > + * proportional. This is unfortunate and is improper aging but > + * minimises the amount of time a process is stalled. > + */ > + if (!current_is_kswapd()) { > + if (nr_reclaimed >= nr_to_reclaim) { > + for_each_evictable_lru(l) > + nr[l] = 0; > + } > + return; Heh, this is nicely cryptically said what could be done in shrink_lruvec as if (!current_is_kswapd()) { if (nr_reclaimed >= nr_to_reclaim) break; } Besides that this is not memcg aware which I think it would break targeted reclaim which is kind of direct reclaim but it still would be good to stay proportional because it starts with DEF_PRIORITY. I would suggest moving this back to shrink_lruvec and update the test as follows: diff --git a/mm/vmscan.c b/mm/vmscan.c index 182ff15..5cf5a4b 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1822,23 +1822,9 @@ static void recalculate_scan_count(unsigned long nr_reclaimed, enum lru_list l; /* - * For direct reclaim, reclaim the number of pages requested. Less - * care is taken to ensure that scanning for each LRU is properly - * proportional. This is unfortunate and is improper aging but - * minimises the amount of time a process is stalled. - */ - if (!current_is_kswapd()) { - if (nr_reclaimed >= nr_to_reclaim) { - for_each_evictable_lru(l) - nr[l] = 0; - } - return; - } - - /* - * For kswapd, reclaim at least the number of pages requested. - * However, ensure that LRUs shrink by the proportion requested - * by get_scan_count() so vm.swappiness is obeyed. + * Reclaim at least the number of pages requested. However, + * ensure that LRUs shrink by the proportion requested by + * get_scan_count() so vm.swappiness is obeyed. */ if (nr_reclaimed >= nr_to_reclaim) { unsigned long min = ULONG_MAX; @@ -1881,6 +1867,18 @@ static void shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc) } } + /* + * For global direct reclaim, reclaim the number of + * pages requested. Less care is taken to ensure that + * scanning for each LRU is properly proportional. This + * is unfortunate and is improper aging but minimises + * the amount of time a process is stalled. + */ + if (global_reclaim(sc) && !current_is_kswapd()) { + if (nr_reclaimed >= nr_to_reclaim) + break + } + recalculate_scan_count(nr_reclaimed, nr_to_reclaim, nr); } blk_finish_plug(&plug); > + } > + > + /* > + * For kswapd, reclaim at least the number of pages requested. > + * However, ensure that LRUs shrink by the proportion requested > + * by get_scan_count() so vm.swappiness is obeyed. > + */ > + if (nr_reclaimed >= nr_to_reclaim) { > + unsigned long min = ULONG_MAX; > + > + /* Find the LRU with the fewest pages to reclaim */ > + for_each_evictable_lru(l) > + if (nr[l] < min) > + min = nr[l]; > + > + /* Normalise the scan counts so kswapd scans proportionally */ > + for_each_evictable_lru(l) > + nr[l] -= min; > + } It looked scary at first glance but it makes sense. Every round (after we have reclaimed enough) one LRU is pulled out and others are proportionally inhibited. > +} > + > /* > * This is a basic per-zone page freer. Used by both kswapd and direct reclaim. > */ > @@ -1841,17 +1880,8 @@ static void shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc) > lruvec, sc); > } > } > - /* > - * On large memory systems, scan >> priority can become > - * really large. This is fine for the starting priority; > - * we want to put equal scanning pressure on each zone. > - * However, if the VM has a harder time of freeing pages, > - * with multiple processes reclaiming pages, the total > - * freeing target can get unreasonably large. > - */ > - if (nr_reclaimed >= nr_to_reclaim && > - sc->priority < DEF_PRIORITY) > - break; > + > + recalculate_scan_count(nr_reclaimed, nr_to_reclaim, nr); > } > blk_finish_plug(&plug); > sc->nr_reclaimed += nr_reclaimed; > -- > 1.8.1.4 > -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH 02/10] mm: vmscan: Obey proportional scanning requirements for kswapd 2013-03-21 14:01 ` Michal Hocko @ 2013-03-21 14:31 ` Mel Gorman 2013-03-21 15:07 ` Michal Hocko 0 siblings, 1 reply; 44+ messages in thread From: Mel Gorman @ 2013-03-21 14:31 UTC (permalink / raw) To: Michal Hocko Cc: Linux-MM, Jiri Slaby, Valdis Kletnieks, Rik van Riel, Zlatko Calusic, Johannes Weiner, dormando, Satoru Moriya, LKML On Thu, Mar 21, 2013 at 03:01:54PM +0100, Michal Hocko wrote: > On Sun 17-03-13 13:04:08, Mel Gorman wrote: > > Simplistically, the anon and file LRU lists are scanned proportionally > > depending on the value of vm.swappiness although there are other factors > > taken into account by get_scan_count(). The patch "mm: vmscan: Limit > > the number of pages kswapd reclaims" limits the number of pages kswapd > > reclaims but it breaks this proportional scanning and may evenly shrink > > anon/file LRUs regardless of vm.swappiness. > > > > This patch preserves the proportional scanning and reclaim. It does mean > > that kswapd will reclaim more than requested but the number of pages will > > be related to the high watermark. > > > > Signed-off-by: Mel Gorman <mgorman@suse.de> > > --- > > mm/vmscan.c | 52 +++++++++++++++++++++++++++++++++++++++++----------- > > 1 file changed, 41 insertions(+), 11 deletions(-) > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > index 4835a7a..182ff15 100644 > > --- a/mm/vmscan.c > > +++ b/mm/vmscan.c > > @@ -1815,6 +1815,45 @@ out: > > } > > } > > > > +static void recalculate_scan_count(unsigned long nr_reclaimed, > > + unsigned long nr_to_reclaim, > > + unsigned long nr[NR_LRU_LISTS]) > > +{ > > + enum lru_list l; > > + > > + /* > > + * For direct reclaim, reclaim the number of pages requested. Less > > + * care is taken to ensure that scanning for each LRU is properly > > + * proportional. This is unfortunate and is improper aging but > > + * minimises the amount of time a process is stalled. > > + */ > > + if (!current_is_kswapd()) { > > + if (nr_reclaimed >= nr_to_reclaim) { > > + for_each_evictable_lru(l) > > + nr[l] = 0; > > + } > > + return; > > Heh, this is nicely cryptically said what could be done in shrink_lruvec > as > if (!current_is_kswapd()) { > if (nr_reclaimed >= nr_to_reclaim) > break; > } > Pretty much. At one point during development, this function was more complex and it evolved into this without me rechecking if splitting it out still made sense. > Besides that this is not memcg aware which I think it would break > targeted reclaim which is kind of direct reclaim but it still would be > good to stay proportional because it starts with DEF_PRIORITY. > This does break memcg because it's a special sort of direct reclaim. > I would suggest moving this back to shrink_lruvec and update the test as > follows: I also noticed that we check whether the scan counts need to be normalised more than once and this reshuffling checks nr_reclaimed twice. How about this? diff --git a/mm/vmscan.c b/mm/vmscan.c index 182ff15..320a2f4 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1815,45 +1815,6 @@ out: } } -static void recalculate_scan_count(unsigned long nr_reclaimed, - unsigned long nr_to_reclaim, - unsigned long nr[NR_LRU_LISTS]) -{ - enum lru_list l; - - /* - * For direct reclaim, reclaim the number of pages requested. Less - * care is taken to ensure that scanning for each LRU is properly - * proportional. This is unfortunate and is improper aging but - * minimises the amount of time a process is stalled. - */ - if (!current_is_kswapd()) { - if (nr_reclaimed >= nr_to_reclaim) { - for_each_evictable_lru(l) - nr[l] = 0; - } - return; - } - - /* - * For kswapd, reclaim at least the number of pages requested. - * However, ensure that LRUs shrink by the proportion requested - * by get_scan_count() so vm.swappiness is obeyed. - */ - if (nr_reclaimed >= nr_to_reclaim) { - unsigned long min = ULONG_MAX; - - /* Find the LRU with the fewest pages to reclaim */ - for_each_evictable_lru(l) - if (nr[l] < min) - min = nr[l]; - - /* Normalise the scan counts so kswapd scans proportionally */ - for_each_evictable_lru(l) - nr[l] -= min; - } -} - /* * This is a basic per-zone page freer. Used by both kswapd and direct reclaim. */ @@ -1864,7 +1825,9 @@ static void shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc) enum lru_list lru; unsigned long nr_reclaimed = 0; unsigned long nr_to_reclaim = sc->nr_to_reclaim; + unsigned long min; struct blk_plug plug; + bool scan_adjusted = false; get_scan_count(lruvec, sc, nr); @@ -1881,7 +1844,33 @@ static void shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc) } } - recalculate_scan_count(nr_reclaimed, nr_to_reclaim, nr); + if (nr_reclaimed < nr_to_reclaim || scan_adjusted) + continue; + + /* + * For global direct reclaim, reclaim only the number of pages + * requested. Less care is taken to scan proportionally as it + * is more important to minimise direct reclaim stall latency + * than it is to properly age the LRU lists. + */ + if (global_reclaim(sc) && !current_is_kswapd()) + break; + + /* + * For kswapd and memcg, reclaim at least the number of pages + * requested. However, ensure that LRUs shrink by the + * proportion requested by get_scan_count() so vm.swappiness + * is obeyed. Find the smallest LRU list and normalise the + * scan counts so the fewest number of pages are reclaimed + * while still maintaining proportionality. + */ + min = ULONG_MAX; + for_each_evictable_lru(lru) + if (nr[lru] < min) + min = nr[lru]; + for_each_evictable_lru(lru) + nr[lru] -= min; + scan_adjusted = true; } blk_finish_plug(&plug); sc->nr_reclaimed += nr_reclaimed; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH 02/10] mm: vmscan: Obey proportional scanning requirements for kswapd 2013-03-21 14:31 ` Mel Gorman @ 2013-03-21 15:07 ` Michal Hocko 2013-03-21 15:34 ` Mel Gorman 0 siblings, 1 reply; 44+ messages in thread From: Michal Hocko @ 2013-03-21 15:07 UTC (permalink / raw) To: Mel Gorman Cc: Linux-MM, Jiri Slaby, Valdis Kletnieks, Rik van Riel, Zlatko Calusic, Johannes Weiner, dormando, Satoru Moriya, LKML On Thu 21-03-13 14:31:15, Mel Gorman wrote: > On Thu, Mar 21, 2013 at 03:01:54PM +0100, Michal Hocko wrote: > > On Sun 17-03-13 13:04:08, Mel Gorman wrote: > > > Simplistically, the anon and file LRU lists are scanned proportionally > > > depending on the value of vm.swappiness although there are other factors > > > taken into account by get_scan_count(). The patch "mm: vmscan: Limit > > > the number of pages kswapd reclaims" limits the number of pages kswapd > > > reclaims but it breaks this proportional scanning and may evenly shrink > > > anon/file LRUs regardless of vm.swappiness. > > > > > > This patch preserves the proportional scanning and reclaim. It does mean > > > that kswapd will reclaim more than requested but the number of pages will > > > be related to the high watermark. > > > > > > Signed-off-by: Mel Gorman <mgorman@suse.de> > > > --- > > > mm/vmscan.c | 52 +++++++++++++++++++++++++++++++++++++++++----------- > > > 1 file changed, 41 insertions(+), 11 deletions(-) > > > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > > index 4835a7a..182ff15 100644 > > > --- a/mm/vmscan.c > > > +++ b/mm/vmscan.c > > > @@ -1815,6 +1815,45 @@ out: > > > } > > > } > > > > > > +static void recalculate_scan_count(unsigned long nr_reclaimed, > > > + unsigned long nr_to_reclaim, > > > + unsigned long nr[NR_LRU_LISTS]) > > > +{ > > > + enum lru_list l; > > > + > > > + /* > > > + * For direct reclaim, reclaim the number of pages requested. Less > > > + * care is taken to ensure that scanning for each LRU is properly > > > + * proportional. This is unfortunate and is improper aging but > > > + * minimises the amount of time a process is stalled. > > > + */ > > > + if (!current_is_kswapd()) { > > > + if (nr_reclaimed >= nr_to_reclaim) { > > > + for_each_evictable_lru(l) > > > + nr[l] = 0; > > > + } > > > + return; > > > > Heh, this is nicely cryptically said what could be done in shrink_lruvec > > as > > if (!current_is_kswapd()) { > > if (nr_reclaimed >= nr_to_reclaim) > > break; > > } > > > > Pretty much. At one point during development, this function was more > complex and it evolved into this without me rechecking if splitting it > out still made sense. > > > Besides that this is not memcg aware which I think it would break > > targeted reclaim which is kind of direct reclaim but it still would be > > good to stay proportional because it starts with DEF_PRIORITY. > > > > This does break memcg because it's a special sort of direct reclaim. > > > I would suggest moving this back to shrink_lruvec and update the test as > > follows: > > I also noticed that we check whether the scan counts need to be > normalised more than once I didn't mind this because it "disqualified" at least one LRU every round which sounds reasonable to me because all LRUs would be scanned proportionally. E.g. if swappiness is 0 then nr[anon] would be 0 and then the active/inactive aging would break? Or am I missing something? > and this reshuffling checks nr_reclaimed twice. How about this? > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 182ff15..320a2f4 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1815,45 +1815,6 @@ out: > } > } > > -static void recalculate_scan_count(unsigned long nr_reclaimed, > - unsigned long nr_to_reclaim, > - unsigned long nr[NR_LRU_LISTS]) > -{ > - enum lru_list l; > - > - /* > - * For direct reclaim, reclaim the number of pages requested. Less > - * care is taken to ensure that scanning for each LRU is properly > - * proportional. This is unfortunate and is improper aging but > - * minimises the amount of time a process is stalled. > - */ > - if (!current_is_kswapd()) { > - if (nr_reclaimed >= nr_to_reclaim) { > - for_each_evictable_lru(l) > - nr[l] = 0; > - } > - return; > - } > - > - /* > - * For kswapd, reclaim at least the number of pages requested. > - * However, ensure that LRUs shrink by the proportion requested > - * by get_scan_count() so vm.swappiness is obeyed. > - */ > - if (nr_reclaimed >= nr_to_reclaim) { > - unsigned long min = ULONG_MAX; > - > - /* Find the LRU with the fewest pages to reclaim */ > - for_each_evictable_lru(l) > - if (nr[l] < min) > - min = nr[l]; > - > - /* Normalise the scan counts so kswapd scans proportionally */ > - for_each_evictable_lru(l) > - nr[l] -= min; > - } > -} > - > /* > * This is a basic per-zone page freer. Used by both kswapd and direct reclaim. > */ > @@ -1864,7 +1825,9 @@ static void shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc) > enum lru_list lru; > unsigned long nr_reclaimed = 0; > unsigned long nr_to_reclaim = sc->nr_to_reclaim; > + unsigned long min; > struct blk_plug plug; > + bool scan_adjusted = false; > > get_scan_count(lruvec, sc, nr); > > @@ -1881,7 +1844,33 @@ static void shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc) > } > } > > - recalculate_scan_count(nr_reclaimed, nr_to_reclaim, nr); > + if (nr_reclaimed < nr_to_reclaim || scan_adjusted) > + continue; > + > + /* > + * For global direct reclaim, reclaim only the number of pages > + * requested. Less care is taken to scan proportionally as it > + * is more important to minimise direct reclaim stall latency > + * than it is to properly age the LRU lists. > + */ > + if (global_reclaim(sc) && !current_is_kswapd()) > + break; > + > + /* > + * For kswapd and memcg, reclaim at least the number of pages > + * requested. However, ensure that LRUs shrink by the > + * proportion requested by get_scan_count() so vm.swappiness > + * is obeyed. Find the smallest LRU list and normalise the > + * scan counts so the fewest number of pages are reclaimed > + * while still maintaining proportionality. > + */ > + min = ULONG_MAX; > + for_each_evictable_lru(lru) > + if (nr[lru] < min) > + min = nr[lru]; > + for_each_evictable_lru(lru) > + nr[lru] -= min; > + scan_adjusted = true; > } > blk_finish_plug(&plug); > sc->nr_reclaimed += nr_reclaimed; -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 02/10] mm: vmscan: Obey proportional scanning requirements for kswapd 2013-03-21 15:07 ` Michal Hocko @ 2013-03-21 15:34 ` Mel Gorman 2013-03-22 7:54 ` Michal Hocko 0 siblings, 1 reply; 44+ messages in thread From: Mel Gorman @ 2013-03-21 15:34 UTC (permalink / raw) To: Michal Hocko Cc: Linux-MM, Jiri Slaby, Valdis Kletnieks, Rik van Riel, Zlatko Calusic, Johannes Weiner, dormando, Satoru Moriya, LKML On Thu, Mar 21, 2013 at 04:07:55PM +0100, Michal Hocko wrote: > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > > > index 4835a7a..182ff15 100644 > > > > --- a/mm/vmscan.c > > > > +++ b/mm/vmscan.c > > > > @@ -1815,6 +1815,45 @@ out: > > > > } > > > > } > > > > > > > > +static void recalculate_scan_count(unsigned long nr_reclaimed, > > > > + unsigned long nr_to_reclaim, > > > > + unsigned long nr[NR_LRU_LISTS]) > > > > +{ > > > > + enum lru_list l; > > > > + > > > > + /* > > > > + * For direct reclaim, reclaim the number of pages requested. Less > > > > + * care is taken to ensure that scanning for each LRU is properly > > > > + * proportional. This is unfortunate and is improper aging but > > > > + * minimises the amount of time a process is stalled. > > > > + */ > > > > + if (!current_is_kswapd()) { > > > > + if (nr_reclaimed >= nr_to_reclaim) { > > > > + for_each_evictable_lru(l) > > > > + nr[l] = 0; > > > > + } > > > > + return; > > > > > > Heh, this is nicely cryptically said what could be done in shrink_lruvec > > > as > > > if (!current_is_kswapd()) { > > > if (nr_reclaimed >= nr_to_reclaim) > > > break; > > > } > > > > > > > Pretty much. At one point during development, this function was more > > complex and it evolved into this without me rechecking if splitting it > > out still made sense. > > > > > Besides that this is not memcg aware which I think it would break > > > targeted reclaim which is kind of direct reclaim but it still would be > > > good to stay proportional because it starts with DEF_PRIORITY. > > > > > > > This does break memcg because it's a special sort of direct reclaim. > > > > > I would suggest moving this back to shrink_lruvec and update the test as > > > follows: > > > > I also noticed that we check whether the scan counts need to be > > normalised more than once > > I didn't mind this because it "disqualified" at least one LRU every > round which sounds reasonable to me because all LRUs would be scanned > proportionally. Once the scan count for one LRU is 0 then min will always be 0 and no further adjustment is made. It's just redundant to check again. > E.g. if swappiness is 0 then nr[anon] would be 0 and > then the active/inactive aging would break? Or am I missing something? > If swappiness is 0 and nr[anon] is zero then the number of pages to scan from every other LRU will never be adjusted. I do not see how this would affect active/inactive scanning but maybe I'm misunderstanding you. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 02/10] mm: vmscan: Obey proportional scanning requirements for kswapd 2013-03-21 15:34 ` Mel Gorman @ 2013-03-22 7:54 ` Michal Hocko 2013-03-22 8:37 ` Mel Gorman 0 siblings, 1 reply; 44+ messages in thread From: Michal Hocko @ 2013-03-22 7:54 UTC (permalink / raw) To: Mel Gorman Cc: Linux-MM, Jiri Slaby, Valdis Kletnieks, Rik van Riel, Zlatko Calusic, Johannes Weiner, dormando, Satoru Moriya, LKML On Thu 21-03-13 15:34:42, Mel Gorman wrote: > On Thu, Mar 21, 2013 at 04:07:55PM +0100, Michal Hocko wrote: > > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > > > > index 4835a7a..182ff15 100644 > > > > > --- a/mm/vmscan.c > > > > > +++ b/mm/vmscan.c > > > > > @@ -1815,6 +1815,45 @@ out: > > > > > } > > > > > } > > > > > > > > > > +static void recalculate_scan_count(unsigned long nr_reclaimed, > > > > > + unsigned long nr_to_reclaim, > > > > > + unsigned long nr[NR_LRU_LISTS]) > > > > > +{ > > > > > + enum lru_list l; > > > > > + > > > > > + /* > > > > > + * For direct reclaim, reclaim the number of pages requested. Less > > > > > + * care is taken to ensure that scanning for each LRU is properly > > > > > + * proportional. This is unfortunate and is improper aging but > > > > > + * minimises the amount of time a process is stalled. > > > > > + */ > > > > > + if (!current_is_kswapd()) { > > > > > + if (nr_reclaimed >= nr_to_reclaim) { > > > > > + for_each_evictable_lru(l) > > > > > + nr[l] = 0; > > > > > + } > > > > > + return; > > > > > > > > Heh, this is nicely cryptically said what could be done in shrink_lruvec > > > > as > > > > if (!current_is_kswapd()) { > > > > if (nr_reclaimed >= nr_to_reclaim) > > > > break; > > > > } > > > > > > > > > > Pretty much. At one point during development, this function was more > > > complex and it evolved into this without me rechecking if splitting it > > > out still made sense. > > > > > > > Besides that this is not memcg aware which I think it would break > > > > targeted reclaim which is kind of direct reclaim but it still would be > > > > good to stay proportional because it starts with DEF_PRIORITY. > > > > > > > > > > This does break memcg because it's a special sort of direct reclaim. > > > > > > > I would suggest moving this back to shrink_lruvec and update the test as > > > > follows: > > > > > > I also noticed that we check whether the scan counts need to be > > > normalised more than once > > > > I didn't mind this because it "disqualified" at least one LRU every > > round which sounds reasonable to me because all LRUs would be scanned > > proportionally. > > Once the scan count for one LRU is 0 then min will always be 0 and no > further adjustment is made. It's just redundant to check again. Hmm, I was almost sure I wrote that min should be adjusted only if it is >0 in the first loop but it is not there... So for real this time. for_each_evictable_lru(l) if (nr[l] && nr[l] < min) min = nr[l]; This should work, no? Everytime you shrink all LRUs you and you have reclaimed enough already you get the smallest LRU out of game. This should keep proportions evenly. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 02/10] mm: vmscan: Obey proportional scanning requirements for kswapd 2013-03-22 7:54 ` Michal Hocko @ 2013-03-22 8:37 ` Mel Gorman 2013-03-22 10:04 ` Michal Hocko 0 siblings, 1 reply; 44+ messages in thread From: Mel Gorman @ 2013-03-22 8:37 UTC (permalink / raw) To: Michal Hocko Cc: Linux-MM, Jiri Slaby, Valdis Kletnieks, Rik van Riel, Zlatko Calusic, Johannes Weiner, dormando, Satoru Moriya, LKML On Fri, Mar 22, 2013 at 08:54:27AM +0100, Michal Hocko wrote: > On Thu 21-03-13 15:34:42, Mel Gorman wrote: > > On Thu, Mar 21, 2013 at 04:07:55PM +0100, Michal Hocko wrote: > > > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > > > > > index 4835a7a..182ff15 100644 > > > > > > --- a/mm/vmscan.c > > > > > > +++ b/mm/vmscan.c > > > > > > @@ -1815,6 +1815,45 @@ out: > > > > > > } > > > > > > } > > > > > > > > > > > > +static void recalculate_scan_count(unsigned long nr_reclaimed, > > > > > > + unsigned long nr_to_reclaim, > > > > > > + unsigned long nr[NR_LRU_LISTS]) > > > > > > +{ > > > > > > + enum lru_list l; > > > > > > + > > > > > > + /* > > > > > > + * For direct reclaim, reclaim the number of pages requested. Less > > > > > > + * care is taken to ensure that scanning for each LRU is properly > > > > > > + * proportional. This is unfortunate and is improper aging but > > > > > > + * minimises the amount of time a process is stalled. > > > > > > + */ > > > > > > + if (!current_is_kswapd()) { > > > > > > + if (nr_reclaimed >= nr_to_reclaim) { > > > > > > + for_each_evictable_lru(l) > > > > > > + nr[l] = 0; > > > > > > + } > > > > > > + return; > > > > > > > > > > Heh, this is nicely cryptically said what could be done in shrink_lruvec > > > > > as > > > > > if (!current_is_kswapd()) { > > > > > if (nr_reclaimed >= nr_to_reclaim) > > > > > break; > > > > > } > > > > > > > > > > > > > Pretty much. At one point during development, this function was more > > > > complex and it evolved into this without me rechecking if splitting it > > > > out still made sense. > > > > > > > > > Besides that this is not memcg aware which I think it would break > > > > > targeted reclaim which is kind of direct reclaim but it still would be > > > > > good to stay proportional because it starts with DEF_PRIORITY. > > > > > > > > > > > > > This does break memcg because it's a special sort of direct reclaim. > > > > > > > > > I would suggest moving this back to shrink_lruvec and update the test as > > > > > follows: > > > > > > > > I also noticed that we check whether the scan counts need to be > > > > normalised more than once > > > > > > I didn't mind this because it "disqualified" at least one LRU every > > > round which sounds reasonable to me because all LRUs would be scanned > > > proportionally. > > > > Once the scan count for one LRU is 0 then min will always be 0 and no > > further adjustment is made. It's just redundant to check again. > > Hmm, I was almost sure I wrote that min should be adjusted only if it is >0 > in the first loop but it is not there... > > So for real this time. > for_each_evictable_lru(l) > if (nr[l] && nr[l] < min) > min = nr[l]; > > This should work, no? Everytime you shrink all LRUs you and you have > reclaimed enough already you get the smallest LRU out of game. This > should keep proportions evenly. Lets say we started like this LRU_INACTIVE_ANON 60 LRU_ACTIVE_FILE 1000 LRU_INACTIVE_FILE 3000 and we've reclaimed nr_to_reclaim pages then we recalculate the number of pages to scan from each list as; LRU_INACTIVE_ANON 0 LRU_ACTIVE_FILE 940 LRU_INACTIVE_FILE 2940 We then shrink SWAP_CLUSTER_MAX from each LRU giving us this. LRU_INACTIVE_ANON 0 LRU_ACTIVE_FILE 908 LRU_INACTIVE_FILE 2908 Then under your suggestion this would be recalculated as LRU_INACTIVE_ANON 0 LRU_ACTIVE_FILE 0 LRU_INACTIVE_FILE 2000 another SWAP_CLUSTER_MAX reclaims and then it stops we stop reclaiming. I might still be missing the point of your suggestion but I do not think it would preserve the proportion of pages we reclaim from the anon or file LRUs. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 02/10] mm: vmscan: Obey proportional scanning requirements for kswapd 2013-03-22 8:37 ` Mel Gorman @ 2013-03-22 10:04 ` Michal Hocko 2013-03-22 10:47 ` Michal Hocko 0 siblings, 1 reply; 44+ messages in thread From: Michal Hocko @ 2013-03-22 10:04 UTC (permalink / raw) To: Mel Gorman Cc: Linux-MM, Jiri Slaby, Valdis Kletnieks, Rik van Riel, Zlatko Calusic, Johannes Weiner, dormando, Satoru Moriya, LKML On Fri 22-03-13 08:37:04, Mel Gorman wrote: > On Fri, Mar 22, 2013 at 08:54:27AM +0100, Michal Hocko wrote: > > On Thu 21-03-13 15:34:42, Mel Gorman wrote: > > > On Thu, Mar 21, 2013 at 04:07:55PM +0100, Michal Hocko wrote: > > > > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > > > > > > index 4835a7a..182ff15 100644 > > > > > > > --- a/mm/vmscan.c > > > > > > > +++ b/mm/vmscan.c > > > > > > > @@ -1815,6 +1815,45 @@ out: > > > > > > > } > > > > > > > } > > > > > > > > > > > > > > +static void recalculate_scan_count(unsigned long nr_reclaimed, > > > > > > > + unsigned long nr_to_reclaim, > > > > > > > + unsigned long nr[NR_LRU_LISTS]) > > > > > > > +{ > > > > > > > + enum lru_list l; > > > > > > > + > > > > > > > + /* > > > > > > > + * For direct reclaim, reclaim the number of pages requested. Less > > > > > > > + * care is taken to ensure that scanning for each LRU is properly > > > > > > > + * proportional. This is unfortunate and is improper aging but > > > > > > > + * minimises the amount of time a process is stalled. > > > > > > > + */ > > > > > > > + if (!current_is_kswapd()) { > > > > > > > + if (nr_reclaimed >= nr_to_reclaim) { > > > > > > > + for_each_evictable_lru(l) > > > > > > > + nr[l] = 0; > > > > > > > + } > > > > > > > + return; > > > > > > > > > > > > Heh, this is nicely cryptically said what could be done in shrink_lruvec > > > > > > as > > > > > > if (!current_is_kswapd()) { > > > > > > if (nr_reclaimed >= nr_to_reclaim) > > > > > > break; > > > > > > } > > > > > > > > > > > > > > > > Pretty much. At one point during development, this function was more > > > > > complex and it evolved into this without me rechecking if splitting it > > > > > out still made sense. > > > > > > > > > > > Besides that this is not memcg aware which I think it would break > > > > > > targeted reclaim which is kind of direct reclaim but it still would be > > > > > > good to stay proportional because it starts with DEF_PRIORITY. > > > > > > > > > > > > > > > > This does break memcg because it's a special sort of direct reclaim. > > > > > > > > > > > I would suggest moving this back to shrink_lruvec and update the test as > > > > > > follows: > > > > > > > > > > I also noticed that we check whether the scan counts need to be > > > > > normalised more than once > > > > > > > > I didn't mind this because it "disqualified" at least one LRU every > > > > round which sounds reasonable to me because all LRUs would be scanned > > > > proportionally. > > > > > > Once the scan count for one LRU is 0 then min will always be 0 and no > > > further adjustment is made. It's just redundant to check again. > > > > Hmm, I was almost sure I wrote that min should be adjusted only if it is >0 > > in the first loop but it is not there... > > > > So for real this time. > > for_each_evictable_lru(l) > > if (nr[l] && nr[l] < min) > > min = nr[l]; > > > > This should work, no? Everytime you shrink all LRUs you and you have > > reclaimed enough already you get the smallest LRU out of game. This > > should keep proportions evenly. > > Lets say we started like this > > LRU_INACTIVE_ANON 60 > LRU_ACTIVE_FILE 1000 > LRU_INACTIVE_FILE 3000 > > and we've reclaimed nr_to_reclaim pages then we recalculate the number > of pages to scan from each list as; > > LRU_INACTIVE_ANON 0 > LRU_ACTIVE_FILE 940 > LRU_INACTIVE_FILE 2940 > > We then shrink SWAP_CLUSTER_MAX from each LRU giving us this. > > LRU_INACTIVE_ANON 0 > LRU_ACTIVE_FILE 908 > LRU_INACTIVE_FILE 2908 > > Then under your suggestion this would be recalculated as > > LRU_INACTIVE_ANON 0 > LRU_ACTIVE_FILE 0 > LRU_INACTIVE_FILE 2000 > > another SWAP_CLUSTER_MAX reclaims and then it stops we stop reclaiming. I > might still be missing the point of your suggestion but I do not think it > would preserve the proportion of pages we reclaim from the anon or file LRUs. It wouldn't preserve proportion precisely because each reclaim round is in SWAP_CLUSTER_MAX units but it would reclaim bigger lists more than smaller ones which I thought was the whole point. So yes using word "proportionally" is unfortunate but I didn't find out better one. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 02/10] mm: vmscan: Obey proportional scanning requirements for kswapd 2013-03-22 10:04 ` Michal Hocko @ 2013-03-22 10:47 ` Michal Hocko 0 siblings, 0 replies; 44+ messages in thread From: Michal Hocko @ 2013-03-22 10:47 UTC (permalink / raw) To: Mel Gorman Cc: Linux-MM, Jiri Slaby, Valdis Kletnieks, Rik van Riel, Zlatko Calusic, Johannes Weiner, dormando, Satoru Moriya, LKML On Fri 22-03-13 11:04:49, Michal Hocko wrote: > On Fri 22-03-13 08:37:04, Mel Gorman wrote: > > On Fri, Mar 22, 2013 at 08:54:27AM +0100, Michal Hocko wrote: > > > On Thu 21-03-13 15:34:42, Mel Gorman wrote: > > > > On Thu, Mar 21, 2013 at 04:07:55PM +0100, Michal Hocko wrote: > > > > > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > > > > > > > index 4835a7a..182ff15 100644 > > > > > > > > --- a/mm/vmscan.c > > > > > > > > +++ b/mm/vmscan.c > > > > > > > > @@ -1815,6 +1815,45 @@ out: > > > > > > > > } > > > > > > > > } > > > > > > > > > > > > > > > > +static void recalculate_scan_count(unsigned long nr_reclaimed, > > > > > > > > + unsigned long nr_to_reclaim, > > > > > > > > + unsigned long nr[NR_LRU_LISTS]) > > > > > > > > +{ > > > > > > > > + enum lru_list l; > > > > > > > > + > > > > > > > > + /* > > > > > > > > + * For direct reclaim, reclaim the number of pages requested. Less > > > > > > > > + * care is taken to ensure that scanning for each LRU is properly > > > > > > > > + * proportional. This is unfortunate and is improper aging but > > > > > > > > + * minimises the amount of time a process is stalled. > > > > > > > > + */ > > > > > > > > + if (!current_is_kswapd()) { > > > > > > > > + if (nr_reclaimed >= nr_to_reclaim) { > > > > > > > > + for_each_evictable_lru(l) > > > > > > > > + nr[l] = 0; > > > > > > > > + } > > > > > > > > + return; > > > > > > > > > > > > > > Heh, this is nicely cryptically said what could be done in shrink_lruvec > > > > > > > as > > > > > > > if (!current_is_kswapd()) { > > > > > > > if (nr_reclaimed >= nr_to_reclaim) > > > > > > > break; > > > > > > > } > > > > > > > > > > > > > > > > > > > Pretty much. At one point during development, this function was more > > > > > > complex and it evolved into this without me rechecking if splitting it > > > > > > out still made sense. > > > > > > > > > > > > > Besides that this is not memcg aware which I think it would break > > > > > > > targeted reclaim which is kind of direct reclaim but it still would be > > > > > > > good to stay proportional because it starts with DEF_PRIORITY. > > > > > > > > > > > > > > > > > > > This does break memcg because it's a special sort of direct reclaim. > > > > > > > > > > > > > I would suggest moving this back to shrink_lruvec and update the test as > > > > > > > follows: > > > > > > > > > > > > I also noticed that we check whether the scan counts need to be > > > > > > normalised more than once > > > > > > > > > > I didn't mind this because it "disqualified" at least one LRU every > > > > > round which sounds reasonable to me because all LRUs would be scanned > > > > > proportionally. > > > > > > > > Once the scan count for one LRU is 0 then min will always be 0 and no > > > > further adjustment is made. It's just redundant to check again. > > > > > > Hmm, I was almost sure I wrote that min should be adjusted only if it is >0 > > > in the first loop but it is not there... > > > > > > So for real this time. > > > for_each_evictable_lru(l) > > > if (nr[l] && nr[l] < min) > > > min = nr[l]; > > > > > > This should work, no? Everytime you shrink all LRUs you and you have > > > reclaimed enough already you get the smallest LRU out of game. This > > > should keep proportions evenly. > > > > Lets say we started like this > > > > LRU_INACTIVE_ANON 60 > > LRU_ACTIVE_FILE 1000 > > LRU_INACTIVE_FILE 3000 > > > > and we've reclaimed nr_to_reclaim pages then we recalculate the number > > of pages to scan from each list as; > > > > LRU_INACTIVE_ANON 0 > > LRU_ACTIVE_FILE 940 > > LRU_INACTIVE_FILE 2940 > > > > We then shrink SWAP_CLUSTER_MAX from each LRU giving us this. > > > > LRU_INACTIVE_ANON 0 > > LRU_ACTIVE_FILE 908 > > LRU_INACTIVE_FILE 2908 > > > > Then under your suggestion this would be recalculated as > > > > LRU_INACTIVE_ANON 0 > > LRU_ACTIVE_FILE 0 > > LRU_INACTIVE_FILE 2000 > > > > another SWAP_CLUSTER_MAX reclaims and then it stops we stop reclaiming. I > > might still be missing the point of your suggestion but I do not think it > > would preserve the proportion of pages we reclaim from the anon or file LRUs. > > It wouldn't preserve proportion precisely because each reclaim round is > in SWAP_CLUSTER_MAX units but it would reclaim bigger lists more than > smaller ones which I thought was the whole point. So yes using word > "proportionally" is unfortunate but I didn't find out better one. OK, I have obviosly missed that you are not breaking out of the loop if scan_adjusted. Now that I am looking at the updated patch again you just do if (nr_reclaimed < nr_to_reclaim || scan_adjusted) continue; So I thouught you would just do one round of reclaim after nr_reclaimed >= nr_to_reclaim which din't feel right to me. Sorry about the confusion! -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 02/10] mm: vmscan: Obey proportional scanning requirements for kswapd 2013-03-17 13:04 ` [PATCH 02/10] mm: vmscan: Obey proportional scanning requirements for kswapd Mel Gorman ` (2 preceding siblings ...) 2013-03-21 14:01 ` Michal Hocko @ 2013-03-21 16:25 ` Johannes Weiner 2013-03-21 18:02 ` Mel Gorman 3 siblings, 1 reply; 44+ messages in thread From: Johannes Weiner @ 2013-03-21 16:25 UTC (permalink / raw) To: Mel Gorman Cc: Linux-MM, Jiri Slaby, Valdis Kletnieks, Rik van Riel, Zlatko Calusic, dormando, Satoru Moriya, Michal Hocko, LKML On Sun, Mar 17, 2013 at 01:04:08PM +0000, Mel Gorman wrote: > Simplistically, the anon and file LRU lists are scanned proportionally > depending on the value of vm.swappiness although there are other factors > taken into account by get_scan_count(). The patch "mm: vmscan: Limit > the number of pages kswapd reclaims" limits the number of pages kswapd > reclaims but it breaks this proportional scanning and may evenly shrink > anon/file LRUs regardless of vm.swappiness. > > This patch preserves the proportional scanning and reclaim. It does mean > that kswapd will reclaim more than requested but the number of pages will > be related to the high watermark. Swappiness is about page types, but this implementation compares all LRUs against each other, and I'm not convinced that this makes sense as there is no guaranteed balance between the inactive and active lists. For example, the active file LRU could get knocked out when it's almost empty while the inactive file LRU has more easy cache than the anon lists combined. Would it be better to compare the sum of file pages with the sum of anon pages and then knock out the smaller pair? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 02/10] mm: vmscan: Obey proportional scanning requirements for kswapd 2013-03-21 16:25 ` Johannes Weiner @ 2013-03-21 18:02 ` Mel Gorman 2013-03-22 16:53 ` Johannes Weiner 0 siblings, 1 reply; 44+ messages in thread From: Mel Gorman @ 2013-03-21 18:02 UTC (permalink / raw) To: Johannes Weiner Cc: Linux-MM, Jiri Slaby, Valdis Kletnieks, Rik van Riel, Zlatko Calusic, dormando, Satoru Moriya, Michal Hocko, LKML On Thu, Mar 21, 2013 at 12:25:18PM -0400, Johannes Weiner wrote: > On Sun, Mar 17, 2013 at 01:04:08PM +0000, Mel Gorman wrote: > > Simplistically, the anon and file LRU lists are scanned proportionally > > depending on the value of vm.swappiness although there are other factors > > taken into account by get_scan_count(). The patch "mm: vmscan: Limit > > the number of pages kswapd reclaims" limits the number of pages kswapd > > reclaims but it breaks this proportional scanning and may evenly shrink > > anon/file LRUs regardless of vm.swappiness. > > > > This patch preserves the proportional scanning and reclaim. It does mean > > that kswapd will reclaim more than requested but the number of pages will > > be related to the high watermark. > > Swappiness is about page types, but this implementation compares all > LRUs against each other, and I'm not convinced that this makes sense > as there is no guaranteed balance between the inactive and active > lists. For example, the active file LRU could get knocked out when > it's almost empty while the inactive file LRU has more easy cache than > the anon lists combined. > Ok, I see your point. I think Michal was making the same point but I failed to understand it the first time around. > Would it be better to compare the sum of file pages with the sum of > anon pages and then knock out the smaller pair? Yes, it makes more sense but the issue then becomes how can we do that sensibly, The following is straight-forward and roughly in line with your suggestion but it does not preseve the scanning ratio between active and inactive of the remaining LRU lists. /* * For kswapd and memcg, reclaim at least the number of pages * requested. Ensure that the anon and file LRUs shrink * proportionally what was requested by get_scan_count(). We * stop reclaiming one LRU and reduce the amount scanning * required on the other. */ nr_file = nr[LRU_INACTIVE_FILE] + nr[LRU_ACTIVE_FILE]; nr_anon = nr[LRU_INACTIVE_ANON] + nr[LRU_ACTIVE_ANON]; if (nr_file > nr_anon) { nr[LRU_INACTIVE_FILE] -= min(nr_anon, nr[LRU_INACTIVE_FILE]); nr[LRU_ACTIVE_FILE] -= min(nr_anon, nr[LRU_ACTIVE_FILE]); nr[LRU_INACTIVE_ANON] = nr[LRU_ACTIVE_ANON] = 0; } else { nr[LRU_INACTIVE_ANON] -= min(nr_file, nr[LRU_INACTIVE_ANON]); nr[LRU_ACTIVE_ANON] -= min(nr_file, nr[LRU_ACTIVE_ANON]); nr[LRU_INACTIVE_FILE] = nr[LRU_ACTIVE_FILE] = 0; } scan_adjusted = true; Preserving the ratio gets complicated and to avoid excessive branching, it ends up looking like the following untested code. /* * For kswapd and memcg, reclaim at least the number of pages * requested. Ensure that the anon and file LRUs shrink * proportionally what was requested by get_scan_count(). We * stop reclaiming one LRU and reduce the amount scanning * required on the other preserving the ratio between the * active/inactive lists. * * Start by preparing to shrink the larger of the LRUs by * the size of the smaller list. */ nr_file = nr[LRU_INACTIVE_FILE] + nr[LRU_ACTIVE_FILE]; nr_anon = nr[LRU_INACTIVE_ANON] + nr[LRU_ACTIVE_ANON]; nr_shrink = (nr_file > nr_anon) ? nr_anon : nr_file; lru = (nr_file > nr_anon) ? LRU_FILE : 0; /* Work out the ratio of the inactive/active list */ top = min(nr[LRU_ACTIVE + lru], nr[lru]); bottom = max(nr[LRU_ACTIVE + lru], nr[lru]); percentage = top * 100 / bottom; nr_fraction = nr_shrink * percentage / 100; nr_remaining = nr_anon - nr_fraction; /* Reduce the remaining pages to scan proportionally */ if (nr[LRU_ACTIVE + lru] > nr[lru]) { nr[LRU_ACTIVE + lru] -= min(nr_remaining, nr[LRU_ACTIVE + lru]); nr[lru] -= min(nr_fraction, nr[lru]); } else { nr[LRU_ACTIVE + lru] -= min(nr_fraction, nr[LRU_ACTIVE + lru]); nr[lru] -= min(nr_remaining, nr[lru]); } /* Stop scanning the smaller LRU */ lru = (lru == LRU_FILE) ? LRU_BASE : LRU_FILE; nr[LRU_ACTIVE + lru] = 0; nr[lru] = 0; Is this what you had in mind or had you something simplier in mind? -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 02/10] mm: vmscan: Obey proportional scanning requirements for kswapd 2013-03-21 18:02 ` Mel Gorman @ 2013-03-22 16:53 ` Johannes Weiner 2013-03-22 18:25 ` Mel Gorman 0 siblings, 1 reply; 44+ messages in thread From: Johannes Weiner @ 2013-03-22 16:53 UTC (permalink / raw) To: Mel Gorman Cc: Linux-MM, Jiri Slaby, Valdis Kletnieks, Rik van Riel, Zlatko Calusic, dormando, Satoru Moriya, Michal Hocko, LKML On Thu, Mar 21, 2013 at 06:02:38PM +0000, Mel Gorman wrote: > On Thu, Mar 21, 2013 at 12:25:18PM -0400, Johannes Weiner wrote: > > On Sun, Mar 17, 2013 at 01:04:08PM +0000, Mel Gorman wrote: > > > Simplistically, the anon and file LRU lists are scanned proportionally > > > depending on the value of vm.swappiness although there are other factors > > > taken into account by get_scan_count(). The patch "mm: vmscan: Limit > > > the number of pages kswapd reclaims" limits the number of pages kswapd > > > reclaims but it breaks this proportional scanning and may evenly shrink > > > anon/file LRUs regardless of vm.swappiness. > > > > > > This patch preserves the proportional scanning and reclaim. It does mean > > > that kswapd will reclaim more than requested but the number of pages will > > > be related to the high watermark. > > > > Swappiness is about page types, but this implementation compares all > > LRUs against each other, and I'm not convinced that this makes sense > > as there is no guaranteed balance between the inactive and active > > lists. For example, the active file LRU could get knocked out when > > it's almost empty while the inactive file LRU has more easy cache than > > the anon lists combined. > > > > Ok, I see your point. I think Michal was making the same point but I > failed to understand it the first time around. > > > Would it be better to compare the sum of file pages with the sum of > > anon pages and then knock out the smaller pair? > > Yes, it makes more sense but the issue then becomes how can we do that > sensibly, The following is straight-forward and roughly in line with your > suggestion but it does not preseve the scanning ratio between active and > inactive of the remaining LRU lists. After thinking more about it, I wonder if subtracting absolute values of one LRU goal from the other is right to begin with, because the anon/file balance percentage is applied to individual LRU sizes, and these sizes are not necessarily comparable. Consider an unbalanced case of 64 file and 32768 anon pages targetted. If the balance is 70% file and 30% anon, we will scan 70% of those 64 file pages and 30% of the 32768 anon pages. Say we decide to bail after one iteration of 32 file pages reclaimed. We would have scanned only 50% of the targetted file pages, but subtracting those remaining 32 leaves us with 99% of the targetted anon pages. So would it make sense to determine the percentage scanned of the type that we stop scanning, then scale the original goal of the remaining LRUs to that percentage, and scan the remainder? In the above example, we'd determine we scanned 50% of the targetted file pages, so we reduce the anon inactive and active goals to 50% of their original values, then scan the difference between those reduced goals and the pages already scanned. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 02/10] mm: vmscan: Obey proportional scanning requirements for kswapd 2013-03-22 16:53 ` Johannes Weiner @ 2013-03-22 18:25 ` Mel Gorman 2013-03-22 19:09 ` Johannes Weiner 0 siblings, 1 reply; 44+ messages in thread From: Mel Gorman @ 2013-03-22 18:25 UTC (permalink / raw) To: Johannes Weiner Cc: Linux-MM, Jiri Slaby, Valdis Kletnieks, Rik van Riel, Zlatko Calusic, dormando, Satoru Moriya, Michal Hocko, LKML On Fri, Mar 22, 2013 at 12:53:49PM -0400, Johannes Weiner wrote: > On Thu, Mar 21, 2013 at 06:02:38PM +0000, Mel Gorman wrote: > > On Thu, Mar 21, 2013 at 12:25:18PM -0400, Johannes Weiner wrote: > > > On Sun, Mar 17, 2013 at 01:04:08PM +0000, Mel Gorman wrote: > > > > Simplistically, the anon and file LRU lists are scanned proportionally > > > > depending on the value of vm.swappiness although there are other factors > > > > taken into account by get_scan_count(). The patch "mm: vmscan: Limit > > > > the number of pages kswapd reclaims" limits the number of pages kswapd > > > > reclaims but it breaks this proportional scanning and may evenly shrink > > > > anon/file LRUs regardless of vm.swappiness. > > > > > > > > This patch preserves the proportional scanning and reclaim. It does mean > > > > that kswapd will reclaim more than requested but the number of pages will > > > > be related to the high watermark. > > > > > > Swappiness is about page types, but this implementation compares all > > > LRUs against each other, and I'm not convinced that this makes sense > > > as there is no guaranteed balance between the inactive and active > > > lists. For example, the active file LRU could get knocked out when > > > it's almost empty while the inactive file LRU has more easy cache than > > > the anon lists combined. > > > > > > > Ok, I see your point. I think Michal was making the same point but I > > failed to understand it the first time around. > > > > > Would it be better to compare the sum of file pages with the sum of > > > anon pages and then knock out the smaller pair? > > > > Yes, it makes more sense but the issue then becomes how can we do that > > sensibly, The following is straight-forward and roughly in line with your > > suggestion but it does not preseve the scanning ratio between active and > > inactive of the remaining LRU lists. > > After thinking more about it, I wonder if subtracting absolute values > of one LRU goal from the other is right to begin with, because the > anon/file balance percentage is applied to individual LRU sizes, and > these sizes are not necessarily comparable. > Good point and in itself it's not 100% clear that it's a good idea. If swappiness reflected the ratio of anon/file pages that were reflected then it's very easy to reason about. By our current definition, the rate at which anon or file pages get reclaimed adjusts as reclaim progresses. > <Snipped the example> > I agree and I see your point. > So would it make sense to determine the percentage scanned of the type > that we stop scanning, then scale the original goal of the remaining > LRUs to that percentage, and scan the remainder? > To preserve existing behaviour, that makes sense. I'm not convinced that it's necessarily the best idea but altering it would be beyond the scope of this series and bite off more than I'm willing to chew. This actually simplifies things a bit and shrink_lruvec turns into the (untested) code below. It does not do exact proportional scanning but I do not think it's necessary to either and is a useful enough approximation. It still could end up reclaiming much more than sc->nr_to_reclaim unfortunately but fixing it requires reworking how kswapd scans at different priorities. Is this closer to what you had in mind? static void shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc) { unsigned long nr[NR_LRU_LISTS]; unsigned long nr_to_scan; enum lru_list lru; unsigned long nr_reclaimed = 0; unsigned long nr_to_reclaim = sc->nr_to_reclaim; unsigned long nr_anon_scantarget, nr_file_scantarget; struct blk_plug plug; bool scan_adjusted = false; get_scan_count(lruvec, sc, nr); /* Record the original scan target for proportional adjustments later */ nr_file_scantarget = nr[LRU_INACTIVE_FILE] + nr[LRU_ACTIVE_FILE] + 1; nr_anon_scantarget = nr[LRU_INACTIVE_ANON] + nr[LRU_ACTIVE_ANON] + 1; blk_start_plug(&plug); while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] || nr[LRU_INACTIVE_FILE]) { unsigned long nr_anon, nr_file, percentage; for_each_evictable_lru(lru) { if (nr[lru]) { nr_to_scan = min(nr[lru], SWAP_CLUSTER_MAX); nr[lru] -= nr_to_scan; nr_reclaimed += shrink_list(lru, nr_to_scan, lruvec, sc); } } if (nr_reclaimed < nr_to_reclaim || scan_adjusted) continue; /* * For global direct reclaim, reclaim only the number of pages * requested. Less care is taken to scan proportionally as it * is more important to minimise direct reclaim stall latency * than it is to properly age the LRU lists. */ if (global_reclaim(sc) && !current_is_kswapd()) break; /* * For kswapd and memcg, reclaim at least the number of pages * requested. Ensure that the anon and file LRUs shrink * proportionally what was requested by get_scan_count(). We * stop reclaiming one LRU and reduce the amount scanning * proportional to the original scan target. */ nr_file = nr[LRU_INACTIVE_FILE] + nr[LRU_ACTIVE_FILE]; nr_anon = nr[LRU_INACTIVE_ANON] + nr[LRU_ACTIVE_ANON]; if (nr_file > nr_anon) { lru = LRU_BASE; percentage = nr_anon * 100 / nr_anon_scantarget; } else { lru = LRU_FILE; percentage = nr_file * 100 / nr_file_scantarget; } /* Stop scanning the smaller of the LRU */ nr[lru] = 0; nr[lru + LRU_ACTIVE] = 0; /* Reduce scanning of the other LRU proportionally */ lru = (lru == LRU_FILE) ? LRU_BASE : LRU_FILE; nr[lru] = nr[lru] * percentage / 100;; nr[lru + LRU_ACTIVE] = nr[lru + LRU_ACTIVE] * percentage / 100; scan_adjusted = true; } blk_finish_plug(&plug); sc->nr_reclaimed += nr_reclaimed; /* * Even if we did not try to evict anon pages at all, we want to * rebalance the anon lru active/inactive ratio. */ if (inactive_anon_is_low(lruvec)) shrink_active_list(SWAP_CLUSTER_MAX, lruvec, sc, LRU_ACTIVE_ANON); throttle_vm_writeout(sc->gfp_mask); } -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 02/10] mm: vmscan: Obey proportional scanning requirements for kswapd 2013-03-22 18:25 ` Mel Gorman @ 2013-03-22 19:09 ` Johannes Weiner 2013-03-22 19:46 ` Mel Gorman 0 siblings, 1 reply; 44+ messages in thread From: Johannes Weiner @ 2013-03-22 19:09 UTC (permalink / raw) To: Mel Gorman Cc: Linux-MM, Jiri Slaby, Valdis Kletnieks, Rik van Riel, Zlatko Calusic, dormando, Satoru Moriya, Michal Hocko, LKML On Fri, Mar 22, 2013 at 06:25:56PM +0000, Mel Gorman wrote: > On Fri, Mar 22, 2013 at 12:53:49PM -0400, Johannes Weiner wrote: > > So would it make sense to determine the percentage scanned of the type > > that we stop scanning, then scale the original goal of the remaining > > LRUs to that percentage, and scan the remainder? > > To preserve existing behaviour, that makes sense. I'm not convinced that > it's necessarily the best idea but altering it would be beyond the scope > of this series and bite off more than I'm willing to chew. This actually > simplifies things a bit and shrink_lruvec turns into the (untested) code > below. It does not do exact proportional scanning but I do not think it's > necessary to either and is a useful enough approximation. It still could > end up reclaiming much more than sc->nr_to_reclaim unfortunately but fixing > it requires reworking how kswapd scans at different priorities. In which way does it not do exact proportional scanning? I commented on one issue below, but maybe you were referring to something else. Yes, it's a little unfortunate that we escalate to a gigantic scan window first, and then have to contort ourselves in the process of backing off gracefully after we reclaimed a few pages... > Is this closer to what you had in mind? > > static void shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc) > { > unsigned long nr[NR_LRU_LISTS]; > unsigned long nr_to_scan; > enum lru_list lru; > unsigned long nr_reclaimed = 0; > unsigned long nr_to_reclaim = sc->nr_to_reclaim; > unsigned long nr_anon_scantarget, nr_file_scantarget; > struct blk_plug plug; > bool scan_adjusted = false; > > get_scan_count(lruvec, sc, nr); > > /* Record the original scan target for proportional adjustments later */ > nr_file_scantarget = nr[LRU_INACTIVE_FILE] + nr[LRU_ACTIVE_FILE] + 1; > nr_anon_scantarget = nr[LRU_INACTIVE_ANON] + nr[LRU_ACTIVE_ANON] + 1; > > blk_start_plug(&plug); > while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] || > nr[LRU_INACTIVE_FILE]) { > unsigned long nr_anon, nr_file, percentage; > > for_each_evictable_lru(lru) { > if (nr[lru]) { > nr_to_scan = min(nr[lru], SWAP_CLUSTER_MAX); > nr[lru] -= nr_to_scan; > > nr_reclaimed += shrink_list(lru, nr_to_scan, > lruvec, sc); > } > } > > if (nr_reclaimed < nr_to_reclaim || scan_adjusted) > continue; > > /* > * For global direct reclaim, reclaim only the number of pages > * requested. Less care is taken to scan proportionally as it > * is more important to minimise direct reclaim stall latency > * than it is to properly age the LRU lists. > */ > if (global_reclaim(sc) && !current_is_kswapd()) > break; > > /* > * For kswapd and memcg, reclaim at least the number of pages > * requested. Ensure that the anon and file LRUs shrink > * proportionally what was requested by get_scan_count(). We > * stop reclaiming one LRU and reduce the amount scanning > * proportional to the original scan target. > */ > nr_file = nr[LRU_INACTIVE_FILE] + nr[LRU_ACTIVE_FILE]; > nr_anon = nr[LRU_INACTIVE_ANON] + nr[LRU_ACTIVE_ANON]; > > if (nr_file > nr_anon) { > lru = LRU_BASE; > percentage = nr_anon * 100 / nr_anon_scantarget; > } else { > lru = LRU_FILE; > percentage = nr_file * 100 / nr_file_scantarget; > } > > /* Stop scanning the smaller of the LRU */ > nr[lru] = 0; > nr[lru + LRU_ACTIVE] = 0; > > /* Reduce scanning of the other LRU proportionally */ > lru = (lru == LRU_FILE) ? LRU_BASE : LRU_FILE; > nr[lru] = nr[lru] * percentage / 100;; > nr[lru + LRU_ACTIVE] = nr[lru + LRU_ACTIVE] * percentage / 100; The percentage is taken from the original goal but then applied to the remainder of scan goal for the LRUs we continue scanning. The more pages that have already been scanned, the more inaccurate this gets. Is that what you had in mind with useful enough approximation? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 02/10] mm: vmscan: Obey proportional scanning requirements for kswapd 2013-03-22 19:09 ` Johannes Weiner @ 2013-03-22 19:46 ` Mel Gorman 0 siblings, 0 replies; 44+ messages in thread From: Mel Gorman @ 2013-03-22 19:46 UTC (permalink / raw) To: Johannes Weiner Cc: Linux-MM, Jiri Slaby, Valdis Kletnieks, Rik van Riel, Zlatko Calusic, dormando, Satoru Moriya, Michal Hocko, LKML On Fri, Mar 22, 2013 at 03:09:02PM -0400, Johannes Weiner wrote: > > To preserve existing behaviour, that makes sense. I'm not convinced that > > it's necessarily the best idea but altering it would be beyond the scope > > of this series and bite off more than I'm willing to chew. This actually > > simplifies things a bit and shrink_lruvec turns into the (untested) code > > below. It does not do exact proportional scanning but I do not think it's > > necessary to either and is a useful enough approximation. It still could > > end up reclaiming much more than sc->nr_to_reclaim unfortunately but fixing > > it requires reworking how kswapd scans at different priorities. > > In which way does it not do exact proportional scanning? I commented > on one issue below, but maybe you were referring to something else. > You guessed what I was referring to correctly. > Yes, it's a little unfortunate that we escalate to a gigantic scan > window first, and then have to contort ourselves in the process of > backing off gracefully after we reclaimed a few pages... > The next patch "mm: vmscan: Flatten kswapd priority loop" mitigates the problem slightly by improving how kswapd controls when priority gets raised. It's not perfect though, lots of pages under writeback at the tail of the LRU will still raise the priority quickly. > > Is this closer to what you had in mind? > > > > static void shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc) > > { > > unsigned long nr[NR_LRU_LISTS]; > > unsigned long nr_to_scan; > > enum lru_list lru; > > unsigned long nr_reclaimed = 0; > > unsigned long nr_to_reclaim = sc->nr_to_reclaim; > > unsigned long nr_anon_scantarget, nr_file_scantarget; > > struct blk_plug plug; > > bool scan_adjusted = false; > > > > get_scan_count(lruvec, sc, nr); > > > > /* Record the original scan target for proportional adjustments later */ > > nr_file_scantarget = nr[LRU_INACTIVE_FILE] + nr[LRU_ACTIVE_FILE] + 1; > > nr_anon_scantarget = nr[LRU_INACTIVE_ANON] + nr[LRU_ACTIVE_ANON] + 1; > > > > blk_start_plug(&plug); > > while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] || > > nr[LRU_INACTIVE_FILE]) { > > unsigned long nr_anon, nr_file, percentage; > > > > for_each_evictable_lru(lru) { > > if (nr[lru]) { > > nr_to_scan = min(nr[lru], SWAP_CLUSTER_MAX); > > nr[lru] -= nr_to_scan; > > > > nr_reclaimed += shrink_list(lru, nr_to_scan, > > lruvec, sc); > > } > > } > > > > if (nr_reclaimed < nr_to_reclaim || scan_adjusted) > > continue; > > > > /* > > * For global direct reclaim, reclaim only the number of pages > > * requested. Less care is taken to scan proportionally as it > > * is more important to minimise direct reclaim stall latency > > * than it is to properly age the LRU lists. > > */ > > if (global_reclaim(sc) && !current_is_kswapd()) > > break; > > > > /* > > * For kswapd and memcg, reclaim at least the number of pages > > * requested. Ensure that the anon and file LRUs shrink > > * proportionally what was requested by get_scan_count(). We > > * stop reclaiming one LRU and reduce the amount scanning > > * proportional to the original scan target. > > */ > > nr_file = nr[LRU_INACTIVE_FILE] + nr[LRU_ACTIVE_FILE]; > > nr_anon = nr[LRU_INACTIVE_ANON] + nr[LRU_ACTIVE_ANON]; > > > > if (nr_file > nr_anon) { > > lru = LRU_BASE; > > percentage = nr_anon * 100 / nr_anon_scantarget; > > } else { > > lru = LRU_FILE; > > percentage = nr_file * 100 / nr_file_scantarget; > > } > > > > /* Stop scanning the smaller of the LRU */ > > nr[lru] = 0; > > nr[lru + LRU_ACTIVE] = 0; > > > > /* Reduce scanning of the other LRU proportionally */ > > lru = (lru == LRU_FILE) ? LRU_BASE : LRU_FILE; > > nr[lru] = nr[lru] * percentage / 100;; > > nr[lru + LRU_ACTIVE] = nr[lru + LRU_ACTIVE] * percentage / 100; > > The percentage is taken from the original goal but then applied to the > remainder of scan goal for the LRUs we continue scanning. The more > pages that have already been scanned, the more inaccurate this gets. > Is that what you had in mind with useful enough approximation? Yes. I could record the original scan rates, recalculate as a percentage and then do something like nr[lru] = min(nr[lru], origin_nr[lru] * percentage / 100) but it was not obvious that the result would be any better. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
end of thread, other threads:[~2013-04-18 16:56 UTC | newest] Thread overview: 44+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-04-11 19:57 [PATCH 0/10] Reduce system disruption due to kswapd V3 Mel Gorman 2013-04-11 19:57 ` [PATCH 01/10] mm: vmscan: Limit the number of pages kswapd reclaims at each priority Mel Gorman 2013-04-11 19:57 ` [PATCH 02/10] mm: vmscan: Obey proportional scanning requirements for kswapd Mel Gorman 2013-04-18 15:01 ` Johannes Weiner 2013-04-18 15:58 ` Mel Gorman 2013-04-11 19:57 ` [PATCH 03/10] mm: vmscan: Flatten kswapd priority loop Mel Gorman 2013-04-18 15:02 ` Johannes Weiner 2013-04-11 19:57 ` [PATCH 04/10] mm: vmscan: Decide whether to compact the pgdat based on reclaim progress Mel Gorman 2013-04-18 15:09 ` Johannes Weiner 2013-04-11 19:57 ` [PATCH 05/10] mm: vmscan: Do not allow kswapd to scan at maximum priority Mel Gorman 2013-04-18 15:11 ` Johannes Weiner 2013-04-11 19:57 ` [PATCH 06/10] mm: vmscan: Have kswapd writeback pages based on dirty pages encountered, not priority Mel Gorman 2013-04-18 15:16 ` Johannes Weiner 2013-04-11 19:57 ` [PATCH 07/10] mm: vmscan: Block kswapd if it is encountering pages under writeback Mel Gorman 2013-04-11 19:57 ` [PATCH 08/10] mm: vmscan: Have kswapd shrink slab only once per priority Mel Gorman 2013-04-18 16:43 ` Johannes Weiner 2013-04-11 19:57 ` [PATCH 09/10] mm: vmscan: Check if kswapd should writepage once per pgdat scan Mel Gorman 2013-04-18 16:44 ` Johannes Weiner 2013-04-11 19:57 ` [PATCH 10/10] mm: vmscan: Move logic from balance_pgdat() to kswapd_shrink_zone() Mel Gorman 2013-04-18 16:56 ` Johannes Weiner -- strict thread matches above, loose matches on Subject: below -- 2013-04-09 11:06 [PATCH 0/10] Reduce system disruption due to kswapd V2 Mel Gorman 2013-04-09 11:06 ` [PATCH 02/10] mm: vmscan: Obey proportional scanning requirements for kswapd Mel Gorman 2013-04-10 7:16 ` Kamezawa Hiroyuki 2013-04-10 14:08 ` Mel Gorman 2013-04-11 0:14 ` Kamezawa Hiroyuki 2013-04-11 9:09 ` Mel Gorman 2013-03-17 13:04 [RFC PATCH 0/8] Reduce system disruption due to kswapd Mel Gorman 2013-03-17 13:04 ` [PATCH 02/10] mm: vmscan: Obey proportional scanning requirements for kswapd Mel Gorman 2013-03-17 14:39 ` Andi Kleen 2013-03-17 15:08 ` Mel Gorman 2013-03-21 1:10 ` Rik van Riel 2013-03-21 9:54 ` Mel Gorman 2013-03-21 14:01 ` Michal Hocko 2013-03-21 14:31 ` Mel Gorman 2013-03-21 15:07 ` Michal Hocko 2013-03-21 15:34 ` Mel Gorman 2013-03-22 7:54 ` Michal Hocko 2013-03-22 8:37 ` Mel Gorman 2013-03-22 10:04 ` Michal Hocko 2013-03-22 10:47 ` Michal Hocko 2013-03-21 16:25 ` Johannes Weiner 2013-03-21 18:02 ` Mel Gorman 2013-03-22 16:53 ` Johannes Weiner 2013-03-22 18:25 ` Mel Gorman 2013-03-22 19:09 ` Johannes Weiner 2013-03-22 19:46 ` Mel Gorman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).