* [RFC PATCH 0/2] Removal of lumpy reclaim @ 2012-03-28 16:06 Mel Gorman 2012-03-28 16:06 ` [PATCH 1/2] mm: vmscan: Remove " Mel Gorman ` (2 more replies) 0 siblings, 3 replies; 14+ messages in thread From: Mel Gorman @ 2012-03-28 16:06 UTC (permalink / raw) To: Linux-MM, LKML Cc: Andrew Morton, Rik van Riel, Konstantin Khlebnikov, Hugh Dickins, Mel Gorman (cc'ing active people in the thread "[patch 68/92] mm: forbid lumpy-reclaim in shrink_active_list()") In the interest of keeping my fingers from the flames at LSF/MM, I'm releasing an RFC for lumpy reclaim removal. The first patch removes removes lumpy reclaim itself and the second removes reclaim_mode_t. They can be merged together but the resulting patch is harder to review. The patches are based on commit e22057c8599373e5caef0bc42bdb95d2a361ab0d which is after Andrew's tree was merged but before 3.4-rc1 is released. Roughly 1K of text is removed, over 200 lines of code and struct scan_control is smaller. text data bss dec hex filename 6723455 1931304 2260992 10915751 a68fa7 vmlinux-3.3.0-git 6722303 1931304 2260992 10914599 a68b27 vmlinux-3.3.0-lumpyremove-v1 There are behaviour changes caused by the series with details in the patches themselves. I ran some preliminary tests but coverage is shaky due to time constraints. The kernels tested were 3.2.0 Vanilla 3.2.0 kernel 3.3.0-git Commit e22057c which will be part of 3.4-rc1 3.3.0-lumpyremove These two patches fs-mark running in a threaded configuration showed nothing useful postmark had interesting results. I know postmark is not very useful as a mail server benchmark but it pushes page reclaim in a manner that is useful from a testing perspective. Regressions in page reclaim can result in regressions in postmark when the WSS for postmark is larger than physical memory. POSTMARK 3.2.0-vanilla 3.3.0-git lumpyremove-v1r3 Transactions per second: 16.00 ( 0.00%) 19.00 (18.75%) 19.00 (18.75%) Data megabytes read per second: 18.62 ( 0.00%) 23.18 (24.49%) 22.56 (21.16%) Data megabytes written per second: 35.49 ( 0.00%) 44.18 (24.49%) 42.99 (21.13%) Files created alone per second: 26.00 ( 0.00%) 35.00 (34.62%) 34.00 (30.77%) Files create/transact per second: 8.00 ( 0.00%) 9.00 (12.50%) 9.00 (12.50%) Files deleted alone per second: 680.00 ( 0.00%) 6124.00 (800.59%) 2041.00 (200.15%) Files delete/transact per second: 8.00 ( 0.00%) 9.00 (12.50%) 9.00 (12.50%) MMTests Statistics: duration Sys Time Running Test (seconds) 119.61 111.16 111.40 User+Sys Time Running Test (seconds) 153.19 144.13 143.29 Total Elapsed Time (seconds) 1171.34 940.97 966.97 MMTests Statistics: vmstat Page Ins 13797412 13734736 13731792 Page Outs 43284036 42959856 42744668 Swap Ins 7751 0 0 Swap Outs 9617 0 0 Direct pages scanned 334395 0 0 Kswapd pages scanned 9664358 9933599 9929577 Kswapd pages reclaimed 9621893 9932913 9928893 Direct pages reclaimed 334395 0 0 Kswapd efficiency 99% 99% 99% Kswapd velocity 8250.686 10556.765 10268.754 Direct efficiency 100% 100% 100% Direct velocity 285.481 0.000 0.000 Percentage direct scans 3% 0% 0% Page writes by reclaim 9619 0 0 Page writes file 2 0 0 Page writes anon 9617 0 0 Page reclaim immediate 7 0 0 Page rescued immediate 0 0 0 Slabs scanned 38912 38912 38912 Direct inode steals 0 0 0 Kswapd inode steals 154304 160972 158444 Kswapd skipped wait 0 0 0 THP fault alloc 4 4 4 THP collapse alloc 0 0 0 THP splits 3 0 0 THP fault fallback 0 0 0 THP collapse fail 0 0 0 Compaction stalls 1 0 0 Compaction success 1 0 0 Compaction failures 0 0 0 Compaction pages moved 0 0 0 Compaction move failure 0 0 0 It looks like 3.3.0-git is better in general although that "Files deleted alone per second" looks like an anomaly. Removing lumpy reclaim fully affects things a bit but not enough to be of concern as monitoring was running at the same time which disrupts results. Dirty pages were not being encountered at the end of the LRU so the behaviour change related to THP allocations stalling on dirty pages would not be triggered. Note that swap in/out, direct reclaim and page writes from reclaim dropped to 0 between 3.2.0 and 3.3.0-git. According to a range of results I have for mainline kernels between 2.6.32 and 3.3.0 on a different machine, this swap in/out and direct reclaim problem was introduced after 3.0 and fixed by 3.3.0 with 3.1.x and 3.2.x both showing swap in/out, direct reclaim and page writes from reclaim. If I had to guess, it was fixed by commits e0887c19, fe4b1b24 and 0cee34fd but I did not double check[1]. Removing direct reclaim does not make an obvious difference but note that THP was barely used at all in this benchmark. Benchmarks that stress both page reclaim and THP at the same time in a meaningful manner are thin on the ground. A benchmark that DD writes a large file also showed nothing interesting but I was not really expecting it to. The test looks for problems related to a large linear writer and removing lumpy reclaim was unlikely to affect it. I ran a benchmark that stressed high-order allocation. This is very artifical load but was used in the past to evaluate lumpy reclaim and compaction. Generally I look at allocation success rates and latency figures. STRESS-HIGHALLOC 3.2.0-vanilla 3.3.0-git lumpyremove-v1r3 Pass 1 82.00 ( 0.00%) 27.00 (-55.00%) 32.00 (-50.00%) Pass 2 70.00 ( 0.00%) 37.00 (-33.00%) 40.00 (-30.00%) while Rested 90.00 ( 0.00%) 88.00 (-2.00%) 88.00 (-2.00%) MMTests Statistics: duration Sys Time Running Test (seconds) 735.12 688.13 683.91 User+Sys Time Running Test (seconds) 2764.46 3278.45 3271.41 Total Elapsed Time (seconds) 1204.41 1140.29 1137.58 MMTests Statistics: vmstat Page Ins 5426648 2840348 2695120 Page Outs 7206376 7854516 7860408 Swap Ins 36799 0 0 Swap Outs 76903 4 0 Direct pages scanned 31981 43749 160647 Kswapd pages scanned 26658682 1285341 1195956 Kswapd pages reclaimed 2248583 1271621 1178420 Direct pages reclaimed 6397 14416 94093 Kswapd efficiency 8% 98% 98% Kswapd velocity 22134.225 1127.205 1051.316 Direct efficiency 20% 32% 58% Direct velocity 26.553 38.367 141.218 Percentage direct scans 0% 3% 11% Page writes by reclaim 6530481 4 0 Page writes file 6453578 0 0 Page writes anon 76903 4 0 Page reclaim immediate 256742 17832 61576 Page rescued immediate 0 0 0 Slabs scanned 1073152 971776 975872 Direct inode steals 0 196279 205178 Kswapd inode steals 139260 70390 64323 Kswapd skipped wait 21711 1 0 THP fault alloc 1 126 143 THP collapse alloc 324 294 224 THP splits 32 8 10 THP fault fallback 0 0 0 THP collapse fail 5 6 7 Compaction stalls 364 1312 1324 Compaction success 255 343 366 Compaction failures 109 969 958 Compaction pages moved 265107 3952630 4489215 Compaction move failure 7493 26038 24739 Success rates are completely hosed for 3.4-rc1 which is almost certainly due to [fe2c2a10: vmscan: reclaim at order 0 when compaction is enabled]. I expected this would happen for kswapd and impair allocation success rates (https://lkml.org/lkml/2012/1/25/166) but I did not anticipate this much a difference: 95% less scanning, 43% less reclaim by kswapd In comparison, reclaim/compaction is not aggressive and gives up easily which is the intended behaviour. hugetlbfs uses __GFP_REPEAT and would be much more aggressive about reclaim/compaction than THP allocations are. The stress test above is allocating like neither THP or hugetlbfs but is much closer to THP. Mainline is now impared in terms of high order allocation under heavy load although I do not know to what degree as I did not test with __GFP_REPEAT. Still, keep it in mind for bugs related to hugepage pool resizing, THP allocation and high order atomic allocation failures from network devices. Despite this, I think we should merge the patches in this series. The stress tests were very useful when the main user was hugetlb pool resizing and when rattling out bugs in memory compaction but are now too unrealistic to draw solid conclusions from. They need to be replaced but that should not delay the lumpy reclaim removal. I'd appreciate it people took a look at the patches and see if there was anything I missed. [1] Where are these results you say? They are generated using MM Tests to see what negative trends could be identified. They are still in the process of running. I've had limited time to dig through the data. include/trace/events/vmscan.h | 40 ++----- mm/vmscan.c | 263 ++++------------------------------------- 2 files changed, 37 insertions(+), 266 deletions(-) -- 1.7.9.2 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 1/2] mm: vmscan: Remove lumpy reclaim 2012-03-28 16:06 [RFC PATCH 0/2] Removal of lumpy reclaim Mel Gorman @ 2012-03-28 16:06 ` Mel Gorman 2012-04-06 23:52 ` Ying Han 2012-03-28 16:06 ` [PATCH 2/2] mm: vmscan: Remove reclaim_mode_t Mel Gorman 2012-04-06 19:34 ` [RFC PATCH 0/2] Removal of lumpy reclaim Andrew Morton 2 siblings, 1 reply; 14+ messages in thread From: Mel Gorman @ 2012-03-28 16:06 UTC (permalink / raw) To: Linux-MM, LKML Cc: Andrew Morton, Rik van Riel, Konstantin Khlebnikov, Hugh Dickins, Mel Gorman Lumpy reclaim had a purpose but in the mind of some, it was to kick the system so hard it trashed. For others the purpose was to complicate vmscan.c. Over time it was giving softer shoes and a nicer attitude but memory compaction needs to step up and replace it so this patch sends lumpy reclaim to the farm. Here are the important notes related to the patch. 1. The tracepoint format changes for isolating LRU pages. 2. This patch stops reclaim/compaction entering sync reclaim as this was only intended for lumpy reclaim and an oversight. Page migration has its own logic for stalling on writeback pages if necessary and memory compaction is already using it. This is a behaviour change. 3. RECLAIM_MODE_SYNC no longer exists. pageout() does not stall on PageWriteback with CONFIG_COMPACTION has been this way for a while. I am calling it out in case this is a surpise to people. This behaviour avoids a situation where we wait on a page being written back to slow storage like USB. Currently we depend on wait_iff_congested() for throttling if if too many dirty pages are scanned. 4. Reclaim/compaction can no longer queue dirty pages in pageout() if the underlying BDI is congested. Lumpy reclaim used this logic and reclaim/compaction was using it in error. This is a behaviour change. Signed-off-by: Mel Gorman <mgorman@suse.de> --- include/trace/events/vmscan.h | 36 ++----- mm/vmscan.c | 209 +++-------------------------------------- 2 files changed, 22 insertions(+), 223 deletions(-) diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h index f64560e..6f60b33 100644 --- a/include/trace/events/vmscan.h +++ b/include/trace/events/vmscan.h @@ -13,7 +13,7 @@ #define RECLAIM_WB_ANON 0x0001u #define RECLAIM_WB_FILE 0x0002u #define RECLAIM_WB_MIXED 0x0010u -#define RECLAIM_WB_SYNC 0x0004u +#define RECLAIM_WB_SYNC 0x0004u /* Unused, all reclaim async */ #define RECLAIM_WB_ASYNC 0x0008u #define show_reclaim_flags(flags) \ @@ -27,13 +27,13 @@ #define trace_reclaim_flags(page, sync) ( \ (page_is_file_cache(page) ? RECLAIM_WB_FILE : RECLAIM_WB_ANON) | \ - (sync & RECLAIM_MODE_SYNC ? RECLAIM_WB_SYNC : RECLAIM_WB_ASYNC) \ + (RECLAIM_WB_ASYNC) \ ) #define trace_shrink_flags(file, sync) ( \ - (sync & RECLAIM_MODE_SYNC ? RECLAIM_WB_MIXED : \ - (file ? RECLAIM_WB_FILE : RECLAIM_WB_ANON)) | \ - (sync & RECLAIM_MODE_SYNC ? RECLAIM_WB_SYNC : RECLAIM_WB_ASYNC) \ + ( \ + (file ? RECLAIM_WB_FILE : RECLAIM_WB_ANON) | \ + (RECLAIM_WB_ASYNC) \ ) TRACE_EVENT(mm_vmscan_kswapd_sleep, @@ -263,22 +263,16 @@ DECLARE_EVENT_CLASS(mm_vmscan_lru_isolate_template, unsigned long nr_requested, unsigned long nr_scanned, unsigned long nr_taken, - unsigned long nr_lumpy_taken, - unsigned long nr_lumpy_dirty, - unsigned long nr_lumpy_failed, isolate_mode_t isolate_mode, int file), - TP_ARGS(order, nr_requested, nr_scanned, nr_taken, nr_lumpy_taken, nr_lumpy_dirty, nr_lumpy_failed, isolate_mode, file), + TP_ARGS(order, nr_requested, nr_scanned, nr_taken, isolate_mode, file), TP_STRUCT__entry( __field(int, order) __field(unsigned long, nr_requested) __field(unsigned long, nr_scanned) __field(unsigned long, nr_taken) - __field(unsigned long, nr_lumpy_taken) - __field(unsigned long, nr_lumpy_dirty) - __field(unsigned long, nr_lumpy_failed) __field(isolate_mode_t, isolate_mode) __field(int, file) ), @@ -288,22 +282,16 @@ DECLARE_EVENT_CLASS(mm_vmscan_lru_isolate_template, __entry->nr_requested = nr_requested; __entry->nr_scanned = nr_scanned; __entry->nr_taken = nr_taken; - __entry->nr_lumpy_taken = nr_lumpy_taken; - __entry->nr_lumpy_dirty = nr_lumpy_dirty; - __entry->nr_lumpy_failed = nr_lumpy_failed; __entry->isolate_mode = isolate_mode; __entry->file = file; ), - TP_printk("isolate_mode=%d order=%d nr_requested=%lu nr_scanned=%lu nr_taken=%lu contig_taken=%lu contig_dirty=%lu contig_failed=%lu file=%d", + TP_printk("isolate_mode=%d order=%d nr_requested=%lu nr_scanned=%lu nr_taken=%lu file=%d", __entry->isolate_mode, __entry->order, __entry->nr_requested, __entry->nr_scanned, __entry->nr_taken, - __entry->nr_lumpy_taken, - __entry->nr_lumpy_dirty, - __entry->nr_lumpy_failed, __entry->file) ); @@ -313,13 +301,10 @@ DEFINE_EVENT(mm_vmscan_lru_isolate_template, mm_vmscan_lru_isolate, unsigned long nr_requested, unsigned long nr_scanned, unsigned long nr_taken, - unsigned long nr_lumpy_taken, - unsigned long nr_lumpy_dirty, - unsigned long nr_lumpy_failed, isolate_mode_t isolate_mode, int file), - TP_ARGS(order, nr_requested, nr_scanned, nr_taken, nr_lumpy_taken, nr_lumpy_dirty, nr_lumpy_failed, isolate_mode, file) + TP_ARGS(order, nr_requested, nr_scanned, nr_taken, isolate_mode, file) ); @@ -329,13 +314,10 @@ DEFINE_EVENT(mm_vmscan_lru_isolate_template, mm_vmscan_memcg_isolate, unsigned long nr_requested, unsigned long nr_scanned, unsigned long nr_taken, - unsigned long nr_lumpy_taken, - unsigned long nr_lumpy_dirty, - unsigned long nr_lumpy_failed, isolate_mode_t isolate_mode, int file), - TP_ARGS(order, nr_requested, nr_scanned, nr_taken, nr_lumpy_taken, nr_lumpy_dirty, nr_lumpy_failed, isolate_mode, file) + TP_ARGS(order, nr_requested, nr_scanned, nr_taken, isolate_mode, file) ); diff --git a/mm/vmscan.c b/mm/vmscan.c index 33c332b..68319e4 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -56,19 +56,11 @@ /* * reclaim_mode determines how the inactive list is shrunk * RECLAIM_MODE_SINGLE: Reclaim only order-0 pages - * RECLAIM_MODE_ASYNC: Do not block - * RECLAIM_MODE_SYNC: Allow blocking e.g. call wait_on_page_writeback - * RECLAIM_MODE_LUMPYRECLAIM: For high-order allocations, take a reference - * page from the LRU and reclaim all pages within a - * naturally aligned range * RECLAIM_MODE_COMPACTION: For high-order allocations, reclaim a number of * order-0 pages and then compact the zone */ typedef unsigned __bitwise__ reclaim_mode_t; #define RECLAIM_MODE_SINGLE ((__force reclaim_mode_t)0x01u) -#define RECLAIM_MODE_ASYNC ((__force reclaim_mode_t)0x02u) -#define RECLAIM_MODE_SYNC ((__force reclaim_mode_t)0x04u) -#define RECLAIM_MODE_LUMPYRECLAIM ((__force reclaim_mode_t)0x08u) #define RECLAIM_MODE_COMPACTION ((__force reclaim_mode_t)0x10u) struct scan_control { @@ -364,37 +356,23 @@ out: return ret; } -static void set_reclaim_mode(int priority, struct scan_control *sc, - bool sync) +static void set_reclaim_mode(int priority, struct scan_control *sc) { - reclaim_mode_t syncmode = sync ? RECLAIM_MODE_SYNC : RECLAIM_MODE_ASYNC; - /* - * Initially assume we are entering either lumpy reclaim or - * reclaim/compaction.Depending on the order, we will either set the - * sync mode or just reclaim order-0 pages later. - */ - if (COMPACTION_BUILD) - sc->reclaim_mode = RECLAIM_MODE_COMPACTION; - else - sc->reclaim_mode = RECLAIM_MODE_LUMPYRECLAIM; - - /* - * Avoid using lumpy reclaim or reclaim/compaction if possible by - * restricting when its set to either costly allocations or when + * Restrict reclaim/compaction to costly allocations or when * under memory pressure */ - if (sc->order > PAGE_ALLOC_COSTLY_ORDER) - sc->reclaim_mode |= syncmode; - else if (sc->order && priority < DEF_PRIORITY - 2) - sc->reclaim_mode |= syncmode; + if (COMPACTION_BUILD && sc->order && + (sc->order > PAGE_ALLOC_COSTLY_ORDER || + priority < DEF_PRIORITY - 2)) + sc->reclaim_mode = RECLAIM_MODE_COMPACTION; else - sc->reclaim_mode = RECLAIM_MODE_SINGLE | RECLAIM_MODE_ASYNC; + sc->reclaim_mode = RECLAIM_MODE_SINGLE; } static void reset_reclaim_mode(struct scan_control *sc) { - sc->reclaim_mode = RECLAIM_MODE_SINGLE | RECLAIM_MODE_ASYNC; + sc->reclaim_mode = RECLAIM_MODE_SINGLE; } static inline int is_page_cache_freeable(struct page *page) @@ -416,10 +394,6 @@ static int may_write_to_queue(struct backing_dev_info *bdi, return 1; if (bdi == current->backing_dev_info) return 1; - - /* lumpy reclaim for hugepage often need a lot of write */ - if (sc->order > PAGE_ALLOC_COSTLY_ORDER) - return 1; return 0; } @@ -710,10 +684,6 @@ static enum page_references page_check_references(struct page *page, referenced_ptes = page_referenced(page, 1, mz->mem_cgroup, &vm_flags); referenced_page = TestClearPageReferenced(page); - /* Lumpy reclaim - ignore references */ - if (sc->reclaim_mode & RECLAIM_MODE_LUMPYRECLAIM) - return PAGEREF_RECLAIM; - /* * Mlock lost the isolation race with us. Let try_to_unmap() * move the page to the unevictable list. @@ -813,19 +783,8 @@ static unsigned long shrink_page_list(struct list_head *page_list, if (PageWriteback(page)) { nr_writeback++; - /* - * Synchronous reclaim cannot queue pages for - * writeback due to the possibility of stack overflow - * but if it encounters a page under writeback, wait - * for the IO to complete. - */ - if ((sc->reclaim_mode & RECLAIM_MODE_SYNC) && - may_enter_fs) - wait_on_page_writeback(page); - else { - unlock_page(page); - goto keep_lumpy; - } + unlock_page(page); + goto keep; } references = page_check_references(page, mz, sc); @@ -908,7 +867,7 @@ static unsigned long shrink_page_list(struct list_head *page_list, goto activate_locked; case PAGE_SUCCESS: if (PageWriteback(page)) - goto keep_lumpy; + goto keep; if (PageDirty(page)) goto keep; @@ -1007,8 +966,6 @@ activate_locked: keep_locked: unlock_page(page); keep: - reset_reclaim_mode(sc); -keep_lumpy: list_add(&page->lru, &ret_pages); VM_BUG_ON(PageLRU(page) || PageUnevictable(page)); } @@ -1064,11 +1021,7 @@ int __isolate_lru_page(struct page *page, isolate_mode_t mode, int file) if (!all_lru_mode && !!page_is_file_cache(page) != file) return ret; - /* - * When this function is being called for lumpy reclaim, we - * initially look into all LRU pages, active, inactive and - * unevictable; only give shrink_page_list evictable pages. - */ + /* Do not give back unevictable pages for compaction */ if (PageUnevictable(page)) return ret; @@ -1153,9 +1106,6 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan, struct lruvec *lruvec; struct list_head *src; unsigned long nr_taken = 0; - unsigned long nr_lumpy_taken = 0; - unsigned long nr_lumpy_dirty = 0; - unsigned long nr_lumpy_failed = 0; unsigned long scan; int lru = LRU_BASE; @@ -1168,10 +1118,6 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan, for (scan = 0; scan < nr_to_scan && !list_empty(src); scan++) { struct page *page; - unsigned long pfn; - unsigned long end_pfn; - unsigned long page_pfn; - int zone_id; page = lru_to_page(src); prefetchw_prev_lru_page(page, src, flags); @@ -1193,84 +1139,6 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan, default: BUG(); } - - if (!sc->order || !(sc->reclaim_mode & RECLAIM_MODE_LUMPYRECLAIM)) - continue; - - /* - * Attempt to take all pages in the order aligned region - * surrounding the tag page. Only take those pages of - * the same active state as that tag page. We may safely - * round the target page pfn down to the requested order - * as the mem_map is guaranteed valid out to MAX_ORDER, - * where that page is in a different zone we will detect - * it from its zone id and abort this block scan. - */ - zone_id = page_zone_id(page); - page_pfn = page_to_pfn(page); - pfn = page_pfn & ~((1 << sc->order) - 1); - end_pfn = pfn + (1 << sc->order); - for (; pfn < end_pfn; pfn++) { - struct page *cursor_page; - - /* The target page is in the block, ignore it. */ - if (unlikely(pfn == page_pfn)) - continue; - - /* Avoid holes within the zone. */ - if (unlikely(!pfn_valid_within(pfn))) - break; - - cursor_page = pfn_to_page(pfn); - - /* Check that we have not crossed a zone boundary. */ - if (unlikely(page_zone_id(cursor_page) != zone_id)) - break; - - /* - * If we don't have enough swap space, reclaiming of - * anon page which don't already have a swap slot is - * pointless. - */ - if (nr_swap_pages <= 0 && PageSwapBacked(cursor_page) && - !PageSwapCache(cursor_page)) - break; - - if (__isolate_lru_page(cursor_page, mode, file) == 0) { - unsigned int isolated_pages; - - mem_cgroup_lru_del(cursor_page); - list_move(&cursor_page->lru, dst); - isolated_pages = hpage_nr_pages(cursor_page); - nr_taken += isolated_pages; - nr_lumpy_taken += isolated_pages; - if (PageDirty(cursor_page)) - nr_lumpy_dirty += isolated_pages; - scan++; - pfn += isolated_pages - 1; - } else { - /* - * Check if the page is freed already. - * - * We can't use page_count() as that - * requires compound_head and we don't - * have a pin on the page here. If a - * page is tail, we may or may not - * have isolated the head, so assume - * it's not free, it'd be tricky to - * track the head status without a - * page pin. - */ - if (!PageTail(cursor_page) && - !atomic_read(&cursor_page->_count)) - continue; - break; - } - } - - /* If we break out of the loop above, lumpy reclaim failed */ - if (pfn < end_pfn) - nr_lumpy_failed++; } *nr_scanned = scan; @@ -1278,7 +1146,6 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan, trace_mm_vmscan_lru_isolate(sc->order, nr_to_scan, scan, nr_taken, - nr_lumpy_taken, nr_lumpy_dirty, nr_lumpy_failed, mode, file); return nr_taken; } @@ -1454,47 +1321,6 @@ update_isolated_counts(struct mem_cgroup_zone *mz, } /* - * Returns true if a direct reclaim should wait on pages under writeback. - * - * If we are direct reclaiming for contiguous pages and we do not reclaim - * everything in the list, try again and wait for writeback IO to complete. - * This will stall high-order allocations noticeably. Only do that when really - * need to free the pages under high memory pressure. - */ -static inline bool should_reclaim_stall(unsigned long nr_taken, - unsigned long nr_freed, - int priority, - struct scan_control *sc) -{ - int lumpy_stall_priority; - - /* kswapd should not stall on sync IO */ - if (current_is_kswapd()) - return false; - - /* Only stall on lumpy reclaim */ - if (sc->reclaim_mode & RECLAIM_MODE_SINGLE) - return false; - - /* If we have reclaimed everything on the isolated list, no stall */ - if (nr_freed == nr_taken) - return false; - - /* - * For high-order allocations, there are two stall thresholds. - * High-cost allocations stall immediately where as lower - * order allocations such as stacks require the scanning - * priority to be much higher before stalling. - */ - if (sc->order > PAGE_ALLOC_COSTLY_ORDER) - lumpy_stall_priority = DEF_PRIORITY; - else - lumpy_stall_priority = DEF_PRIORITY / 3; - - return priority <= lumpy_stall_priority; -} - -/* * shrink_inactive_list() is a helper for shrink_zone(). It returns the number * of reclaimed pages */ @@ -1522,9 +1348,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct mem_cgroup_zone *mz, return SWAP_CLUSTER_MAX; } - set_reclaim_mode(priority, sc, false); - if (sc->reclaim_mode & RECLAIM_MODE_LUMPYRECLAIM) - isolate_mode |= ISOLATE_ACTIVE; + set_reclaim_mode(priority, sc); lru_add_drain(); @@ -1556,13 +1380,6 @@ shrink_inactive_list(unsigned long nr_to_scan, struct mem_cgroup_zone *mz, nr_reclaimed = shrink_page_list(&page_list, mz, sc, priority, &nr_dirty, &nr_writeback); - /* Check if we should syncronously wait for writeback */ - if (should_reclaim_stall(nr_taken, nr_reclaimed, priority, sc)) { - set_reclaim_mode(priority, sc, true); - nr_reclaimed += shrink_page_list(&page_list, mz, sc, - priority, &nr_dirty, &nr_writeback); - } - spin_lock_irq(&zone->lru_lock); reclaim_stat->recent_scanned[0] += nr_anon; -- 1.7.9.2 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] mm: vmscan: Remove lumpy reclaim 2012-03-28 16:06 ` [PATCH 1/2] mm: vmscan: Remove " Mel Gorman @ 2012-04-06 23:52 ` Ying Han 2012-04-10 8:24 ` Mel Gorman 0 siblings, 1 reply; 14+ messages in thread From: Ying Han @ 2012-04-06 23:52 UTC (permalink / raw) To: Mel Gorman Cc: Linux-MM, LKML, Andrew Morton, Rik van Riel, Konstantin Khlebnikov, Hugh Dickins On Wed, Mar 28, 2012 at 9:06 AM, Mel Gorman <mgorman@suse.de> wrote: > Lumpy reclaim had a purpose but in the mind of some, it was to kick > the system so hard it trashed. For others the purpose was to complicate > vmscan.c. Over time it was giving softer shoes and a nicer attitude but > memory compaction needs to step up and replace it so this patch sends > lumpy reclaim to the farm. > > Here are the important notes related to the patch. > > 1. The tracepoint format changes for isolating LRU pages. > > 2. This patch stops reclaim/compaction entering sync reclaim as this > was only intended for lumpy reclaim and an oversight. Page migration > has its own logic for stalling on writeback pages if necessary and > memory compaction is already using it. This is a behaviour change. > > 3. RECLAIM_MODE_SYNC no longer exists. pageout() does not stall > on PageWriteback with CONFIG_COMPACTION has been this way for a while. > I am calling it out in case this is a surpise to people. Mel, Can you point me the commit making that change? I am looking at v3.4-rc1 where set_reclaim_mode() still set RECLAIM_MODE_SYNC for COMPACTION_BUILD. --Ying This behaviour > avoids a situation where we wait on a page being written back to > slow storage like USB. Currently we depend on wait_iff_congested() > for throttling if if too many dirty pages are scanned. > > 4. Reclaim/compaction can no longer queue dirty pages in pageout() > if the underlying BDI is congested. Lumpy reclaim used this logic and > reclaim/compaction was using it in error. This is a behaviour change. > > Signed-off-by: Mel Gorman <mgorman@suse.de> > --- > include/trace/events/vmscan.h | 36 ++----- > mm/vmscan.c | 209 +++-------------------------------------- > 2 files changed, 22 insertions(+), 223 deletions(-) > > diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h > index f64560e..6f60b33 100644 > --- a/include/trace/events/vmscan.h > +++ b/include/trace/events/vmscan.h > @@ -13,7 +13,7 @@ > #define RECLAIM_WB_ANON 0x0001u > #define RECLAIM_WB_FILE 0x0002u > #define RECLAIM_WB_MIXED 0x0010u > -#define RECLAIM_WB_SYNC 0x0004u > +#define RECLAIM_WB_SYNC 0x0004u /* Unused, all reclaim async */ > #define RECLAIM_WB_ASYNC 0x0008u > > #define show_reclaim_flags(flags) \ > @@ -27,13 +27,13 @@ > > #define trace_reclaim_flags(page, sync) ( \ > (page_is_file_cache(page) ? RECLAIM_WB_FILE : RECLAIM_WB_ANON) | \ > - (sync & RECLAIM_MODE_SYNC ? RECLAIM_WB_SYNC : RECLAIM_WB_ASYNC) \ > + (RECLAIM_WB_ASYNC) \ > ) > > #define trace_shrink_flags(file, sync) ( \ > - (sync & RECLAIM_MODE_SYNC ? RECLAIM_WB_MIXED : \ > - (file ? RECLAIM_WB_FILE : RECLAIM_WB_ANON)) | \ > - (sync & RECLAIM_MODE_SYNC ? RECLAIM_WB_SYNC : RECLAIM_WB_ASYNC) \ > + ( \ > + (file ? RECLAIM_WB_FILE : RECLAIM_WB_ANON) | \ > + (RECLAIM_WB_ASYNC) \ > ) > > TRACE_EVENT(mm_vmscan_kswapd_sleep, > @@ -263,22 +263,16 @@ DECLARE_EVENT_CLASS(mm_vmscan_lru_isolate_template, > unsigned long nr_requested, > unsigned long nr_scanned, > unsigned long nr_taken, > - unsigned long nr_lumpy_taken, > - unsigned long nr_lumpy_dirty, > - unsigned long nr_lumpy_failed, > isolate_mode_t isolate_mode, > int file), > > - TP_ARGS(order, nr_requested, nr_scanned, nr_taken, nr_lumpy_taken, nr_lumpy_dirty, nr_lumpy_failed, isolate_mode, file), > + TP_ARGS(order, nr_requested, nr_scanned, nr_taken, isolate_mode, file), > > TP_STRUCT__entry( > __field(int, order) > __field(unsigned long, nr_requested) > __field(unsigned long, nr_scanned) > __field(unsigned long, nr_taken) > - __field(unsigned long, nr_lumpy_taken) > - __field(unsigned long, nr_lumpy_dirty) > - __field(unsigned long, nr_lumpy_failed) > __field(isolate_mode_t, isolate_mode) > __field(int, file) > ), > @@ -288,22 +282,16 @@ DECLARE_EVENT_CLASS(mm_vmscan_lru_isolate_template, > __entry->nr_requested = nr_requested; > __entry->nr_scanned = nr_scanned; > __entry->nr_taken = nr_taken; > - __entry->nr_lumpy_taken = nr_lumpy_taken; > - __entry->nr_lumpy_dirty = nr_lumpy_dirty; > - __entry->nr_lumpy_failed = nr_lumpy_failed; > __entry->isolate_mode = isolate_mode; > __entry->file = file; > ), > > - TP_printk("isolate_mode=%d order=%d nr_requested=%lu nr_scanned=%lu nr_taken=%lu contig_taken=%lu contig_dirty=%lu contig_failed=%lu file=%d", > + TP_printk("isolate_mode=%d order=%d nr_requested=%lu nr_scanned=%lu nr_taken=%lu file=%d", > __entry->isolate_mode, > __entry->order, > __entry->nr_requested, > __entry->nr_scanned, > __entry->nr_taken, > - __entry->nr_lumpy_taken, > - __entry->nr_lumpy_dirty, > - __entry->nr_lumpy_failed, > __entry->file) > ); > > @@ -313,13 +301,10 @@ DEFINE_EVENT(mm_vmscan_lru_isolate_template, mm_vmscan_lru_isolate, > unsigned long nr_requested, > unsigned long nr_scanned, > unsigned long nr_taken, > - unsigned long nr_lumpy_taken, > - unsigned long nr_lumpy_dirty, > - unsigned long nr_lumpy_failed, > isolate_mode_t isolate_mode, > int file), > > - TP_ARGS(order, nr_requested, nr_scanned, nr_taken, nr_lumpy_taken, nr_lumpy_dirty, nr_lumpy_failed, isolate_mode, file) > + TP_ARGS(order, nr_requested, nr_scanned, nr_taken, isolate_mode, file) > > ); > > @@ -329,13 +314,10 @@ DEFINE_EVENT(mm_vmscan_lru_isolate_template, mm_vmscan_memcg_isolate, > unsigned long nr_requested, > unsigned long nr_scanned, > unsigned long nr_taken, > - unsigned long nr_lumpy_taken, > - unsigned long nr_lumpy_dirty, > - unsigned long nr_lumpy_failed, > isolate_mode_t isolate_mode, > int file), > > - TP_ARGS(order, nr_requested, nr_scanned, nr_taken, nr_lumpy_taken, nr_lumpy_dirty, nr_lumpy_failed, isolate_mode, file) > + TP_ARGS(order, nr_requested, nr_scanned, nr_taken, isolate_mode, file) > > ); > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 33c332b..68319e4 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -56,19 +56,11 @@ > /* > * reclaim_mode determines how the inactive list is shrunk > * RECLAIM_MODE_SINGLE: Reclaim only order-0 pages > - * RECLAIM_MODE_ASYNC: Do not block > - * RECLAIM_MODE_SYNC: Allow blocking e.g. call wait_on_page_writeback > - * RECLAIM_MODE_LUMPYRECLAIM: For high-order allocations, take a reference > - * page from the LRU and reclaim all pages within a > - * naturally aligned range > * RECLAIM_MODE_COMPACTION: For high-order allocations, reclaim a number of > * order-0 pages and then compact the zone > */ > typedef unsigned __bitwise__ reclaim_mode_t; > #define RECLAIM_MODE_SINGLE ((__force reclaim_mode_t)0x01u) > -#define RECLAIM_MODE_ASYNC ((__force reclaim_mode_t)0x02u) > -#define RECLAIM_MODE_SYNC ((__force reclaim_mode_t)0x04u) > -#define RECLAIM_MODE_LUMPYRECLAIM ((__force reclaim_mode_t)0x08u) > #define RECLAIM_MODE_COMPACTION ((__force reclaim_mode_t)0x10u) > > struct scan_control { > @@ -364,37 +356,23 @@ out: > return ret; > } > > -static void set_reclaim_mode(int priority, struct scan_control *sc, > - bool sync) > +static void set_reclaim_mode(int priority, struct scan_control *sc) > { > - reclaim_mode_t syncmode = sync ? RECLAIM_MODE_SYNC : RECLAIM_MODE_ASYNC; > - > /* > - * Initially assume we are entering either lumpy reclaim or > - * reclaim/compaction.Depending on the order, we will either set the > - * sync mode or just reclaim order-0 pages later. > - */ > - if (COMPACTION_BUILD) > - sc->reclaim_mode = RECLAIM_MODE_COMPACTION; > - else > - sc->reclaim_mode = RECLAIM_MODE_LUMPYRECLAIM; > - > - /* > - * Avoid using lumpy reclaim or reclaim/compaction if possible by > - * restricting when its set to either costly allocations or when > + * Restrict reclaim/compaction to costly allocations or when > * under memory pressure > */ > - if (sc->order > PAGE_ALLOC_COSTLY_ORDER) > - sc->reclaim_mode |= syncmode; > - else if (sc->order && priority < DEF_PRIORITY - 2) > - sc->reclaim_mode |= syncmode; > + if (COMPACTION_BUILD && sc->order && > + (sc->order > PAGE_ALLOC_COSTLY_ORDER || > + priority < DEF_PRIORITY - 2)) > + sc->reclaim_mode = RECLAIM_MODE_COMPACTION; > else > - sc->reclaim_mode = RECLAIM_MODE_SINGLE | RECLAIM_MODE_ASYNC; > + sc->reclaim_mode = RECLAIM_MODE_SINGLE; > } > > static void reset_reclaim_mode(struct scan_control *sc) > { > - sc->reclaim_mode = RECLAIM_MODE_SINGLE | RECLAIM_MODE_ASYNC; > + sc->reclaim_mode = RECLAIM_MODE_SINGLE; > } > > static inline int is_page_cache_freeable(struct page *page) > @@ -416,10 +394,6 @@ static int may_write_to_queue(struct backing_dev_info *bdi, > return 1; > if (bdi == current->backing_dev_info) > return 1; > - > - /* lumpy reclaim for hugepage often need a lot of write */ > - if (sc->order > PAGE_ALLOC_COSTLY_ORDER) > - return 1; > return 0; > } > > @@ -710,10 +684,6 @@ static enum page_references page_check_references(struct page *page, > referenced_ptes = page_referenced(page, 1, mz->mem_cgroup, &vm_flags); > referenced_page = TestClearPageReferenced(page); > > - /* Lumpy reclaim - ignore references */ > - if (sc->reclaim_mode & RECLAIM_MODE_LUMPYRECLAIM) > - return PAGEREF_RECLAIM; > - > /* > * Mlock lost the isolation race with us. Let try_to_unmap() > * move the page to the unevictable list. > @@ -813,19 +783,8 @@ static unsigned long shrink_page_list(struct list_head *page_list, > > if (PageWriteback(page)) { > nr_writeback++; > - /* > - * Synchronous reclaim cannot queue pages for > - * writeback due to the possibility of stack overflow > - * but if it encounters a page under writeback, wait > - * for the IO to complete. > - */ > - if ((sc->reclaim_mode & RECLAIM_MODE_SYNC) && > - may_enter_fs) > - wait_on_page_writeback(page); > - else { > - unlock_page(page); > - goto keep_lumpy; > - } > + unlock_page(page); > + goto keep; > } > > references = page_check_references(page, mz, sc); > @@ -908,7 +867,7 @@ static unsigned long shrink_page_list(struct list_head *page_list, > goto activate_locked; > case PAGE_SUCCESS: > if (PageWriteback(page)) > - goto keep_lumpy; > + goto keep; > if (PageDirty(page)) > goto keep; > > @@ -1007,8 +966,6 @@ activate_locked: > keep_locked: > unlock_page(page); > keep: > - reset_reclaim_mode(sc); > -keep_lumpy: > list_add(&page->lru, &ret_pages); > VM_BUG_ON(PageLRU(page) || PageUnevictable(page)); > } > @@ -1064,11 +1021,7 @@ int __isolate_lru_page(struct page *page, isolate_mode_t mode, int file) > if (!all_lru_mode && !!page_is_file_cache(page) != file) > return ret; > > - /* > - * When this function is being called for lumpy reclaim, we > - * initially look into all LRU pages, active, inactive and > - * unevictable; only give shrink_page_list evictable pages. > - */ > + /* Do not give back unevictable pages for compaction */ > if (PageUnevictable(page)) > return ret; > > @@ -1153,9 +1106,6 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan, > struct lruvec *lruvec; > struct list_head *src; > unsigned long nr_taken = 0; > - unsigned long nr_lumpy_taken = 0; > - unsigned long nr_lumpy_dirty = 0; > - unsigned long nr_lumpy_failed = 0; > unsigned long scan; > int lru = LRU_BASE; > > @@ -1168,10 +1118,6 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan, > > for (scan = 0; scan < nr_to_scan && !list_empty(src); scan++) { > struct page *page; > - unsigned long pfn; > - unsigned long end_pfn; > - unsigned long page_pfn; > - int zone_id; > > page = lru_to_page(src); > prefetchw_prev_lru_page(page, src, flags); > @@ -1193,84 +1139,6 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan, > default: > BUG(); > } > - > - if (!sc->order || !(sc->reclaim_mode & RECLAIM_MODE_LUMPYRECLAIM)) > - continue; > - > - /* > - * Attempt to take all pages in the order aligned region > - * surrounding the tag page. Only take those pages of > - * the same active state as that tag page. We may safely > - * round the target page pfn down to the requested order > - * as the mem_map is guaranteed valid out to MAX_ORDER, > - * where that page is in a different zone we will detect > - * it from its zone id and abort this block scan. > - */ > - zone_id = page_zone_id(page); > - page_pfn = page_to_pfn(page); > - pfn = page_pfn & ~((1 << sc->order) - 1); > - end_pfn = pfn + (1 << sc->order); > - for (; pfn < end_pfn; pfn++) { > - struct page *cursor_page; > - > - /* The target page is in the block, ignore it. */ > - if (unlikely(pfn == page_pfn)) > - continue; > - > - /* Avoid holes within the zone. */ > - if (unlikely(!pfn_valid_within(pfn))) > - break; > - > - cursor_page = pfn_to_page(pfn); > - > - /* Check that we have not crossed a zone boundary. */ > - if (unlikely(page_zone_id(cursor_page) != zone_id)) > - break; > - > - /* > - * If we don't have enough swap space, reclaiming of > - * anon page which don't already have a swap slot is > - * pointless. > - */ > - if (nr_swap_pages <= 0 && PageSwapBacked(cursor_page) && > - !PageSwapCache(cursor_page)) > - break; > - > - if (__isolate_lru_page(cursor_page, mode, file) == 0) { > - unsigned int isolated_pages; > - > - mem_cgroup_lru_del(cursor_page); > - list_move(&cursor_page->lru, dst); > - isolated_pages = hpage_nr_pages(cursor_page); > - nr_taken += isolated_pages; > - nr_lumpy_taken += isolated_pages; > - if (PageDirty(cursor_page)) > - nr_lumpy_dirty += isolated_pages; > - scan++; > - pfn += isolated_pages - 1; > - } else { > - /* > - * Check if the page is freed already. > - * > - * We can't use page_count() as that > - * requires compound_head and we don't > - * have a pin on the page here. If a > - * page is tail, we may or may not > - * have isolated the head, so assume > - * it's not free, it'd be tricky to > - * track the head status without a > - * page pin. > - */ > - if (!PageTail(cursor_page) && > - !atomic_read(&cursor_page->_count)) > - continue; > - break; > - } > - } > - > - /* If we break out of the loop above, lumpy reclaim failed */ > - if (pfn < end_pfn) > - nr_lumpy_failed++; > } > > *nr_scanned = scan; > @@ -1278,7 +1146,6 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan, > trace_mm_vmscan_lru_isolate(sc->order, > nr_to_scan, scan, > nr_taken, > - nr_lumpy_taken, nr_lumpy_dirty, nr_lumpy_failed, > mode, file); > return nr_taken; > } > @@ -1454,47 +1321,6 @@ update_isolated_counts(struct mem_cgroup_zone *mz, > } > > /* > - * Returns true if a direct reclaim should wait on pages under writeback. > - * > - * If we are direct reclaiming for contiguous pages and we do not reclaim > - * everything in the list, try again and wait for writeback IO to complete. > - * This will stall high-order allocations noticeably. Only do that when really > - * need to free the pages under high memory pressure. > - */ > -static inline bool should_reclaim_stall(unsigned long nr_taken, > - unsigned long nr_freed, > - int priority, > - struct scan_control *sc) > -{ > - int lumpy_stall_priority; > - > - /* kswapd should not stall on sync IO */ > - if (current_is_kswapd()) > - return false; > - > - /* Only stall on lumpy reclaim */ > - if (sc->reclaim_mode & RECLAIM_MODE_SINGLE) > - return false; > - > - /* If we have reclaimed everything on the isolated list, no stall */ > - if (nr_freed == nr_taken) > - return false; > - > - /* > - * For high-order allocations, there are two stall thresholds. > - * High-cost allocations stall immediately where as lower > - * order allocations such as stacks require the scanning > - * priority to be much higher before stalling. > - */ > - if (sc->order > PAGE_ALLOC_COSTLY_ORDER) > - lumpy_stall_priority = DEF_PRIORITY; > - else > - lumpy_stall_priority = DEF_PRIORITY / 3; > - > - return priority <= lumpy_stall_priority; > -} > - > -/* > * shrink_inactive_list() is a helper for shrink_zone(). It returns the number > * of reclaimed pages > */ > @@ -1522,9 +1348,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct mem_cgroup_zone *mz, > return SWAP_CLUSTER_MAX; > } > > - set_reclaim_mode(priority, sc, false); > - if (sc->reclaim_mode & RECLAIM_MODE_LUMPYRECLAIM) > - isolate_mode |= ISOLATE_ACTIVE; > + set_reclaim_mode(priority, sc); > > lru_add_drain(); > > @@ -1556,13 +1380,6 @@ shrink_inactive_list(unsigned long nr_to_scan, struct mem_cgroup_zone *mz, > nr_reclaimed = shrink_page_list(&page_list, mz, sc, priority, > &nr_dirty, &nr_writeback); > > - /* Check if we should syncronously wait for writeback */ > - if (should_reclaim_stall(nr_taken, nr_reclaimed, priority, sc)) { > - set_reclaim_mode(priority, sc, true); > - nr_reclaimed += shrink_page_list(&page_list, mz, sc, > - priority, &nr_dirty, &nr_writeback); > - } > - > spin_lock_irq(&zone->lru_lock); > > reclaim_stat->recent_scanned[0] += nr_anon; > -- > 1.7.9.2 > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] mm: vmscan: Remove lumpy reclaim 2012-04-06 23:52 ` Ying Han @ 2012-04-10 8:24 ` Mel Gorman 2012-04-10 9:29 ` Mel Gorman 0 siblings, 1 reply; 14+ messages in thread From: Mel Gorman @ 2012-04-10 8:24 UTC (permalink / raw) To: Ying Han Cc: Linux-MM, LKML, Andrew Morton, Rik van Riel, Konstantin Khlebnikov, Hugh Dickins On Fri, Apr 06, 2012 at 04:52:09PM -0700, Ying Han wrote: > On Wed, Mar 28, 2012 at 9:06 AM, Mel Gorman <mgorman@suse.de> wrote: > > Lumpy reclaim had a purpose but in the mind of some, it was to kick > > the system so hard it trashed. For others the purpose was to complicate > > vmscan.c. Over time it was giving softer shoes and a nicer attitude but > > memory compaction needs to step up and replace it so this patch sends > > lumpy reclaim to the farm. > > > > Here are the important notes related to the patch. > > > > 1. The tracepoint format changes for isolating LRU pages. > > > > 2. This patch stops reclaim/compaction entering sync reclaim as this > > was only intended for lumpy reclaim and an oversight. Page migration > > has its own logic for stalling on writeback pages if necessary and > > memory compaction is already using it. This is a behaviour change. > > > > 3. RECLAIM_MODE_SYNC no longer exists. pageout() does not stall > > on PageWriteback with CONFIG_COMPACTION has been this way for a while. > > I am calling it out in case this is a surpise to people. > > Mel, > > Can you point me the commit making that change? I am looking at > v3.4-rc1 where set_reclaim_mode() still set RECLAIM_MODE_SYNC for > COMPACTION_BUILD. > You're right. There is only one call site that passes sync==true for set_reclaim_mode() in vmscan.c and that is only if should_reclaim_stall() returns true. It had the comment "Only stall on lumpy reclaim" but the comment is not accurate and that mislead me. Thanks, I'll revisit the patch. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] mm: vmscan: Remove lumpy reclaim 2012-04-10 8:24 ` Mel Gorman @ 2012-04-10 9:29 ` Mel Gorman 2012-04-10 17:25 ` Ying Han 0 siblings, 1 reply; 14+ messages in thread From: Mel Gorman @ 2012-04-10 9:29 UTC (permalink / raw) To: Ying Han Cc: Linux-MM, LKML, Andrew Morton, Rik van Riel, Konstantin Khlebnikov, Hugh Dickins On Tue, Apr 10, 2012 at 09:24:54AM +0100, Mel Gorman wrote: > On Fri, Apr 06, 2012 at 04:52:09PM -0700, Ying Han wrote: > > On Wed, Mar 28, 2012 at 9:06 AM, Mel Gorman <mgorman@suse.de> wrote: > > > Lumpy reclaim had a purpose but in the mind of some, it was to kick > > > the system so hard it trashed. For others the purpose was to complicate > > > vmscan.c. Over time it was giving softer shoes and a nicer attitude but > > > memory compaction needs to step up and replace it so this patch sends > > > lumpy reclaim to the farm. > > > > > > Here are the important notes related to the patch. > > > > > > 1. The tracepoint format changes for isolating LRU pages. > > > > > > 2. This patch stops reclaim/compaction entering sync reclaim as this > > > was only intended for lumpy reclaim and an oversight. Page migration > > > has its own logic for stalling on writeback pages if necessary and > > > memory compaction is already using it. This is a behaviour change. > > > > > > 3. RECLAIM_MODE_SYNC no longer exists. pageout() does not stall > > > on PageWriteback with CONFIG_COMPACTION has been this way for a while. > > > I am calling it out in case this is a surpise to people. > > > > Mel, > > > > Can you point me the commit making that change? I am looking at > > v3.4-rc1 where set_reclaim_mode() still set RECLAIM_MODE_SYNC for > > COMPACTION_BUILD. > > > > You're right. > > There is only one call site that passes sync==true for set_reclaim_mode() in > vmscan.c and that is only if should_reclaim_stall() returns true. It had the > comment "Only stall on lumpy reclaim" but the comment is not accurate > and that mislead me. > > Thanks, I'll revisit the patch. > Just to be clear, I think the patch is right in that stalling on page writeback was intended just for lumpy reclaim. I've split out the patch that stops reclaim/compaction entering sync reclaim but the end result of the series is the same. Unfortunately we do not have tracing to record how often reclaim waited on writeback during compaction so my historical data does not indicate how often it happened. However, it may partially explain occasionaly complaints about interactivity during heavy writeback when THP is enabled (the bulk of the stalls were due to something else but on rare occasions disabling THP was reported to make a small unquantifable difference). I'll enable ftrace to record how often mm_vmscan_writepage() used RECLAIM_MODE_SYNC during tests for this series and include that information in the changelog. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] mm: vmscan: Remove lumpy reclaim 2012-04-10 9:29 ` Mel Gorman @ 2012-04-10 17:25 ` Ying Han 0 siblings, 0 replies; 14+ messages in thread From: Ying Han @ 2012-04-10 17:25 UTC (permalink / raw) To: Mel Gorman Cc: Linux-MM, LKML, Andrew Morton, Rik van Riel, Konstantin Khlebnikov, Hugh Dickins On Tue, Apr 10, 2012 at 2:29 AM, Mel Gorman <mgorman@suse.de> wrote: > On Tue, Apr 10, 2012 at 09:24:54AM +0100, Mel Gorman wrote: >> On Fri, Apr 06, 2012 at 04:52:09PM -0700, Ying Han wrote: >> > On Wed, Mar 28, 2012 at 9:06 AM, Mel Gorman <mgorman@suse.de> wrote: >> > > Lumpy reclaim had a purpose but in the mind of some, it was to kick >> > > the system so hard it trashed. For others the purpose was to complicate >> > > vmscan.c. Over time it was giving softer shoes and a nicer attitude but >> > > memory compaction needs to step up and replace it so this patch sends >> > > lumpy reclaim to the farm. >> > > >> > > Here are the important notes related to the patch. >> > > >> > > 1. The tracepoint format changes for isolating LRU pages. >> > > >> > > 2. This patch stops reclaim/compaction entering sync reclaim as this >> > > was only intended for lumpy reclaim and an oversight. Page migration >> > > has its own logic for stalling on writeback pages if necessary and >> > > memory compaction is already using it. This is a behaviour change. >> > > >> > > 3. RECLAIM_MODE_SYNC no longer exists. pageout() does not stall >> > > on PageWriteback with CONFIG_COMPACTION has been this way for a while. >> > > I am calling it out in case this is a surpise to people. >> > >> > Mel, >> > >> > Can you point me the commit making that change? I am looking at >> > v3.4-rc1 where set_reclaim_mode() still set RECLAIM_MODE_SYNC for >> > COMPACTION_BUILD. >> > >> >> You're right. >> >> There is only one call site that passes sync==true for set_reclaim_mode() in >> vmscan.c and that is only if should_reclaim_stall() returns true. It had the >> comment "Only stall on lumpy reclaim" but the comment is not accurate >> and that mislead me. >> >> Thanks, I'll revisit the patch. >> > > Just to be clear, I think the patch is right in that stalling on page > writeback was intended just for lumpy reclaim. I see mismatch between the comment "Only stall on lumpy reclaim" and the actual implementation in should_reclaim_stall(). Not sure what is intended, but based on the code, both lumpy and compaction reclaim will be stalled under PageWriteback. I've split out the patch > that stops reclaim/compaction entering sync reclaim but the end result > of the series is the same. I think that make senses to me for compaction due to its migrating page nature. Unfortunately we do not have tracing to record > how often reclaim waited on writeback during compaction so my historical > data does not indicate how often it happened. However, it may partially > explain occasionaly complaints about interactivity during heavy writeback > when THP is enabled (the bulk of the stalls were due to something else but > on rare occasions disabling THP was reported to make a small unquantifable > difference). I'll enable ftrace to record how often mm_vmscan_writepage() > used RECLAIM_MODE_SYNC during tests for this series and include that > information in the changelog. Thanks for looking into it. --Ying > -- > Mel Gorman > SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 2/2] mm: vmscan: Remove reclaim_mode_t 2012-03-28 16:06 [RFC PATCH 0/2] Removal of lumpy reclaim Mel Gorman 2012-03-28 16:06 ` [PATCH 1/2] mm: vmscan: Remove " Mel Gorman @ 2012-03-28 16:06 ` Mel Gorman 2012-04-06 19:34 ` [RFC PATCH 0/2] Removal of lumpy reclaim Andrew Morton 2 siblings, 0 replies; 14+ messages in thread From: Mel Gorman @ 2012-03-28 16:06 UTC (permalink / raw) To: Linux-MM, LKML Cc: Andrew Morton, Rik van Riel, Konstantin Khlebnikov, Hugh Dickins, Mel Gorman There is little motiviation for reclaim_mode_t once RECLAIM_MODE_[A]SYNC and lumpy reclaim have been removed. This patch gets rid of reclaim_mode_t as well and improves the documentation about what reclaim/compaction is and when it is triggered. Signed-off-by: Mel Gorman <mgorman@suse.de> --- include/trace/events/vmscan.h | 4 +-- mm/vmscan.c | 72 +++++++++++++---------------------------- 2 files changed, 24 insertions(+), 52 deletions(-) diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h index 6f60b33..f66cc93 100644 --- a/include/trace/events/vmscan.h +++ b/include/trace/events/vmscan.h @@ -25,12 +25,12 @@ {RECLAIM_WB_ASYNC, "RECLAIM_WB_ASYNC"} \ ) : "RECLAIM_WB_NONE" -#define trace_reclaim_flags(page, sync) ( \ +#define trace_reclaim_flags(page) ( \ (page_is_file_cache(page) ? RECLAIM_WB_FILE : RECLAIM_WB_ANON) | \ (RECLAIM_WB_ASYNC) \ ) -#define trace_shrink_flags(file, sync) ( \ +#define trace_shrink_flags(file) \ ( \ (file ? RECLAIM_WB_FILE : RECLAIM_WB_ANON) | \ (RECLAIM_WB_ASYNC) \ diff --git a/mm/vmscan.c b/mm/vmscan.c index 68319e4..36c6ad2 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -53,16 +53,6 @@ #define CREATE_TRACE_POINTS #include <trace/events/vmscan.h> -/* - * reclaim_mode determines how the inactive list is shrunk - * RECLAIM_MODE_SINGLE: Reclaim only order-0 pages - * RECLAIM_MODE_COMPACTION: For high-order allocations, reclaim a number of - * order-0 pages and then compact the zone - */ -typedef unsigned __bitwise__ reclaim_mode_t; -#define RECLAIM_MODE_SINGLE ((__force reclaim_mode_t)0x01u) -#define RECLAIM_MODE_COMPACTION ((__force reclaim_mode_t)0x10u) - struct scan_control { /* Incremented by the number of inactive pages that were scanned */ unsigned long nr_scanned; @@ -89,12 +79,6 @@ struct scan_control { int order; /* - * Intend to reclaim enough continuous memory rather than reclaim - * enough amount of memory. i.e, mode for high order allocation. - */ - reclaim_mode_t reclaim_mode; - - /* * The memory cgroup that hit its limit and as a result is the * primary target of this reclaim invocation. */ @@ -356,25 +340,6 @@ out: return ret; } -static void set_reclaim_mode(int priority, struct scan_control *sc) -{ - /* - * Restrict reclaim/compaction to costly allocations or when - * under memory pressure - */ - if (COMPACTION_BUILD && sc->order && - (sc->order > PAGE_ALLOC_COSTLY_ORDER || - priority < DEF_PRIORITY - 2)) - sc->reclaim_mode = RECLAIM_MODE_COMPACTION; - else - sc->reclaim_mode = RECLAIM_MODE_SINGLE; -} - -static void reset_reclaim_mode(struct scan_control *sc) -{ - sc->reclaim_mode = RECLAIM_MODE_SINGLE; -} - static inline int is_page_cache_freeable(struct page *page) { /* @@ -497,8 +462,7 @@ static pageout_t pageout(struct page *page, struct address_space *mapping, /* synchronous write or broken a_ops? */ ClearPageReclaim(page); } - trace_mm_vmscan_writepage(page, - trace_reclaim_flags(page, sc->reclaim_mode)); + trace_mm_vmscan_writepage(page, trace_reclaim_flags(page)); inc_zone_page_state(page, NR_VMSCAN_WRITE); return PAGE_SUCCESS; } @@ -953,7 +917,6 @@ cull_mlocked: try_to_free_swap(page); unlock_page(page); putback_lru_page(page); - reset_reclaim_mode(sc); continue; activate_locked: @@ -1348,8 +1311,6 @@ shrink_inactive_list(unsigned long nr_to_scan, struct mem_cgroup_zone *mz, return SWAP_CLUSTER_MAX; } - set_reclaim_mode(priority, sc); - lru_add_drain(); if (!sc->may_unmap) @@ -1428,7 +1389,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct mem_cgroup_zone *mz, zone_idx(zone), nr_scanned, nr_reclaimed, priority, - trace_shrink_flags(file, sc->reclaim_mode)); + trace_shrink_flags(file)); return nr_reclaimed; } @@ -1507,8 +1468,6 @@ static void shrink_active_list(unsigned long nr_to_scan, lru_add_drain(); - reset_reclaim_mode(sc); - if (!sc->may_unmap) isolate_mode |= ISOLATE_UNMAPPED; if (!sc->may_writepage) @@ -1821,23 +1780,35 @@ out: } } +/* Use reclaim/compaction for costly allocs or under memory pressure */ +static bool in_reclaim_compaction(int priority, struct scan_control *sc) +{ + if (COMPACTION_BUILD && sc->order && + (sc->order > PAGE_ALLOC_COSTLY_ORDER || + priority < DEF_PRIORITY - 2)) + return true; + + return false; +} + /* - * Reclaim/compaction depends on a number of pages being freed. To avoid - * disruption to the system, a small number of order-0 pages continue to be - * rotated and reclaimed in the normal fashion. However, by the time we get - * back to the allocator and call try_to_compact_zone(), we ensure that - * there are enough free pages for it to be likely successful + * Reclaim/compaction is used for high-order allocation requests. It reclaims + * order-0 pages before compacting the zone. should_continue_reclaim() returns + * true if more pages should be reclaimed such that when the page allocator + * calls try_to_compact_zone() that it will have enough free pages to succeed. + * It will give up earlier than that if there is difficulty reclaiming pages. */ static inline bool should_continue_reclaim(struct mem_cgroup_zone *mz, unsigned long nr_reclaimed, unsigned long nr_scanned, + int priority, struct scan_control *sc) { unsigned long pages_for_compaction; unsigned long inactive_lru_pages; /* If not in reclaim/compaction mode, stop */ - if (!(sc->reclaim_mode & RECLAIM_MODE_COMPACTION)) + if (!in_reclaim_compaction(priority, sc)) return false; /* Consider stopping depending on scan and reclaim activity */ @@ -1944,7 +1915,8 @@ restart: /* reclaim/compaction might need reclaim to continue */ if (should_continue_reclaim(mz, nr_reclaimed, - sc->nr_scanned - nr_scanned, sc)) + sc->nr_scanned - nr_scanned, + priority, sc)) goto restart; throttle_vm_writeout(sc->gfp_mask); -- 1.7.9.2 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [RFC PATCH 0/2] Removal of lumpy reclaim 2012-03-28 16:06 [RFC PATCH 0/2] Removal of lumpy reclaim Mel Gorman 2012-03-28 16:06 ` [PATCH 1/2] mm: vmscan: Remove " Mel Gorman 2012-03-28 16:06 ` [PATCH 2/2] mm: vmscan: Remove reclaim_mode_t Mel Gorman @ 2012-04-06 19:34 ` Andrew Morton 2012-04-06 20:31 ` Hugh Dickins 2012-04-10 8:32 ` Mel Gorman 2 siblings, 2 replies; 14+ messages in thread From: Andrew Morton @ 2012-04-06 19:34 UTC (permalink / raw) To: Mel Gorman Cc: Linux-MM, LKML, Rik van Riel, Konstantin Khlebnikov, Hugh Dickins On Wed, 28 Mar 2012 17:06:21 +0100 Mel Gorman <mgorman@suse.de> wrote: > (cc'ing active people in the thread "[patch 68/92] mm: forbid lumpy-reclaim > in shrink_active_list()") > > In the interest of keeping my fingers from the flames at LSF/MM, I'm > releasing an RFC for lumpy reclaim removal. I grabbed them, thanks. > > ... > > MMTests Statistics: vmstat > Page Ins 5426648 2840348 2695120 > Page Outs 7206376 7854516 7860408 > Swap Ins 36799 0 0 > Swap Outs 76903 4 0 > Direct pages scanned 31981 43749 160647 > Kswapd pages scanned 26658682 1285341 1195956 > Kswapd pages reclaimed 2248583 1271621 1178420 > Direct pages reclaimed 6397 14416 94093 > Kswapd efficiency 8% 98% 98% > Kswapd velocity 22134.225 1127.205 1051.316 > Direct efficiency 20% 32% 58% > Direct velocity 26.553 38.367 141.218 > Percentage direct scans 0% 3% 11% > Page writes by reclaim 6530481 4 0 > Page writes file 6453578 0 0 > Page writes anon 76903 4 0 > Page reclaim immediate 256742 17832 61576 > Page rescued immediate 0 0 0 > Slabs scanned 1073152 971776 975872 > Direct inode steals 0 196279 205178 > Kswapd inode steals 139260 70390 64323 > Kswapd skipped wait 21711 1 0 > THP fault alloc 1 126 143 > THP collapse alloc 324 294 224 > THP splits 32 8 10 > THP fault fallback 0 0 0 > THP collapse fail 5 6 7 > Compaction stalls 364 1312 1324 > Compaction success 255 343 366 > Compaction failures 109 969 958 > Compaction pages moved 265107 3952630 4489215 > Compaction move failure 7493 26038 24739 > > ... > > Success rates are completely hosed for 3.4-rc1 which is almost certainly > due to [fe2c2a10: vmscan: reclaim at order 0 when compaction is enabled]. I > expected this would happen for kswapd and impair allocation success rates > (https://lkml.org/lkml/2012/1/25/166) but I did not anticipate this much > a difference: 95% less scanning, 43% less reclaim by kswapd > > In comparison, reclaim/compaction is not aggressive and gives up easily > which is the intended behaviour. hugetlbfs uses __GFP_REPEAT and would be > much more aggressive about reclaim/compaction than THP allocations are. The > stress test above is allocating like neither THP or hugetlbfs but is much > closer to THP. We seem to be thrashing around a bit with the performance, and we aren't tracking this closely enough. What is kswapd efficiency? pages-relclaimed/pages-scanned? Why did it increase so much? Are pages which were reclaimed via prune_icache_sb() included? If so, they can make a real mess of the scanning efficiency metric. The increase in PGINODESTEAL is remarkable. It seems to largely be a transfer from kswapd inode stealing. Bad from a latency POV, at least. What would cause this change? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH 0/2] Removal of lumpy reclaim 2012-04-06 19:34 ` [RFC PATCH 0/2] Removal of lumpy reclaim Andrew Morton @ 2012-04-06 20:31 ` Hugh Dickins 2012-04-07 3:00 ` KOSAKI Motohiro 2012-04-09 18:10 ` Rik van Riel 2012-04-10 8:32 ` Mel Gorman 1 sibling, 2 replies; 14+ messages in thread From: Hugh Dickins @ 2012-04-06 20:31 UTC (permalink / raw) To: Andrew Morton Cc: Mel Gorman, Linux-MM, LKML, Rik van Riel, Konstantin Khlebnikov, Hugh Dickins On Fri, 6 Apr 2012, Andrew Morton wrote: > On Wed, 28 Mar 2012 17:06:21 +0100 > Mel Gorman <mgorman@suse.de> wrote: > > > (cc'ing active people in the thread "[patch 68/92] mm: forbid lumpy-reclaim > > in shrink_active_list()") > > > > In the interest of keeping my fingers from the flames at LSF/MM, I'm > > releasing an RFC for lumpy reclaim removal. > > I grabbed them, thanks. I do have a concern with this: I was expecting lumpy reclaim to be replaced by compaction, and indeed it is when CONFIG_COMPACTION=y. But when CONFIG_COMPACTION is not set, we're back to 2.6.22 in relying upon blind chance to provide order>0 pages. Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH 0/2] Removal of lumpy reclaim 2012-04-06 20:31 ` Hugh Dickins @ 2012-04-07 3:00 ` KOSAKI Motohiro 2012-04-09 18:10 ` Rik van Riel 1 sibling, 0 replies; 14+ messages in thread From: KOSAKI Motohiro @ 2012-04-07 3:00 UTC (permalink / raw) To: Hugh Dickins Cc: Andrew Morton, Mel Gorman, Linux-MM, LKML, Rik van Riel, Konstantin Khlebnikov, kosaki.motohiro (4/6/12 1:31 PM), Hugh Dickins wrote: > On Fri, 6 Apr 2012, Andrew Morton wrote: >> On Wed, 28 Mar 2012 17:06:21 +0100 >> Mel Gorman<mgorman@suse.de> wrote: >> >>> (cc'ing active people in the thread "[patch 68/92] mm: forbid lumpy-reclaim >>> in shrink_active_list()") >>> >>> In the interest of keeping my fingers from the flames at LSF/MM, I'm >>> releasing an RFC for lumpy reclaim removal. >> >> I grabbed them, thanks. > > I do have a concern with this: I was expecting lumpy reclaim to be > replaced by compaction, and indeed it is when CONFIG_COMPACTION=y. > But when CONFIG_COMPACTION is not set, we're back to 2.6.22 in > relying upon blind chance to provide order>0 pages. I was putted most big objection to remove lumpy when compaction merging. But I think that's ok. Because of, desktop and server people always use COMPACTION=y kernel and embedded people don't use swap (then lumpy wouldn't work). My thought was to keep gradual development and avoid aggressive regression. and Mel did. compaction is now completely stable and we have no reason to keep lumpy, I think. Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH 0/2] Removal of lumpy reclaim 2012-04-06 20:31 ` Hugh Dickins 2012-04-07 3:00 ` KOSAKI Motohiro @ 2012-04-09 18:10 ` Rik van Riel 2012-04-09 19:18 ` Hugh Dickins 1 sibling, 1 reply; 14+ messages in thread From: Rik van Riel @ 2012-04-09 18:10 UTC (permalink / raw) To: Hugh Dickins Cc: Andrew Morton, Mel Gorman, Linux-MM, LKML, Konstantin Khlebnikov On 04/06/2012 04:31 PM, Hugh Dickins wrote: > On Fri, 6 Apr 2012, Andrew Morton wrote: >> On Wed, 28 Mar 2012 17:06:21 +0100 >> Mel Gorman<mgorman@suse.de> wrote: >> >>> (cc'ing active people in the thread "[patch 68/92] mm: forbid lumpy-reclaim >>> in shrink_active_list()") >>> >>> In the interest of keeping my fingers from the flames at LSF/MM, I'm >>> releasing an RFC for lumpy reclaim removal. >> >> I grabbed them, thanks. > > I do have a concern with this: I was expecting lumpy reclaim to be > replaced by compaction, and indeed it is when CONFIG_COMPACTION=y. > But when CONFIG_COMPACTION is not set, we're back to 2.6.22 in > relying upon blind chance to provide order>0 pages. Is this an issue for any architecture? I could see NOMMU being unable to use compaction, but chances are lumpy reclaim would be sufficient for that configuration, anyway... -- All rights reversed -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH 0/2] Removal of lumpy reclaim 2012-04-09 18:10 ` Rik van Riel @ 2012-04-09 19:18 ` Hugh Dickins 2012-04-09 23:40 ` Rik van Riel 0 siblings, 1 reply; 14+ messages in thread From: Hugh Dickins @ 2012-04-09 19:18 UTC (permalink / raw) To: Rik van Riel Cc: Andrew Morton, Mel Gorman, Linux-MM, LKML, Konstantin Khlebnikov On Mon, 9 Apr 2012, Rik van Riel wrote: > On 04/06/2012 04:31 PM, Hugh Dickins wrote: > > On Fri, 6 Apr 2012, Andrew Morton wrote: > > > On Wed, 28 Mar 2012 17:06:21 +0100 > > > Mel Gorman<mgorman@suse.de> wrote: > > > > > > > (cc'ing active people in the thread "[patch 68/92] mm: forbid > > > > lumpy-reclaim > > > > in shrink_active_list()") > > > > > > > > In the interest of keeping my fingers from the flames at LSF/MM, I'm > > > > releasing an RFC for lumpy reclaim removal. > > > > > > I grabbed them, thanks. > > > > I do have a concern with this: I was expecting lumpy reclaim to be > > replaced by compaction, and indeed it is when CONFIG_COMPACTION=y. > > But when CONFIG_COMPACTION is not set, we're back to 2.6.22 in > > relying upon blind chance to provide order>0 pages. > > Is this an issue for any architecture? Dunno about any architecture as a whole; but I'd expect users of SLOB or TINY config options to want to still use lumpy rather than the more efficient but weightier COMPACTION+MIGRATION. Though "size migrate.o compaction.o" on my 32-bit config does not reach 8kB, so maybe it's not a big deal after all. > > I could see NOMMU being unable to use compaction, but Yes, COMPACTION depends on MMU. > chances are lumpy reclaim would be sufficient for that > configuration, anyway... That's an argument for your patch in 3.4-rc, which uses lumpy only when !COMPACTION_BUILD. But here we're worrying about Mel's patch, which removes the lumpy code completely. Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH 0/2] Removal of lumpy reclaim 2012-04-09 19:18 ` Hugh Dickins @ 2012-04-09 23:40 ` Rik van Riel 0 siblings, 0 replies; 14+ messages in thread From: Rik van Riel @ 2012-04-09 23:40 UTC (permalink / raw) To: Hugh Dickins Cc: Andrew Morton, Mel Gorman, Linux-MM, LKML, Konstantin Khlebnikov On 04/09/2012 03:18 PM, Hugh Dickins wrote: > On Mon, 9 Apr 2012, Rik van Riel wrote: >> I could see NOMMU being unable to use compaction, but > > Yes, COMPACTION depends on MMU. > >> chances are lumpy reclaim would be sufficient for that >> configuration, anyway... > > That's an argument for your patch in 3.4-rc, which uses lumpy only > when !COMPACTION_BUILD. But here we're worrying about Mel's patch, > which removes the lumpy code completely. Sorry, that was a typo in my mail. I wanted to say that I expect lumpy reclaim to NOT be sufficient for NOMMU anyway, because it cannot reclaim lumps of memory large enough to fit a new process. -- All rights reversed -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH 0/2] Removal of lumpy reclaim 2012-04-06 19:34 ` [RFC PATCH 0/2] Removal of lumpy reclaim Andrew Morton 2012-04-06 20:31 ` Hugh Dickins @ 2012-04-10 8:32 ` Mel Gorman 1 sibling, 0 replies; 14+ messages in thread From: Mel Gorman @ 2012-04-10 8:32 UTC (permalink / raw) To: Andrew Morton Cc: Linux-MM, LKML, Rik van Riel, Konstantin Khlebnikov, Hugh Dickins On Fri, Apr 06, 2012 at 12:34:39PM -0700, Andrew Morton wrote: > On Wed, 28 Mar 2012 17:06:21 +0100 > Mel Gorman <mgorman@suse.de> wrote: > > > (cc'ing active people in the thread "[patch 68/92] mm: forbid lumpy-reclaim > > in shrink_active_list()") > > > > In the interest of keeping my fingers from the flames at LSF/MM, I'm > > releasing an RFC for lumpy reclaim removal. > > I grabbed them, thanks. > There probably will be a V2 as Ying pointed out a problem with patch 1. > > > > ... > > > > MMTests Statistics: vmstat > > Page Ins 5426648 2840348 2695120 > > Page Outs 7206376 7854516 7860408 > > Swap Ins 36799 0 0 > > Swap Outs 76903 4 0 > > Direct pages scanned 31981 43749 160647 > > Kswapd pages scanned 26658682 1285341 1195956 > > Kswapd pages reclaimed 2248583 1271621 1178420 > > Direct pages reclaimed 6397 14416 94093 > > Kswapd efficiency 8% 98% 98% > > Kswapd velocity 22134.225 1127.205 1051.316 > > Direct efficiency 20% 32% 58% > > Direct velocity 26.553 38.367 141.218 > > Percentage direct scans 0% 3% 11% > > Page writes by reclaim 6530481 4 0 > > Page writes file 6453578 0 0 > > Page writes anon 76903 4 0 > > Page reclaim immediate 256742 17832 61576 > > Page rescued immediate 0 0 0 > > Slabs scanned 1073152 971776 975872 > > Direct inode steals 0 196279 205178 > > Kswapd inode steals 139260 70390 64323 > > Kswapd skipped wait 21711 1 0 > > THP fault alloc 1 126 143 > > THP collapse alloc 324 294 224 > > THP splits 32 8 10 > > THP fault fallback 0 0 0 > > THP collapse fail 5 6 7 > > Compaction stalls 364 1312 1324 > > Compaction success 255 343 366 > > Compaction failures 109 969 958 > > Compaction pages moved 265107 3952630 4489215 > > Compaction move failure 7493 26038 24739 > > > > ... > > > > Success rates are completely hosed for 3.4-rc1 which is almost certainly > > due to [fe2c2a10: vmscan: reclaim at order 0 when compaction is enabled]. I > > expected this would happen for kswapd and impair allocation success rates > > (https://lkml.org/lkml/2012/1/25/166) but I did not anticipate this much > > a difference: 95% less scanning, 43% less reclaim by kswapd > > > > In comparison, reclaim/compaction is not aggressive and gives up easily > > which is the intended behaviour. hugetlbfs uses __GFP_REPEAT and would be > > much more aggressive about reclaim/compaction than THP allocations are. The > > stress test above is allocating like neither THP or hugetlbfs but is much > > closer to THP. > > We seem to be thrashing around a bit with the performance, and we > aren't tracking this closely enough. > Yes. > What is kswapd efficiency? pages-relclaimed/pages-scanned? pages_reclaimed*100/pages_scanned > Why did it > increase so much? Lumpy reclaim increases the number of pages scanned in isolate_lru_pages() and that is what I was attributing it to. > Are pages which were reclaimed via prune_icache_sb() > included? If so, they can make a real mess of the scanning efficiency > metric. > I don't think so. For Kswapd efficiency, I'm using "kswapd_steal" from vmstat and that is updated by shrink_inactive_list and not the slab shrinker > The increase in PGINODESTEAL is remarkable. It seems to largely be a > transfer from kswapd inode stealing. Bad from a latency POV, at least. > What would cause this change? I'm playing catch-up at the moment and right now, I do not have a good explanation as to why it changed like this. The most likely explanation is that we are reclaiming fewer pages leading to more slab reclaim. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2012-04-10 17:25 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-03-28 16:06 [RFC PATCH 0/2] Removal of lumpy reclaim Mel Gorman 2012-03-28 16:06 ` [PATCH 1/2] mm: vmscan: Remove " Mel Gorman 2012-04-06 23:52 ` Ying Han 2012-04-10 8:24 ` Mel Gorman 2012-04-10 9:29 ` Mel Gorman 2012-04-10 17:25 ` Ying Han 2012-03-28 16:06 ` [PATCH 2/2] mm: vmscan: Remove reclaim_mode_t Mel Gorman 2012-04-06 19:34 ` [RFC PATCH 0/2] Removal of lumpy reclaim Andrew Morton 2012-04-06 20:31 ` Hugh Dickins 2012-04-07 3:00 ` KOSAKI Motohiro 2012-04-09 18:10 ` Rik van Riel 2012-04-09 19:18 ` Hugh Dickins 2012-04-09 23:40 ` Rik van Riel 2012-04-10 8:32 ` Mel Gorman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).