From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx172.postini.com [74.125.245.172]) by kanga.kvack.org (Postfix) with SMTP id 76B8D6B0033 for ; Fri, 31 May 2013 14:39:00 -0400 (EDT) Subject: [v4][PATCH 0/6] mm: vmscan: Batch page reclamation under shink_page_list From: Dave Hansen Date: Fri, 31 May 2013 11:38:55 -0700 Message-Id: <20130531183855.44DDF928@viggo.jf.intel.com> Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, akpm@linux-foundation.org, mgorman@suse.de, tim.c.chen@linux.intel.com, Dave Hansen These are an update of Tim Chen's earlier work: http://lkml.kernel.org/r/1347293960.9977.70.camel@schen9-DESK I broke the patches up a bit more, and tried to incorporate some changes based on some feedback from Mel and Andrew. Changes for v4: * generated on top of linux-next-20130530, plus Mel's vmscan fixes: http://lkml.kernel.org/r/1369659778-6772-2-git-send-email-mgorman@suse.de * added some proper vmscan/swap: prefixes to the subjects Changes for v3: * Add batch draining before congestion_wait() * minor merge conflicts with Mel's vmscan work Changes for v2: * use page_mapping() accessor instead of direct access to page->mapping (could cause crashes when running in to swap cache pages. * group the batch function's introduction patch with its first use * rename a few functions as suggested by Mel * Ran some single-threaded tests to look for regressions caused by the batching. If there is overhead, it is only in the worst-case scenarios, and then only in hundreths of a percent of CPU time. If you're curious how effective the batching is, I have a quick and dirty patch to keep some stats: https://www.sr71.net/~dave/intel/rmb-stats-only.patch -- To do page reclamation in shrink_page_list function, there are two locks taken on a page by page basis. One is the tree lock protecting the radix tree of the page mapping and the other is the mapping->i_mmap_mutex protecting the mapped pages. This set deals only with mapping->tree_lock. Tim managed to get 14% throughput improvement when with a workload putting heavy pressure of page cache by reading many large mmaped files simultaneously on a 8 socket Westmere server. I've been testing these by running large parallel kernel compiles on systems that are under memory pressure. During development, I caught quite a few races on smaller setups, and it's being quite stable that large (160 logical CPU / 1TB) system. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx203.postini.com [74.125.245.203]) by kanga.kvack.org (Postfix) with SMTP id 7B3F36B0034 for ; Fri, 31 May 2013 14:39:00 -0400 (EDT) Subject: [v4][PATCH 1/6] mm: swap: defer clearing of page_private() for swap cache pages From: Dave Hansen Date: Fri, 31 May 2013 11:38:56 -0700 References: <20130531183855.44DDF928@viggo.jf.intel.com> In-Reply-To: <20130531183855.44DDF928@viggo.jf.intel.com> Message-Id: <20130531183856.1D7D75AD@viggo.jf.intel.com> Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, akpm@linux-foundation.org, mgorman@suse.de, tim.c.chen@linux.intel.com, Dave Hansen From: Dave Hansen This patch defers the destruction of swapcache-specific data in 'struct page'. This simplifies the code because we do not have to keep extra copies of the data during the removal of a page from the swap cache. There are only two callers of swapcache_free() which actually pass in a non-NULL 'struct page'. Both of them (__remove_mapping and delete_from_swap_cache()) create a temporary on-stack 'swp_entry_t' and set entry.val to page_private(). They need to do this since __delete_from_swap_cache() does set_page_private(page, 0) and destroys the information. However, I'd like to batch a few of these operations on several pages together in a new version of __remove_mapping(), and I would prefer not to have to allocate temporary storage for each page. The code is pretty ugly, and it's a bit silly to create these on-stack 'swp_entry_t's when it is so easy to just keep the information around in 'struct page'. There should not be any danger in doing this since we are absolutely on the path of freeing these page. There is no turning back, and no other rerferences can be obtained after it comes out of the radix tree. Note: This patch is separate from the next one since it introduces the behavior change. I've seen issues with this patch by itself in various forms and I think having it separate like this aids bisection. Signed-off-by: Dave Hansen Acked-by: Mel Gorman --- linux.git-davehans/mm/swap_state.c | 4 ++-- linux.git-davehans/mm/vmscan.c | 2 ++ 2 files changed, 4 insertions(+), 2 deletions(-) diff -puN mm/swap_state.c~__delete_from_swap_cache-dont-clear-page-private mm/swap_state.c --- linux.git/mm/swap_state.c~__delete_from_swap_cache-dont-clear-page-private 2013-05-30 16:07:50.630079404 -0700 +++ linux.git-davehans/mm/swap_state.c 2013-05-30 16:07:50.635079624 -0700 @@ -148,8 +148,6 @@ void __delete_from_swap_cache(struct pag entry.val = page_private(page); address_space = swap_address_space(entry); radix_tree_delete(&address_space->page_tree, page_private(page)); - set_page_private(page, 0); - ClearPageSwapCache(page); address_space->nrpages--; __dec_zone_page_state(page, NR_FILE_PAGES); INC_CACHE_INFO(del_total); @@ -226,6 +224,8 @@ void delete_from_swap_cache(struct page spin_unlock_irq(&address_space->tree_lock); swapcache_free(entry, page); + set_page_private(page, 0); + ClearPageSwapCache(page); page_cache_release(page); } diff -puN mm/vmscan.c~__delete_from_swap_cache-dont-clear-page-private mm/vmscan.c --- linux.git/mm/vmscan.c~__delete_from_swap_cache-dont-clear-page-private 2013-05-30 16:07:50.632079492 -0700 +++ linux.git-davehans/mm/vmscan.c 2013-05-30 16:07:50.637079712 -0700 @@ -494,6 +494,8 @@ static int __remove_mapping(struct addre __delete_from_swap_cache(page); spin_unlock_irq(&mapping->tree_lock); swapcache_free(swap, page); + set_page_private(page, 0); + ClearPageSwapCache(page); } else { void (*freepage)(struct page *); _ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx123.postini.com [74.125.245.123]) by kanga.kvack.org (Postfix) with SMTP id B3AA16B0037 for ; Fri, 31 May 2013 14:39:01 -0400 (EDT) Subject: [v4][PATCH 2/6] mm: swap: make 'struct page' and swp_entry_t variants of swapcache_free(). From: Dave Hansen Date: Fri, 31 May 2013 11:38:58 -0700 References: <20130531183855.44DDF928@viggo.jf.intel.com> In-Reply-To: <20130531183855.44DDF928@viggo.jf.intel.com> Message-Id: <20130531183858.3C8C10C7@viggo.jf.intel.com> Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, akpm@linux-foundation.org, mgorman@suse.de, tim.c.chen@linux.intel.com, Dave Hansen From: Dave Hansen swapcache_free() takes two arguments: void swapcache_free(swp_entry_t entry, struct page *page) Most of its callers (5/7) are from error handling paths haven't even instantiated a page, so they pass page=NULL. Both of the callers that call in with a 'struct page' create and pass in a temporary swp_entry_t. Now that we are deferring clearing page_private() until after swapcache_free() has been called, we can just create a variant that takes a 'struct page' and does the temporary variable in the helper. That leaves all the other callers doing swapcache_free(entry, NULL) so create another helper for them that makes it clear that they need only pass in a swp_entry_t. One downside here is that delete_from_swap_cache() now does an extra swap_address_space() call. But, those are pretty cheap (just some array index arithmetic). Signed-off-by: Dave Hansen --- linux.git-davehans/drivers/staging/zcache/zcache-main.c | 2 +- linux.git-davehans/include/linux/swap.h | 3 ++- linux.git-davehans/mm/shmem.c | 2 +- linux.git-davehans/mm/swap_state.c | 15 +++++---------- linux.git-davehans/mm/swapfile.c | 15 ++++++++++++++- linux.git-davehans/mm/vmscan.c | 5 +---- 6 files changed, 24 insertions(+), 18 deletions(-) diff -puN drivers/staging/zcache/zcache-main.c~make-page-and-swp_entry_t-variants drivers/staging/zcache/zcache-main.c --- linux.git/drivers/staging/zcache/zcache-main.c~make-page-and-swp_entry_t-variants 2013-05-30 16:07:50.898091196 -0700 +++ linux.git-davehans/drivers/staging/zcache/zcache-main.c 2013-05-30 16:07:50.910091724 -0700 @@ -961,7 +961,7 @@ static int zcache_get_swap_cache_page(in * add_to_swap_cache() doesn't return -EEXIST, so we can safely * clear SWAP_HAS_CACHE flag. */ - swapcache_free(entry, NULL); + swapcache_free_entry(entry); /* FIXME: is it possible to get here without err==-ENOMEM? * If not, we can dispense with the do loop, use goto retry */ } while (err != -ENOMEM); diff -puN include/linux/swap.h~make-page-and-swp_entry_t-variants include/linux/swap.h --- linux.git/include/linux/swap.h~make-page-and-swp_entry_t-variants 2013-05-30 16:07:50.900091284 -0700 +++ linux.git-davehans/include/linux/swap.h 2013-05-30 16:07:50.911091768 -0700 @@ -385,7 +385,8 @@ extern void swap_shmem_alloc(swp_entry_t extern int swap_duplicate(swp_entry_t); extern int swapcache_prepare(swp_entry_t); extern void swap_free(swp_entry_t); -extern void swapcache_free(swp_entry_t, struct page *page); +extern void swapcache_free_entry(swp_entry_t entry); +extern void swapcache_free_page_entry(struct page *page); extern int free_swap_and_cache(swp_entry_t); extern int swap_type_of(dev_t, sector_t, struct block_device **); extern unsigned int count_swap_pages(int, int); diff -puN mm/shmem.c~make-page-and-swp_entry_t-variants mm/shmem.c --- linux.git/mm/shmem.c~make-page-and-swp_entry_t-variants 2013-05-30 16:07:50.902091372 -0700 +++ linux.git-davehans/mm/shmem.c 2013-05-30 16:07:50.912091812 -0700 @@ -872,7 +872,7 @@ static int shmem_writepage(struct page * } mutex_unlock(&shmem_swaplist_mutex); - swapcache_free(swap, NULL); + swapcache_free_entry(swap); redirty: set_page_dirty(page); if (wbc->for_reclaim) diff -puN mm/swapfile.c~make-page-and-swp_entry_t-variants mm/swapfile.c --- linux.git/mm/swapfile.c~make-page-and-swp_entry_t-variants 2013-05-30 16:07:50.904091460 -0700 +++ linux.git-davehans/mm/swapfile.c 2013-05-30 16:07:50.913091856 -0700 @@ -637,7 +637,7 @@ void swap_free(swp_entry_t entry) /* * Called after dropping swapcache to decrease refcnt to swap entries. */ -void swapcache_free(swp_entry_t entry, struct page *page) +static void __swapcache_free(swp_entry_t entry, struct page *page) { struct swap_info_struct *p; unsigned char count; @@ -651,6 +651,19 @@ void swapcache_free(swp_entry_t entry, s } } +void swapcache_free_entry(swp_entry_t entry) +{ + __swapcache_free(entry, NULL); +} + +void swapcache_free_page_entry(struct page *page) +{ + swp_entry_t entry = { .val = page_private(page) }; + __swapcache_free(entry, page); + set_page_private(page, 0); + ClearPageSwapCache(page); +} + /* * How many references to page are currently swapped out? * This does not give an exact answer when swap count is continued, diff -puN mm/swap_state.c~make-page-and-swp_entry_t-variants mm/swap_state.c --- linux.git/mm/swap_state.c~make-page-and-swp_entry_t-variants 2013-05-30 16:07:50.905091504 -0700 +++ linux.git-davehans/mm/swap_state.c 2013-05-30 16:07:50.914091900 -0700 @@ -174,7 +174,7 @@ int add_to_swap(struct page *page, struc if (unlikely(PageTransHuge(page))) if (unlikely(split_huge_page_to_list(page, list))) { - swapcache_free(entry, NULL); + swapcache_free_entry(entry); return 0; } @@ -200,7 +200,7 @@ int add_to_swap(struct page *page, struc * add_to_swap_cache() doesn't return -EEXIST, so we can safely * clear SWAP_HAS_CACHE flag. */ - swapcache_free(entry, NULL); + swapcache_free_entry(entry); return 0; } } @@ -213,19 +213,14 @@ int add_to_swap(struct page *page, struc */ void delete_from_swap_cache(struct page *page) { - swp_entry_t entry; struct address_space *address_space; - entry.val = page_private(page); - - address_space = swap_address_space(entry); + address_space = page_mapping(page); spin_lock_irq(&address_space->tree_lock); __delete_from_swap_cache(page); spin_unlock_irq(&address_space->tree_lock); - swapcache_free(entry, page); - set_page_private(page, 0); - ClearPageSwapCache(page); + swapcache_free_page_entry(page); page_cache_release(page); } @@ -370,7 +365,7 @@ struct page *read_swap_cache_async(swp_e * add_to_swap_cache() doesn't return -EEXIST, so we can safely * clear SWAP_HAS_CACHE flag. */ - swapcache_free(entry, NULL); + swapcache_free_entry(entry); } while (err != -ENOMEM); if (new_page) diff -puN mm/vmscan.c~make-page-and-swp_entry_t-variants mm/vmscan.c --- linux.git/mm/vmscan.c~make-page-and-swp_entry_t-variants 2013-05-30 16:07:50.907091592 -0700 +++ linux.git-davehans/mm/vmscan.c 2013-05-30 16:07:50.915091944 -0700 @@ -490,12 +490,9 @@ static int __remove_mapping(struct addre } if (PageSwapCache(page)) { - swp_entry_t swap = { .val = page_private(page) }; __delete_from_swap_cache(page); spin_unlock_irq(&mapping->tree_lock); - swapcache_free(swap, page); - set_page_private(page, 0); - ClearPageSwapCache(page); + swapcache_free_page_entry(page); } else { void (*freepage)(struct page *); _ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx116.postini.com [74.125.245.116]) by kanga.kvack.org (Postfix) with SMTP id 543A36B0038 for ; Fri, 31 May 2013 14:39:04 -0400 (EDT) Subject: [v4][PATCH 4/6] mm: vmscan: break out mapping "freepage" code From: Dave Hansen Date: Fri, 31 May 2013 11:39:01 -0700 References: <20130531183855.44DDF928@viggo.jf.intel.com> In-Reply-To: <20130531183855.44DDF928@viggo.jf.intel.com> Message-Id: <20130531183901.375FE758@viggo.jf.intel.com> Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, akpm@linux-foundation.org, mgorman@suse.de, tim.c.chen@linux.intel.com, Dave Hansen From: Dave Hansen __remove_mapping() only deals with pages with mappings, meaning page cache and swap cache. At this point, the page has been removed from the mapping's radix tree, and we need to ensure that any fs-specific (or swap- specific) resources are freed up. We will be using this function from a second location in a following patch. Signed-off-by: Dave Hansen Acked-by: Mel Gorman --- linux.git-davehans/mm/vmscan.c | 28 +++++++++++++++++++--------- 1 file changed, 19 insertions(+), 9 deletions(-) diff -puN mm/vmscan.c~free_mapping_page mm/vmscan.c --- linux.git/mm/vmscan.c~free_mapping_page 2013-05-30 16:07:51.461115968 -0700 +++ linux.git-davehans/mm/vmscan.c 2013-05-30 16:07:51.465116144 -0700 @@ -497,6 +497,24 @@ static int __remove_mapping(struct addre return 1; } +/* + * Release any resources the mapping had tied up in + * the page. + */ +static void mapping_release_page(struct address_space *mapping, + struct page *page) +{ + if (PageSwapCache(page)) { + swapcache_free_page_entry(page); + } else { + void (*freepage)(struct page *); + freepage = mapping->a_ops->freepage; + mem_cgroup_uncharge_cache_page(page); + if (freepage != NULL) + freepage(page); + } +} + static int lock_remove_mapping(struct address_space *mapping, struct page *page) { int ret; @@ -510,15 +528,7 @@ static int lock_remove_mapping(struct ad if (!ret) return 0; - if (PageSwapCache(page)) { - swapcache_free_page_entry(page); - } else { - void (*freepage)(struct page *); - freepage = mapping->a_ops->freepage; - mem_cgroup_uncharge_cache_page(page); - if (freepage != NULL) - freepage(page); - } + mapping_release_page(mapping, page); return ret; } _ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx203.postini.com [74.125.245.203]) by kanga.kvack.org (Postfix) with SMTP id 39AD36B0037 for ; Fri, 31 May 2013 14:39:04 -0400 (EDT) Subject: [v4][PATCH 3/6] mm: vmscan: break up __remove_mapping() From: Dave Hansen Date: Fri, 31 May 2013 11:38:59 -0700 References: <20130531183855.44DDF928@viggo.jf.intel.com> In-Reply-To: <20130531183855.44DDF928@viggo.jf.intel.com> Message-Id: <20130531183859.F179225E@viggo.jf.intel.com> Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, akpm@linux-foundation.org, mgorman@suse.de, tim.c.chen@linux.intel.com, Dave Hansen From: Dave Hansen Our goal here is to eventually reduce the number of repetitive acquire/release operations on mapping->tree_lock. Logically, this patch has two steps: 1. rename __remove_mapping() to lock_remove_mapping() since "__" usually means "this us the unlocked version. 2. Recreate __remove_mapping() to _be_ the lock_remove_mapping() but without the locks. I think this actually makes the code flow around the locking _much_ more straighforward since the locking just becomes: spin_lock_irq(&mapping->tree_lock); ret = __remove_mapping(mapping, page); spin_unlock_irq(&mapping->tree_lock); One non-obvious part of this patch: the freepage = mapping->a_ops->freepage; used to happen under the mapping->tree_lock, but this patch moves it to outside of the lock. All of the other a_ops->freepage users do it outside the lock, and we only assign it when we create inodes, so that makes it safe. Signed-off-by: Dave Hansen Acked-by: Mel Gorman --- linux.git-davehans/mm/vmscan.c | 43 ++++++++++++++++++++++++----------------- 1 file changed, 26 insertions(+), 17 deletions(-) diff -puN mm/vmscan.c~make-remove-mapping-without-locks mm/vmscan.c --- linux.git/mm/vmscan.c~make-remove-mapping-without-locks 2013-05-30 16:07:51.210104924 -0700 +++ linux.git-davehans/mm/vmscan.c 2013-05-30 16:07:51.214105100 -0700 @@ -450,12 +450,12 @@ static pageout_t pageout(struct page *pa * Same as remove_mapping, but if the page is removed from the mapping, it * gets returned with a refcount of 0. */ -static int __remove_mapping(struct address_space *mapping, struct page *page) +static int __remove_mapping(struct address_space *mapping, + struct page *page) { BUG_ON(!PageLocked(page)); BUG_ON(mapping != page_mapping(page)); - spin_lock_irq(&mapping->tree_lock); /* * The non racy check for a busy page. * @@ -482,35 +482,44 @@ static int __remove_mapping(struct addre * and thus under tree_lock, then this ordering is not required. */ if (!page_freeze_refs(page, 2)) - goto cannot_free; + return 0; /* note: atomic_cmpxchg in page_freeze_refs provides the smp_rmb */ if (unlikely(PageDirty(page))) { page_unfreeze_refs(page, 2); - goto cannot_free; + return 0; } if (PageSwapCache(page)) { __delete_from_swap_cache(page); - spin_unlock_irq(&mapping->tree_lock); + } else { + __delete_from_page_cache(page); + } + return 1; +} + +static int lock_remove_mapping(struct address_space *mapping, struct page *page) +{ + int ret; + BUG_ON(!PageLocked(page)); + + spin_lock_irq(&mapping->tree_lock); + ret = __remove_mapping(mapping, page); + spin_unlock_irq(&mapping->tree_lock); + + /* unable to free */ + if (!ret) + return 0; + + if (PageSwapCache(page)) { swapcache_free_page_entry(page); } else { void (*freepage)(struct page *); - freepage = mapping->a_ops->freepage; - - __delete_from_page_cache(page); - spin_unlock_irq(&mapping->tree_lock); mem_cgroup_uncharge_cache_page(page); - if (freepage != NULL) freepage(page); } - - return 1; - -cannot_free: - spin_unlock_irq(&mapping->tree_lock); - return 0; + return ret; } /* @@ -521,7 +530,7 @@ cannot_free: */ int remove_mapping(struct address_space *mapping, struct page *page) { - if (__remove_mapping(mapping, page)) { + if (lock_remove_mapping(mapping, page)) { /* * Unfreezing the refcount with 1 rather than 2 effectively * drops the pagecache ref for us without requiring another _ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx116.postini.com [74.125.245.116]) by kanga.kvack.org (Postfix) with SMTP id 676CD6B0039 for ; Fri, 31 May 2013 14:39:05 -0400 (EDT) Subject: [v4][PATCH 5/6] mm: vmscan: batch shrink_page_list() locking operations From: Dave Hansen Date: Fri, 31 May 2013 11:39:02 -0700 References: <20130531183855.44DDF928@viggo.jf.intel.com> In-Reply-To: <20130531183855.44DDF928@viggo.jf.intel.com> Message-Id: <20130531183902.8E84DFB4@viggo.jf.intel.com> Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, akpm@linux-foundation.org, mgorman@suse.de, tim.c.chen@linux.intel.com, Dave Hansen From: Dave Hansen changes for v2: * remove batch_has_same_mapping() helper. A local varible makes the check cheaper and cleaner * Move batch draining later to where we already know page_mapping(). This probably fixes a truncation race anyway * rename batch_for_mapping_removal -> batch_for_mapping_rm. It caused a line over 80 chars and needed shortening anyway. * Note: we only set 'batch_mapping' when there are pages in the batch_for_mapping_rm list -- We batch like this so that several pages can be freed with a single mapping->tree_lock acquisition/release pair. This reduces the number of atomic operations and ensures that we do not bounce cachelines around. Tim Chen's earlier version of these patches just unconditionally created large batches of pages, even if they did not share a page_mapping(). This is a bit suboptimal for a few reasons: 1. if we can not consolidate lock acquisitions, it makes little sense to batch 2. The page locks are held for long periods of time, so we only want to do this when we are sure that we will gain a substantial throughput improvement because we pay a latency cost by holding the locks. This patch makes sure to only batch when all the pages on 'batch_for_mapping_rm' continue to share a page_mapping(). This only happens in practice in cases where pages in the same file are close to each other on the LRU. That seems like a reasonable assumption. In a 128MB virtual machine doing kernel compiles, the average batch size when calling __remove_mapping_batch() is around 5, so this does seem to do some good in practice. On a 160-cpu system doing kernel compiles, I still saw an average batch length of about 2.8. One promising feature: as the memory pressure went up, the average batches seem to have gotten larger. It has shown some substantial performance benefits on microbenchmarks. Signed-off-by: Dave Hansen --- linux.git-davehans/mm/vmscan.c | 95 +++++++++++++++++++++++++++++++++++++---- 1 file changed, 86 insertions(+), 9 deletions(-) diff -puN mm/vmscan.c~create-remove_mapping_batch mm/vmscan.c --- linux.git/mm/vmscan.c~create-remove_mapping_batch 2013-05-30 16:07:51.711126969 -0700 +++ linux.git-davehans/mm/vmscan.c 2013-05-30 16:07:51.716127189 -0700 @@ -552,6 +552,61 @@ int remove_mapping(struct address_space return 0; } +/* + * pages come in here (via remove_list) locked and leave unlocked + * (on either ret_pages or free_pages) + * + * We do this batching so that we free batches of pages with a + * single mapping->tree_lock acquisition/release. This optimization + * only makes sense when the pages on remove_list all share a + * page_mapping(). If this is violated you will BUG_ON(). + */ +static int __remove_mapping_batch(struct list_head *remove_list, + struct list_head *ret_pages, + struct list_head *free_pages) +{ + int nr_reclaimed = 0; + struct address_space *mapping; + struct page *page; + LIST_HEAD(need_free_mapping); + + if (list_empty(remove_list)) + return 0; + + mapping = page_mapping(lru_to_page(remove_list)); + spin_lock_irq(&mapping->tree_lock); + while (!list_empty(remove_list)) { + page = lru_to_page(remove_list); + BUG_ON(!PageLocked(page)); + BUG_ON(page_mapping(page) != mapping); + list_del(&page->lru); + + if (!__remove_mapping(mapping, page)) { + unlock_page(page); + list_add(&page->lru, ret_pages); + continue; + } + list_add(&page->lru, &need_free_mapping); + } + spin_unlock_irq(&mapping->tree_lock); + + while (!list_empty(&need_free_mapping)) { + page = lru_to_page(&need_free_mapping); + list_move(&page->list, free_pages); + mapping_release_page(mapping, page); + /* + * At this point, we have no other references and there is + * no way to pick any more up (removed from LRU, removed + * from pagecache). Can use non-atomic bitops now (and + * we obviously don't have to worry about waking up a process + * waiting on the page lock, because there are no references. + */ + __clear_page_locked(page); + nr_reclaimed++; + } + return nr_reclaimed; +} + /** * putback_lru_page - put previously isolated page onto appropriate LRU list * @page: page to be put back to appropriate lru list @@ -730,6 +785,8 @@ static unsigned long shrink_page_list(st { LIST_HEAD(ret_pages); LIST_HEAD(free_pages); + LIST_HEAD(batch_for_mapping_rm); + struct address_space *batch_mapping = NULL; int pgactivate = 0; unsigned long nr_unqueued_dirty = 0; unsigned long nr_dirty = 0; @@ -751,6 +808,7 @@ static unsigned long shrink_page_list(st cond_resched(); page = lru_to_page(page_list); + list_del(&page->lru); if (!trylock_page(page)) @@ -853,6 +911,10 @@ static unsigned long shrink_page_list(st /* Case 3 above */ } else { + /* + * batch_for_mapping_rm could be drained here + * if its lock_page()s hurt latency elsewhere. + */ wait_on_page_writeback(page); } } @@ -883,6 +945,18 @@ static unsigned long shrink_page_list(st } mapping = page_mapping(page); + /* + * batching only makes sense when we can save lock + * acquisitions, so drain the previously-batched + * pages when we move over to a different mapping + */ + if (batch_mapping && (batch_mapping != mapping)) { + nr_reclaimed += + __remove_mapping_batch(&batch_for_mapping_rm, + &ret_pages, + &free_pages); + batch_mapping = NULL; + } /* * The page is mapped into the page tables of one or more @@ -998,17 +1072,18 @@ static unsigned long shrink_page_list(st } } - if (!mapping || !__remove_mapping(mapping, page)) + if (!mapping) goto keep_locked; - /* - * At this point, we have no other references and there is - * no way to pick any more up (removed from LRU, removed - * from pagecache). Can use non-atomic bitops now (and - * we obviously don't have to worry about waking up a process - * waiting on the page lock, because there are no references. + * This list contains pages all in the same mapping, but + * in effectively random order and we hold lock_page() + * on *all* of them. This can potentially cause lock + * ordering issues, but the reclaim code only trylocks + * them which saves us. */ - __clear_page_locked(page); + list_add(&page->lru, &batch_for_mapping_rm); + batch_mapping = mapping; + continue; free_it: nr_reclaimed++; @@ -1039,7 +1114,9 @@ keep: list_add(&page->lru, &ret_pages); VM_BUG_ON(PageLRU(page) || PageUnevictable(page)); } - + nr_reclaimed += __remove_mapping_batch(&batch_for_mapping_rm, + &ret_pages, + &free_pages); /* * Tag a zone as congested if all the dirty pages encountered were * backed by a congested BDI. In this case, reclaimers should just _ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx203.postini.com [74.125.245.203]) by kanga.kvack.org (Postfix) with SMTP id 779216B003B for ; Fri, 31 May 2013 14:39:05 -0400 (EDT) Subject: [v4][PATCH 6/6] mm: vmscan: drain batch list during long operations From: Dave Hansen Date: Fri, 31 May 2013 11:39:04 -0700 References: <20130531183855.44DDF928@viggo.jf.intel.com> In-Reply-To: <20130531183855.44DDF928@viggo.jf.intel.com> Message-Id: <20130531183904.76C2B57D@viggo.jf.intel.com> Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, akpm@linux-foundation.org, mgorman@suse.de, tim.c.chen@linux.intel.com, Dave Hansen From: Dave Hansen This was a suggestion from Mel: http://lkml.kernel.org/r/20120914085634.GM11157@csn.ul.ie Any pages we collect on 'batch_for_mapping_removal' will have their lock_page() held during the duration of their stay on the list. If some other user is trying to get at them during this time, they might end up having to wait. This ensures that we drain the batch if we are about to perform a pageout() or congestion_wait(), either of which will take some time. We expect this to help mitigate the worst of the latency increase that the batching could cause. I added some statistics to the __remove_mapping_batch() code to track how large the lists are that we pass in to it. With this patch, the average list length drops about 10% (from about 4.1 to 3.8). The workload here was a make -j4 kernel compile on a VM with 200MB of RAM. I've still got the statistics patch around if anyone is interested. Signed-off-by: Dave Hansen --- linux.git-davehans/mm/vmscan.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff -puN mm/vmscan.c~drain-batch-list-during-long-operations mm/vmscan.c --- linux.git/mm/vmscan.c~drain-batch-list-during-long-operations 2013-05-30 16:07:51.962138013 -0700 +++ linux.git-davehans/mm/vmscan.c 2013-05-30 16:07:51.966138189 -0700 @@ -1003,6 +1003,16 @@ static unsigned long shrink_page_list(st if (!sc->may_writepage) goto keep_locked; + /* + * We hold a bunch of page locks on the batch. + * pageout() can take a while, so drain the + * batch before we perform pageout. + */ + nr_reclaimed += + __remove_mapping_batch(&batch_for_mapping_rm, + &ret_pages, + &free_pages); + /* Page is dirty, try to write it out here */ switch (pageout(page, mapping, sc)) { case PAGE_KEEP: _ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx189.postini.com [74.125.245.189]) by kanga.kvack.org (Postfix) with SMTP id 996676B0002 for ; Mon, 3 Jun 2013 01:40:50 -0400 (EDT) Date: Mon, 3 Jun 2013 14:40:48 +0900 From: Minchan Kim Subject: Re: [v4][PATCH 1/6] mm: swap: defer clearing of page_private() for swap cache pages Message-ID: <20130603054048.GA27858@blaptop> References: <20130531183855.44DDF928@viggo.jf.intel.com> <20130531183856.1D7D75AD@viggo.jf.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130531183856.1D7D75AD@viggo.jf.intel.com> Sender: owner-linux-mm@kvack.org List-ID: To: Dave Hansen Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, mgorman@suse.de, tim.c.chen@linux.intel.com On Fri, May 31, 2013 at 11:38:56AM -0700, Dave Hansen wrote: > > From: Dave Hansen > > This patch defers the destruction of swapcache-specific data in > 'struct page'. This simplifies the code because we do not have > to keep extra copies of the data during the removal of a page > from the swap cache. > > There are only two callers of swapcache_free() which actually > pass in a non-NULL 'struct page'. Both of them (__remove_mapping > and delete_from_swap_cache()) create a temporary on-stack > 'swp_entry_t' and set entry.val to page_private(). > > They need to do this since __delete_from_swap_cache() does > set_page_private(page, 0) and destroys the information. > > However, I'd like to batch a few of these operations on several > pages together in a new version of __remove_mapping(), and I > would prefer not to have to allocate temporary storage for each > page. The code is pretty ugly, and it's a bit silly to create > these on-stack 'swp_entry_t's when it is so easy to just keep the > information around in 'struct page'. > > There should not be any danger in doing this since we are > absolutely on the path of freeing these page. There is no > turning back, and no other rerferences can be obtained after it > comes out of the radix tree. > > Note: This patch is separate from the next one since it > introduces the behavior change. I've seen issues with this patch > by itself in various forms and I think having it separate like > this aids bisection. > > Signed-off-by: Dave Hansen > Acked-by: Mel Gorman Reviewed-by: Minchan Kim Look at below nitpick. > --- > > linux.git-davehans/mm/swap_state.c | 4 ++-- > linux.git-davehans/mm/vmscan.c | 2 ++ > 2 files changed, 4 insertions(+), 2 deletions(-) > > diff -puN mm/swap_state.c~__delete_from_swap_cache-dont-clear-page-private mm/swap_state.c > --- linux.git/mm/swap_state.c~__delete_from_swap_cache-dont-clear-page-private 2013-05-30 16:07:50.630079404 -0700 > +++ linux.git-davehans/mm/swap_state.c 2013-05-30 16:07:50.635079624 -0700 > @@ -148,8 +148,6 @@ void __delete_from_swap_cache(struct pag > entry.val = page_private(page); > address_space = swap_address_space(entry); > radix_tree_delete(&address_space->page_tree, page_private(page)); > - set_page_private(page, 0); > - ClearPageSwapCache(page); > address_space->nrpages--; > __dec_zone_page_state(page, NR_FILE_PAGES); > INC_CACHE_INFO(del_total); > @@ -226,6 +224,8 @@ void delete_from_swap_cache(struct page > spin_unlock_irq(&address_space->tree_lock); > > swapcache_free(entry, page); > + set_page_private(page, 0); > + ClearPageSwapCache(page); > page_cache_release(page); > } > > diff -puN mm/vmscan.c~__delete_from_swap_cache-dont-clear-page-private mm/vmscan.c > --- linux.git/mm/vmscan.c~__delete_from_swap_cache-dont-clear-page-private 2013-05-30 16:07:50.632079492 -0700 > +++ linux.git-davehans/mm/vmscan.c 2013-05-30 16:07:50.637079712 -0700 > @@ -494,6 +494,8 @@ static int __remove_mapping(struct addre > __delete_from_swap_cache(page); > spin_unlock_irq(&mapping->tree_lock); > swapcache_free(swap, page); > + set_page_private(page, 0); > + ClearPageSwapCache(page); It it worth to support non-atomic version of ClearPageSwapCache? -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx206.postini.com [74.125.245.206]) by kanga.kvack.org (Postfix) with SMTP id 5D71B6B0002 for ; Mon, 3 Jun 2013 02:13:22 -0400 (EDT) Date: Mon, 3 Jun 2013 15:13:20 +0900 From: Minchan Kim Subject: Re: [v4][PATCH 2/6] mm: swap: make 'struct page' and swp_entry_t variants of swapcache_free(). Message-ID: <20130603061320.GA2795@blaptop> References: <20130531183855.44DDF928@viggo.jf.intel.com> <20130531183858.3C8C10C7@viggo.jf.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130531183858.3C8C10C7@viggo.jf.intel.com> Sender: owner-linux-mm@kvack.org List-ID: To: Dave Hansen Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, mgorman@suse.de, tim.c.chen@linux.intel.com Hello Dave, On Fri, May 31, 2013 at 11:38:58AM -0700, Dave Hansen wrote: > > From: Dave Hansen > > swapcache_free() takes two arguments: > > void swapcache_free(swp_entry_t entry, struct page *page) > > Most of its callers (5/7) are from error handling paths haven't even > instantiated a page, so they pass page=NULL. Both of the callers > that call in with a 'struct page' create and pass in a temporary > swp_entry_t. > > Now that we are deferring clearing page_private() until after > swapcache_free() has been called, we can just create a variant > that takes a 'struct page' and does the temporary variable in > the helper. > > That leaves all the other callers doing > > swapcache_free(entry, NULL) > > so create another helper for them that makes it clear that they > need only pass in a swp_entry_t. > > One downside here is that delete_from_swap_cache() now does > an extra swap_address_space() call. But, those are pretty > cheap (just some array index arithmetic). I lost from this description. Old behavior delete_from_swap_cache swap_address_space __delete_from_swap_cache swap_address_space New behavior delete_from_swap_cache __delete_from_swap_cache swap_address_space So you removes a swap_address_space, not adding a extra call. Am I missing something? > > Signed-off-by: Dave Hansen Otherwise, looks good to me Reviewed-by: Minchan Kim -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx120.postini.com [74.125.245.120]) by kanga.kvack.org (Postfix) with SMTP id 60DE96B0039 for ; Mon, 3 Jun 2013 04:28:43 -0400 (EDT) Date: Mon, 3 Jun 2013 17:28:41 +0900 From: Minchan Kim Subject: Re: [v4][PATCH 3/6] mm: vmscan: break up __remove_mapping() Message-ID: <20130603082841.GB2795@blaptop> References: <20130531183855.44DDF928@viggo.jf.intel.com> <20130531183859.F179225E@viggo.jf.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130531183859.F179225E@viggo.jf.intel.com> Sender: owner-linux-mm@kvack.org List-ID: To: Dave Hansen Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, mgorman@suse.de, tim.c.chen@linux.intel.com On Fri, May 31, 2013 at 11:38:59AM -0700, Dave Hansen wrote: > > From: Dave Hansen > > Our goal here is to eventually reduce the number of repetitive > acquire/release operations on mapping->tree_lock. > > Logically, this patch has two steps: > 1. rename __remove_mapping() to lock_remove_mapping() since > "__" usually means "this us the unlocked version. > 2. Recreate __remove_mapping() to _be_ the lock_remove_mapping() > but without the locks. > > I think this actually makes the code flow around the locking > _much_ more straighforward since the locking just becomes: > > spin_lock_irq(&mapping->tree_lock); > ret = __remove_mapping(mapping, page); > spin_unlock_irq(&mapping->tree_lock); > > One non-obvious part of this patch: the > > freepage = mapping->a_ops->freepage; > > used to happen under the mapping->tree_lock, but this patch > moves it to outside of the lock. All of the other > a_ops->freepage users do it outside the lock, and we only > assign it when we create inodes, so that makes it safe. > > Signed-off-by: Dave Hansen > Acked-by: Mel Gorman Reviewed-by: Minchan Kin Just a nitpick below. > > --- > > linux.git-davehans/mm/vmscan.c | 43 ++++++++++++++++++++++++----------------- > 1 file changed, 26 insertions(+), 17 deletions(-) > > diff -puN mm/vmscan.c~make-remove-mapping-without-locks mm/vmscan.c > --- linux.git/mm/vmscan.c~make-remove-mapping-without-locks 2013-05-30 16:07:51.210104924 -0700 > +++ linux.git-davehans/mm/vmscan.c 2013-05-30 16:07:51.214105100 -0700 > @@ -450,12 +450,12 @@ static pageout_t pageout(struct page *pa > * Same as remove_mapping, but if the page is removed from the mapping, it > * gets returned with a refcount of 0. > */ > -static int __remove_mapping(struct address_space *mapping, struct page *page) > +static int __remove_mapping(struct address_space *mapping, > + struct page *page) Unnecessary change. -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx172.postini.com [74.125.245.172]) by kanga.kvack.org (Postfix) with SMTP id 2C32F6B003B for ; Mon, 3 Jun 2013 04:35:42 -0400 (EDT) Date: Mon, 3 Jun 2013 17:35:40 +0900 From: Minchan Kim Subject: Re: [v4][PATCH 4/6] mm: vmscan: break out mapping "freepage" code Message-ID: <20130603083540.GC2795@blaptop> References: <20130531183855.44DDF928@viggo.jf.intel.com> <20130531183901.375FE758@viggo.jf.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130531183901.375FE758@viggo.jf.intel.com> Sender: owner-linux-mm@kvack.org List-ID: To: Dave Hansen Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, mgorman@suse.de, tim.c.chen@linux.intel.com On Fri, May 31, 2013 at 11:39:01AM -0700, Dave Hansen wrote: > > From: Dave Hansen > > __remove_mapping() only deals with pages with mappings, meaning > page cache and swap cache. > > At this point, the page has been removed from the mapping's radix > tree, and we need to ensure that any fs-specific (or swap- > specific) resources are freed up. > > We will be using this function from a second location in a > following patch. > > Signed-off-by: Dave Hansen > Acked-by: Mel Gorman Reviewed-by: Minchan Kim Again, a nitpick. Sorry. > --- > > linux.git-davehans/mm/vmscan.c | 28 +++++++++++++++++++--------- > 1 file changed, 19 insertions(+), 9 deletions(-) > > diff -puN mm/vmscan.c~free_mapping_page mm/vmscan.c > --- linux.git/mm/vmscan.c~free_mapping_page 2013-05-30 16:07:51.461115968 -0700 > +++ linux.git-davehans/mm/vmscan.c 2013-05-30 16:07:51.465116144 -0700 > @@ -497,6 +497,24 @@ static int __remove_mapping(struct addre > return 1; > } > > +/* > + * Release any resources the mapping had tied up in > + * the page. It could be a one line. > + */ > +static void mapping_release_page(struct address_space *mapping, > + struct page *page) > +{ > + if (PageSwapCache(page)) { > + swapcache_free_page_entry(page); > + } else { > + void (*freepage)(struct page *); > + freepage = mapping->a_ops->freepage; > + mem_cgroup_uncharge_cache_page(page); > + if (freepage != NULL) > + freepage(page); > + } > +} > + > static int lock_remove_mapping(struct address_space *mapping, struct page *page) > { > int ret; > @@ -510,15 +528,7 @@ static int lock_remove_mapping(struct ad > if (!ret) > return 0; > > - if (PageSwapCache(page)) { > - swapcache_free_page_entry(page); > - } else { > - void (*freepage)(struct page *); > - freepage = mapping->a_ops->freepage; > - mem_cgroup_uncharge_cache_page(page); > - if (freepage != NULL) > - freepage(page); > - } > + mapping_release_page(mapping, page); > return ret; > } > > _ > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx189.postini.com [74.125.245.189]) by kanga.kvack.org (Postfix) with SMTP id 0C0D56B0002 for ; Mon, 3 Jun 2013 10:53:03 -0400 (EDT) Message-ID: <51ACADCD.70904@sr71.net> Date: Mon, 03 Jun 2013 07:53:01 -0700 From: Dave Hansen MIME-Version: 1.0 Subject: Re: [v4][PATCH 1/6] mm: swap: defer clearing of page_private() for swap cache pages References: <20130531183855.44DDF928@viggo.jf.intel.com> <20130531183856.1D7D75AD@viggo.jf.intel.com> <20130603054048.GA27858@blaptop> In-Reply-To: <20130603054048.GA27858@blaptop> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Minchan Kim Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, mgorman@suse.de, tim.c.chen@linux.intel.com On 06/02/2013 10:40 PM, Minchan Kim wrote: >> > diff -puN mm/vmscan.c~__delete_from_swap_cache-dont-clear-page-private mm/vmscan.c >> > --- linux.git/mm/vmscan.c~__delete_from_swap_cache-dont-clear-page-private 2013-05-30 16:07:50.632079492 -0700 >> > +++ linux.git-davehans/mm/vmscan.c 2013-05-30 16:07:50.637079712 -0700 >> > @@ -494,6 +494,8 @@ static int __remove_mapping(struct addre >> > __delete_from_swap_cache(page); >> > spin_unlock_irq(&mapping->tree_lock); >> > swapcache_free(swap, page); >> > + set_page_private(page, 0); >> > + ClearPageSwapCache(page); > It it worth to support non-atomic version of ClearPageSwapCache? Just for this, probably not. It does look like a site where it would be theoretically safe to use non-atomic flag operations since the page is on a one-way trip to the allocator at this point and the __clear_page_locked() now happens _just_ after this code. But, personally, I'm happy to leave it as-is. The atomic vs. non-atomic flags look to me like a micro-optimization that we should use when we _know_ there will be some tangible benefit. Otherwise, they're just something extra for developers to trip over and cause very subtle bugs. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx183.postini.com [74.125.245.183]) by kanga.kvack.org (Postfix) with SMTP id 7BF356B0037 for ; Mon, 3 Jun 2013 11:55:10 -0400 (EDT) Message-ID: <51ACBC5C.9020701@sr71.net> Date: Mon, 03 Jun 2013 08:55:08 -0700 From: Dave Hansen MIME-Version: 1.0 Subject: Re: [v4][PATCH 2/6] mm: swap: make 'struct page' and swp_entry_t variants of swapcache_free(). References: <20130531183855.44DDF928@viggo.jf.intel.com> <20130531183858.3C8C10C7@viggo.jf.intel.com> <20130603061320.GA2795@blaptop> In-Reply-To: <20130603061320.GA2795@blaptop> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Minchan Kim Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, mgorman@suse.de, tim.c.chen@linux.intel.com On 06/02/2013 11:13 PM, Minchan Kim wrote: > I lost from this description. > > Old behavior > > delete_from_swap_cache > swap_address_space > __delete_from_swap_cache > swap_address_space > > > New behavior > > delete_from_swap_cache > __delete_from_swap_cache > swap_address_space > > So you removes a swap_address_space, not adding a extra call. > Am I missing something? I think I got the page->swp_entry_t lookup confused with the page->swap_address_space lookup when I was writing the description. The bit that you missed is that I _added_ a page_mapping() call, which calls swap_address_space() internally: Old behavior: delete_from_swap_cache swap_address_space __delete_from_swap_cache swap_address_space New behavior: delete_from_swap_cache page_mapping swap_address_space __delete_from_swap_cache swap_address_space -- New description (last paragraph changed). Andrew, I'll resend the series since there are a few of these cleanups. From: Dave Hansen swapcache_free() takes two arguments: void swapcache_free(swp_entry_t entry, struct page *page) Most of its callers (5/7) are from error handling paths haven't even instantiated a page, so they pass page=NULL. Both of the callers that call in with a 'struct page' create and pass in a temporary swp_entry_t. Now that we are deferring clearing page_private() until after swapcache_free() has been called, we can just create a variant that takes a 'struct page' and does the temporary variable in the helper. That leaves all the other callers doing swapcache_free(entry, NULL) so create another helper for them that makes it clear that they need only pass in a swp_entry_t. One downside here is that delete_from_swap_cache() now calls swap_address_space() via page_mapping() instead of calling swap_address_space() directly. In doing so, it removes one more case of the swap cache code being special-cased, which is a good thing in my book. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx141.postini.com [74.125.245.141]) by kanga.kvack.org (Postfix) with SMTP id 386056B0037 for ; Tue, 4 Jun 2013 00:41:41 -0400 (EDT) Date: Tue, 4 Jun 2013 13:41:39 +0900 From: Minchan Kim Subject: Re: [v4][PATCH 1/6] mm: swap: defer clearing of page_private() for swap cache pages Message-ID: <20130604044139.GB14719@blaptop> References: <20130531183855.44DDF928@viggo.jf.intel.com> <20130531183856.1D7D75AD@viggo.jf.intel.com> <20130603054048.GA27858@blaptop> <51ACADCD.70904@sr71.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <51ACADCD.70904@sr71.net> Sender: owner-linux-mm@kvack.org List-ID: To: Dave Hansen Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, mgorman@suse.de, tim.c.chen@linux.intel.com On Mon, Jun 03, 2013 at 07:53:01AM -0700, Dave Hansen wrote: > On 06/02/2013 10:40 PM, Minchan Kim wrote: > >> > diff -puN mm/vmscan.c~__delete_from_swap_cache-dont-clear-page-private mm/vmscan.c > >> > --- linux.git/mm/vmscan.c~__delete_from_swap_cache-dont-clear-page-private 2013-05-30 16:07:50.632079492 -0700 > >> > +++ linux.git-davehans/mm/vmscan.c 2013-05-30 16:07:50.637079712 -0700 > >> > @@ -494,6 +494,8 @@ static int __remove_mapping(struct addre > >> > __delete_from_swap_cache(page); > >> > spin_unlock_irq(&mapping->tree_lock); > >> > swapcache_free(swap, page); > >> > + set_page_private(page, 0); > >> > + ClearPageSwapCache(page); > > It it worth to support non-atomic version of ClearPageSwapCache? > > Just for this, probably not. > > It does look like a site where it would be theoretically safe to use > non-atomic flag operations since the page is on a one-way trip to the > allocator at this point and the __clear_page_locked() now happens _just_ > after this code. True. > > But, personally, I'm happy to leave it as-is. The atomic vs. non-atomic > flags look to me like a micro-optimization that we should use when we > _know_ there will be some tangible benefit. Otherwise, they're just > something extra for developers to trip over and cause very subtle bugs. I just asked it because when I read the description of patchset, I felt you were very sensitive to the atomic operation on many CPU system with several sockets. Anyway, if you don't want it, I don't mind it, either. -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757051Ab3EaSjC (ORCPT ); Fri, 31 May 2013 14:39:02 -0400 Received: from mga01.intel.com ([192.55.52.88]:61718 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755235Ab3EaSiz (ORCPT ); Fri, 31 May 2013 14:38:55 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.87,780,1363158000"; d="scan'208";a="343159098" Subject: [v4][PATCH 0/6] mm: vmscan: Batch page reclamation under shink_page_list To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, akpm@linux-foundation.org, mgorman@suse.de, tim.c.chen@linux.intel.com, Dave Hansen From: Dave Hansen Date: Fri, 31 May 2013 11:38:55 -0700 Message-Id: <20130531183855.44DDF928@viggo.jf.intel.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org These are an update of Tim Chen's earlier work: http://lkml.kernel.org/r/1347293960.9977.70.camel@schen9-DESK I broke the patches up a bit more, and tried to incorporate some changes based on some feedback from Mel and Andrew. Changes for v4: * generated on top of linux-next-20130530, plus Mel's vmscan fixes: http://lkml.kernel.org/r/1369659778-6772-2-git-send-email-mgorman@suse.de * added some proper vmscan/swap: prefixes to the subjects Changes for v3: * Add batch draining before congestion_wait() * minor merge conflicts with Mel's vmscan work Changes for v2: * use page_mapping() accessor instead of direct access to page->mapping (could cause crashes when running in to swap cache pages. * group the batch function's introduction patch with its first use * rename a few functions as suggested by Mel * Ran some single-threaded tests to look for regressions caused by the batching. If there is overhead, it is only in the worst-case scenarios, and then only in hundreths of a percent of CPU time. If you're curious how effective the batching is, I have a quick and dirty patch to keep some stats: https://www.sr71.net/~dave/intel/rmb-stats-only.patch -- To do page reclamation in shrink_page_list function, there are two locks taken on a page by page basis. One is the tree lock protecting the radix tree of the page mapping and the other is the mapping->i_mmap_mutex protecting the mapped pages. This set deals only with mapping->tree_lock. Tim managed to get 14% throughput improvement when with a workload putting heavy pressure of page cache by reading many large mmaped files simultaneously on a 8 socket Westmere server. I've been testing these by running large parallel kernel compiles on systems that are under memory pressure. During development, I caught quite a few races on smaller setups, and it's being quite stable that large (160 logical CPU / 1TB) system. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757116Ab3EaSjN (ORCPT ); Fri, 31 May 2013 14:39:13 -0400 Received: from mga03.intel.com ([143.182.124.21]:57561 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756950Ab3EaSi6 (ORCPT ); Fri, 31 May 2013 14:38:58 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.87,780,1363158000"; d="scan'208";a="310670498" Subject: [v4][PATCH 1/6] mm: swap: defer clearing of page_private() for swap cache pages To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, akpm@linux-foundation.org, mgorman@suse.de, tim.c.chen@linux.intel.com, Dave Hansen From: Dave Hansen Date: Fri, 31 May 2013 11:38:56 -0700 References: <20130531183855.44DDF928@viggo.jf.intel.com> In-Reply-To: <20130531183855.44DDF928@viggo.jf.intel.com> Message-Id: <20130531183856.1D7D75AD@viggo.jf.intel.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Dave Hansen This patch defers the destruction of swapcache-specific data in 'struct page'. This simplifies the code because we do not have to keep extra copies of the data during the removal of a page from the swap cache. There are only two callers of swapcache_free() which actually pass in a non-NULL 'struct page'. Both of them (__remove_mapping and delete_from_swap_cache()) create a temporary on-stack 'swp_entry_t' and set entry.val to page_private(). They need to do this since __delete_from_swap_cache() does set_page_private(page, 0) and destroys the information. However, I'd like to batch a few of these operations on several pages together in a new version of __remove_mapping(), and I would prefer not to have to allocate temporary storage for each page. The code is pretty ugly, and it's a bit silly to create these on-stack 'swp_entry_t's when it is so easy to just keep the information around in 'struct page'. There should not be any danger in doing this since we are absolutely on the path of freeing these page. There is no turning back, and no other rerferences can be obtained after it comes out of the radix tree. Note: This patch is separate from the next one since it introduces the behavior change. I've seen issues with this patch by itself in various forms and I think having it separate like this aids bisection. Signed-off-by: Dave Hansen Acked-by: Mel Gorman --- linux.git-davehans/mm/swap_state.c | 4 ++-- linux.git-davehans/mm/vmscan.c | 2 ++ 2 files changed, 4 insertions(+), 2 deletions(-) diff -puN mm/swap_state.c~__delete_from_swap_cache-dont-clear-page-private mm/swap_state.c --- linux.git/mm/swap_state.c~__delete_from_swap_cache-dont-clear-page-private 2013-05-30 16:07:50.630079404 -0700 +++ linux.git-davehans/mm/swap_state.c 2013-05-30 16:07:50.635079624 -0700 @@ -148,8 +148,6 @@ void __delete_from_swap_cache(struct pag entry.val = page_private(page); address_space = swap_address_space(entry); radix_tree_delete(&address_space->page_tree, page_private(page)); - set_page_private(page, 0); - ClearPageSwapCache(page); address_space->nrpages--; __dec_zone_page_state(page, NR_FILE_PAGES); INC_CACHE_INFO(del_total); @@ -226,6 +224,8 @@ void delete_from_swap_cache(struct page spin_unlock_irq(&address_space->tree_lock); swapcache_free(entry, page); + set_page_private(page, 0); + ClearPageSwapCache(page); page_cache_release(page); } diff -puN mm/vmscan.c~__delete_from_swap_cache-dont-clear-page-private mm/vmscan.c --- linux.git/mm/vmscan.c~__delete_from_swap_cache-dont-clear-page-private 2013-05-30 16:07:50.632079492 -0700 +++ linux.git-davehans/mm/vmscan.c 2013-05-30 16:07:50.637079712 -0700 @@ -494,6 +494,8 @@ static int __remove_mapping(struct addre __delete_from_swap_cache(page); spin_unlock_irq(&mapping->tree_lock); swapcache_free(swap, page); + set_page_private(page, 0); + ClearPageSwapCache(page); } else { void (*freepage)(struct page *); _ From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757035Ab3EaSjX (ORCPT ); Fri, 31 May 2013 14:39:23 -0400 Received: from mga01.intel.com ([192.55.52.88]:42707 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757053Ab3EaSjC (ORCPT ); Fri, 31 May 2013 14:39:02 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.87,780,1363158000"; d="scan'208";a="343159136" Subject: [v4][PATCH 4/6] mm: vmscan: break out mapping "freepage" code To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, akpm@linux-foundation.org, mgorman@suse.de, tim.c.chen@linux.intel.com, Dave Hansen From: Dave Hansen Date: Fri, 31 May 2013 11:39:01 -0700 References: <20130531183855.44DDF928@viggo.jf.intel.com> In-Reply-To: <20130531183855.44DDF928@viggo.jf.intel.com> Message-Id: <20130531183901.375FE758@viggo.jf.intel.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Dave Hansen __remove_mapping() only deals with pages with mappings, meaning page cache and swap cache. At this point, the page has been removed from the mapping's radix tree, and we need to ensure that any fs-specific (or swap- specific) resources are freed up. We will be using this function from a second location in a following patch. Signed-off-by: Dave Hansen Acked-by: Mel Gorman --- linux.git-davehans/mm/vmscan.c | 28 +++++++++++++++++++--------- 1 file changed, 19 insertions(+), 9 deletions(-) diff -puN mm/vmscan.c~free_mapping_page mm/vmscan.c --- linux.git/mm/vmscan.c~free_mapping_page 2013-05-30 16:07:51.461115968 -0700 +++ linux.git-davehans/mm/vmscan.c 2013-05-30 16:07:51.465116144 -0700 @@ -497,6 +497,24 @@ static int __remove_mapping(struct addre return 1; } +/* + * Release any resources the mapping had tied up in + * the page. + */ +static void mapping_release_page(struct address_space *mapping, + struct page *page) +{ + if (PageSwapCache(page)) { + swapcache_free_page_entry(page); + } else { + void (*freepage)(struct page *); + freepage = mapping->a_ops->freepage; + mem_cgroup_uncharge_cache_page(page); + if (freepage != NULL) + freepage(page); + } +} + static int lock_remove_mapping(struct address_space *mapping, struct page *page) { int ret; @@ -510,15 +528,7 @@ static int lock_remove_mapping(struct ad if (!ret) return 0; - if (PageSwapCache(page)) { - swapcache_free_page_entry(page); - } else { - void (*freepage)(struct page *); - freepage = mapping->a_ops->freepage; - mem_cgroup_uncharge_cache_page(page); - if (freepage != NULL) - freepage(page); - } + mapping_release_page(mapping, page); return ret; } _ From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757280Ab3EaSjq (ORCPT ); Fri, 31 May 2013 14:39:46 -0400 Received: from mga01.intel.com ([192.55.52.88]:34113 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757067Ab3EaSjD (ORCPT ); Fri, 31 May 2013 14:39:03 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.87,780,1363158000"; d="scan'208";a="343159150" Subject: [v4][PATCH 5/6] mm: vmscan: batch shrink_page_list() locking operations To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, akpm@linux-foundation.org, mgorman@suse.de, tim.c.chen@linux.intel.com, Dave Hansen From: Dave Hansen Date: Fri, 31 May 2013 11:39:02 -0700 References: <20130531183855.44DDF928@viggo.jf.intel.com> In-Reply-To: <20130531183855.44DDF928@viggo.jf.intel.com> Message-Id: <20130531183902.8E84DFB4@viggo.jf.intel.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Dave Hansen changes for v2: * remove batch_has_same_mapping() helper. A local varible makes the check cheaper and cleaner * Move batch draining later to where we already know page_mapping(). This probably fixes a truncation race anyway * rename batch_for_mapping_removal -> batch_for_mapping_rm. It caused a line over 80 chars and needed shortening anyway. * Note: we only set 'batch_mapping' when there are pages in the batch_for_mapping_rm list -- We batch like this so that several pages can be freed with a single mapping->tree_lock acquisition/release pair. This reduces the number of atomic operations and ensures that we do not bounce cachelines around. Tim Chen's earlier version of these patches just unconditionally created large batches of pages, even if they did not share a page_mapping(). This is a bit suboptimal for a few reasons: 1. if we can not consolidate lock acquisitions, it makes little sense to batch 2. The page locks are held for long periods of time, so we only want to do this when we are sure that we will gain a substantial throughput improvement because we pay a latency cost by holding the locks. This patch makes sure to only batch when all the pages on 'batch_for_mapping_rm' continue to share a page_mapping(). This only happens in practice in cases where pages in the same file are close to each other on the LRU. That seems like a reasonable assumption. In a 128MB virtual machine doing kernel compiles, the average batch size when calling __remove_mapping_batch() is around 5, so this does seem to do some good in practice. On a 160-cpu system doing kernel compiles, I still saw an average batch length of about 2.8. One promising feature: as the memory pressure went up, the average batches seem to have gotten larger. It has shown some substantial performance benefits on microbenchmarks. Signed-off-by: Dave Hansen --- linux.git-davehans/mm/vmscan.c | 95 +++++++++++++++++++++++++++++++++++++---- 1 file changed, 86 insertions(+), 9 deletions(-) diff -puN mm/vmscan.c~create-remove_mapping_batch mm/vmscan.c --- linux.git/mm/vmscan.c~create-remove_mapping_batch 2013-05-30 16:07:51.711126969 -0700 +++ linux.git-davehans/mm/vmscan.c 2013-05-30 16:07:51.716127189 -0700 @@ -552,6 +552,61 @@ int remove_mapping(struct address_space return 0; } +/* + * pages come in here (via remove_list) locked and leave unlocked + * (on either ret_pages or free_pages) + * + * We do this batching so that we free batches of pages with a + * single mapping->tree_lock acquisition/release. This optimization + * only makes sense when the pages on remove_list all share a + * page_mapping(). If this is violated you will BUG_ON(). + */ +static int __remove_mapping_batch(struct list_head *remove_list, + struct list_head *ret_pages, + struct list_head *free_pages) +{ + int nr_reclaimed = 0; + struct address_space *mapping; + struct page *page; + LIST_HEAD(need_free_mapping); + + if (list_empty(remove_list)) + return 0; + + mapping = page_mapping(lru_to_page(remove_list)); + spin_lock_irq(&mapping->tree_lock); + while (!list_empty(remove_list)) { + page = lru_to_page(remove_list); + BUG_ON(!PageLocked(page)); + BUG_ON(page_mapping(page) != mapping); + list_del(&page->lru); + + if (!__remove_mapping(mapping, page)) { + unlock_page(page); + list_add(&page->lru, ret_pages); + continue; + } + list_add(&page->lru, &need_free_mapping); + } + spin_unlock_irq(&mapping->tree_lock); + + while (!list_empty(&need_free_mapping)) { + page = lru_to_page(&need_free_mapping); + list_move(&page->list, free_pages); + mapping_release_page(mapping, page); + /* + * At this point, we have no other references and there is + * no way to pick any more up (removed from LRU, removed + * from pagecache). Can use non-atomic bitops now (and + * we obviously don't have to worry about waking up a process + * waiting on the page lock, because there are no references. + */ + __clear_page_locked(page); + nr_reclaimed++; + } + return nr_reclaimed; +} + /** * putback_lru_page - put previously isolated page onto appropriate LRU list * @page: page to be put back to appropriate lru list @@ -730,6 +785,8 @@ static unsigned long shrink_page_list(st { LIST_HEAD(ret_pages); LIST_HEAD(free_pages); + LIST_HEAD(batch_for_mapping_rm); + struct address_space *batch_mapping = NULL; int pgactivate = 0; unsigned long nr_unqueued_dirty = 0; unsigned long nr_dirty = 0; @@ -751,6 +808,7 @@ static unsigned long shrink_page_list(st cond_resched(); page = lru_to_page(page_list); + list_del(&page->lru); if (!trylock_page(page)) @@ -853,6 +911,10 @@ static unsigned long shrink_page_list(st /* Case 3 above */ } else { + /* + * batch_for_mapping_rm could be drained here + * if its lock_page()s hurt latency elsewhere. + */ wait_on_page_writeback(page); } } @@ -883,6 +945,18 @@ static unsigned long shrink_page_list(st } mapping = page_mapping(page); + /* + * batching only makes sense when we can save lock + * acquisitions, so drain the previously-batched + * pages when we move over to a different mapping + */ + if (batch_mapping && (batch_mapping != mapping)) { + nr_reclaimed += + __remove_mapping_batch(&batch_for_mapping_rm, + &ret_pages, + &free_pages); + batch_mapping = NULL; + } /* * The page is mapped into the page tables of one or more @@ -998,17 +1072,18 @@ static unsigned long shrink_page_list(st } } - if (!mapping || !__remove_mapping(mapping, page)) + if (!mapping) goto keep_locked; - /* - * At this point, we have no other references and there is - * no way to pick any more up (removed from LRU, removed - * from pagecache). Can use non-atomic bitops now (and - * we obviously don't have to worry about waking up a process - * waiting on the page lock, because there are no references. + * This list contains pages all in the same mapping, but + * in effectively random order and we hold lock_page() + * on *all* of them. This can potentially cause lock + * ordering issues, but the reclaim code only trylocks + * them which saves us. */ - __clear_page_locked(page); + list_add(&page->lru, &batch_for_mapping_rm); + batch_mapping = mapping; + continue; free_it: nr_reclaimed++; @@ -1039,7 +1114,9 @@ keep: list_add(&page->lru, &ret_pages); VM_BUG_ON(PageLRU(page) || PageUnevictable(page)); } - + nr_reclaimed += __remove_mapping_batch(&batch_for_mapping_rm, + &ret_pages, + &free_pages); /* * Tag a zone as congested if all the dirty pages encountered were * backed by a congested BDI. In this case, reclaimers should just _ From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756907Ab3EaSj7 (ORCPT ); Fri, 31 May 2013 14:39:59 -0400 Received: from mga03.intel.com ([143.182.124.21]:57561 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757071Ab3EaSjE (ORCPT ); Fri, 31 May 2013 14:39:04 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.87,780,1363158000"; d="scan'208";a="249209176" Subject: [v4][PATCH 3/6] mm: vmscan: break up __remove_mapping() To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, akpm@linux-foundation.org, mgorman@suse.de, tim.c.chen@linux.intel.com, Dave Hansen From: Dave Hansen Date: Fri, 31 May 2013 11:38:59 -0700 References: <20130531183855.44DDF928@viggo.jf.intel.com> In-Reply-To: <20130531183855.44DDF928@viggo.jf.intel.com> Message-Id: <20130531183859.F179225E@viggo.jf.intel.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Dave Hansen Our goal here is to eventually reduce the number of repetitive acquire/release operations on mapping->tree_lock. Logically, this patch has two steps: 1. rename __remove_mapping() to lock_remove_mapping() since "__" usually means "this us the unlocked version. 2. Recreate __remove_mapping() to _be_ the lock_remove_mapping() but without the locks. I think this actually makes the code flow around the locking _much_ more straighforward since the locking just becomes: spin_lock_irq(&mapping->tree_lock); ret = __remove_mapping(mapping, page); spin_unlock_irq(&mapping->tree_lock); One non-obvious part of this patch: the freepage = mapping->a_ops->freepage; used to happen under the mapping->tree_lock, but this patch moves it to outside of the lock. All of the other a_ops->freepage users do it outside the lock, and we only assign it when we create inodes, so that makes it safe. Signed-off-by: Dave Hansen Acked-by: Mel Gorman --- linux.git-davehans/mm/vmscan.c | 43 ++++++++++++++++++++++++----------------- 1 file changed, 26 insertions(+), 17 deletions(-) diff -puN mm/vmscan.c~make-remove-mapping-without-locks mm/vmscan.c --- linux.git/mm/vmscan.c~make-remove-mapping-without-locks 2013-05-30 16:07:51.210104924 -0700 +++ linux.git-davehans/mm/vmscan.c 2013-05-30 16:07:51.214105100 -0700 @@ -450,12 +450,12 @@ static pageout_t pageout(struct page *pa * Same as remove_mapping, but if the page is removed from the mapping, it * gets returned with a refcount of 0. */ -static int __remove_mapping(struct address_space *mapping, struct page *page) +static int __remove_mapping(struct address_space *mapping, + struct page *page) { BUG_ON(!PageLocked(page)); BUG_ON(mapping != page_mapping(page)); - spin_lock_irq(&mapping->tree_lock); /* * The non racy check for a busy page. * @@ -482,35 +482,44 @@ static int __remove_mapping(struct addre * and thus under tree_lock, then this ordering is not required. */ if (!page_freeze_refs(page, 2)) - goto cannot_free; + return 0; /* note: atomic_cmpxchg in page_freeze_refs provides the smp_rmb */ if (unlikely(PageDirty(page))) { page_unfreeze_refs(page, 2); - goto cannot_free; + return 0; } if (PageSwapCache(page)) { __delete_from_swap_cache(page); - spin_unlock_irq(&mapping->tree_lock); + } else { + __delete_from_page_cache(page); + } + return 1; +} + +static int lock_remove_mapping(struct address_space *mapping, struct page *page) +{ + int ret; + BUG_ON(!PageLocked(page)); + + spin_lock_irq(&mapping->tree_lock); + ret = __remove_mapping(mapping, page); + spin_unlock_irq(&mapping->tree_lock); + + /* unable to free */ + if (!ret) + return 0; + + if (PageSwapCache(page)) { swapcache_free_page_entry(page); } else { void (*freepage)(struct page *); - freepage = mapping->a_ops->freepage; - - __delete_from_page_cache(page); - spin_unlock_irq(&mapping->tree_lock); mem_cgroup_uncharge_cache_page(page); - if (freepage != NULL) freepage(page); } - - return 1; - -cannot_free: - spin_unlock_irq(&mapping->tree_lock); - return 0; + return ret; } /* @@ -521,7 +530,7 @@ cannot_free: */ int remove_mapping(struct address_space *mapping, struct page *page) { - if (__remove_mapping(mapping, page)) { + if (lock_remove_mapping(mapping, page)) { /* * Unfreezing the refcount with 1 rather than 2 effectively * drops the pagecache ref for us without requiring another _ From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757099Ab3EaSjf (ORCPT ); Fri, 31 May 2013 14:39:35 -0400 Received: from mga02.intel.com ([134.134.136.20]:16598 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756964Ab3EaSi7 (ORCPT ); Fri, 31 May 2013 14:38:59 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.87,780,1363158000"; d="scan'208";a="346421527" Subject: [v4][PATCH 2/6] mm: swap: make 'struct page' and swp_entry_t variants of swapcache_free(). To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, akpm@linux-foundation.org, mgorman@suse.de, tim.c.chen@linux.intel.com, Dave Hansen From: Dave Hansen Date: Fri, 31 May 2013 11:38:58 -0700 References: <20130531183855.44DDF928@viggo.jf.intel.com> In-Reply-To: <20130531183855.44DDF928@viggo.jf.intel.com> Message-Id: <20130531183858.3C8C10C7@viggo.jf.intel.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Dave Hansen swapcache_free() takes two arguments: void swapcache_free(swp_entry_t entry, struct page *page) Most of its callers (5/7) are from error handling paths haven't even instantiated a page, so they pass page=NULL. Both of the callers that call in with a 'struct page' create and pass in a temporary swp_entry_t. Now that we are deferring clearing page_private() until after swapcache_free() has been called, we can just create a variant that takes a 'struct page' and does the temporary variable in the helper. That leaves all the other callers doing swapcache_free(entry, NULL) so create another helper for them that makes it clear that they need only pass in a swp_entry_t. One downside here is that delete_from_swap_cache() now does an extra swap_address_space() call. But, those are pretty cheap (just some array index arithmetic). Signed-off-by: Dave Hansen --- linux.git-davehans/drivers/staging/zcache/zcache-main.c | 2 +- linux.git-davehans/include/linux/swap.h | 3 ++- linux.git-davehans/mm/shmem.c | 2 +- linux.git-davehans/mm/swap_state.c | 15 +++++---------- linux.git-davehans/mm/swapfile.c | 15 ++++++++++++++- linux.git-davehans/mm/vmscan.c | 5 +---- 6 files changed, 24 insertions(+), 18 deletions(-) diff -puN drivers/staging/zcache/zcache-main.c~make-page-and-swp_entry_t-variants drivers/staging/zcache/zcache-main.c --- linux.git/drivers/staging/zcache/zcache-main.c~make-page-and-swp_entry_t-variants 2013-05-30 16:07:50.898091196 -0700 +++ linux.git-davehans/drivers/staging/zcache/zcache-main.c 2013-05-30 16:07:50.910091724 -0700 @@ -961,7 +961,7 @@ static int zcache_get_swap_cache_page(in * add_to_swap_cache() doesn't return -EEXIST, so we can safely * clear SWAP_HAS_CACHE flag. */ - swapcache_free(entry, NULL); + swapcache_free_entry(entry); /* FIXME: is it possible to get here without err==-ENOMEM? * If not, we can dispense with the do loop, use goto retry */ } while (err != -ENOMEM); diff -puN include/linux/swap.h~make-page-and-swp_entry_t-variants include/linux/swap.h --- linux.git/include/linux/swap.h~make-page-and-swp_entry_t-variants 2013-05-30 16:07:50.900091284 -0700 +++ linux.git-davehans/include/linux/swap.h 2013-05-30 16:07:50.911091768 -0700 @@ -385,7 +385,8 @@ extern void swap_shmem_alloc(swp_entry_t extern int swap_duplicate(swp_entry_t); extern int swapcache_prepare(swp_entry_t); extern void swap_free(swp_entry_t); -extern void swapcache_free(swp_entry_t, struct page *page); +extern void swapcache_free_entry(swp_entry_t entry); +extern void swapcache_free_page_entry(struct page *page); extern int free_swap_and_cache(swp_entry_t); extern int swap_type_of(dev_t, sector_t, struct block_device **); extern unsigned int count_swap_pages(int, int); diff -puN mm/shmem.c~make-page-and-swp_entry_t-variants mm/shmem.c --- linux.git/mm/shmem.c~make-page-and-swp_entry_t-variants 2013-05-30 16:07:50.902091372 -0700 +++ linux.git-davehans/mm/shmem.c 2013-05-30 16:07:50.912091812 -0700 @@ -872,7 +872,7 @@ static int shmem_writepage(struct page * } mutex_unlock(&shmem_swaplist_mutex); - swapcache_free(swap, NULL); + swapcache_free_entry(swap); redirty: set_page_dirty(page); if (wbc->for_reclaim) diff -puN mm/swapfile.c~make-page-and-swp_entry_t-variants mm/swapfile.c --- linux.git/mm/swapfile.c~make-page-and-swp_entry_t-variants 2013-05-30 16:07:50.904091460 -0700 +++ linux.git-davehans/mm/swapfile.c 2013-05-30 16:07:50.913091856 -0700 @@ -637,7 +637,7 @@ void swap_free(swp_entry_t entry) /* * Called after dropping swapcache to decrease refcnt to swap entries. */ -void swapcache_free(swp_entry_t entry, struct page *page) +static void __swapcache_free(swp_entry_t entry, struct page *page) { struct swap_info_struct *p; unsigned char count; @@ -651,6 +651,19 @@ void swapcache_free(swp_entry_t entry, s } } +void swapcache_free_entry(swp_entry_t entry) +{ + __swapcache_free(entry, NULL); +} + +void swapcache_free_page_entry(struct page *page) +{ + swp_entry_t entry = { .val = page_private(page) }; + __swapcache_free(entry, page); + set_page_private(page, 0); + ClearPageSwapCache(page); +} + /* * How many references to page are currently swapped out? * This does not give an exact answer when swap count is continued, diff -puN mm/swap_state.c~make-page-and-swp_entry_t-variants mm/swap_state.c --- linux.git/mm/swap_state.c~make-page-and-swp_entry_t-variants 2013-05-30 16:07:50.905091504 -0700 +++ linux.git-davehans/mm/swap_state.c 2013-05-30 16:07:50.914091900 -0700 @@ -174,7 +174,7 @@ int add_to_swap(struct page *page, struc if (unlikely(PageTransHuge(page))) if (unlikely(split_huge_page_to_list(page, list))) { - swapcache_free(entry, NULL); + swapcache_free_entry(entry); return 0; } @@ -200,7 +200,7 @@ int add_to_swap(struct page *page, struc * add_to_swap_cache() doesn't return -EEXIST, so we can safely * clear SWAP_HAS_CACHE flag. */ - swapcache_free(entry, NULL); + swapcache_free_entry(entry); return 0; } } @@ -213,19 +213,14 @@ int add_to_swap(struct page *page, struc */ void delete_from_swap_cache(struct page *page) { - swp_entry_t entry; struct address_space *address_space; - entry.val = page_private(page); - - address_space = swap_address_space(entry); + address_space = page_mapping(page); spin_lock_irq(&address_space->tree_lock); __delete_from_swap_cache(page); spin_unlock_irq(&address_space->tree_lock); - swapcache_free(entry, page); - set_page_private(page, 0); - ClearPageSwapCache(page); + swapcache_free_page_entry(page); page_cache_release(page); } @@ -370,7 +365,7 @@ struct page *read_swap_cache_async(swp_e * add_to_swap_cache() doesn't return -EEXIST, so we can safely * clear SWAP_HAS_CACHE flag. */ - swapcache_free(entry, NULL); + swapcache_free_entry(entry); } while (err != -ENOMEM); if (new_page) diff -puN mm/vmscan.c~make-page-and-swp_entry_t-variants mm/vmscan.c --- linux.git/mm/vmscan.c~make-page-and-swp_entry_t-variants 2013-05-30 16:07:50.907091592 -0700 +++ linux.git-davehans/mm/vmscan.c 2013-05-30 16:07:50.915091944 -0700 @@ -490,12 +490,9 @@ static int __remove_mapping(struct addre } if (PageSwapCache(page)) { - swp_entry_t swap = { .val = page_private(page) }; __delete_from_swap_cache(page); spin_unlock_irq(&mapping->tree_lock); - swapcache_free(swap, page); - set_page_private(page, 0); - ClearPageSwapCache(page); + swapcache_free_page_entry(page); } else { void (*freepage)(struct page *); _ From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757299Ab3EaSkp (ORCPT ); Fri, 31 May 2013 14:40:45 -0400 Received: from mga03.intel.com ([143.182.124.21]:20199 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757097Ab3EaSjF (ORCPT ); Fri, 31 May 2013 14:39:05 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.87,780,1363158000"; d="scan'208";a="249209191" Subject: [v4][PATCH 6/6] mm: vmscan: drain batch list during long operations To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, akpm@linux-foundation.org, mgorman@suse.de, tim.c.chen@linux.intel.com, Dave Hansen From: Dave Hansen Date: Fri, 31 May 2013 11:39:04 -0700 References: <20130531183855.44DDF928@viggo.jf.intel.com> In-Reply-To: <20130531183855.44DDF928@viggo.jf.intel.com> Message-Id: <20130531183904.76C2B57D@viggo.jf.intel.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Dave Hansen This was a suggestion from Mel: http://lkml.kernel.org/r/20120914085634.GM11157@csn.ul.ie Any pages we collect on 'batch_for_mapping_removal' will have their lock_page() held during the duration of their stay on the list. If some other user is trying to get at them during this time, they might end up having to wait. This ensures that we drain the batch if we are about to perform a pageout() or congestion_wait(), either of which will take some time. We expect this to help mitigate the worst of the latency increase that the batching could cause. I added some statistics to the __remove_mapping_batch() code to track how large the lists are that we pass in to it. With this patch, the average list length drops about 10% (from about 4.1 to 3.8). The workload here was a make -j4 kernel compile on a VM with 200MB of RAM. I've still got the statistics patch around if anyone is interested. Signed-off-by: Dave Hansen --- linux.git-davehans/mm/vmscan.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff -puN mm/vmscan.c~drain-batch-list-during-long-operations mm/vmscan.c --- linux.git/mm/vmscan.c~drain-batch-list-during-long-operations 2013-05-30 16:07:51.962138013 -0700 +++ linux.git-davehans/mm/vmscan.c 2013-05-30 16:07:51.966138189 -0700 @@ -1003,6 +1003,16 @@ static unsigned long shrink_page_list(st if (!sc->may_writepage) goto keep_locked; + /* + * We hold a bunch of page locks on the batch. + * pageout() can take a while, so drain the + * batch before we perform pageout. + */ + nr_reclaimed += + __remove_mapping_batch(&batch_for_mapping_rm, + &ret_pages, + &free_pages); + /* Page is dirty, try to write it out here */ switch (pageout(page, mapping, sc)) { case PAGE_KEEP: _ From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751978Ab3FCFlC (ORCPT ); Mon, 3 Jun 2013 01:41:02 -0400 Received: from LGEMRELSE7Q.lge.com ([156.147.1.151]:57544 "EHLO LGEMRELSE7Q.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751779Ab3FCFku (ORCPT ); Mon, 3 Jun 2013 01:40:50 -0400 X-AuditID: 9c930197-b7be3ae000000eab-02-51ac2c609cfa Date: Mon, 3 Jun 2013 14:40:48 +0900 From: Minchan Kim To: Dave Hansen Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, mgorman@suse.de, tim.c.chen@linux.intel.com Subject: Re: [v4][PATCH 1/6] mm: swap: defer clearing of page_private() for swap cache pages Message-ID: <20130603054048.GA27858@blaptop> References: <20130531183855.44DDF928@viggo.jf.intel.com> <20130531183856.1D7D75AD@viggo.jf.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130531183856.1D7D75AD@viggo.jf.intel.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-Brightmail-Tracker: AAAAAA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, May 31, 2013 at 11:38:56AM -0700, Dave Hansen wrote: > > From: Dave Hansen > > This patch defers the destruction of swapcache-specific data in > 'struct page'. This simplifies the code because we do not have > to keep extra copies of the data during the removal of a page > from the swap cache. > > There are only two callers of swapcache_free() which actually > pass in a non-NULL 'struct page'. Both of them (__remove_mapping > and delete_from_swap_cache()) create a temporary on-stack > 'swp_entry_t' and set entry.val to page_private(). > > They need to do this since __delete_from_swap_cache() does > set_page_private(page, 0) and destroys the information. > > However, I'd like to batch a few of these operations on several > pages together in a new version of __remove_mapping(), and I > would prefer not to have to allocate temporary storage for each > page. The code is pretty ugly, and it's a bit silly to create > these on-stack 'swp_entry_t's when it is so easy to just keep the > information around in 'struct page'. > > There should not be any danger in doing this since we are > absolutely on the path of freeing these page. There is no > turning back, and no other rerferences can be obtained after it > comes out of the radix tree. > > Note: This patch is separate from the next one since it > introduces the behavior change. I've seen issues with this patch > by itself in various forms and I think having it separate like > this aids bisection. > > Signed-off-by: Dave Hansen > Acked-by: Mel Gorman Reviewed-by: Minchan Kim Look at below nitpick. > --- > > linux.git-davehans/mm/swap_state.c | 4 ++-- > linux.git-davehans/mm/vmscan.c | 2 ++ > 2 files changed, 4 insertions(+), 2 deletions(-) > > diff -puN mm/swap_state.c~__delete_from_swap_cache-dont-clear-page-private mm/swap_state.c > --- linux.git/mm/swap_state.c~__delete_from_swap_cache-dont-clear-page-private 2013-05-30 16:07:50.630079404 -0700 > +++ linux.git-davehans/mm/swap_state.c 2013-05-30 16:07:50.635079624 -0700 > @@ -148,8 +148,6 @@ void __delete_from_swap_cache(struct pag > entry.val = page_private(page); > address_space = swap_address_space(entry); > radix_tree_delete(&address_space->page_tree, page_private(page)); > - set_page_private(page, 0); > - ClearPageSwapCache(page); > address_space->nrpages--; > __dec_zone_page_state(page, NR_FILE_PAGES); > INC_CACHE_INFO(del_total); > @@ -226,6 +224,8 @@ void delete_from_swap_cache(struct page > spin_unlock_irq(&address_space->tree_lock); > > swapcache_free(entry, page); > + set_page_private(page, 0); > + ClearPageSwapCache(page); > page_cache_release(page); > } > > diff -puN mm/vmscan.c~__delete_from_swap_cache-dont-clear-page-private mm/vmscan.c > --- linux.git/mm/vmscan.c~__delete_from_swap_cache-dont-clear-page-private 2013-05-30 16:07:50.632079492 -0700 > +++ linux.git-davehans/mm/vmscan.c 2013-05-30 16:07:50.637079712 -0700 > @@ -494,6 +494,8 @@ static int __remove_mapping(struct addre > __delete_from_swap_cache(page); > spin_unlock_irq(&mapping->tree_lock); > swapcache_free(swap, page); > + set_page_private(page, 0); > + ClearPageSwapCache(page); It it worth to support non-atomic version of ClearPageSwapCache? -- Kind regards, Minchan Kim From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752219Ab3FCGN3 (ORCPT ); Mon, 3 Jun 2013 02:13:29 -0400 Received: from lgeamrelo02.lge.com ([156.147.1.126]:54842 "EHLO LGEAMRELO02.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751100Ab3FCGNW (ORCPT ); Mon, 3 Jun 2013 02:13:22 -0400 X-AuditID: 9c93017e-b7c0cae0000006f6-7b-51ac34007ba6 Date: Mon, 3 Jun 2013 15:13:20 +0900 From: Minchan Kim To: Dave Hansen Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, mgorman@suse.de, tim.c.chen@linux.intel.com Subject: Re: [v4][PATCH 2/6] mm: swap: make 'struct page' and swp_entry_t variants of swapcache_free(). Message-ID: <20130603061320.GA2795@blaptop> References: <20130531183855.44DDF928@viggo.jf.intel.com> <20130531183858.3C8C10C7@viggo.jf.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130531183858.3C8C10C7@viggo.jf.intel.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-Brightmail-Tracker: AAAAAA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello Dave, On Fri, May 31, 2013 at 11:38:58AM -0700, Dave Hansen wrote: > > From: Dave Hansen > > swapcache_free() takes two arguments: > > void swapcache_free(swp_entry_t entry, struct page *page) > > Most of its callers (5/7) are from error handling paths haven't even > instantiated a page, so they pass page=NULL. Both of the callers > that call in with a 'struct page' create and pass in a temporary > swp_entry_t. > > Now that we are deferring clearing page_private() until after > swapcache_free() has been called, we can just create a variant > that takes a 'struct page' and does the temporary variable in > the helper. > > That leaves all the other callers doing > > swapcache_free(entry, NULL) > > so create another helper for them that makes it clear that they > need only pass in a swp_entry_t. > > One downside here is that delete_from_swap_cache() now does > an extra swap_address_space() call. But, those are pretty > cheap (just some array index arithmetic). I lost from this description. Old behavior delete_from_swap_cache swap_address_space __delete_from_swap_cache swap_address_space New behavior delete_from_swap_cache __delete_from_swap_cache swap_address_space So you removes a swap_address_space, not adding a extra call. Am I missing something? > > Signed-off-by: Dave Hansen Otherwise, looks good to me Reviewed-by: Minchan Kim -- Kind regards, Minchan Kim From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752020Ab3FCI2q (ORCPT ); Mon, 3 Jun 2013 04:28:46 -0400 Received: from lgeamrelo01.lge.com ([156.147.1.125]:61959 "EHLO LGEAMRELO01.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751110Ab3FCI2n (ORCPT ); Mon, 3 Jun 2013 04:28:43 -0400 X-AuditID: 9c93017d-b7c04ae00000207d-ca-51ac53b9e11f Date: Mon, 3 Jun 2013 17:28:41 +0900 From: Minchan Kim To: Dave Hansen Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, mgorman@suse.de, tim.c.chen@linux.intel.com Subject: Re: [v4][PATCH 3/6] mm: vmscan: break up __remove_mapping() Message-ID: <20130603082841.GB2795@blaptop> References: <20130531183855.44DDF928@viggo.jf.intel.com> <20130531183859.F179225E@viggo.jf.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130531183859.F179225E@viggo.jf.intel.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-Brightmail-Tracker: AAAAAA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, May 31, 2013 at 11:38:59AM -0700, Dave Hansen wrote: > > From: Dave Hansen > > Our goal here is to eventually reduce the number of repetitive > acquire/release operations on mapping->tree_lock. > > Logically, this patch has two steps: > 1. rename __remove_mapping() to lock_remove_mapping() since > "__" usually means "this us the unlocked version. > 2. Recreate __remove_mapping() to _be_ the lock_remove_mapping() > but without the locks. > > I think this actually makes the code flow around the locking > _much_ more straighforward since the locking just becomes: > > spin_lock_irq(&mapping->tree_lock); > ret = __remove_mapping(mapping, page); > spin_unlock_irq(&mapping->tree_lock); > > One non-obvious part of this patch: the > > freepage = mapping->a_ops->freepage; > > used to happen under the mapping->tree_lock, but this patch > moves it to outside of the lock. All of the other > a_ops->freepage users do it outside the lock, and we only > assign it when we create inodes, so that makes it safe. > > Signed-off-by: Dave Hansen > Acked-by: Mel Gorman Reviewed-by: Minchan Kin Just a nitpick below. > > --- > > linux.git-davehans/mm/vmscan.c | 43 ++++++++++++++++++++++++----------------- > 1 file changed, 26 insertions(+), 17 deletions(-) > > diff -puN mm/vmscan.c~make-remove-mapping-without-locks mm/vmscan.c > --- linux.git/mm/vmscan.c~make-remove-mapping-without-locks 2013-05-30 16:07:51.210104924 -0700 > +++ linux.git-davehans/mm/vmscan.c 2013-05-30 16:07:51.214105100 -0700 > @@ -450,12 +450,12 @@ static pageout_t pageout(struct page *pa > * Same as remove_mapping, but if the page is removed from the mapping, it > * gets returned with a refcount of 0. > */ > -static int __remove_mapping(struct address_space *mapping, struct page *page) > +static int __remove_mapping(struct address_space *mapping, > + struct page *page) Unnecessary change. -- Kind regards, Minchan Kim From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752681Ab3FCIfo (ORCPT ); Mon, 3 Jun 2013 04:35:44 -0400 Received: from lgeamrelo01.lge.com ([156.147.1.125]:47248 "EHLO LGEAMRELO01.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751312Ab3FCIfm (ORCPT ); Mon, 3 Jun 2013 04:35:42 -0400 X-AuditID: 9c93017d-b7c04ae00000207d-33-51ac555c720d Date: Mon, 3 Jun 2013 17:35:40 +0900 From: Minchan Kim To: Dave Hansen Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, mgorman@suse.de, tim.c.chen@linux.intel.com Subject: Re: [v4][PATCH 4/6] mm: vmscan: break out mapping "freepage" code Message-ID: <20130603083540.GC2795@blaptop> References: <20130531183855.44DDF928@viggo.jf.intel.com> <20130531183901.375FE758@viggo.jf.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130531183901.375FE758@viggo.jf.intel.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-Brightmail-Tracker: AAAAAA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, May 31, 2013 at 11:39:01AM -0700, Dave Hansen wrote: > > From: Dave Hansen > > __remove_mapping() only deals with pages with mappings, meaning > page cache and swap cache. > > At this point, the page has been removed from the mapping's radix > tree, and we need to ensure that any fs-specific (or swap- > specific) resources are freed up. > > We will be using this function from a second location in a > following patch. > > Signed-off-by: Dave Hansen > Acked-by: Mel Gorman Reviewed-by: Minchan Kim Again, a nitpick. Sorry. > --- > > linux.git-davehans/mm/vmscan.c | 28 +++++++++++++++++++--------- > 1 file changed, 19 insertions(+), 9 deletions(-) > > diff -puN mm/vmscan.c~free_mapping_page mm/vmscan.c > --- linux.git/mm/vmscan.c~free_mapping_page 2013-05-30 16:07:51.461115968 -0700 > +++ linux.git-davehans/mm/vmscan.c 2013-05-30 16:07:51.465116144 -0700 > @@ -497,6 +497,24 @@ static int __remove_mapping(struct addre > return 1; > } > > +/* > + * Release any resources the mapping had tied up in > + * the page. It could be a one line. > + */ > +static void mapping_release_page(struct address_space *mapping, > + struct page *page) > +{ > + if (PageSwapCache(page)) { > + swapcache_free_page_entry(page); > + } else { > + void (*freepage)(struct page *); > + freepage = mapping->a_ops->freepage; > + mem_cgroup_uncharge_cache_page(page); > + if (freepage != NULL) > + freepage(page); > + } > +} > + > static int lock_remove_mapping(struct address_space *mapping, struct page *page) > { > int ret; > @@ -510,15 +528,7 @@ static int lock_remove_mapping(struct ad > if (!ret) > return 0; > > - if (PageSwapCache(page)) { > - swapcache_free_page_entry(page); > - } else { > - void (*freepage)(struct page *); > - freepage = mapping->a_ops->freepage; > - mem_cgroup_uncharge_cache_page(page); > - if (freepage != NULL) > - freepage(page); > - } > + mapping_release_page(mapping, page); > return ret; > } > > _ > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org -- Kind regards, Minchan Kim From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758274Ab3FCOxF (ORCPT ); Mon, 3 Jun 2013 10:53:05 -0400 Received: from www.sr71.net ([198.145.64.142]:57588 "EHLO blackbird.sr71.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753768Ab3FCOxE (ORCPT ); Mon, 3 Jun 2013 10:53:04 -0400 Message-ID: <51ACADCD.70904@sr71.net> Date: Mon, 03 Jun 2013 07:53:01 -0700 From: Dave Hansen User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130510 Thunderbird/17.0.6 MIME-Version: 1.0 To: Minchan Kim CC: linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, mgorman@suse.de, tim.c.chen@linux.intel.com Subject: Re: [v4][PATCH 1/6] mm: swap: defer clearing of page_private() for swap cache pages References: <20130531183855.44DDF928@viggo.jf.intel.com> <20130531183856.1D7D75AD@viggo.jf.intel.com> <20130603054048.GA27858@blaptop> In-Reply-To: <20130603054048.GA27858@blaptop> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/02/2013 10:40 PM, Minchan Kim wrote: >> > diff -puN mm/vmscan.c~__delete_from_swap_cache-dont-clear-page-private mm/vmscan.c >> > --- linux.git/mm/vmscan.c~__delete_from_swap_cache-dont-clear-page-private 2013-05-30 16:07:50.632079492 -0700 >> > +++ linux.git-davehans/mm/vmscan.c 2013-05-30 16:07:50.637079712 -0700 >> > @@ -494,6 +494,8 @@ static int __remove_mapping(struct addre >> > __delete_from_swap_cache(page); >> > spin_unlock_irq(&mapping->tree_lock); >> > swapcache_free(swap, page); >> > + set_page_private(page, 0); >> > + ClearPageSwapCache(page); > It it worth to support non-atomic version of ClearPageSwapCache? Just for this, probably not. It does look like a site where it would be theoretically safe to use non-atomic flag operations since the page is on a one-way trip to the allocator at this point and the __clear_page_locked() now happens _just_ after this code. But, personally, I'm happy to leave it as-is. The atomic vs. non-atomic flags look to me like a micro-optimization that we should use when we _know_ there will be some tangible benefit. Otherwise, they're just something extra for developers to trip over and cause very subtle bugs. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759040Ab3FCPzM (ORCPT ); Mon, 3 Jun 2013 11:55:12 -0400 Received: from www.sr71.net ([198.145.64.142]:58313 "EHLO blackbird.sr71.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758553Ab3FCPzK (ORCPT ); Mon, 3 Jun 2013 11:55:10 -0400 Message-ID: <51ACBC5C.9020701@sr71.net> Date: Mon, 03 Jun 2013 08:55:08 -0700 From: Dave Hansen User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130510 Thunderbird/17.0.6 MIME-Version: 1.0 To: Minchan Kim CC: linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, mgorman@suse.de, tim.c.chen@linux.intel.com Subject: Re: [v4][PATCH 2/6] mm: swap: make 'struct page' and swp_entry_t variants of swapcache_free(). References: <20130531183855.44DDF928@viggo.jf.intel.com> <20130531183858.3C8C10C7@viggo.jf.intel.com> <20130603061320.GA2795@blaptop> In-Reply-To: <20130603061320.GA2795@blaptop> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/02/2013 11:13 PM, Minchan Kim wrote: > I lost from this description. > > Old behavior > > delete_from_swap_cache > swap_address_space > __delete_from_swap_cache > swap_address_space > > > New behavior > > delete_from_swap_cache > __delete_from_swap_cache > swap_address_space > > So you removes a swap_address_space, not adding a extra call. > Am I missing something? I think I got the page->swp_entry_t lookup confused with the page->swap_address_space lookup when I was writing the description. The bit that you missed is that I _added_ a page_mapping() call, which calls swap_address_space() internally: Old behavior: delete_from_swap_cache swap_address_space __delete_from_swap_cache swap_address_space New behavior: delete_from_swap_cache page_mapping swap_address_space __delete_from_swap_cache swap_address_space -- New description (last paragraph changed). Andrew, I'll resend the series since there are a few of these cleanups. From: Dave Hansen swapcache_free() takes two arguments: void swapcache_free(swp_entry_t entry, struct page *page) Most of its callers (5/7) are from error handling paths haven't even instantiated a page, so they pass page=NULL. Both of the callers that call in with a 'struct page' create and pass in a temporary swp_entry_t. Now that we are deferring clearing page_private() until after swapcache_free() has been called, we can just create a variant that takes a 'struct page' and does the temporary variable in the helper. That leaves all the other callers doing swapcache_free(entry, NULL) so create another helper for them that makes it clear that they need only pass in a swp_entry_t. One downside here is that delete_from_swap_cache() now calls swap_address_space() via page_mapping() instead of calling swap_address_space() directly. In doing so, it removes one more case of the swap cache code being special-cased, which is a good thing in my book. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753701Ab3FDEln (ORCPT ); Tue, 4 Jun 2013 00:41:43 -0400 Received: from LGEMRELSE6Q.lge.com ([156.147.1.121]:50406 "EHLO LGEMRELSE6Q.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751044Ab3FDElm (ORCPT ); Tue, 4 Jun 2013 00:41:42 -0400 X-AuditID: 9c930179-b7c07ae000000e33-db-51ad7003b85e Date: Tue, 4 Jun 2013 13:41:39 +0900 From: Minchan Kim To: Dave Hansen Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, mgorman@suse.de, tim.c.chen@linux.intel.com Subject: Re: [v4][PATCH 1/6] mm: swap: defer clearing of page_private() for swap cache pages Message-ID: <20130604044139.GB14719@blaptop> References: <20130531183855.44DDF928@viggo.jf.intel.com> <20130531183856.1D7D75AD@viggo.jf.intel.com> <20130603054048.GA27858@blaptop> <51ACADCD.70904@sr71.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <51ACADCD.70904@sr71.net> User-Agent: Mutt/1.5.21 (2010-09-15) X-Brightmail-Tracker: AAAAAA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jun 03, 2013 at 07:53:01AM -0700, Dave Hansen wrote: > On 06/02/2013 10:40 PM, Minchan Kim wrote: > >> > diff -puN mm/vmscan.c~__delete_from_swap_cache-dont-clear-page-private mm/vmscan.c > >> > --- linux.git/mm/vmscan.c~__delete_from_swap_cache-dont-clear-page-private 2013-05-30 16:07:50.632079492 -0700 > >> > +++ linux.git-davehans/mm/vmscan.c 2013-05-30 16:07:50.637079712 -0700 > >> > @@ -494,6 +494,8 @@ static int __remove_mapping(struct addre > >> > __delete_from_swap_cache(page); > >> > spin_unlock_irq(&mapping->tree_lock); > >> > swapcache_free(swap, page); > >> > + set_page_private(page, 0); > >> > + ClearPageSwapCache(page); > > It it worth to support non-atomic version of ClearPageSwapCache? > > Just for this, probably not. > > It does look like a site where it would be theoretically safe to use > non-atomic flag operations since the page is on a one-way trip to the > allocator at this point and the __clear_page_locked() now happens _just_ > after this code. True. > > But, personally, I'm happy to leave it as-is. The atomic vs. non-atomic > flags look to me like a micro-optimization that we should use when we > _know_ there will be some tangible benefit. Otherwise, they're just > something extra for developers to trip over and cause very subtle bugs. I just asked it because when I read the description of patchset, I felt you were very sensitive to the atomic operation on many CPU system with several sockets. Anyway, if you don't want it, I don't mind it, either. -- Kind regards, Minchan Kim