From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f198.google.com (mail-pf0-f198.google.com [209.85.192.198]) by kanga.kvack.org (Postfix) with ESMTP id D51076B0005 for ; Mon, 26 Feb 2018 08:52:49 -0500 (EST) Received: by mail-pf0-f198.google.com with SMTP id z5so8441428pfe.16 for ; Mon, 26 Feb 2018 05:52:49 -0800 (PST) Received: from mga07.intel.com (mga07.intel.com. [134.134.136.100]) by mx.google.com with ESMTPS id i127si604972pgc.100.2018.02.26.05.52.48 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2018 05:52:48 -0800 (PST) From: Aaron Lu Subject: [PATCH v3 0/3] mm: improve zone->lock scalability Date: Mon, 26 Feb 2018 21:53:43 +0800 Message-Id: <20180226135346.7208-1-aaron.lu@intel.com> Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Andrew Morton , Huang Ying , Dave Hansen , Kemi Wang , Tim Chen , Andi Kleen , Michal Hocko , Vlastimil Babka , Mel Gorman , Matthew Wilcox Patch 1/3 is a small cleanup suggested by Matthew Wilcox which doesn't affect scalability or performance; Patch 2/3 moves some code in free_pcppages_bulk() outside of zone->lock and has Mel Gorman's ack; Patch 3/3 uses prefetch in free_pcppages_bulk() outside of zone->lock to speedup page merging under zone->lock but Mel Gorman has some concerns. For details, please see their changelogs. Changes from v2: Patch 1/3 is newly added; Patch 2/3 is patch 1/2 in v2 and doesn't have any change except resolving conflicts due to the newly added patch 1/3; Patch 3/3 is patch 2/2 in v2 and only has some changelog updates on concerns part. v1 and v2 was here: https://lkml.org/lkml/2018/1/23/879 Aaron Lu (3): mm/free_pcppages_bulk: update pcp->count inside mm/free_pcppages_bulk: do not hold lock when picking pages to free mm/free_pcppages_bulk: prefetch buddy while not holding lock mm/page_alloc.c | 56 +++++++++++++++++++++++++++++++++++--------------------- 1 file changed, 35 insertions(+), 21 deletions(-) -- 2.14.3 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pl0-f71.google.com (mail-pl0-f71.google.com [209.85.160.71]) by kanga.kvack.org (Postfix) with ESMTP id 7B76A6B0006 for ; Mon, 26 Feb 2018 08:52:51 -0500 (EST) Received: by mail-pl0-f71.google.com with SMTP id 4so7669578plb.1 for ; Mon, 26 Feb 2018 05:52:51 -0800 (PST) Received: from mga07.intel.com (mga07.intel.com. [134.134.136.100]) by mx.google.com with ESMTPS id i127si604972pgc.100.2018.02.26.05.52.50 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2018 05:52:50 -0800 (PST) From: Aaron Lu Subject: [PATCH v3 1/3] mm/free_pcppages_bulk: update pcp->count inside Date: Mon, 26 Feb 2018 21:53:44 +0800 Message-Id: <20180226135346.7208-2-aaron.lu@intel.com> In-Reply-To: <20180226135346.7208-1-aaron.lu@intel.com> References: <20180226135346.7208-1-aaron.lu@intel.com> Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Andrew Morton , Huang Ying , Dave Hansen , Kemi Wang , Tim Chen , Andi Kleen , Michal Hocko , Vlastimil Babka , Mel Gorman , Matthew Wilcox Matthew Wilcox found that all callers of free_pcppages_bulk() currently update pcp->count immediately after so it's natural to do it inside free_pcppages_bulk(). No functionality or performance change is expected from this patch. Suggested-by: Matthew Wilcox Signed-off-by: Aaron Lu --- mm/page_alloc.c | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index cb416723538f..3154859cccd6 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1117,6 +1117,7 @@ static void free_pcppages_bulk(struct zone *zone, int count, int batch_free = 0; bool isolated_pageblocks; + pcp->count -= count; spin_lock(&zone->lock); isolated_pageblocks = has_isolate_pageblock(zone); @@ -2416,10 +2417,8 @@ void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp) local_irq_save(flags); batch = READ_ONCE(pcp->batch); to_drain = min(pcp->count, batch); - if (to_drain > 0) { + if (to_drain > 0) free_pcppages_bulk(zone, to_drain, pcp); - pcp->count -= to_drain; - } local_irq_restore(flags); } #endif @@ -2441,10 +2440,8 @@ static void drain_pages_zone(unsigned int cpu, struct zone *zone) pset = per_cpu_ptr(zone->pageset, cpu); pcp = &pset->pcp; - if (pcp->count) { + if (pcp->count) free_pcppages_bulk(zone, pcp->count, pcp); - pcp->count = 0; - } local_irq_restore(flags); } @@ -2668,7 +2665,6 @@ static void free_unref_page_commit(struct page *page, unsigned long pfn) if (pcp->count >= pcp->high) { unsigned long batch = READ_ONCE(pcp->batch); free_pcppages_bulk(zone, batch, pcp); - pcp->count -= batch; } } -- 2.14.3 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f197.google.com (mail-pf0-f197.google.com [209.85.192.197]) by kanga.kvack.org (Postfix) with ESMTP id 6D9AC6B0007 for ; Mon, 26 Feb 2018 08:52:54 -0500 (EST) Received: by mail-pf0-f197.google.com with SMTP id v186so8417300pfb.8 for ; Mon, 26 Feb 2018 05:52:54 -0800 (PST) Received: from mga05.intel.com (mga05.intel.com. [192.55.52.43]) by mx.google.com with ESMTPS id bf2-v6si6873537plb.257.2018.02.26.05.52.52 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2018 05:52:53 -0800 (PST) From: Aaron Lu Subject: [PATCH v3 2/3] mm/free_pcppages_bulk: do not hold lock when picking pages to free Date: Mon, 26 Feb 2018 21:53:45 +0800 Message-Id: <20180226135346.7208-3-aaron.lu@intel.com> In-Reply-To: <20180226135346.7208-1-aaron.lu@intel.com> References: <20180226135346.7208-1-aaron.lu@intel.com> Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Andrew Morton , Huang Ying , Dave Hansen , Kemi Wang , Tim Chen , Andi Kleen , Michal Hocko , Vlastimil Babka , Mel Gorman , Matthew Wilcox When freeing a batch of pages from Per-CPU-Pages(PCP) back to buddy, the zone->lock is held and then pages are chosen from PCP's migratetype list. While there is actually no need to do this 'choose part' under lock since it's PCP pages, the only CPU that can touch them is us and irq is also disabled. Moving this part outside could reduce lock held time and improve performance. Test with will-it-scale/page_fault1 full load: kernel Broadwell(2S) Skylake(2S) Broadwell(4S) Skylake(4S) v4.16-rc2+ 9034215 7971818 13667135 15677465 this patch 9536374 +5.6% 8314710 +4.3% 14070408 +3.0% 16675866 +6.4% What the test does is: starts $nr_cpu processes and each will repeatedly do the following for 5 minutes: 1 mmap 128M anonymouse space; 2 write access to that space; 3 munmap. The score is the aggregated iteration. https://github.com/antonblanchard/will-it-scale/blob/master/tests/page_fault1.c Acked-by: Mel Gorman Signed-off-by: Aaron Lu --- mm/page_alloc.c | 33 ++++++++++++++++++--------------- 1 file changed, 18 insertions(+), 15 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 3154859cccd6..35576da0a6c9 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1116,13 +1116,11 @@ static void free_pcppages_bulk(struct zone *zone, int count, int migratetype = 0; int batch_free = 0; bool isolated_pageblocks; + struct page *page, *tmp; + LIST_HEAD(head); pcp->count -= count; - spin_lock(&zone->lock); - isolated_pageblocks = has_isolate_pageblock(zone); - while (count) { - struct page *page; struct list_head *list; /* @@ -1144,26 +1142,31 @@ static void free_pcppages_bulk(struct zone *zone, int count, batch_free = count; do { - int mt; /* migratetype of the to-be-freed page */ - page = list_last_entry(list, struct page, lru); /* must delete as __free_one_page list manipulates */ list_del(&page->lru); - mt = get_pcppage_migratetype(page); - /* MIGRATE_ISOLATE page should not go to pcplists */ - VM_BUG_ON_PAGE(is_migrate_isolate(mt), page); - /* Pageblock could have been isolated meanwhile */ - if (unlikely(isolated_pageblocks)) - mt = get_pageblock_migratetype(page); - if (bulkfree_pcp_prepare(page)) continue; - __free_one_page(page, page_to_pfn(page), zone, 0, mt); - trace_mm_page_pcpu_drain(page, 0, mt); + list_add_tail(&page->lru, &head); } while (--count && --batch_free && !list_empty(list)); } + + spin_lock(&zone->lock); + isolated_pageblocks = has_isolate_pageblock(zone); + + list_for_each_entry_safe(page, tmp, &head, lru) { + int mt = get_pcppage_migratetype(page); + /* MIGRATE_ISOLATE page should not go to pcplists */ + VM_BUG_ON_PAGE(is_migrate_isolate(mt), page); + /* Pageblock could have been isolated meanwhile */ + if (unlikely(isolated_pageblocks)) + mt = get_pageblock_migratetype(page); + + __free_one_page(page, page_to_pfn(page), zone, 0, mt); + trace_mm_page_pcpu_drain(page, 0, mt); + } spin_unlock(&zone->lock); } -- 2.14.3 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg0-f70.google.com (mail-pg0-f70.google.com [74.125.83.70]) by kanga.kvack.org (Postfix) with ESMTP id 6F31D6B0008 for ; Mon, 26 Feb 2018 08:52:56 -0500 (EST) Received: by mail-pg0-f70.google.com with SMTP id l1so5626031pga.1 for ; Mon, 26 Feb 2018 05:52:56 -0800 (PST) Received: from mga18.intel.com (mga18.intel.com. [134.134.136.126]) by mx.google.com with ESMTPS id n4si5394527pgn.235.2018.02.26.05.52.55 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2018 05:52:55 -0800 (PST) From: Aaron Lu Subject: [PATCH v3 3/3] mm/free_pcppages_bulk: prefetch buddy while not holding lock Date: Mon, 26 Feb 2018 21:53:46 +0800 Message-Id: <20180226135346.7208-4-aaron.lu@intel.com> In-Reply-To: <20180226135346.7208-1-aaron.lu@intel.com> References: <20180226135346.7208-1-aaron.lu@intel.com> Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Andrew Morton , Huang Ying , Dave Hansen , Kemi Wang , Tim Chen , Andi Kleen , Michal Hocko , Vlastimil Babka , Mel Gorman , Matthew Wilcox When a page is freed back to the global pool, its buddy will be checked to see if it's possible to do a merge. This requires accessing buddy's page structure and that access could take a long time if it's cache cold. This patch adds a prefetch to the to-be-freed page's buddy outside of zone->lock in hope of accessing buddy's page structure later under zone->lock will be faster. Since we *always* do buddy merging and check an order-0 page's buddy to try to merge it when it goes into the main allocator, the cacheline will always come in, i.e. the prefetched data will never be unused. In the meantime, there are two concerns: 1 the prefetch could potentially evict existing cachelines, especially for L1D cache since it is not huge; 2 there is some additional instruction overhead, namely calculating buddy pfn twice. For 1, it's hard to say, this microbenchmark though shows good result but the actual benefit of this patch will be workload/CPU dependant; For 2, since the calculation is a XOR on two local variables, it's expected in many cases that cycles spent will be offset by reduced memory latency later. This is especially true for NUMA machines where multiple CPUs are contending on zone->lock and the most time consuming part under zone->lock is the wait of 'struct page' cacheline of the to-be-freed pages and their buddies. Test with will-it-scale/page_fault1 full load: kernel Broadwell(2S) Skylake(2S) Broadwell(4S) Skylake(4S) v4.16-rc2+ 9034215 7971818 13667135 15677465 patch2/3 9536374 +5.6% 8314710 +4.3% 14070408 +3.0% 16675866 +6.4% this patch 10338868 +8.4% 8544477 +2.8% 14839808 +5.5% 17155464 +2.9% Note: this patch's performance improvement percent is against patch2/3. [changelog stole from Dave Hansen and Mel Gorman's comments] https://lkml.org/lkml/2018/1/24/551 Suggested-by: Ying Huang Signed-off-by: Aaron Lu --- mm/page_alloc.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 35576da0a6c9..dc3b89894f2c 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1142,6 +1142,9 @@ static void free_pcppages_bulk(struct zone *zone, int count, batch_free = count; do { + unsigned long pfn, buddy_pfn; + struct page *buddy; + page = list_last_entry(list, struct page, lru); /* must delete as __free_one_page list manipulates */ list_del(&page->lru); @@ -1150,6 +1153,18 @@ static void free_pcppages_bulk(struct zone *zone, int count, continue; list_add_tail(&page->lru, &head); + + /* + * We are going to put the page back to the global + * pool, prefetch its buddy to speed up later access + * under zone->lock. It is believed the overhead of + * calculating buddy_pfn here can be offset by reduced + * memory latency later. + */ + pfn = page_to_pfn(page); + buddy_pfn = __find_buddy_pfn(pfn, 0); + buddy = page + (buddy_pfn - pfn); + prefetch(buddy); } while (--count && --batch_free && !list_empty(list)); } -- 2.14.3 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io0-f200.google.com (mail-io0-f200.google.com [209.85.223.200]) by kanga.kvack.org (Postfix) with ESMTP id D149D6B0003 for ; Mon, 26 Feb 2018 16:48:17 -0500 (EST) Received: by mail-io0-f200.google.com with SMTP id g42so7654384ioi.3 for ; Mon, 26 Feb 2018 13:48:17 -0800 (PST) Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id c19sor5574855ioa.190.2018.02.26.13.48.16 for (Google Transport Security); Mon, 26 Feb 2018 13:48:16 -0800 (PST) Date: Mon, 26 Feb 2018 13:48:14 -0800 (PST) From: David Rientjes Subject: Re: [PATCH v3 1/3] mm/free_pcppages_bulk: update pcp->count inside In-Reply-To: <20180226135346.7208-2-aaron.lu@intel.com> Message-ID: References: <20180226135346.7208-1-aaron.lu@intel.com> <20180226135346.7208-2-aaron.lu@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: Aaron Lu Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton , Huang Ying , Dave Hansen , Kemi Wang , Tim Chen , Andi Kleen , Michal Hocko , Vlastimil Babka , Mel Gorman , Matthew Wilcox On Mon, 26 Feb 2018, Aaron Lu wrote: > Matthew Wilcox found that all callers of free_pcppages_bulk() currently > update pcp->count immediately after so it's natural to do it inside > free_pcppages_bulk(). > > No functionality or performance change is expected from this patch. > > Suggested-by: Matthew Wilcox > Signed-off-by: Aaron Lu > --- > mm/page_alloc.c | 10 +++------- > 1 file changed, 3 insertions(+), 7 deletions(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index cb416723538f..3154859cccd6 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1117,6 +1117,7 @@ static void free_pcppages_bulk(struct zone *zone, int count, > int batch_free = 0; > bool isolated_pageblocks; > > + pcp->count -= count; > spin_lock(&zone->lock); > isolated_pageblocks = has_isolate_pageblock(zone); > Why modify pcp->count before the pages have actually been freed? I doubt that it matters too much, but at least /proc/zoneinfo uses zone->lock. I think it should be done after the lock is dropped. Otherwise, looks good. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io0-f199.google.com (mail-io0-f199.google.com [209.85.223.199]) by kanga.kvack.org (Postfix) with ESMTP id D4CBA6B0003 for ; Mon, 26 Feb 2018 16:53:13 -0500 (EST) Received: by mail-io0-f199.google.com with SMTP id n95so15941586ioo.2 for ; Mon, 26 Feb 2018 13:53:13 -0800 (PST) Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id a31sor5476046itj.119.2018.02.26.13.53.12 for (Google Transport Security); Mon, 26 Feb 2018 13:53:13 -0800 (PST) Date: Mon, 26 Feb 2018 13:53:10 -0800 (PST) From: David Rientjes Subject: Re: [PATCH v3 2/3] mm/free_pcppages_bulk: do not hold lock when picking pages to free In-Reply-To: <20180226135346.7208-3-aaron.lu@intel.com> Message-ID: References: <20180226135346.7208-1-aaron.lu@intel.com> <20180226135346.7208-3-aaron.lu@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: Aaron Lu Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton , Huang Ying , Dave Hansen , Kemi Wang , Tim Chen , Andi Kleen , Michal Hocko , Vlastimil Babka , Mel Gorman , Matthew Wilcox On Mon, 26 Feb 2018, Aaron Lu wrote: > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 3154859cccd6..35576da0a6c9 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1116,13 +1116,11 @@ static void free_pcppages_bulk(struct zone *zone, int count, > int migratetype = 0; > int batch_free = 0; > bool isolated_pageblocks; > + struct page *page, *tmp; > + LIST_HEAD(head); > > pcp->count -= count; > - spin_lock(&zone->lock); > - isolated_pageblocks = has_isolate_pageblock(zone); > - > while (count) { > - struct page *page; > struct list_head *list; > > /* > @@ -1144,26 +1142,31 @@ static void free_pcppages_bulk(struct zone *zone, int count, > batch_free = count; > > do { > - int mt; /* migratetype of the to-be-freed page */ > - > page = list_last_entry(list, struct page, lru); > /* must delete as __free_one_page list manipulates */ Looks good in general, but I'm not sure how I reconcile this comment with the new implementation that later links page->lru again. > list_del(&page->lru); > > - mt = get_pcppage_migratetype(page); > - /* MIGRATE_ISOLATE page should not go to pcplists */ > - VM_BUG_ON_PAGE(is_migrate_isolate(mt), page); > - /* Pageblock could have been isolated meanwhile */ > - if (unlikely(isolated_pageblocks)) > - mt = get_pageblock_migratetype(page); > - > if (bulkfree_pcp_prepare(page)) > continue; > > - __free_one_page(page, page_to_pfn(page), zone, 0, mt); > - trace_mm_page_pcpu_drain(page, 0, mt); > + list_add_tail(&page->lru, &head); > } while (--count && --batch_free && !list_empty(list)); > } > + > + spin_lock(&zone->lock); > + isolated_pageblocks = has_isolate_pageblock(zone); > + > + list_for_each_entry_safe(page, tmp, &head, lru) { > + int mt = get_pcppage_migratetype(page); > + /* MIGRATE_ISOLATE page should not go to pcplists */ > + VM_BUG_ON_PAGE(is_migrate_isolate(mt), page); > + /* Pageblock could have been isolated meanwhile */ > + if (unlikely(isolated_pageblocks)) > + mt = get_pageblock_migratetype(page); > + > + __free_one_page(page, page_to_pfn(page), zone, 0, mt); > + trace_mm_page_pcpu_drain(page, 0, mt); > + } > spin_unlock(&zone->lock); > } > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pl0-f69.google.com (mail-pl0-f69.google.com [209.85.160.69]) by kanga.kvack.org (Postfix) with ESMTP id A5A076B0006 for ; Mon, 26 Feb 2018 20:55:17 -0500 (EST) Received: by mail-pl0-f69.google.com with SMTP id l5-v6so3114035pli.8 for ; Mon, 26 Feb 2018 17:55:17 -0800 (PST) Received: from mga11.intel.com (mga11.intel.com. [192.55.52.93]) by mx.google.com with ESMTPS id t18si7718122pfg.246.2018.02.26.17.55.15 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2018 17:55:16 -0800 (PST) Date: Tue, 27 Feb 2018 09:56:13 +0800 From: Aaron Lu Subject: Re: [PATCH v3 1/3] mm/free_pcppages_bulk: update pcp->count inside Message-ID: <20180227015613.GA9141@intel.com> References: <20180226135346.7208-1-aaron.lu@intel.com> <20180226135346.7208-2-aaron.lu@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: David Rientjes Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton , Huang Ying , Dave Hansen , Kemi Wang , Tim Chen , Andi Kleen , Michal Hocko , Vlastimil Babka , Mel Gorman , Matthew Wilcox On Mon, Feb 26, 2018 at 01:48:14PM -0800, David Rientjes wrote: > On Mon, 26 Feb 2018, Aaron Lu wrote: > > > Matthew Wilcox found that all callers of free_pcppages_bulk() currently > > update pcp->count immediately after so it's natural to do it inside > > free_pcppages_bulk(). > > > > No functionality or performance change is expected from this patch. > > > > Suggested-by: Matthew Wilcox > > Signed-off-by: Aaron Lu > > --- > > mm/page_alloc.c | 10 +++------- > > 1 file changed, 3 insertions(+), 7 deletions(-) > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index cb416723538f..3154859cccd6 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -1117,6 +1117,7 @@ static void free_pcppages_bulk(struct zone *zone, int count, > > int batch_free = 0; > > bool isolated_pageblocks; > > > > + pcp->count -= count; > > spin_lock(&zone->lock); > > isolated_pageblocks = has_isolate_pageblock(zone); > > > > Why modify pcp->count before the pages have actually been freed? When count is still count and not zero after pages have actually been freed :-) > > I doubt that it matters too much, but at least /proc/zoneinfo uses > zone->lock. I think it should be done after the lock is dropped. Agree that it looks a bit weird to do it beforehand and I just want to avoid adding one more local variable here. pcp->count is not protected by zone->lock though so even we do it after dropping the lock, it could still happen that zoneinfo shows a wrong value of pcp->count while it should be zero(this isn't a problem since zoneinfo doesn't need to be precise). Anyway, I'll follow your suggestion here to avoid confusion. > Otherwise, looks good. Thanks for taking a look at this. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pl0-f72.google.com (mail-pl0-f72.google.com [209.85.160.72]) by kanga.kvack.org (Postfix) with ESMTP id B79656B0009 for ; Mon, 26 Feb 2018 21:00:01 -0500 (EST) Received: by mail-pl0-f72.google.com with SMTP id 6so8479659plf.6 for ; Mon, 26 Feb 2018 18:00:01 -0800 (PST) Received: from mga06.intel.com (mga06.intel.com. [134.134.136.31]) by mx.google.com with ESMTPS id c3-v6si2298918pld.466.2018.02.26.18.00.00 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2018 18:00:00 -0800 (PST) Date: Tue, 27 Feb 2018 10:00:58 +0800 From: Aaron Lu Subject: Re: [PATCH v3 2/3] mm/free_pcppages_bulk: do not hold lock when picking pages to free Message-ID: <20180227020058.GB9141@intel.com> References: <20180226135346.7208-1-aaron.lu@intel.com> <20180226135346.7208-3-aaron.lu@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: David Rientjes Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton , Huang Ying , Dave Hansen , Kemi Wang , Tim Chen , Andi Kleen , Michal Hocko , Vlastimil Babka , Mel Gorman , Matthew Wilcox On Mon, Feb 26, 2018 at 01:53:10PM -0800, David Rientjes wrote: > On Mon, 26 Feb 2018, Aaron Lu wrote: > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index 3154859cccd6..35576da0a6c9 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -1116,13 +1116,11 @@ static void free_pcppages_bulk(struct zone *zone, int count, > > int migratetype = 0; > > int batch_free = 0; > > bool isolated_pageblocks; > > + struct page *page, *tmp; > > + LIST_HEAD(head); > > > > pcp->count -= count; > > - spin_lock(&zone->lock); > > - isolated_pageblocks = has_isolate_pageblock(zone); > > - > > while (count) { > > - struct page *page; > > struct list_head *list; > > > > /* > > @@ -1144,26 +1142,31 @@ static void free_pcppages_bulk(struct zone *zone, int count, > > batch_free = count; > > > > do { > > - int mt; /* migratetype of the to-be-freed page */ > > - > > page = list_last_entry(list, struct page, lru); > > /* must delete as __free_one_page list manipulates */ > > Looks good in general, but I'm not sure how I reconcile this comment with > the new implementation that later links page->lru again. Thanks for pointing this out. I think the comment is useless now since there is a list_add_tail right below so it's obvious we need to take the page off its original list. I'll remove the comment in an update. > > > list_del(&page->lru); > > > > - mt = get_pcppage_migratetype(page); > > - /* MIGRATE_ISOLATE page should not go to pcplists */ > > - VM_BUG_ON_PAGE(is_migrate_isolate(mt), page); > > - /* Pageblock could have been isolated meanwhile */ > > - if (unlikely(isolated_pageblocks)) > > - mt = get_pageblock_migratetype(page); > > - > > if (bulkfree_pcp_prepare(page)) > > continue; > > > > - __free_one_page(page, page_to_pfn(page), zone, 0, mt); > > - trace_mm_page_pcpu_drain(page, 0, mt); > > + list_add_tail(&page->lru, &head); > > } while (--count && --batch_free && !list_empty(list)); > > } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f199.google.com (mail-pf0-f199.google.com [209.85.192.199]) by kanga.kvack.org (Postfix) with ESMTP id 01C686B0007 for ; Mon, 26 Feb 2018 22:11:49 -0500 (EST) Received: by mail-pf0-f199.google.com with SMTP id c83so2803441pfk.5 for ; Mon, 26 Feb 2018 19:11:48 -0800 (PST) Received: from mga01.intel.com (mga01.intel.com. [192.55.52.88]) by mx.google.com with ESMTPS id 33-v6si7735941pls.710.2018.02.26.19.11.47 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2018 19:11:47 -0800 (PST) Date: Tue, 27 Feb 2018 11:12:44 +0800 From: Aaron Lu Subject: Re: [PATCH v3 1/3] mm/free_pcppages_bulk: update pcp->count inside Message-ID: <20180227031244.GA28977@intel.com> References: <20180226135346.7208-1-aaron.lu@intel.com> <20180226135346.7208-2-aaron.lu@intel.com> <20180227015613.GA9141@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180227015613.GA9141@intel.com> Sender: owner-linux-mm@kvack.org List-ID: To: David Rientjes Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton , Huang Ying , Dave Hansen , Kemi Wang , Tim Chen , Andi Kleen , Michal Hocko , Vlastimil Babka , Mel Gorman , Matthew Wilcox On Tue, Feb 27, 2018 at 09:56:13AM +0800, Aaron Lu wrote: > On Mon, Feb 26, 2018 at 01:48:14PM -0800, David Rientjes wrote: > > On Mon, 26 Feb 2018, Aaron Lu wrote: > > > > > Matthew Wilcox found that all callers of free_pcppages_bulk() currently > > > update pcp->count immediately after so it's natural to do it inside > > > free_pcppages_bulk(). > > > > > > No functionality or performance change is expected from this patch. > > > > > > Suggested-by: Matthew Wilcox > > > Signed-off-by: Aaron Lu > > > --- > > > mm/page_alloc.c | 10 +++------- > > > 1 file changed, 3 insertions(+), 7 deletions(-) > > > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > > index cb416723538f..3154859cccd6 100644 > > > --- a/mm/page_alloc.c > > > +++ b/mm/page_alloc.c > > > @@ -1117,6 +1117,7 @@ static void free_pcppages_bulk(struct zone *zone, int count, > > > int batch_free = 0; > > > bool isolated_pageblocks; > > > > > > + pcp->count -= count; > > > spin_lock(&zone->lock); > > > isolated_pageblocks = has_isolate_pageblock(zone); > > > > > > > Why modify pcp->count before the pages have actually been freed? > > When count is still count and not zero after pages have actually been > freed :-) > > > > > I doubt that it matters too much, but at least /proc/zoneinfo uses > > zone->lock. I think it should be done after the lock is dropped. > > Agree that it looks a bit weird to do it beforehand and I just want to > avoid adding one more local variable here. > > pcp->count is not protected by zone->lock though so even we do it after > dropping the lock, it could still happen that zoneinfo shows a wrong > value of pcp->count while it should be zero(this isn't a problem since > zoneinfo doesn't need to be precise). > > Anyway, I'll follow your suggestion here to avoid confusion. What about this: update pcp->count when page is dropped off pcp list. diff --git a/mm/page_alloc.c b/mm/page_alloc.c index cb416723538f..faa33eac1635 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1148,6 +1148,7 @@ static void free_pcppages_bulk(struct zone *zone, int count, page = list_last_entry(list, struct page, lru); /* must delete as __free_one_page list manipulates */ list_del(&page->lru); + pcp->count--; mt = get_pcppage_migratetype(page); /* MIGRATE_ISOLATE page should not go to pcplists */ @@ -2416,10 +2417,8 @@ void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp) local_irq_save(flags); batch = READ_ONCE(pcp->batch); to_drain = min(pcp->count, batch); - if (to_drain > 0) { + if (to_drain > 0) free_pcppages_bulk(zone, to_drain, pcp); - pcp->count -= to_drain; - } local_irq_restore(flags); } #endif @@ -2441,10 +2440,8 @@ static void drain_pages_zone(unsigned int cpu, struct zone *zone) pset = per_cpu_ptr(zone->pageset, cpu); pcp = &pset->pcp; - if (pcp->count) { + if (pcp->count) free_pcppages_bulk(zone, pcp->count, pcp); - pcp->count = 0; - } local_irq_restore(flags); } @@ -2668,7 +2665,6 @@ static void free_unref_page_commit(struct page *page, unsigned long pfn) if (pcp->count >= pcp->high) { unsigned long batch = READ_ONCE(pcp->batch); free_pcppages_bulk(zone, batch, pcp); - pcp->count -= batch; } } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pl0-f70.google.com (mail-pl0-f70.google.com [209.85.160.70]) by kanga.kvack.org (Postfix) with ESMTP id 4CED46B0007 for ; Mon, 26 Feb 2018 22:16:10 -0500 (EST) Received: by mail-pl0-f70.google.com with SMTP id t2so8597745plr.15 for ; Mon, 26 Feb 2018 19:16:10 -0800 (PST) Received: from mga05.intel.com (mga05.intel.com. [192.55.52.43]) by mx.google.com with ESMTPS id u3si6371622pgr.447.2018.02.26.19.16.09 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2018 19:16:09 -0800 (PST) Date: Tue, 27 Feb 2018 11:17:07 +0800 From: Aaron Lu Subject: Re: [PATCH v3 2/3] mm/free_pcppages_bulk: do not hold lock when picking pages to free Message-ID: <20180227031707.GB28977@intel.com> References: <20180226135346.7208-1-aaron.lu@intel.com> <20180226135346.7208-3-aaron.lu@intel.com> <20180227020058.GB9141@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180227020058.GB9141@intel.com> Sender: owner-linux-mm@kvack.org List-ID: To: David Rientjes Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton , Huang Ying , Dave Hansen , Kemi Wang , Tim Chen , Andi Kleen , Michal Hocko , Vlastimil Babka , Mel Gorman , Matthew Wilcox On Tue, Feb 27, 2018 at 10:00:58AM +0800, Aaron Lu wrote: > On Mon, Feb 26, 2018 at 01:53:10PM -0800, David Rientjes wrote: > > On Mon, 26 Feb 2018, Aaron Lu wrote: > > > > > @@ -1144,26 +1142,31 @@ static void free_pcppages_bulk(struct zone *zone, int count, > > > batch_free = count; > > > > > > do { > > > - int mt; /* migratetype of the to-be-freed page */ > > > - > > > page = list_last_entry(list, struct page, lru); > > > /* must delete as __free_one_page list manipulates */ > > > > Looks good in general, but I'm not sure how I reconcile this comment with > > the new implementation that later links page->lru again. > > Thanks for pointing this out. > > I think the comment is useless now since there is a list_add_tail right > below so it's obvious we need to take the page off its original list. > I'll remove the comment in an update. Thinking again, I think I'll change the comment to: - /* must delete as __free_one_page list manipulates */ + /* must delete to avoid corrupting pcp list */ list_del(&page->lru); pcp->count--; Meanwhile, I'll add one more comment about why list_for_each_entry_safe is used: + /* + * Use safe version since after __free_one_page(), + * page->lru.next will not point to original list. + */ + list_for_each_entry_safe(page, tmp, &head, lru) { + int mt = get_pcppage_migratetype(page); + /* MIGRATE_ISOLATE page should not go to pcplists */ + VM_BUG_ON_PAGE(is_migrate_isolate(mt), page); + /* Pageblock could have been isolated meanwhile */ + if (unlikely(isolated_pageblocks)) + mt = get_pageblock_migratetype(page); + + __free_one_page(page, page_to_pfn(page), zone, 0, mt); + trace_mm_page_pcpu_drain(page, 0, mt); + } spin_unlock(&zone->lock); } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753059AbeBZNwt (ORCPT ); Mon, 26 Feb 2018 08:52:49 -0500 Received: from mga04.intel.com ([192.55.52.120]:54609 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752771AbeBZNws (ORCPT ); Mon, 26 Feb 2018 08:52:48 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.47,396,1515484800"; d="scan'208";a="29836352" From: Aaron Lu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Andrew Morton , Huang Ying , Dave Hansen , Kemi Wang , Tim Chen , Andi Kleen , Michal Hocko , Vlastimil Babka , Mel Gorman , Matthew Wilcox Subject: [PATCH v3 0/3] mm: improve zone->lock scalability Date: Mon, 26 Feb 2018 21:53:43 +0800 Message-Id: <20180226135346.7208-1-aaron.lu@intel.com> X-Mailer: git-send-email 2.14.3 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Patch 1/3 is a small cleanup suggested by Matthew Wilcox which doesn't affect scalability or performance; Patch 2/3 moves some code in free_pcppages_bulk() outside of zone->lock and has Mel Gorman's ack; Patch 3/3 uses prefetch in free_pcppages_bulk() outside of zone->lock to speedup page merging under zone->lock but Mel Gorman has some concerns. For details, please see their changelogs. Changes from v2: Patch 1/3 is newly added; Patch 2/3 is patch 1/2 in v2 and doesn't have any change except resolving conflicts due to the newly added patch 1/3; Patch 3/3 is patch 2/2 in v2 and only has some changelog updates on concerns part. v1 and v2 was here: https://lkml.org/lkml/2018/1/23/879 Aaron Lu (3): mm/free_pcppages_bulk: update pcp->count inside mm/free_pcppages_bulk: do not hold lock when picking pages to free mm/free_pcppages_bulk: prefetch buddy while not holding lock mm/page_alloc.c | 56 +++++++++++++++++++++++++++++++++++--------------------- 1 file changed, 35 insertions(+), 21 deletions(-) -- 2.14.3 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753191AbeBZNw4 (ORCPT ); Mon, 26 Feb 2018 08:52:56 -0500 Received: from mga04.intel.com ([192.55.52.120]:54609 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753066AbeBZNwu (ORCPT ); Mon, 26 Feb 2018 08:52:50 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.47,396,1515484800"; d="scan'208";a="29836362" From: Aaron Lu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Andrew Morton , Huang Ying , Dave Hansen , Kemi Wang , Tim Chen , Andi Kleen , Michal Hocko , Vlastimil Babka , Mel Gorman , Matthew Wilcox Subject: [PATCH v3 1/3] mm/free_pcppages_bulk: update pcp->count inside Date: Mon, 26 Feb 2018 21:53:44 +0800 Message-Id: <20180226135346.7208-2-aaron.lu@intel.com> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20180226135346.7208-1-aaron.lu@intel.com> References: <20180226135346.7208-1-aaron.lu@intel.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Matthew Wilcox found that all callers of free_pcppages_bulk() currently update pcp->count immediately after so it's natural to do it inside free_pcppages_bulk(). No functionality or performance change is expected from this patch. Suggested-by: Matthew Wilcox Signed-off-by: Aaron Lu --- mm/page_alloc.c | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index cb416723538f..3154859cccd6 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1117,6 +1117,7 @@ static void free_pcppages_bulk(struct zone *zone, int count, int batch_free = 0; bool isolated_pageblocks; + pcp->count -= count; spin_lock(&zone->lock); isolated_pageblocks = has_isolate_pageblock(zone); @@ -2416,10 +2417,8 @@ void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp) local_irq_save(flags); batch = READ_ONCE(pcp->batch); to_drain = min(pcp->count, batch); - if (to_drain > 0) { + if (to_drain > 0) free_pcppages_bulk(zone, to_drain, pcp); - pcp->count -= to_drain; - } local_irq_restore(flags); } #endif @@ -2441,10 +2440,8 @@ static void drain_pages_zone(unsigned int cpu, struct zone *zone) pset = per_cpu_ptr(zone->pageset, cpu); pcp = &pset->pcp; - if (pcp->count) { + if (pcp->count) free_pcppages_bulk(zone, pcp->count, pcp); - pcp->count = 0; - } local_irq_restore(flags); } @@ -2668,7 +2665,6 @@ static void free_unref_page_commit(struct page *page, unsigned long pfn) if (pcp->count >= pcp->high) { unsigned long batch = READ_ONCE(pcp->batch); free_pcppages_bulk(zone, batch, pcp); - pcp->count -= batch; } } -- 2.14.3 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753218AbeBZNxB (ORCPT ); Mon, 26 Feb 2018 08:53:01 -0500 Received: from mga04.intel.com ([192.55.52.120]:54609 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753086AbeBZNww (ORCPT ); Mon, 26 Feb 2018 08:52:52 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.47,396,1515484800"; d="scan'208";a="29836368" From: Aaron Lu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Andrew Morton , Huang Ying , Dave Hansen , Kemi Wang , Tim Chen , Andi Kleen , Michal Hocko , Vlastimil Babka , Mel Gorman , Matthew Wilcox Subject: [PATCH v3 2/3] mm/free_pcppages_bulk: do not hold lock when picking pages to free Date: Mon, 26 Feb 2018 21:53:45 +0800 Message-Id: <20180226135346.7208-3-aaron.lu@intel.com> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20180226135346.7208-1-aaron.lu@intel.com> References: <20180226135346.7208-1-aaron.lu@intel.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org When freeing a batch of pages from Per-CPU-Pages(PCP) back to buddy, the zone->lock is held and then pages are chosen from PCP's migratetype list. While there is actually no need to do this 'choose part' under lock since it's PCP pages, the only CPU that can touch them is us and irq is also disabled. Moving this part outside could reduce lock held time and improve performance. Test with will-it-scale/page_fault1 full load: kernel Broadwell(2S) Skylake(2S) Broadwell(4S) Skylake(4S) v4.16-rc2+ 9034215 7971818 13667135 15677465 this patch 9536374 +5.6% 8314710 +4.3% 14070408 +3.0% 16675866 +6.4% What the test does is: starts $nr_cpu processes and each will repeatedly do the following for 5 minutes: 1 mmap 128M anonymouse space; 2 write access to that space; 3 munmap. The score is the aggregated iteration. https://github.com/antonblanchard/will-it-scale/blob/master/tests/page_fault1.c Acked-by: Mel Gorman Signed-off-by: Aaron Lu --- mm/page_alloc.c | 33 ++++++++++++++++++--------------- 1 file changed, 18 insertions(+), 15 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 3154859cccd6..35576da0a6c9 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1116,13 +1116,11 @@ static void free_pcppages_bulk(struct zone *zone, int count, int migratetype = 0; int batch_free = 0; bool isolated_pageblocks; + struct page *page, *tmp; + LIST_HEAD(head); pcp->count -= count; - spin_lock(&zone->lock); - isolated_pageblocks = has_isolate_pageblock(zone); - while (count) { - struct page *page; struct list_head *list; /* @@ -1144,26 +1142,31 @@ static void free_pcppages_bulk(struct zone *zone, int count, batch_free = count; do { - int mt; /* migratetype of the to-be-freed page */ - page = list_last_entry(list, struct page, lru); /* must delete as __free_one_page list manipulates */ list_del(&page->lru); - mt = get_pcppage_migratetype(page); - /* MIGRATE_ISOLATE page should not go to pcplists */ - VM_BUG_ON_PAGE(is_migrate_isolate(mt), page); - /* Pageblock could have been isolated meanwhile */ - if (unlikely(isolated_pageblocks)) - mt = get_pageblock_migratetype(page); - if (bulkfree_pcp_prepare(page)) continue; - __free_one_page(page, page_to_pfn(page), zone, 0, mt); - trace_mm_page_pcpu_drain(page, 0, mt); + list_add_tail(&page->lru, &head); } while (--count && --batch_free && !list_empty(list)); } + + spin_lock(&zone->lock); + isolated_pageblocks = has_isolate_pageblock(zone); + + list_for_each_entry_safe(page, tmp, &head, lru) { + int mt = get_pcppage_migratetype(page); + /* MIGRATE_ISOLATE page should not go to pcplists */ + VM_BUG_ON_PAGE(is_migrate_isolate(mt), page); + /* Pageblock could have been isolated meanwhile */ + if (unlikely(isolated_pageblocks)) + mt = get_pageblock_migratetype(page); + + __free_one_page(page, page_to_pfn(page), zone, 0, mt); + trace_mm_page_pcpu_drain(page, 0, mt); + } spin_unlock(&zone->lock); } -- 2.14.3 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753265AbeBZNxD (ORCPT ); Mon, 26 Feb 2018 08:53:03 -0500 Received: from mga04.intel.com ([192.55.52.120]:54609 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753122AbeBZNwy (ORCPT ); Mon, 26 Feb 2018 08:52:54 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.47,396,1515484800"; d="scan'208";a="29836380" From: Aaron Lu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Andrew Morton , Huang Ying , Dave Hansen , Kemi Wang , Tim Chen , Andi Kleen , Michal Hocko , Vlastimil Babka , Mel Gorman , Matthew Wilcox Subject: [PATCH v3 3/3] mm/free_pcppages_bulk: prefetch buddy while not holding lock Date: Mon, 26 Feb 2018 21:53:46 +0800 Message-Id: <20180226135346.7208-4-aaron.lu@intel.com> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20180226135346.7208-1-aaron.lu@intel.com> References: <20180226135346.7208-1-aaron.lu@intel.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org When a page is freed back to the global pool, its buddy will be checked to see if it's possible to do a merge. This requires accessing buddy's page structure and that access could take a long time if it's cache cold. This patch adds a prefetch to the to-be-freed page's buddy outside of zone->lock in hope of accessing buddy's page structure later under zone->lock will be faster. Since we *always* do buddy merging and check an order-0 page's buddy to try to merge it when it goes into the main allocator, the cacheline will always come in, i.e. the prefetched data will never be unused. In the meantime, there are two concerns: 1 the prefetch could potentially evict existing cachelines, especially for L1D cache since it is not huge; 2 there is some additional instruction overhead, namely calculating buddy pfn twice. For 1, it's hard to say, this microbenchmark though shows good result but the actual benefit of this patch will be workload/CPU dependant; For 2, since the calculation is a XOR on two local variables, it's expected in many cases that cycles spent will be offset by reduced memory latency later. This is especially true for NUMA machines where multiple CPUs are contending on zone->lock and the most time consuming part under zone->lock is the wait of 'struct page' cacheline of the to-be-freed pages and their buddies. Test with will-it-scale/page_fault1 full load: kernel Broadwell(2S) Skylake(2S) Broadwell(4S) Skylake(4S) v4.16-rc2+ 9034215 7971818 13667135 15677465 patch2/3 9536374 +5.6% 8314710 +4.3% 14070408 +3.0% 16675866 +6.4% this patch 10338868 +8.4% 8544477 +2.8% 14839808 +5.5% 17155464 +2.9% Note: this patch's performance improvement percent is against patch2/3. [changelog stole from Dave Hansen and Mel Gorman's comments] https://lkml.org/lkml/2018/1/24/551 Suggested-by: Ying Huang Signed-off-by: Aaron Lu --- mm/page_alloc.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 35576da0a6c9..dc3b89894f2c 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1142,6 +1142,9 @@ static void free_pcppages_bulk(struct zone *zone, int count, batch_free = count; do { + unsigned long pfn, buddy_pfn; + struct page *buddy; + page = list_last_entry(list, struct page, lru); /* must delete as __free_one_page list manipulates */ list_del(&page->lru); @@ -1150,6 +1153,18 @@ static void free_pcppages_bulk(struct zone *zone, int count, continue; list_add_tail(&page->lru, &head); + + /* + * We are going to put the page back to the global + * pool, prefetch its buddy to speed up later access + * under zone->lock. It is believed the overhead of + * calculating buddy_pfn here can be offset by reduced + * memory latency later. + */ + pfn = page_to_pfn(page); + buddy_pfn = __find_buddy_pfn(pfn, 0); + buddy = page + (buddy_pfn - pfn); + prefetch(buddy); } while (--count && --batch_free && !list_empty(list)); } -- 2.14.3 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751964AbeBZVsV (ORCPT ); Mon, 26 Feb 2018 16:48:21 -0500 Received: from mail-io0-f195.google.com ([209.85.223.195]:38010 "EHLO mail-io0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751814AbeBZVsR (ORCPT ); Mon, 26 Feb 2018 16:48:17 -0500 X-Google-Smtp-Source: AG47ELvOhW4VwzKMX0AebcJI8/2p+j3EZTMtsJpVCR/blLCxh2hPPmOJqmMQI7Fyn4b0Qkj0OOfBWw== Date: Mon, 26 Feb 2018 13:48:14 -0800 (PST) From: David Rientjes X-X-Sender: rientjes@chino.kir.corp.google.com To: Aaron Lu cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton , Huang Ying , Dave Hansen , Kemi Wang , Tim Chen , Andi Kleen , Michal Hocko , Vlastimil Babka , Mel Gorman , Matthew Wilcox Subject: Re: [PATCH v3 1/3] mm/free_pcppages_bulk: update pcp->count inside In-Reply-To: <20180226135346.7208-2-aaron.lu@intel.com> Message-ID: References: <20180226135346.7208-1-aaron.lu@intel.com> <20180226135346.7208-2-aaron.lu@intel.com> User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 26 Feb 2018, Aaron Lu wrote: > Matthew Wilcox found that all callers of free_pcppages_bulk() currently > update pcp->count immediately after so it's natural to do it inside > free_pcppages_bulk(). > > No functionality or performance change is expected from this patch. > > Suggested-by: Matthew Wilcox > Signed-off-by: Aaron Lu > --- > mm/page_alloc.c | 10 +++------- > 1 file changed, 3 insertions(+), 7 deletions(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index cb416723538f..3154859cccd6 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1117,6 +1117,7 @@ static void free_pcppages_bulk(struct zone *zone, int count, > int batch_free = 0; > bool isolated_pageblocks; > > + pcp->count -= count; > spin_lock(&zone->lock); > isolated_pageblocks = has_isolate_pageblock(zone); > Why modify pcp->count before the pages have actually been freed? I doubt that it matters too much, but at least /proc/zoneinfo uses zone->lock. I think it should be done after the lock is dropped. Otherwise, looks good. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752013AbeBZVxQ (ORCPT ); Mon, 26 Feb 2018 16:53:16 -0500 Received: from mail-it0-f68.google.com ([209.85.214.68]:36336 "EHLO mail-it0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751514AbeBZVxN (ORCPT ); Mon, 26 Feb 2018 16:53:13 -0500 X-Google-Smtp-Source: AG47ELtEt/v3EyuhbIEKzn/c6A1ORgdfrYuAAH39Rm1YdlTjDdQ8ngIxwh6V9f+bdbJjUSJ+xKNLwg== Date: Mon, 26 Feb 2018 13:53:10 -0800 (PST) From: David Rientjes X-X-Sender: rientjes@chino.kir.corp.google.com To: Aaron Lu cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton , Huang Ying , Dave Hansen , Kemi Wang , Tim Chen , Andi Kleen , Michal Hocko , Vlastimil Babka , Mel Gorman , Matthew Wilcox Subject: Re: [PATCH v3 2/3] mm/free_pcppages_bulk: do not hold lock when picking pages to free In-Reply-To: <20180226135346.7208-3-aaron.lu@intel.com> Message-ID: References: <20180226135346.7208-1-aaron.lu@intel.com> <20180226135346.7208-3-aaron.lu@intel.com> User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 26 Feb 2018, Aaron Lu wrote: > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 3154859cccd6..35576da0a6c9 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1116,13 +1116,11 @@ static void free_pcppages_bulk(struct zone *zone, int count, > int migratetype = 0; > int batch_free = 0; > bool isolated_pageblocks; > + struct page *page, *tmp; > + LIST_HEAD(head); > > pcp->count -= count; > - spin_lock(&zone->lock); > - isolated_pageblocks = has_isolate_pageblock(zone); > - > while (count) { > - struct page *page; > struct list_head *list; > > /* > @@ -1144,26 +1142,31 @@ static void free_pcppages_bulk(struct zone *zone, int count, > batch_free = count; > > do { > - int mt; /* migratetype of the to-be-freed page */ > - > page = list_last_entry(list, struct page, lru); > /* must delete as __free_one_page list manipulates */ Looks good in general, but I'm not sure how I reconcile this comment with the new implementation that later links page->lru again. > list_del(&page->lru); > > - mt = get_pcppage_migratetype(page); > - /* MIGRATE_ISOLATE page should not go to pcplists */ > - VM_BUG_ON_PAGE(is_migrate_isolate(mt), page); > - /* Pageblock could have been isolated meanwhile */ > - if (unlikely(isolated_pageblocks)) > - mt = get_pageblock_migratetype(page); > - > if (bulkfree_pcp_prepare(page)) > continue; > > - __free_one_page(page, page_to_pfn(page), zone, 0, mt); > - trace_mm_page_pcpu_drain(page, 0, mt); > + list_add_tail(&page->lru, &head); > } while (--count && --batch_free && !list_empty(list)); > } > + > + spin_lock(&zone->lock); > + isolated_pageblocks = has_isolate_pageblock(zone); > + > + list_for_each_entry_safe(page, tmp, &head, lru) { > + int mt = get_pcppage_migratetype(page); > + /* MIGRATE_ISOLATE page should not go to pcplists */ > + VM_BUG_ON_PAGE(is_migrate_isolate(mt), page); > + /* Pageblock could have been isolated meanwhile */ > + if (unlikely(isolated_pageblocks)) > + mt = get_pageblock_migratetype(page); > + > + __free_one_page(page, page_to_pfn(page), zone, 0, mt); > + trace_mm_page_pcpu_drain(page, 0, mt); > + } > spin_unlock(&zone->lock); > } > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751795AbeB0BzR (ORCPT ); Mon, 26 Feb 2018 20:55:17 -0500 Received: from mga14.intel.com ([192.55.52.115]:29588 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751463AbeB0BzQ (ORCPT ); Mon, 26 Feb 2018 20:55:16 -0500 X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.47,398,1515484800"; d="scan'208";a="21287066" Date: Tue, 27 Feb 2018 09:56:13 +0800 From: Aaron Lu To: David Rientjes Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton , Huang Ying , Dave Hansen , Kemi Wang , Tim Chen , Andi Kleen , Michal Hocko , Vlastimil Babka , Mel Gorman , Matthew Wilcox Subject: Re: [PATCH v3 1/3] mm/free_pcppages_bulk: update pcp->count inside Message-ID: <20180227015613.GA9141@intel.com> References: <20180226135346.7208-1-aaron.lu@intel.com> <20180226135346.7208-2-aaron.lu@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.2 (2017-12-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Feb 26, 2018 at 01:48:14PM -0800, David Rientjes wrote: > On Mon, 26 Feb 2018, Aaron Lu wrote: > > > Matthew Wilcox found that all callers of free_pcppages_bulk() currently > > update pcp->count immediately after so it's natural to do it inside > > free_pcppages_bulk(). > > > > No functionality or performance change is expected from this patch. > > > > Suggested-by: Matthew Wilcox > > Signed-off-by: Aaron Lu > > --- > > mm/page_alloc.c | 10 +++------- > > 1 file changed, 3 insertions(+), 7 deletions(-) > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index cb416723538f..3154859cccd6 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -1117,6 +1117,7 @@ static void free_pcppages_bulk(struct zone *zone, int count, > > int batch_free = 0; > > bool isolated_pageblocks; > > > > + pcp->count -= count; > > spin_lock(&zone->lock); > > isolated_pageblocks = has_isolate_pageblock(zone); > > > > Why modify pcp->count before the pages have actually been freed? When count is still count and not zero after pages have actually been freed :-) > > I doubt that it matters too much, but at least /proc/zoneinfo uses > zone->lock. I think it should be done after the lock is dropped. Agree that it looks a bit weird to do it beforehand and I just want to avoid adding one more local variable here. pcp->count is not protected by zone->lock though so even we do it after dropping the lock, it could still happen that zoneinfo shows a wrong value of pcp->count while it should be zero(this isn't a problem since zoneinfo doesn't need to be precise). Anyway, I'll follow your suggestion here to avoid confusion. > Otherwise, looks good. Thanks for taking a look at this. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751728AbeB0CAC (ORCPT ); Mon, 26 Feb 2018 21:00:02 -0500 Received: from mga14.intel.com ([192.55.52.115]:29726 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751634AbeB0CAB (ORCPT ); Mon, 26 Feb 2018 21:00:01 -0500 X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.47,398,1515484800"; d="scan'208";a="178307447" Date: Tue, 27 Feb 2018 10:00:58 +0800 From: Aaron Lu To: David Rientjes Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton , Huang Ying , Dave Hansen , Kemi Wang , Tim Chen , Andi Kleen , Michal Hocko , Vlastimil Babka , Mel Gorman , Matthew Wilcox Subject: Re: [PATCH v3 2/3] mm/free_pcppages_bulk: do not hold lock when picking pages to free Message-ID: <20180227020058.GB9141@intel.com> References: <20180226135346.7208-1-aaron.lu@intel.com> <20180226135346.7208-3-aaron.lu@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.2 (2017-12-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Feb 26, 2018 at 01:53:10PM -0800, David Rientjes wrote: > On Mon, 26 Feb 2018, Aaron Lu wrote: > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index 3154859cccd6..35576da0a6c9 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -1116,13 +1116,11 @@ static void free_pcppages_bulk(struct zone *zone, int count, > > int migratetype = 0; > > int batch_free = 0; > > bool isolated_pageblocks; > > + struct page *page, *tmp; > > + LIST_HEAD(head); > > > > pcp->count -= count; > > - spin_lock(&zone->lock); > > - isolated_pageblocks = has_isolate_pageblock(zone); > > - > > while (count) { > > - struct page *page; > > struct list_head *list; > > > > /* > > @@ -1144,26 +1142,31 @@ static void free_pcppages_bulk(struct zone *zone, int count, > > batch_free = count; > > > > do { > > - int mt; /* migratetype of the to-be-freed page */ > > - > > page = list_last_entry(list, struct page, lru); > > /* must delete as __free_one_page list manipulates */ > > Looks good in general, but I'm not sure how I reconcile this comment with > the new implementation that later links page->lru again. Thanks for pointing this out. I think the comment is useless now since there is a list_add_tail right below so it's obvious we need to take the page off its original list. I'll remove the comment in an update. > > > list_del(&page->lru); > > > > - mt = get_pcppage_migratetype(page); > > - /* MIGRATE_ISOLATE page should not go to pcplists */ > > - VM_BUG_ON_PAGE(is_migrate_isolate(mt), page); > > - /* Pageblock could have been isolated meanwhile */ > > - if (unlikely(isolated_pageblocks)) > > - mt = get_pageblock_migratetype(page); > > - > > if (bulkfree_pcp_prepare(page)) > > continue; > > > > - __free_one_page(page, page_to_pfn(page), zone, 0, mt); > > - trace_mm_page_pcpu_drain(page, 0, mt); > > + list_add_tail(&page->lru, &head); > > } while (--count && --batch_free && !list_empty(list)); > > } From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751632AbeB0DLs (ORCPT ); Mon, 26 Feb 2018 22:11:48 -0500 Received: from mga14.intel.com ([192.55.52.115]:32670 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751487AbeB0DLr (ORCPT ); Mon, 26 Feb 2018 22:11:47 -0500 X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.47,398,1515484800"; d="scan'208";a="37702756" Date: Tue, 27 Feb 2018 11:12:44 +0800 From: Aaron Lu To: David Rientjes Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton , Huang Ying , Dave Hansen , Kemi Wang , Tim Chen , Andi Kleen , Michal Hocko , Vlastimil Babka , Mel Gorman , Matthew Wilcox Subject: Re: [PATCH v3 1/3] mm/free_pcppages_bulk: update pcp->count inside Message-ID: <20180227031244.GA28977@intel.com> References: <20180226135346.7208-1-aaron.lu@intel.com> <20180226135346.7208-2-aaron.lu@intel.com> <20180227015613.GA9141@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180227015613.GA9141@intel.com> User-Agent: Mutt/1.9.2 (2017-12-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Feb 27, 2018 at 09:56:13AM +0800, Aaron Lu wrote: > On Mon, Feb 26, 2018 at 01:48:14PM -0800, David Rientjes wrote: > > On Mon, 26 Feb 2018, Aaron Lu wrote: > > > > > Matthew Wilcox found that all callers of free_pcppages_bulk() currently > > > update pcp->count immediately after so it's natural to do it inside > > > free_pcppages_bulk(). > > > > > > No functionality or performance change is expected from this patch. > > > > > > Suggested-by: Matthew Wilcox > > > Signed-off-by: Aaron Lu > > > --- > > > mm/page_alloc.c | 10 +++------- > > > 1 file changed, 3 insertions(+), 7 deletions(-) > > > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > > index cb416723538f..3154859cccd6 100644 > > > --- a/mm/page_alloc.c > > > +++ b/mm/page_alloc.c > > > @@ -1117,6 +1117,7 @@ static void free_pcppages_bulk(struct zone *zone, int count, > > > int batch_free = 0; > > > bool isolated_pageblocks; > > > > > > + pcp->count -= count; > > > spin_lock(&zone->lock); > > > isolated_pageblocks = has_isolate_pageblock(zone); > > > > > > > Why modify pcp->count before the pages have actually been freed? > > When count is still count and not zero after pages have actually been > freed :-) > > > > > I doubt that it matters too much, but at least /proc/zoneinfo uses > > zone->lock. I think it should be done after the lock is dropped. > > Agree that it looks a bit weird to do it beforehand and I just want to > avoid adding one more local variable here. > > pcp->count is not protected by zone->lock though so even we do it after > dropping the lock, it could still happen that zoneinfo shows a wrong > value of pcp->count while it should be zero(this isn't a problem since > zoneinfo doesn't need to be precise). > > Anyway, I'll follow your suggestion here to avoid confusion. What about this: update pcp->count when page is dropped off pcp list. diff --git a/mm/page_alloc.c b/mm/page_alloc.c index cb416723538f..faa33eac1635 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1148,6 +1148,7 @@ static void free_pcppages_bulk(struct zone *zone, int count, page = list_last_entry(list, struct page, lru); /* must delete as __free_one_page list manipulates */ list_del(&page->lru); + pcp->count--; mt = get_pcppage_migratetype(page); /* MIGRATE_ISOLATE page should not go to pcplists */ @@ -2416,10 +2417,8 @@ void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp) local_irq_save(flags); batch = READ_ONCE(pcp->batch); to_drain = min(pcp->count, batch); - if (to_drain > 0) { + if (to_drain > 0) free_pcppages_bulk(zone, to_drain, pcp); - pcp->count -= to_drain; - } local_irq_restore(flags); } #endif @@ -2441,10 +2440,8 @@ static void drain_pages_zone(unsigned int cpu, struct zone *zone) pset = per_cpu_ptr(zone->pageset, cpu); pcp = &pset->pcp; - if (pcp->count) { + if (pcp->count) free_pcppages_bulk(zone, pcp->count, pcp); - pcp->count = 0; - } local_irq_restore(flags); } @@ -2668,7 +2665,6 @@ static void free_unref_page_commit(struct page *page, unsigned long pfn) if (pcp->count >= pcp->high) { unsigned long batch = READ_ONCE(pcp->batch); free_pcppages_bulk(zone, batch, pcp); - pcp->count -= batch; } } From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751874AbeB0DQJ (ORCPT ); Mon, 26 Feb 2018 22:16:09 -0500 Received: from mga12.intel.com ([192.55.52.136]:51146 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751824AbeB0DQI (ORCPT ); Mon, 26 Feb 2018 22:16:08 -0500 X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.47,398,1515484800"; d="scan'208";a="20595919" Date: Tue, 27 Feb 2018 11:17:07 +0800 From: Aaron Lu To: David Rientjes Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton , Huang Ying , Dave Hansen , Kemi Wang , Tim Chen , Andi Kleen , Michal Hocko , Vlastimil Babka , Mel Gorman , Matthew Wilcox Subject: Re: [PATCH v3 2/3] mm/free_pcppages_bulk: do not hold lock when picking pages to free Message-ID: <20180227031707.GB28977@intel.com> References: <20180226135346.7208-1-aaron.lu@intel.com> <20180226135346.7208-3-aaron.lu@intel.com> <20180227020058.GB9141@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180227020058.GB9141@intel.com> User-Agent: Mutt/1.9.2 (2017-12-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Feb 27, 2018 at 10:00:58AM +0800, Aaron Lu wrote: > On Mon, Feb 26, 2018 at 01:53:10PM -0800, David Rientjes wrote: > > On Mon, 26 Feb 2018, Aaron Lu wrote: > > > > > @@ -1144,26 +1142,31 @@ static void free_pcppages_bulk(struct zone *zone, int count, > > > batch_free = count; > > > > > > do { > > > - int mt; /* migratetype of the to-be-freed page */ > > > - > > > page = list_last_entry(list, struct page, lru); > > > /* must delete as __free_one_page list manipulates */ > > > > Looks good in general, but I'm not sure how I reconcile this comment with > > the new implementation that later links page->lru again. > > Thanks for pointing this out. > > I think the comment is useless now since there is a list_add_tail right > below so it's obvious we need to take the page off its original list. > I'll remove the comment in an update. Thinking again, I think I'll change the comment to: - /* must delete as __free_one_page list manipulates */ + /* must delete to avoid corrupting pcp list */ list_del(&page->lru); pcp->count--; Meanwhile, I'll add one more comment about why list_for_each_entry_safe is used: + /* + * Use safe version since after __free_one_page(), + * page->lru.next will not point to original list. + */ + list_for_each_entry_safe(page, tmp, &head, lru) { + int mt = get_pcppage_migratetype(page); + /* MIGRATE_ISOLATE page should not go to pcplists */ + VM_BUG_ON_PAGE(is_migrate_isolate(mt), page); + /* Pageblock could have been isolated meanwhile */ + if (unlikely(isolated_pageblocks)) + mt = get_pageblock_migratetype(page); + + __free_one_page(page, page_to_pfn(page), zone, 0, mt); + trace_mm_page_pcpu_drain(page, 0, mt); + } spin_unlock(&zone->lock); }