From: Andi Kleen <andi@firstfloor.org>
To: Mel Gorman <mel@csn.ul.ie>
Cc: Linux Memory Management List <linux-mm@kvack.org>,
Pekka Enberg <penberg@cs.helsinki.fi>,
Rik van Riel <riel@redhat.com>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
Christoph Lameter <cl@linux-foundation.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Nick Piggin <npiggin@suse.de>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Lin Ming <ming.m.lin@intel.com>,
Zhang Yanmin <yanmin_zhang@linux.intel.com>
Subject: Re: [RFC PATCH 00/20] Cleanup and optimise the page allocator
Date: Mon, 23 Feb 2009 00:57:37 +0100 [thread overview]
Message-ID: <87prhauiry.fsf@basil.nowhere.org> (raw)
In-Reply-To: <1235344649-18265-1-git-send-email-mel@csn.ul.ie> (Mel Gorman's message of "Sun, 22 Feb 2009 23:17:09 +0000")
Mel Gorman <mel@csn.ul.ie> writes:
> The complexity of the page allocator has been increasing for some time
> and it has now reached the point where the SLUB allocator is doing strange
> tricks to avoid the page allocator. This is obviously bad as it may encourage
> other subsystems to try avoiding the page allocator as well.
Congratulations! That was long overdue. Haven't read the patches yet though.
> Patch 15 reduces the number of times interrupts are disabled by reworking
> what free_page_mlock() does. However, I notice that the cost of calling
> TestClearPageMlocked() is still quite high and I'm guessing it's because
> it's a locked bit operation. It's be nice if it could be established if
> it's safe to use an unlocked version here. Rik, can you comment?
What machine was that again?
> Patch 16 avoids using the zonelist cache on non-NUMA machines
My suspicion is that it can be even dropped on most small (all?) NUMA systems.
> Patch 20 gets rid of hot/cold freeing of pages because it incurs cost for
> what I believe to be very dubious gain. I'm not sure we currently gain
> anything by it but it's further discussed in the patch itself.
Yes the hot/cold thing was always quite dubious.
> Counters are surprising expensive, we spent a good chuck of our time in
> functions like __dec_zone_page_state and __dec_zone_state. In a profiled
> run of kernbench, the time spent in __dec_zone_state was roughly equal to
> the combined cost of the rest of the page free path. A quick check showed
> that almost half of the time in that function is spent on line 233 alone
> which for me is;
>
> (*p)--;
>
> That's worth a separate investigation but it might be a case that
> manipulating int8_t on the machine I was using for profiling is unusually
> expensive.
What machine was that?
In general I wouldn't expect even on a system with slow char
operations to be that expensive. It sounds more like a cache miss or a
cache line bounce. You could possibly confirm by using appropiate
performance counters.
> Converting this to an int might be faster but the increased
> memory consumption and cache footprint might be a problem. Opinions?
One possibility would be to move the zone statistics to allocated
per cpu data. Or perhaps just stop counting per zone at all and
only count per cpu.
> The downside is that the patches do increase text size because of the
> splitting of the fast path into one inlined blob and the slow path into a
> number of other functions. On my test machine, text increased by 1.2K so
> I might revisit that again and see how much of a difference it really made.
>
> That all said, I'm seeing good results on actual benchmarks with these
> patches.
>
> o On many machines, I'm seeing a 0-2% improvement on kernbench. The dominant
Neat.
> So, by and large it's an improvement of some sort.
That seems like an understatement.
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
WARNING: multiple messages have this Message-ID (diff)
From: Andi Kleen <andi@firstfloor.org>
To: Mel Gorman <mel@csn.ul.ie>
Cc: Linux Memory Management List <linux-mm@kvack.org>,
Pekka Enberg <penberg@cs.helsinki.fi>,
Rik van Riel <riel@redhat.com>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
Christoph Lameter <cl@linux-foundation.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Nick Piggin <npiggin@suse.de>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Lin Ming <ming.m.lin@intel.com>,
Zhang Yanmin <yanmin_zhang@linux.intel.com>
Subject: Re: [RFC PATCH 00/20] Cleanup and optimise the page allocator
Date: Mon, 23 Feb 2009 00:57:37 +0100 [thread overview]
Message-ID: <87prhauiry.fsf@basil.nowhere.org> (raw)
In-Reply-To: <1235344649-18265-1-git-send-email-mel@csn.ul.ie> (Mel Gorman's message of "Sun, 22 Feb 2009 23:17:09 +0000")
Mel Gorman <mel@csn.ul.ie> writes:
> The complexity of the page allocator has been increasing for some time
> and it has now reached the point where the SLUB allocator is doing strange
> tricks to avoid the page allocator. This is obviously bad as it may encourage
> other subsystems to try avoiding the page allocator as well.
Congratulations! That was long overdue. Haven't read the patches yet though.
> Patch 15 reduces the number of times interrupts are disabled by reworking
> what free_page_mlock() does. However, I notice that the cost of calling
> TestClearPageMlocked() is still quite high and I'm guessing it's because
> it's a locked bit operation. It's be nice if it could be established if
> it's safe to use an unlocked version here. Rik, can you comment?
What machine was that again?
> Patch 16 avoids using the zonelist cache on non-NUMA machines
My suspicion is that it can be even dropped on most small (all?) NUMA systems.
> Patch 20 gets rid of hot/cold freeing of pages because it incurs cost for
> what I believe to be very dubious gain. I'm not sure we currently gain
> anything by it but it's further discussed in the patch itself.
Yes the hot/cold thing was always quite dubious.
> Counters are surprising expensive, we spent a good chuck of our time in
> functions like __dec_zone_page_state and __dec_zone_state. In a profiled
> run of kernbench, the time spent in __dec_zone_state was roughly equal to
> the combined cost of the rest of the page free path. A quick check showed
> that almost half of the time in that function is spent on line 233 alone
> which for me is;
>
> (*p)--;
>
> That's worth a separate investigation but it might be a case that
> manipulating int8_t on the machine I was using for profiling is unusually
> expensive.
What machine was that?
In general I wouldn't expect even on a system with slow char
operations to be that expensive. It sounds more like a cache miss or a
cache line bounce. You could possibly confirm by using appropiate
performance counters.
> Converting this to an int might be faster but the increased
> memory consumption and cache footprint might be a problem. Opinions?
One possibility would be to move the zone statistics to allocated
per cpu data. Or perhaps just stop counting per zone at all and
only count per cpu.
> The downside is that the patches do increase text size because of the
> splitting of the fast path into one inlined blob and the slow path into a
> number of other functions. On my test machine, text increased by 1.2K so
> I might revisit that again and see how much of a difference it really made.
>
> That all said, I'm seeing good results on actual benchmarks with these
> patches.
>
> o On many machines, I'm seeing a 0-2% improvement on kernbench. The dominant
Neat.
> So, by and large it's an improvement of some sort.
That seems like an understatement.
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2009-02-22 23:59 UTC|newest]
Thread overview: 190+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-02-22 23:17 [RFC PATCH 00/20] Cleanup and optimise the page allocator Mel Gorman
2009-02-22 23:17 ` Mel Gorman
2009-02-22 23:17 ` [PATCH 01/20] Replace __alloc_pages_internal() with __alloc_pages_nodemask() Mel Gorman
2009-02-22 23:17 ` Mel Gorman
2009-02-22 23:17 ` [PATCH 02/20] Do not sanity check order in the fast path Mel Gorman
2009-02-22 23:17 ` Mel Gorman
2009-02-22 23:17 ` [PATCH 03/20] Do not check NUMA node ID when the caller knows the node is valid Mel Gorman
2009-02-22 23:17 ` Mel Gorman
2009-02-23 15:01 ` Christoph Lameter
2009-02-23 15:01 ` Christoph Lameter
2009-02-23 16:24 ` Mel Gorman
2009-02-23 16:24 ` Mel Gorman
2009-02-22 23:17 ` [PATCH 04/20] Convert gfp_zone() to use a table of precalculated values Mel Gorman
2009-02-22 23:17 ` Mel Gorman
2009-02-23 11:55 ` [PATCH] mm: clean up __GFP_* flags a bit Peter Zijlstra
2009-02-23 11:55 ` Peter Zijlstra
2009-02-23 18:01 ` Mel Gorman
2009-02-23 18:01 ` Mel Gorman
2009-02-23 20:27 ` Vegard Nossum
2009-02-23 20:27 ` Vegard Nossum
2009-02-23 15:23 ` [PATCH 04/20] Convert gfp_zone() to use a table of precalculated values Christoph Lameter
2009-02-23 15:23 ` Christoph Lameter
2009-02-23 15:41 ` Nick Piggin
2009-02-23 15:41 ` Nick Piggin
2009-02-23 15:43 ` [PATCH 04/20] Convert gfp_zone() to use a table of precalculated value Christoph Lameter
2009-02-23 15:43 ` Christoph Lameter
2009-02-23 16:40 ` Mel Gorman
2009-02-23 16:40 ` Mel Gorman
2009-02-23 17:03 ` Christoph Lameter
2009-02-23 17:03 ` Christoph Lameter
2009-02-24 1:32 ` KAMEZAWA Hiroyuki
2009-02-24 1:32 ` KAMEZAWA Hiroyuki
2009-02-24 3:59 ` Nick Piggin
2009-02-24 3:59 ` Nick Piggin
2009-02-24 5:20 ` KAMEZAWA Hiroyuki
2009-02-24 5:20 ` KAMEZAWA Hiroyuki
2009-02-24 11:36 ` Mel Gorman
2009-02-24 11:36 ` Mel Gorman
2009-02-23 16:33 ` [PATCH 04/20] Convert gfp_zone() to use a table of precalculated values Mel Gorman
2009-02-23 16:33 ` Mel Gorman
2009-02-23 16:33 ` [PATCH 04/20] Convert gfp_zone() to use a table of precalculated value Christoph Lameter
2009-02-23 16:33 ` Christoph Lameter
2009-02-23 17:41 ` Mel Gorman
2009-02-23 17:41 ` Mel Gorman
2009-02-22 23:17 ` [PATCH 05/20] Check only once if the zonelist is suitable for the allocation Mel Gorman
2009-02-22 23:17 ` Mel Gorman
2009-02-22 23:17 ` [PATCH 06/20] Break up the allocator entry point into fast and slow paths Mel Gorman
2009-02-22 23:17 ` Mel Gorman
2009-02-22 23:17 ` [PATCH 07/20] Simplify the check on whether cpusets are a factor or not Mel Gorman
2009-02-22 23:17 ` Mel Gorman
2009-02-23 7:14 ` Pekka J Enberg
2009-02-23 7:14 ` Pekka J Enberg
2009-02-23 9:07 ` Peter Zijlstra
2009-02-23 9:07 ` Peter Zijlstra
2009-02-23 9:13 ` Pekka Enberg
2009-02-23 9:13 ` Pekka Enberg
2009-02-23 11:39 ` Mel Gorman
2009-02-23 11:39 ` Mel Gorman
2009-02-23 13:19 ` Pekka Enberg
2009-02-23 13:19 ` Pekka Enberg
2009-02-23 9:14 ` Li Zefan
2009-02-23 9:14 ` Li Zefan
2009-02-22 23:17 ` [PATCH 08/20] Move check for disabled anti-fragmentation out of fastpath Mel Gorman
2009-02-22 23:17 ` Mel Gorman
2009-02-22 23:17 ` [PATCH 09/20] Calculate the preferred zone for allocation only once Mel Gorman
2009-02-22 23:17 ` Mel Gorman
2009-02-22 23:17 ` [PATCH 10/20] Calculate the migratetype " Mel Gorman
2009-02-22 23:17 ` Mel Gorman
2009-02-22 23:17 ` [PATCH 11/20] Inline get_page_from_freelist() in the fast-path Mel Gorman
2009-02-22 23:17 ` Mel Gorman
2009-02-23 7:21 ` Pekka Enberg
2009-02-23 7:21 ` Pekka Enberg
2009-02-23 11:42 ` Mel Gorman
2009-02-23 11:42 ` Mel Gorman
2009-02-23 15:32 ` Nick Piggin
2009-02-23 15:32 ` Nick Piggin
2009-02-24 13:32 ` Mel Gorman
2009-02-24 13:32 ` Mel Gorman
2009-02-24 14:08 ` Nick Piggin
2009-02-24 14:08 ` Nick Piggin
2009-02-24 15:03 ` Mel Gorman
2009-02-24 15:03 ` Mel Gorman
2009-02-22 23:17 ` [PATCH 12/20] Inline __rmqueue_smallest() Mel Gorman
2009-02-22 23:17 ` Mel Gorman
2009-02-22 23:17 ` [PATCH 13/20] Inline buffered_rmqueue() Mel Gorman
2009-02-22 23:17 ` Mel Gorman
2009-02-23 7:24 ` Pekka Enberg
2009-02-23 7:24 ` Pekka Enberg
2009-02-23 11:44 ` Mel Gorman
2009-02-23 11:44 ` Mel Gorman
2009-02-22 23:17 ` [PATCH 14/20] Do not call get_pageblock_migratetype() more than necessary Mel Gorman
2009-02-22 23:17 ` Mel Gorman
2009-02-22 23:17 ` [PATCH 15/20] Do not disable interrupts in free_page_mlock() Mel Gorman
2009-02-22 23:17 ` Mel Gorman
2009-02-23 9:19 ` Peter Zijlstra
2009-02-23 9:19 ` Peter Zijlstra
2009-02-23 12:23 ` Mel Gorman
2009-02-23 12:23 ` Mel Gorman
2009-02-23 12:44 ` Peter Zijlstra
2009-02-23 12:44 ` Peter Zijlstra
2009-02-23 14:25 ` Mel Gorman
2009-02-23 14:25 ` Mel Gorman
2009-02-22 23:17 ` [PATCH 16/20] Do not setup zonelist cache when there is only one node Mel Gorman
2009-02-22 23:17 ` Mel Gorman
2009-02-22 23:17 ` [PATCH 17/20] Do not double sanity check page attributes during allocation Mel Gorman
2009-02-22 23:17 ` Mel Gorman
2009-02-22 23:17 ` [PATCH 18/20] Split per-cpu list into one-list-per-migrate-type Mel Gorman
2009-02-22 23:17 ` Mel Gorman
2009-02-22 23:17 ` [PATCH 19/20] Batch free pages from migratetype per-cpu lists Mel Gorman
2009-02-22 23:17 ` Mel Gorman
2009-02-22 23:17 ` [PATCH 20/20] Get rid of the concept of hot/cold page freeing Mel Gorman
2009-02-22 23:17 ` Mel Gorman
2009-02-23 9:37 ` Andrew Morton
2009-02-23 9:37 ` Andrew Morton
2009-02-23 23:30 ` Mel Gorman
2009-02-23 23:30 ` Mel Gorman
2009-02-23 23:53 ` Andrew Morton
2009-02-23 23:53 ` Andrew Morton
2009-02-24 11:51 ` Mel Gorman
2009-02-24 11:51 ` Mel Gorman
2009-02-25 0:01 ` Andrew Morton
2009-02-25 0:01 ` Andrew Morton
2009-02-25 16:01 ` Mel Gorman
2009-02-25 16:01 ` Mel Gorman
2009-02-25 16:19 ` Andrew Morton
2009-02-25 16:19 ` Andrew Morton
2009-02-26 16:37 ` Mel Gorman
2009-02-26 16:37 ` Mel Gorman
2009-02-26 17:00 ` Christoph Lameter
2009-02-26 17:00 ` Christoph Lameter
2009-02-26 17:15 ` Mel Gorman
2009-02-26 17:15 ` Mel Gorman
2009-02-26 17:30 ` Christoph Lameter
2009-02-26 17:30 ` Christoph Lameter
2009-02-27 11:33 ` Nick Piggin
2009-02-27 11:33 ` Nick Piggin
2009-02-27 15:40 ` Christoph Lameter
2009-02-27 15:40 ` Christoph Lameter
2009-03-03 13:52 ` Mel Gorman
2009-03-03 13:52 ` Mel Gorman
2009-03-03 18:53 ` Christoph Lameter
2009-03-03 18:53 ` Christoph Lameter
2009-02-27 11:38 ` Nick Piggin
2009-02-27 11:38 ` Nick Piggin
2009-03-01 10:37 ` KOSAKI Motohiro
2009-03-01 10:37 ` KOSAKI Motohiro
2009-02-25 18:33 ` Christoph Lameter
2009-02-25 18:33 ` Christoph Lameter
2009-02-22 23:57 ` Andi Kleen [this message]
2009-02-22 23:57 ` [RFC PATCH 00/20] Cleanup and optimise the page allocator Andi Kleen
2009-02-23 12:34 ` Mel Gorman
2009-02-23 12:34 ` Mel Gorman
2009-02-23 15:34 ` [RFC PATCH 00/20] Cleanup and optimise the page allocato Christoph Lameter
2009-02-23 15:34 ` Christoph Lameter
2009-02-23 0:02 ` [RFC PATCH 00/20] Cleanup and optimise the page allocator Andi Kleen
2009-02-23 0:02 ` Andi Kleen
2009-02-23 14:32 ` Mel Gorman
2009-02-23 14:32 ` Mel Gorman
2009-02-23 17:49 ` Andi Kleen
2009-02-23 17:49 ` Andi Kleen
2009-02-24 14:32 ` Mel Gorman
2009-02-24 14:32 ` Mel Gorman
2009-02-23 7:29 ` Pekka Enberg
2009-02-23 7:29 ` Pekka Enberg
2009-02-23 8:34 ` Zhang, Yanmin
2009-02-23 8:34 ` Zhang, Yanmin
2009-02-23 9:10 ` KOSAKI Motohiro
2009-02-23 9:10 ` KOSAKI Motohiro
2009-02-23 11:55 ` [PATCH] mm: gfp_to_alloc_flags() Peter Zijlstra
2009-02-23 11:55 ` Peter Zijlstra
2009-02-23 14:00 ` Pekka Enberg
2009-02-23 14:00 ` Pekka Enberg
2009-02-23 18:17 ` Mel Gorman
2009-02-23 18:17 ` Mel Gorman
2009-02-23 20:09 ` Peter Zijlstra
2009-02-23 20:09 ` Peter Zijlstra
2009-02-23 22:59 ` Andrew Morton
2009-02-23 22:59 ` Andrew Morton
2009-02-24 8:59 ` Peter Zijlstra
2009-02-24 8:59 ` Peter Zijlstra
2009-02-23 14:38 ` [RFC PATCH 00/20] Cleanup and optimise the page allocator Christoph Lameter
2009-02-23 14:38 ` Christoph Lameter
2009-02-23 14:46 ` Nick Piggin
2009-02-23 14:46 ` Nick Piggin
2009-02-23 15:00 ` Mel Gorman
2009-02-23 15:00 ` Mel Gorman
2009-02-23 15:22 ` Nick Piggin
2009-02-23 15:22 ` Nick Piggin
2009-02-23 20:26 ` Mel Gorman
2009-02-23 20:26 ` Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87prhauiry.fsf@basil.nowhere.org \
--to=andi@firstfloor.org \
--cc=cl@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mel@csn.ul.ie \
--cc=ming.m.lin@intel.com \
--cc=npiggin@suse.de \
--cc=penberg@cs.helsinki.fi \
--cc=riel@redhat.com \
--cc=yanmin_zhang@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.