linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
To: Mel Gorman <mel@csn.ul.ie>
Cc: Linux Memory Management List <linux-mm@kvack.org>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Christoph Lameter <cl@linux-foundation.org>,
	Nick Piggin <npiggin@suse.de>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Lin Ming <ming.m.lin@intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Pekka Enberg <penberg@cs.helsinki.fi>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH 00/22] Cleanup and optimise the page allocator V7
Date: Mon, 27 Apr 2009 15:58:39 +0800	[thread overview]
Message-ID: <1240819119.2567.884.camel@ymzhang> (raw)
In-Reply-To: <1240408407-21848-1-git-send-email-mel@csn.ul.ie>

On Wed, 2009-04-22 at 14:53 +0100, Mel Gorman wrote:
> Here is V7 of the cleanup and optimisation of the page allocator and
> it should be ready for wider testing. Please consider a possibility for
> merging as a Pass 1 at making the page allocator faster. Other passes will
> occur later when this one has had a bit of exercise. This patchset is based
> on mmotm-2009-04-17 and I've tested it successfully on a small number of
> machines.
We ran some performance benchmarks against V7 patch on top of 2.6.30-rc3.
It seems some counters in kernel are incorrect after we run some ffsb (disk I/O benchmark)
and swap-cp (a simple swap memory testing by cp on tmpfs). Free memory is bigger than
total memory.

[ymzhang@lkp-st02-x8664 ~]$ uname -a
Linux lkp-st02-x8664 2.6.30-rc3-mgpage #1 SMP Thu Apr 23 16:09:43 CST 2009 x86_64 x86_64 x86_64 GNU/Linux
[ymzhang@lkp-st02-x8664 ~]$ free
             total       used       free     shared    buffers     cached
Mem:       8166564 18014398497625640   20022908          0    2364424     247224
-/+ buffers/cache: 18014398495013992   22634556
Swap:            0          0          0
[ymzhang@lkp-st02-x8664 ~]$ cat /proc/meminfo 
MemTotal:        8166564 kB
MemFree:        20022916 kB
Buffers:         2364424 kB
Cached:           247224 kB
SwapCached:            0 kB
Active:          2414520 kB
Inactive:         206168 kB
Active(anon):       4316 kB
Inactive(anon):     4932 kB
Active(file):    2410204 kB
Inactive(file):   201236 kB



[ymzhang@lkp-ne01 ~]$ uname -a
Linux lkp-ne01 2.6.30-rc3-mgpage #1 SMP Thu Apr 23 15:04:27 CST 2009 x86_64 x86_64 x86_64 GNU/Linux
[ymzhang@lkp-ne01 ~]$ free
             total       used       free     shared    buffers     cached
Mem:       6116356 18014398509340432    6257908          0     609804    1053512
-/+ buffers/cache: 18014398507677116    7921224
Swap:     15631204          0   15631204
[ymzhang@lkp-ne01 ~]$ cat /proc/meminfo 
MemTotal:        6116356 kB
MemFree:         6257948 kB
Buffers:          609804 kB
Cached:          1053512 kB
SwapCached:            0 kB
Active:           723152 kB



Or a simple kernel source cp/rm/cp:
[ymzhang@lkp-ne01 linux-2.6.30-rc3_melgorman]$ uname -a
Linux lkp-ne01 2.6.30-rc3-mgpage #1 SMP Thu Apr 23 15:04:27 CST 2009 x86_64 x86_64 x86_64 GNU/Linux
[ymzhang@lkp-ne01 linux-2.6.30-rc3_melgorman]$ free
             total       used       free     shared    buffers     cached
Mem:       6116356    1309940    4806416          0      82184    1259072
-/+ buffers/cache: 18014398509450668    6147672
Swap:     15631204          0   15631204
[ymzhang@lkp-ne01 linux-2.6.30-rc3_melgorman]$ cat /proc/meminfo 
MemTotal:        6116356 kB
MemFree:         4806724 kB
Buffers:           82184 kB
Cached:          1259072 kB
SwapCached:            0 kB
Active:           477396 kB
Inactive:         872388 kB
Active(anon):       8704 kB
Inactive(anon):        0 kB
Active(file):     468692 kB
Inactive(file):   872388 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:      15631204 kB
SwapFree:       15631204 kB
Dirty:                 0 kB
Writeback:             0 kB
AnonPages:          8632 kB
Mapped:             4140 kB
Slab:             174504 kB
SReclaimable:     154976 kB


> 
> The performance improvements are in a wide range depending on the exact
> machine but the results I've seen so fair are approximately;
> 
> kernbench:	0	to	 0.12% (elapsed time)
> 		0.49%	to	 3.20% (sys time)
> aim9:		-4%	to	30% (for page_test and brk_test)
> tbench:		-1%	to	 4%
> hackbench:	-2.5%	to	 3.45% (mostly within the noise though)
> netperf-udp	-1.34%  to	 4.06% (varies between machines a bit)
> netperf-tcp	-0.44%  to	 5.22% (varies between machines a bit)
> 
> I haven't sysbench figures at hand, but previously they were within the -0.5%
> to 2% range.
> 
> On netperf, the client and server were bound to opposite number CPUs to
> maximise the problems with cache line bouncing of the struct pages so I
> expect different people to report different results for netperf depending
> on their exact machine and how they ran the test (different machines, same
> cpus client/server, shared cache but two threads client/server, different
> socket client/server etc).
> 
> I also measured the vmlinux sizes for a single x86-based config with
> CONFIG_DEBUG_INFO enabled but not CONFIG_DEBUG_VM. The core of the .config
> is based on the Debian Lenny kernel config so I expect it to be reasonably
> typical.
> 
>    text	kernel
> 3355726 mmotm-20090417
> 3355718 0001-Replace-__alloc_pages_internal-with-__alloc_pages_.patch
> 3355622 0002-Do-not-sanity-check-order-in-the-fast-path.patch
> 3355574 0003-Do-not-check-NUMA-node-ID-when-the-caller-knows-the.patch
> 3355574 0004-Check-only-once-if-the-zonelist-is-suitable-for-the.patch
> 3355526 0005-Break-up-the-allocator-entry-point-into-fast-and-slo.patch
> 3355420 0006-Move-check-for-disabled-anti-fragmentation-out-of-fa.patch
> 3355452 0007-Calculate-the-preferred-zone-for-allocation-only-onc.patch
> 3355452 0008-Calculate-the-migratetype-for-allocation-only-once.patch
> 3355436 0009-Calculate-the-alloc_flags-for-allocation-only-once.patch
> 3355436 0010-Remove-a-branch-by-assuming-__GFP_HIGH-ALLOC_HIGH.patch
> 3355420 0011-Inline-__rmqueue_smallest.patch
> 3355420 0012-Inline-buffered_rmqueue.patch
> 3355420 0013-Inline-__rmqueue_fallback.patch
> 3355404 0014-Do-not-call-get_pageblock_migratetype-more-than-ne.patch
> 3355300 0015-Do-not-disable-interrupts-in-free_page_mlock.patch
> 3355300 0016-Do-not-setup-zonelist-cache-when-there-is-only-one-n.patch
> 3355188 0017-Do-not-check-for-compound-pages-during-the-page-allo.patch
> 3355161 0018-Use-allocation-flags-as-an-index-to-the-zone-waterma.patch
> 3355129 0019-Update-NR_FREE_PAGES-only-as-necessary.patch
> 3355129 0020-Get-the-pageblock-migratetype-without-disabling-inte.patch
> 3355129 0021-Use-a-pre-calculated-value-instead-of-num_online_nod.patch
> 
> Some patches were dropped in this revision because while I believe they
> improved performance, they also increase the text size so they need to
> be revisited in isolation to show they actually help things and by how
> much. Other than that, the biggest changes were cleaning up accidental
> functional changes identified by Kosaki Motohiro. Massive credit to him for a
> very defailed review of V6, to Christoph Lameter who reviewed earlier versions
> quite heavily and Pekka who kicked through V6 in quite a lot of detail.
> 
>  arch/ia64/hp/common/sba_iommu.c   |    2 
>  arch/ia64/kernel/mca.c            |    3 
>  arch/ia64/kernel/uncached.c       |    3 
>  arch/ia64/sn/pci/pci_dma.c        |    3 
>  arch/powerpc/platforms/cell/ras.c |    2 
>  arch/x86/kvm/vmx.c                |    2 
>  drivers/misc/sgi-gru/grufile.c    |    2 
>  drivers/misc/sgi-xp/xpc_uv.c      |    2 
>  include/linux/gfp.h               |   27 -
>  include/linux/mm.h                |    1 
>  include/linux/mmzone.h            |   11 
>  include/linux/nodemask.h          |   15 -
>  kernel/profile.c                  |    8 
>  mm/filemap.c                      |    2 
>  mm/hugetlb.c                      |    8 
>  mm/internal.h                     |   11 
>  mm/mempolicy.c                    |    2 
>  mm/migrate.c                      |    2 
>  mm/page_alloc.c                   |  555 ++++++++++++++++++++++++--------------
>  mm/slab.c                         |   11 
>  mm/slob.c                         |    4 
>  mm/slub.c                         |    2 
>  net/sunrpc/svc.c                  |    2 
>  23 files changed, 424 insertions(+), 256 deletions(-)
> 
> Changes since V6
>   o Remove unintentional functional changes when splitting into fast and slow paths
>   o Drop patch 7 for zonelist filtering as it modified when zlc_setup() is called
>     for the wrong reasons. The patch that avoids calling it for non-NUMA machines is
>     still there which has the bulk of the saving. cpusets is relatively small
>   o Drop an unnecessary check for in_interrupt() in gfp_to_alloc_flags()
>   o Clarify comment on __GFP_HIGH == ALLOC_HIGH
>   o Redefine the watermark mask to be expessed in terms of ALLOC_MARK_NOWATERMARK
>   o Use BUILD_BUG_ON for checking __GFP_HIGH == ALLOC_HIGH
>   o Drop some patches that were not reducing text sizes as expected
>   o Remove numa_platform from slab
> 
> Change since V5
>   o Rebase to mmotm-2009-04-17
> 
> Changes since V4
>   o Drop the more controversial patches for now and focus on the "obvious win"
>     material.
>   o Add reviewed-by notes
>   o Fix changelog entry to say __rmqueue_fallback instead __rmqueue
>   o Add unlikely() for the clearMlocked check
>   o Change where PGFREE is accounted in free_hot_cold_page() to have symmetry
>     with __free_pages_ok()
>   o Convert num_online_nodes() to use a static value so that callers do
>     not have to be individually updated
>   o Rebase to mmotm-2003-03-13
> 
> Changes since V3
>   o Drop the more controversial patches for now and focus on the "obvious win"
>     material
>   o Add reviewed-by notes
>   o Fix changelog entry to say __rmqueue_fallback instead __rmqueue
>   o Add unlikely() for the clearMlocked check
>   o Change where PGFREE is accounted in free_hot_cold_page() to have symmetry
>     with __free_pages_ok()
> 
> Changes since V2
>   o Remove brances by treating watermark flags as array indices
>   o Remove branch by assuming __GFP_HIGH == ALLOC_HIGH
>   o Do not check for compound on every page free
>   o Remove branch by always ensuring the migratetype is known on free
>   o Simplify buffered_rmqueue further
>   o Reintroduce improved version of batched bulk free of pcp pages
>   o Use allocation flags as an index to zone watermarks
>   o Work out __GFP_COLD only once
>   o Reduce the number of times zone stats are updated
>   o Do not dump reserve pages back into the allocator. Instead treat them
>     as MOVABLE so that MIGRATE_RESERVE gets used on the max-order-overlapped
>     boundaries without causing trouble
>   o Allow pages up to PAGE_ALLOC_COSTLY_ORDER to use the per-cpu allocator.
>     order-1 allocations are frequently enough in particular to justify this
>   o Rearrange inlining such that the hot-path is inlined but not in a way
>     that increases the text size of the page allocator
>   o Make the check for needing additional zonelist filtering due to NUMA
>     or cpusets as light as possible
>   o Do not destroy compound pages going to the PCP lists
>   o Delay the merging of buddies until a high-order allocation needs them
>     or anti-fragmentation is being forced to fallback
> 
> Changes since V1
>   o Remove the ifdef CONFIG_CPUSETS from inside get_page_from_freelist()
>   o Use non-lock bit operations for clearing the mlock flag
>   o Factor out alloc_flags calculation so it is only done once (Peter)
>   o Make gfp.h a bit prettier and clear-cut (Peter)
>   o Instead of deleting a debugging check, replace page_count() in the
>     free path with a version that does not check for compound pages (Nick)
>   o Drop the alteration for hot/cold page freeing until we know if it
>     helps or not
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2009-04-27  7:57 UTC|newest]

Thread overview: 93+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-22 13:53 [PATCH 00/22] Cleanup and optimise the page allocator V7 Mel Gorman
2009-04-22 13:53 ` [PATCH 01/22] Replace __alloc_pages_internal() with __alloc_pages_nodemask() Mel Gorman
2009-04-22 13:53 ` [PATCH 02/22] Do not sanity check order in the fast path Mel Gorman
2009-04-22 16:13   ` Dave Hansen
2009-04-22 17:11     ` Mel Gorman
2009-04-22 17:30       ` Dave Hansen
2009-04-23  0:13         ` Mel Gorman
2009-04-23  1:34           ` Dave Hansen
2009-04-23  9:58             ` Mel Gorman
2009-04-23 17:36               ` Dave Hansen
2009-04-24  2:57                 ` KOSAKI Motohiro
2009-04-24 10:34                 ` Mel Gorman
2009-04-24 14:16                   ` Dave Hansen
2009-04-23 19:26             ` Dave Hansen
2009-04-23 19:45               ` Dave Hansen
2009-04-24  9:21                 ` Mel Gorman
2009-04-24 14:25                   ` Dave Hansen
2009-04-22 20:11       ` David Rientjes
2009-04-22 20:20         ` Christoph Lameter
2009-04-23  7:44         ` Pekka Enberg
2009-04-23 22:44       ` Andrew Morton
2009-04-22 13:53 ` [PATCH 03/22] Do not check NUMA node ID when the caller knows the node is valid Mel Gorman
2009-04-22 13:53 ` [PATCH 04/22] Check only once if the zonelist is suitable for the allocation Mel Gorman
2009-04-22 13:53 ` [PATCH 05/22] Break up the allocator entry point into fast and slow paths Mel Gorman
2009-04-22 13:53 ` [PATCH 06/22] Move check for disabled anti-fragmentation out of fastpath Mel Gorman
2009-04-22 13:53 ` [PATCH 07/22] Calculate the preferred zone for allocation only once Mel Gorman
2009-04-23 22:48   ` Andrew Morton
2009-04-22 13:53 ` [PATCH 08/22] Calculate the migratetype " Mel Gorman
2009-04-22 13:53 ` [PATCH 09/22] Calculate the alloc_flags " Mel Gorman
2009-04-23 22:52   ` Andrew Morton
2009-04-24 10:47     ` Mel Gorman
2009-04-24 17:51       ` Andrew Morton
2009-04-22 13:53 ` [PATCH 10/22] Remove a branch by assuming __GFP_HIGH == ALLOC_HIGH Mel Gorman
2009-04-22 13:53 ` [PATCH 11/22] Inline __rmqueue_smallest() Mel Gorman
2009-04-22 13:53 ` [PATCH 12/22] Inline buffered_rmqueue() Mel Gorman
2009-04-22 13:53 ` [PATCH 13/22] Inline __rmqueue_fallback() Mel Gorman
2009-04-22 13:53 ` [PATCH 14/22] Do not call get_pageblock_migratetype() more than necessary Mel Gorman
2009-04-22 13:53 ` [PATCH 15/22] Do not disable interrupts in free_page_mlock() Mel Gorman
2009-04-23 22:59   ` Andrew Morton
2009-04-24  0:07     ` KOSAKI Motohiro
2009-04-24  0:33     ` KOSAKI Motohiro
2009-04-24 11:33       ` Mel Gorman
2009-04-24 11:52         ` Lee Schermerhorn
2009-04-24 11:18     ` Mel Gorman
2009-04-22 13:53 ` [PATCH 16/22] Do not setup zonelist cache when there is only one node Mel Gorman
2009-04-22 20:24   ` David Rientjes
2009-04-22 20:32     ` Lee Schermerhorn
2009-04-22 20:34       ` David Rientjes
2009-04-23  0:11         ` KOSAKI Motohiro
2009-04-23  0:19     ` Mel Gorman
2009-04-22 13:53 ` [PATCH 17/22] Do not check for compound pages during the page allocator sanity checks Mel Gorman
2009-04-22 13:53 ` [PATCH 18/22] Use allocation flags as an index to the zone watermark Mel Gorman
2009-04-22 17:11   ` Dave Hansen
2009-04-22 17:14     ` Mel Gorman
2009-04-22 17:47       ` Dave Hansen
2009-04-23  0:27         ` KOSAKI Motohiro
2009-04-23 10:03           ` Mel Gorman
2009-04-24  6:41             ` KOSAKI Motohiro
2009-04-22 20:06   ` David Rientjes
2009-04-23  0:29     ` Mel Gorman
2009-04-27 17:00     ` [RFC] Replace the watermark-related union in struct zone with a watermark[] array Mel Gorman
2009-04-27 20:48       ` David Rientjes
2009-04-27 20:54         ` Mel Gorman
2009-04-27 20:51           ` Christoph Lameter
2009-04-27 21:04           ` David Rientjes
2009-04-30 13:35             ` Mel Gorman
2009-04-30 13:48               ` Dave Hansen
2009-05-12 14:13                 ` [RFC] Replace the watermark-related union in struct zone with a watermark[] array V2 Mel Gorman
2009-05-12 15:05                   ` [RFC] Replace the watermark-related union in struct zone with awatermark[] " Dave Hansen
2009-05-13  8:31                   ` [RFC] Replace the watermark-related union in struct zone with a watermark[] " KOSAKI Motohiro
2009-04-22 13:53 ` [PATCH 19/22] Update NR_FREE_PAGES only as necessary Mel Gorman
2009-04-23 23:06   ` Andrew Morton
2009-04-23 23:04     ` Christoph Lameter
2009-04-24 13:06     ` Mel Gorman
2009-04-22 13:53 ` [PATCH 20/22] Get the pageblock migratetype without disabling interrupts Mel Gorman
2009-04-22 13:53 ` [PATCH 21/22] Use a pre-calculated value instead of num_online_nodes() in fast paths Mel Gorman
2009-04-22 23:04   ` David Rientjes
2009-04-23  0:44     ` Mel Gorman
2009-04-23 19:29       ` David Rientjes
2009-04-24 13:31         ` [PATCH] Do not override definition of node_set_online() with macro Mel Gorman
2009-04-22 13:53 ` [PATCH 22/22] slab: Use nr_online_nodes to check for a NUMA platform Mel Gorman
2009-04-22 14:37   ` Pekka Enberg
2009-04-27  7:58 ` Zhang, Yanmin [this message]
2009-04-27 14:38   ` [PATCH 00/22] Cleanup and optimise the page allocator V7 Mel Gorman
2009-04-28  1:59     ` Zhang, Yanmin
2009-04-28 10:27       ` Mel Gorman
2009-04-28 10:31       ` [PATCH] Properly account for freed pages in free_pages_bulk() and when allocating high-order pages in buffered_rmqueue() Mel Gorman
2009-04-28 16:37         ` Christoph Lameter
2009-04-28 16:51           ` Mel Gorman
2009-04-28 17:15             ` Hugh Dickins
2009-04-28 18:07               ` [PATCH] Properly account for freed pages in free_pages_bulk() and when allocating high-order pages in buffered_rmqueue() V2 Mel Gorman
2009-04-28 18:25                 ` Hugh Dickins
2009-04-28 18:36               ` [PATCH] Properly account for freed pages in free_pages_bulk() and when allocating high-order pages in buffered_rmqueue() Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1240819119.2567.884.camel@ymzhang \
    --to=yanmin_zhang@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux-foundation.org \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mel@csn.ul.ie \
    --cc=ming.m.lin@intel.com \
    --cc=npiggin@suse.de \
    --cc=penberg@cs.helsinki.fi \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).