From: Andrea Arcangeli <andrea@suse.de>
To: "Martin J. Bligh" <mbligh@aracnet.com>
Cc: Hugh Dickins <hugh@veritas.com>, Andrew Morton <akpm@osdl.org>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH] anobjrmap 1/6 objrmap
Date: Wed, 24 Mar 2004 17:21:16 +0100 [thread overview]
Message-ID: <20040324162116.GQ2065@dualathlon.random> (raw)
In-Reply-To: <24560000.1080143798@[10.10.2.4]>
On Wed, Mar 24, 2004 at 07:56:39AM -0800, Martin J. Bligh wrote:
>
>
> --Andrea Arcangeli <andrea@suse.de> wrote (on Wednesday, March 24, 2004 07:19:57 +0100):
>
> > On Mon, Mar 22, 2004 at 07:53:02AM -0800, Martin J. Bligh wrote:
> >> Just against 2.6.5-rc1 virgin is easiest - that's what I was doing the
> >> rest of it against ...
> >
> > here it is:
> >
> > http://www.us.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.5-rc1/anon-vma-2.6.5-rc2-aa2.gz
> > http://www.us.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.5-rc1/objrmap-core-2.6.5-rc2-aa2.gz
> >
> >
>
> Yay, that works ;-) Without the rest of your tree, performance of anon_vma
> is almost exactly = anon_mm ... of course all this is under no mem pressure,
> I'll have to do some more tests on another machine without infinite ram to
> see what happens as we start to reclaim ;-)
excellent. under reclaim at least in theory you should see less cpu
utilization with anon_vma since the page links directly to the vmas.
> Kernbench: (make -j N vmlinux, where N = 2 x num_cpus)
> Elapsed System User CPU
> 2.6.5-rc1 45.75 102.49 577.39 1486.00
> 2.6.5-rc1-partial 44.84 85.75 576.63 1476.67
> 2.6.5-rc1-hugh 44.79 83.85 576.71 1474.67
> 2.6.5-rc1-anon_vma 44.66 83.69 577.14 1479.00
anonvma is the fastest here.
> Kernbench: (make -j N vmlinux, where N = 16 x num_cpus)
> Elapsed System User CPU
> 2.6.5-rc1 46.99 121.95 580.82 1495.33
> 2.6.5-rc1-partial 45.09 97.16 579.59 1501.00
> 2.6.5-rc1-hugh 45.00 95.45 579.05 1498.67
> 2.6.5-rc1-anon_vma 44.90 96.17 579.60 1503.67
here again the fastest.
>
> Kernbench: (make -j vmlinux, maximal tasks)
> Elapsed System User CPU
> 2.6.5-rc1 46.96 122.43 580.65 1495.00
> 2.6.5-rc1-partial 45.18 93.60 579.10 1488.33
> 2.6.5-rc1-hugh 44.89 91.04 578.49 1490.33
> 2.6.5-rc1-anon_vma 44.92 91.96 578.86 1493.33
here it's not the fastest (though a 0.03 difference should be in the
error range with an unlimited -j)
I also left a zillon of BUG_ON enabled (in cpu-bound fast paths, of
page_add_rmap/page-faults/pagecache etc..), those in theory can be all
removed.
> SDET 1 (see disclaimer)
> Throughput Std. Dev
> 2.6.5-rc1 100.0% 3.0%
> 2.6.5-rc1-partial 101.4% 1.3%
> 2.6.5-rc1-hugh 100.0% 2.9%
> 2.6.5-rc1-anon_vma 101.4% 1.9%
here it's as fast as plain objrmap.
> SDET 2 (see disclaimer)
> Throughput Std. Dev
> 2.6.5-rc1 100.0% 1.3%
> 2.6.5-rc1-partial 107.7% 1.0%
> 2.6.5-rc1-hugh 108.7% 1.5%
> 2.6.5-rc1-anon_vma 109.5% 0.7%
here it's the fastest.
> SDET 4 (see disclaimer)
> Throughput Std. Dev
> 2.6.5-rc1 100.0% 0.7%
> 2.6.5-rc1-partial 110.5% 0.6%
> 2.6.5-rc1-hugh 114.6% 1.3%
> 2.6.5-rc1-anon_vma 113.3% 0.3%
here a 1% slower though anonmm has 1% of standard deviation (higher than
all the others).
> SDET 8 (see disclaimer)
> Throughput Std. Dev
> 2.6.5-rc1 100.0% 0.9%
> 2.6.5-rc1-partial 119.4% 0.5%
> 2.6.5-rc1-hugh 120.2% 1.1%
> 2.6.5-rc1-anon_vma 119.6% 0.0%
here 1% slower tha anonmm though anonm still has a 1% std deviation.
it's interesting I get 0 standard deviation. Is it possible I get lower
standard deviation because you run it less times? just wondering. I'd
expect SDET has a default number of passes, so I expect the answer is no
of course.
> SDET 16 (see disclaimer)
> Throughput Std. Dev
> 2.6.5-rc1 100.0% 0.1%
> 2.6.5-rc1-partial 118.1% 0.2%
> 2.6.5-rc1-hugh 119.8% 0.4%
> 2.6.5-rc1-anon_vma 119.9% 0.8%
here the fastest.
> SDET 32 (see disclaimer)
> Throughput Std. Dev
> 2.6.5-rc1 100.0% 0.2%
> 2.6.5-rc1-partial 119.2% 1.0%
> 2.6.5-rc1-hugh 120.4% 0.4%
> 2.6.5-rc1-anon_vma 121.8% 0.6%
more than 1% faster.
> SDET 64 (see disclaimer)
> Throughput Std. Dev
> 2.6.5-rc1 100.0% 0.3%
> 2.6.5-rc1-partial 122.1% 0.5%
> 2.6.5-rc1-hugh 123.5% 0.4%
> 2.6.5-rc1-anon_vma 123.3% 0.8%
here .2% slower.
> SDET 128 (see disclaimer)
> Throughput Std. Dev
> 2.6.5-rc1 100.0% 0.2%
> 2.6.5-rc1-partial 123.1% 0.4%
> 2.6.5-rc1-hugh 124.7% 0.7%
> 2.6.5-rc1-anon_vma 123.9% 0.3%
around 1% slower here.
overall I think for the fast path we can conclude they're at least
equally fast.
Using Christoph's teqniques of splitting out the swapper_space checks
from the pagecache paths I can boost some more cpu cycle into anon_vma
btw (very low prio at this time, much better to keep the patch smaller
and more robust while it's out of the mainline tree).
> For interest's sake, here's the diffprofile for kernbench from
> anon_mm to the whole -aa tree ...
>
> 3808 25386.7% find_trylock_page
> 568 2704.8% pgd_alloc
> 273 74.2% dentry_open
> 125 11.2% file_move
> 106 23.1% do_page_cache_readahead
> 64 0.5% do_anonymous_page
> ...
> -64 -1.0% __copy_to_user_ll
> -66 -12.2% .text.lock.file_table
> -72 -0.8% __d_lookup
> -78 -3.9% path_lookup
> -84 -14.9% kmap_atomic
> -92 -11.0% pte_alloc_one
> -97 -13.7% generic_file_open
> -106 -11.2% kmem_cache_free
> -121 -13.2% release_pages
> -126 -12.6% page_add_rmap
> -137 -12.9% clear_page_tables
> -212 -7.2% zap_pte_range
> -235 -100.0% radix_tree_lookup
> -239 -12.5% buffered_rmqueue
> -268 -17.8% link_path_walk
> -291 -100.0% .text.lock.filemap
> -397 -20.8% page_remove_rmap
> -398 -100.0% pgd_ctor
> -461 -21.6% do_no_page
> -669 -1.4% default_idle
> -3508 -2.5% total
> -3719 -99.4% find_get_page
>
> zap_pte_range and page_remove_rmap and do_no_page are cheaper ... are we
> setting up and tearing down pages less frequently somehow? Would be
> curious to know which patch that is ...
it's one of the -mm patches probably that boosts those bits (the
cost page_add_rmap and the page faults should be the same with both
anon-vma and anonmm). as for the regression, the pgd_alloc slowdown is
the unslabify one from andrew that releases 8 bytes per page in 32bit
archs and 16 bytes per page in 64bit archs.
My current page_t is now 36 bytes (compared to 48bytes of 2.4) in 32bit
archs, and 56bytes on 64bit archs (hope I counted right this time, Hugh
says I'm counting wrong the page_t, methinks we were looking different
source trees instead but maybe I was really counting wrong ;).
next prev parent reply other threads:[~2004-03-24 16:20 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-03-18 23:21 [PATCH] anobjrmap 1/6 objrmap Hugh Dickins
2004-03-18 23:22 ` [PATCH] anobjrmap 2/6 linux/rmap.h Hugh Dickins
2004-03-18 23:23 ` [PATCH] anobjrmap 3/6 page->mapping Hugh Dickins
2004-03-18 23:25 ` [PATCH] anobjrmap 4/6 no pte_chains Hugh Dickins
2004-03-18 23:26 ` [PATCH] anobjrmap 5/6 anonmm Hugh Dickins
2004-03-19 16:15 ` Rik van Riel
2004-03-18 23:27 ` [PATCH] anobjrmap 6/6 cleanup Hugh Dickins
2004-03-19 2:42 ` [PATCH] anobjrmap 1/6 objrmap Andrea Arcangeli
2004-03-19 7:08 ` Hugh Dickins
2004-03-19 17:11 ` Martin J. Bligh
2004-03-20 12:30 ` Andrea Arcangeli
2004-03-20 14:03 ` William Lee Irwin III
2004-03-20 14:29 ` Andrea Arcangeli
2004-03-20 15:56 ` Martin J. Bligh
2004-03-20 16:19 ` Andrea Arcangeli
2004-03-20 16:40 ` Martin J. Bligh
2004-03-20 16:55 ` Andrea Arcangeli
2004-03-20 17:33 ` Martin J. Bligh
2004-03-20 18:50 ` Andrea Arcangeli
2004-03-21 16:30 ` Martin J. Bligh
2004-03-21 23:52 ` Andrea Arcangeli
2004-03-22 15:53 ` Martin J. Bligh
2004-03-24 6:19 ` Andrea Arcangeli
2004-03-24 15:56 ` Martin J. Bligh
2004-03-24 16:21 ` Andrea Arcangeli [this message]
2004-03-24 16:35 ` Martin J. Bligh
2004-03-24 17:08 ` Andrea Arcangeli
2004-03-24 20:00 ` William Lee Irwin III
2004-03-24 20:01 ` William Lee Irwin III
2004-03-24 20:17 ` William Lee Irwin III
2004-03-20 12:46 ` Andrea Arcangeli
2004-03-19 14:38 ` William Lee Irwin III
2004-03-22 20:37 ` [PATCH] anobjrmap 7/6 mremap moves Hugh Dickins
2004-03-22 21:52 ` Rajesh Venkatasubramanian
2004-03-26 14:29 ` [PATCH] anobjrmap 8/6 unmap nonlinear Hugh Dickins
2004-03-26 14:54 ` Realtek 8139too drivers Linux Kernel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20040324162116.GQ2065@dualathlon.random \
--to=andrea@suse.de \
--cc=akpm@osdl.org \
--cc=hugh@veritas.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mbligh@aracnet.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.