From: Dipankar Sarma <woofwoof@hathway.com>
To: Andrew Morton <akpm@digeo.com>
Cc: dipankar@in.ibm.com, Maneesh Soni <maneesh@in.ibm.com>,
Al Viro <viro@math.psu.edu>, LKML <linux-kernel@vger.kernel.org>,
Anton Blanchard <anton@au1.ibm.com>,
Paul McKenney <paul.mckenney@us.ibm.com>
Subject: Re: dcache_rcu [performance results]
Date: Sat, 2 Nov 2002 14:43:06 +0530 [thread overview]
Message-ID: <20021102144306.A6736@dikhow> (raw)
In-Reply-To: <3DC32C03.C3910128@digeo.com>; from akpm@digeo.com on Fri, Nov 01, 2002 at 05:36:03PM -0800
On Fri, Nov 01, 2002 at 05:36:03PM -0800, Andrew Morton wrote:
> Dipankar Sarma wrote:
> > [ dcache-rcu ]
> > Anton (Blanchard) did some benchmarking with this
> > in a 24-way ppc64 box and the results showed why we need this patch.
> > Here are some performace comparisons based on a multi-user benchmark
> > that Anton ran with vanilla 2.5.40 and 2.5.40-mm.
> >
> > http://lse.sourceforge.net/locking/dcache/summary.png
> >
> simulates lots of developers doing developer things on a multiuser
> machine. Lots of compiling, groffing, etc.
>
> Why does the removal of `ps' from the test script make such a huge
> difference? That's silly, and we should fix it.
I have uploaded the profiles from Anton's benchmark runs -
http://lse.sourceforge.net/locking/dcache/results/2.5.40/200-base.html
http://lse.sourceforge.net/locking/dcache/results/2.5.40/200-base-nops.html
http://lse.sourceforge.net/locking/dcache/results/2.5.40/200-mm.html
http://lse.sourceforge.net/locking/dcache/results/2.5.40/200-mm-nops.html
A quick comparison of base and base-nops profiles show this -
base :
Hits Percentage Function
75185 100.00 total
11215 14.92 path_lookup <1.html>
8578 11.41 atomic_dec_and_lock <2.html>
5763 7.67 do_lookup <3.html>
5745 7.64 proc_pid_readlink <4.html>
4344 5.78 page_remove_rmap <5.html>
2144 2.85 page_add_rmap <6.html>
1587 2.11 link_path_walk <7.html>
1531 2.04 proc_check_root <8.html>
1461 1.94 save_remaining_regs <9.html>
1345 1.79 inode_change_ok <10.html>
1236 1.64 ext2_free_blocks <11.html>
1215 1.62 ext2_new_block <12.html>
1067 1.42 d_lookup <13.html>
base-no-ps :
Hits Percentage Function
50895 100.00 total
8222 16.15 page_remove_rmap <1.html>
3837 7.54 page_add_rmap <2.html>
2222 4.37 save_remaining_regs <3.html>
1618 3.18 release_pages <4.html>
1533 3.01 pSeries_flush_hash_range <5.html>
1446 2.84 do_page_fault <6.html>
1343 2.64 find_get_page <7.html>
1273 2.50 copy_page <8.html>
1228 2.41 copy_page_range <9.html>
1186 2.33 path_lookup <10.html>
1186 2.33 pSeries_insert_hpte <11.html>
1171 2.30 atomic_dec_and_lock <12.html>
1152 2.26 zap_pte_range <13.html>
841 1.65 do_generic_file_read <14.html>
Clearly dcache_lock is the killer when 'ps' command is used in
the benchmark. My guess (without looking at 'ps' code) is that
it has to open/close a lot of files in /proc and that increases
the number of acquisitions of dcache_lock. Increased # of acquisition
add to cache line bouncing and contention.
I should add that this is a general trend we see in all workloads
that do a lot of open/closes and so much so that performance is very
sensitive to how close to / your application's working directory
is. You would get much better system time if you compile a kernel
in /linux as compared to say /home/fs01/users/akpm/kernel/linux ;-)
> And it appears that dcache-rcu made a ~10% difference on a 24-way PPC64,
> yes? That is nice, and perhaps we should take that, but it is not a
> tremendous speedup.
Hmm.. based on Anton's graph it looked more like ~25% difference for
60 or more scripts. At 200 scripts it is ~27.6%. Without the ps
command, it seems more like ~4%.
Thanks
Dipankar
next prev parent reply other threads:[~2002-11-02 9:11 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-10-30 10:49 [PATCH 2.5.44] dcache_rcu Maneesh Soni
2002-10-31 10:53 ` dcache_rcu [performance results] Dipankar Sarma
2002-11-02 1:36 ` Andrew Morton
2002-11-02 9:13 ` Dipankar Sarma [this message]
2002-11-04 17:29 ` Martin J. Bligh
2002-11-05 0:00 ` jw schultz
2002-11-05 1:14 ` ps performance sucks (was Re: dcache_rcu [performance results]) Martin J. Bligh
2002-11-05 3:57 ` Werner Almesberger
2002-11-05 4:42 ` Erik Andersen
2002-11-05 5:44 ` Martin J. Bligh
2002-11-05 5:59 ` Alexander Viro
2002-11-05 6:05 ` Martin J. Bligh
2002-11-05 6:15 ` Robert Love
2002-11-05 6:13 ` Erik Andersen
2002-11-05 6:14 ` Werner Almesberger
2002-11-05 4:26 ` jw schultz
2002-11-05 5:51 ` Martin J. Bligh
2002-11-05 19:57 ` Kai Henningsen
2002-11-05 21:33 ` Erik Andersen
2002-11-05 22:09 ` Karim Yaghmour
[not found] <20021030161912.E2613@in.ibm.com.suse.lists.linux.kernel>
[not found] ` <20021031162330.B12797@in.ibm.com.suse.lists.linux.kernel>
[not found] ` <3DC32C03.C3910128@digeo.com.suse.lists.linux.kernel>
[not found] ` <20021102144306.A6736@dikhow.suse.lists.linux.kernel>
2002-11-02 10:08 ` dcache_rcu [performance results] Andi Kleen
2002-11-02 10:54 ` Dipankar Sarma
2002-11-02 11:01 ` Andi Kleen
2002-11-02 19:41 ` Linus Torvalds
2002-11-02 21:16 ` Sam Ravnborg
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20021102144306.A6736@dikhow \
--to=woofwoof@hathway.com \
--cc=akpm@digeo.com \
--cc=anton@au1.ibm.com \
--cc=dipankar@in.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=maneesh@in.ibm.com \
--cc=paul.mckenney@us.ibm.com \
--cc=viro@math.psu.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.