* Re: dcache_rcu [performance results]
2002-10-30 10:49 [PATCH 2.5.44] dcache_rcu Maneesh Soni
@ 2002-10-31 10:53 ` Dipankar Sarma
2002-11-02 1:36 ` Andrew Morton
0 siblings, 1 reply; 10+ messages in thread
From: Dipankar Sarma @ 2002-10-31 10:53 UTC (permalink / raw)
To: Maneesh Soni; +Cc: Al Viro, LKML, Andrew Morton, Anton Blanchard, Paul McKenney
On Wed, Oct 30, 2002 at 04:19:12PM +0530, Maneesh Soni wrote:
> Hello Viro,
>
> Please consider forwarding the following patch ito Linus for dcache lookup
> using Read Copy Update. The patch has been there in -mm kernel since
> 2.5.37-mm1. The patch is stable. A couple of bugs reported are solved. It
> helps a great deal on higher end SMP machines and there is no performance
> regression on UP and lower end SMP machines as seen in Dipankar's kernbench
> numbers.
>
> http://marc.theaimsgroup.com/?l=linux-kernel&m=103462075416638&w=2
>
Anton (Blanchard) did some benchmarking with this
in a 24-way ppc64 box and the results showed why we need this patch.
Here are some performace comparisons based on a multi-user benchmark
that Anton ran with vanilla 2.5.40 and 2.5.40-mm.
http://lse.sourceforge.net/locking/dcache/summary.png
base = 2.5.40
base-nops = 2.5.40 but ps command in benchmark scripts commented out
mm = 2.5.40-mm
mm-nops = 2.5.40-mm but ps command in benchmark scripts commented out
Here is a profile output snippet of base and mm runs at 200 scripts -
base :
Hits Percentage Function
------------------------
75185 100.00 total
11215 14.92 path_lookup
8578 11.41 atomic_dec_and_lock
5763 7.67 do_lookup
5745 7.64 proc_pid_readlink
4344 5.78 page_remove_rmap
2144 2.85 page_add_rmap
1587 2.11 link_path_walk
1531 2.04 proc_check_root
1461 1.94 save_remaining_regs
1345 1.79 inode_change_ok
1236 1.64 ext2_free_blocks
1215 1.62 ext2_new_block
1067 1.42 d_lookup
1053 1.40 number
907 1.21 release_pages
mm :
Hits Percentage Function
62369 100.00 total
5802 9.30 page_remove_rmap
4092 6.56 atomic_dec_and_lock
3887 6.23 proc_pid_readlink
3207 5.14 follow_mount
2979 4.78 page_add_rmap
2066 3.31 save_remaining_regs
1856 2.98 d_lookup
1629 2.61 number
1235 1.98 release_pages
1168 1.87 pSeries_flush_hash_range
1154 1.85 do_page_fault
1026 1.65 copy_page
1009 1.62 path_lookup
Thanks
--
Dipankar Sarma <dipankar@in.ibm.com> http://lse.sourceforge.net
Linux Technology Center, IBM Software Lab, Bangalore, India.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: dcache_rcu [performance results]
2002-10-31 10:53 ` dcache_rcu [performance results] Dipankar Sarma
@ 2002-11-02 1:36 ` Andrew Morton
2002-11-02 9:13 ` Dipankar Sarma
0 siblings, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2002-11-02 1:36 UTC (permalink / raw)
To: dipankar; +Cc: Maneesh Soni, Al Viro, LKML, Anton Blanchard, Paul McKenney
Dipankar Sarma wrote:
>
> [ dcache-rcu ]
>
> Anton (Blanchard) did some benchmarking with this
> in a 24-way ppc64 box and the results showed why we need this patch.
> Here are some performace comparisons based on a multi-user benchmark
> that Anton ran with vanilla 2.5.40 and 2.5.40-mm.
>
> http://lse.sourceforge.net/locking/dcache/summary.png
>
> base = 2.5.40
> base-nops = 2.5.40 but ps command in benchmark scripts commented out
> mm = 2.5.40-mm
> mm-nops = 2.5.40-mm but ps command in benchmark scripts commented out
>
I'm going to need some help understanding what's going on in
there. I assume the test is SDET (there, I said it), which
simulates lots of developers doing developer things on a multiuser
machine. Lots of compiling, groffing, etc.
Why does the removal of `ps' from the test script make such a huge
difference? That's silly, and we should fix it.
And it appears that dcache-rcu made a ~10% difference on a 24-way PPC64,
yes? That is nice, and perhaps we should take that, but it is not a
tremendous speedup.
Thanks.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: dcache_rcu [performance results]
2002-11-02 1:36 ` Andrew Morton
@ 2002-11-02 9:13 ` Dipankar Sarma
2002-11-04 17:29 ` Martin J. Bligh
0 siblings, 1 reply; 10+ messages in thread
From: Dipankar Sarma @ 2002-11-02 9:13 UTC (permalink / raw)
To: Andrew Morton
Cc: dipankar, Maneesh Soni, Al Viro, LKML, Anton Blanchard,
Paul McKenney
On Fri, Nov 01, 2002 at 05:36:03PM -0800, Andrew Morton wrote:
> Dipankar Sarma wrote:
> > [ dcache-rcu ]
> > Anton (Blanchard) did some benchmarking with this
> > in a 24-way ppc64 box and the results showed why we need this patch.
> > Here are some performace comparisons based on a multi-user benchmark
> > that Anton ran with vanilla 2.5.40 and 2.5.40-mm.
> >
> > http://lse.sourceforge.net/locking/dcache/summary.png
> >
> simulates lots of developers doing developer things on a multiuser
> machine. Lots of compiling, groffing, etc.
>
> Why does the removal of `ps' from the test script make such a huge
> difference? That's silly, and we should fix it.
I have uploaded the profiles from Anton's benchmark runs -
http://lse.sourceforge.net/locking/dcache/results/2.5.40/200-base.html
http://lse.sourceforge.net/locking/dcache/results/2.5.40/200-base-nops.html
http://lse.sourceforge.net/locking/dcache/results/2.5.40/200-mm.html
http://lse.sourceforge.net/locking/dcache/results/2.5.40/200-mm-nops.html
A quick comparison of base and base-nops profiles show this -
base :
Hits Percentage Function
75185 100.00 total
11215 14.92 path_lookup <1.html>
8578 11.41 atomic_dec_and_lock <2.html>
5763 7.67 do_lookup <3.html>
5745 7.64 proc_pid_readlink <4.html>
4344 5.78 page_remove_rmap <5.html>
2144 2.85 page_add_rmap <6.html>
1587 2.11 link_path_walk <7.html>
1531 2.04 proc_check_root <8.html>
1461 1.94 save_remaining_regs <9.html>
1345 1.79 inode_change_ok <10.html>
1236 1.64 ext2_free_blocks <11.html>
1215 1.62 ext2_new_block <12.html>
1067 1.42 d_lookup <13.html>
base-no-ps :
Hits Percentage Function
50895 100.00 total
8222 16.15 page_remove_rmap <1.html>
3837 7.54 page_add_rmap <2.html>
2222 4.37 save_remaining_regs <3.html>
1618 3.18 release_pages <4.html>
1533 3.01 pSeries_flush_hash_range <5.html>
1446 2.84 do_page_fault <6.html>
1343 2.64 find_get_page <7.html>
1273 2.50 copy_page <8.html>
1228 2.41 copy_page_range <9.html>
1186 2.33 path_lookup <10.html>
1186 2.33 pSeries_insert_hpte <11.html>
1171 2.30 atomic_dec_and_lock <12.html>
1152 2.26 zap_pte_range <13.html>
841 1.65 do_generic_file_read <14.html>
Clearly dcache_lock is the killer when 'ps' command is used in
the benchmark. My guess (without looking at 'ps' code) is that
it has to open/close a lot of files in /proc and that increases
the number of acquisitions of dcache_lock. Increased # of acquisition
add to cache line bouncing and contention.
I should add that this is a general trend we see in all workloads
that do a lot of open/closes and so much so that performance is very
sensitive to how close to / your application's working directory
is. You would get much better system time if you compile a kernel
in /linux as compared to say /home/fs01/users/akpm/kernel/linux ;-)
> And it appears that dcache-rcu made a ~10% difference on a 24-way PPC64,
> yes? That is nice, and perhaps we should take that, but it is not a
> tremendous speedup.
Hmm.. based on Anton's graph it looked more like ~25% difference for
60 or more scripts. At 200 scripts it is ~27.6%. Without the ps
command, it seems more like ~4%.
Thanks
Dipankar
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: dcache_rcu [performance results]
[not found] ` <20021102144306.A6736@dikhow.suse.lists.linux.kernel>
@ 2002-11-02 10:08 ` Andi Kleen
2002-11-02 10:54 ` Dipankar Sarma
0 siblings, 1 reply; 10+ messages in thread
From: Andi Kleen @ 2002-11-02 10:08 UTC (permalink / raw)
To: Dipankar Sarma; +Cc: linux-kernel
Dipankar Sarma <woofwoof@hathway.com> writes:
>
> I should add that this is a general trend we see in all workloads
> that do a lot of open/closes and so much so that performance is very
> sensitive to how close to / your application's working directory
> is. You would get much better system time if you compile a kernel
> in /linux as compared to say /home/fs01/users/akpm/kernel/linux ;-)
That's interesting. Perhaps it would make sense to have a fast path
that just does a string match of the to be looked up path to a cached copy
of cwd and if it matches works as if cwd was the root. Would need to be
careful with chroot where cwd could be outside the root and clear the
cached copy in this case. Then you could avoid all the locking overhead
for directories above your cwd if you stay in there.
-Andi
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: dcache_rcu [performance results]
2002-11-02 10:08 ` dcache_rcu [performance results] Andi Kleen
@ 2002-11-02 10:54 ` Dipankar Sarma
2002-11-02 11:01 ` Andi Kleen
0 siblings, 1 reply; 10+ messages in thread
From: Dipankar Sarma @ 2002-11-02 10:54 UTC (permalink / raw)
To: Andi Kleen; +Cc: linux-kernel
On Sat, Nov 02, 2002 at 11:08:44AM +0100, Andi Kleen wrote:
> Dipankar Sarma <woofwoof@hathway.com> writes:
> >
> > I should add that this is a general trend we see in all workloads
> > that do a lot of open/closes and so much so that performance is very
> > sensitive to how close to / your application's working directory
> > is. You would get much better system time if you compile a kernel
> > in /linux as compared to say /home/fs01/users/akpm/kernel/linux ;-)
>
> That's interesting. Perhaps it would make sense to have a fast path
> that just does a string match of the to be looked up path to a cached copy
> of cwd and if it matches works as if cwd was the root. Would need to be
> careful with chroot where cwd could be outside the root and clear the
> cached copy in this case. Then you could avoid all the locking overhead
> for directories above your cwd if you stay in there.
Well, on second thoughts I can't see why the path length for pwd
would make difference for kernel compilation - it uses relative
path and for path lookup, if the first character is not '/', then
lookup is done relative to current->fs->pwd. I will do some more
benchmarking on and verify.
I did get inputs from Troy Wilson who does specweb measurements
that the path name length of the location of the served files
make a difference. I presume his webserver setup used full path names.
Thanks
Dipankar
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: dcache_rcu [performance results]
2002-11-02 10:54 ` Dipankar Sarma
@ 2002-11-02 11:01 ` Andi Kleen
2002-11-02 19:41 ` Linus Torvalds
0 siblings, 1 reply; 10+ messages in thread
From: Andi Kleen @ 2002-11-02 11:01 UTC (permalink / raw)
To: Dipankar Sarma; +Cc: Andi Kleen, linux-kernel
> Well, on second thoughts I can't see why the path length for pwd
> would make difference for kernel compilation - it uses relative
> path and for path lookup, if the first character is not '/', then
> lookup is done relative to current->fs->pwd. I will do some more
> benchmarking on and verify.
Kernel compilation actually uses absolute pathnames e.g. for dependency
checking. TOPDIR is also specified absolutely, so an include access likely
uses an absolute pathname too.
-Andi
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: dcache_rcu [performance results]
2002-11-02 11:01 ` Andi Kleen
@ 2002-11-02 19:41 ` Linus Torvalds
2002-11-02 21:16 ` Sam Ravnborg
0 siblings, 1 reply; 10+ messages in thread
From: Linus Torvalds @ 2002-11-02 19:41 UTC (permalink / raw)
To: linux-kernel
In article <20021102120155.A17591@wotan.suse.de>,
Andi Kleen <ak@suse.de> wrote:
>
>Kernel compilation actually uses absolute pathnames e.g. for dependency
>checking.
This used to be true, but it shouldn't be true any more. TOPDIR should
be gone, and everything should be relative paths (and all "make"
invocations should just be done from the top kernel directory).
But yes, it certainly _used_ to be true (and hey, maybe I've missed some
reason for why it isn't still true).
Linus
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: dcache_rcu [performance results]
2002-11-02 19:41 ` Linus Torvalds
@ 2002-11-02 21:16 ` Sam Ravnborg
0 siblings, 0 replies; 10+ messages in thread
From: Sam Ravnborg @ 2002-11-02 21:16 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-kernel
On Sat, Nov 02, 2002 at 07:41:34PM +0000, Linus Torvalds wrote:
> >Kernel compilation actually uses absolute pathnames e.g. for dependency
> >checking.
>
> This used to be true, but it shouldn't be true any more. TOPDIR should
> be gone, and everything should be relative paths (and all "make"
> invocations should just be done from the top kernel directory).
>
> But yes, it certainly _used_ to be true (and hey, maybe I've missed some
> reason for why it isn't still true).
If there is any dependency left on absolute paths thats a bug.
I have tested this by doing a full make and copy the tree.
When executing make again nothing got rebuild - so it is OK for the
general case.
But please report it if you see something in contradiction with that.
Sam
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: dcache_rcu [performance results]
2002-11-02 9:13 ` Dipankar Sarma
@ 2002-11-04 17:29 ` Martin J. Bligh
2002-11-05 0:00 ` jw schultz
0 siblings, 1 reply; 10+ messages in thread
From: Martin J. Bligh @ 2002-11-04 17:29 UTC (permalink / raw)
To: woofwoof, Andrew Morton
Cc: dipankar, Maneesh Soni, Al Viro, LKML, Anton Blanchard,
Paul McKenney
> Clearly dcache_lock is the killer when 'ps' command is used in
> the benchmark. My guess (without looking at 'ps' code) is that
> it has to open/close a lot of files in /proc and that increases
> the number of acquisitions of dcache_lock. Increased # of acquisition
> add to cache line bouncing and contention.
Strace it - IIRC it does 5 opens per PID. Vomit.
M.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: dcache_rcu [performance results]
2002-11-04 17:29 ` Martin J. Bligh
@ 2002-11-05 0:00 ` jw schultz
0 siblings, 0 replies; 10+ messages in thread
From: jw schultz @ 2002-11-05 0:00 UTC (permalink / raw)
To: LKML
On Mon, Nov 04, 2002 at 09:29:14AM -0800, Martin J. Bligh wrote:
> > Clearly dcache_lock is the killer when 'ps' command is used in
> > the benchmark. My guess (without looking at 'ps' code) is that
> > it has to open/close a lot of files in /proc and that increases
> > the number of acquisitions of dcache_lock. Increased # of acquisition
> > add to cache line bouncing and contention.
>
> Strace it - IIRC it does 5 opens per PID. Vomit.
I just did, had the same reaction. This is ugly.
It opens stat, statm, status, cmdline and environ apparently
regardless of what will be in the ouput. At least environ
will fail on most pids if you aren't root, saving on some of
the overhead. It compunds this by doing so for every pid
even if you have explicitly requested only one pid by
number.
Clearly ps could do with a cleanup. There is no reason to
read environ if it wasn't asked for. Deciding which files
are needed based on the command line options would be a
start.
I'm thinking that ps, top and company are good reasons to
make an exception of one value per file in proc. Clearly
open+read+close of 3-5 "files" each extracting data from
task_struct isn't more efficient than one "file" that
generates the needed data one field per line.
Don't get me wrong. I believe in the one field per file
rule but ps &co are the exception that proves (tests) the
rule. Especially on the heavily laden systems with
tens of thousands of tasks. We could do with a something
between /dev/kmem and five files per pid.
--
________________________________________________________________
J.W. Schultz Pegasystems Technologies
email address: jw@pegasys.ws
Remember Cernan and Schmitt
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2002-11-04 23:54 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20021030161912.E2613@in.ibm.com.suse.lists.linux.kernel>
[not found] ` <20021031162330.B12797@in.ibm.com.suse.lists.linux.kernel>
[not found] ` <3DC32C03.C3910128@digeo.com.suse.lists.linux.kernel>
[not found] ` <20021102144306.A6736@dikhow.suse.lists.linux.kernel>
2002-11-02 10:08 ` dcache_rcu [performance results] Andi Kleen
2002-11-02 10:54 ` Dipankar Sarma
2002-11-02 11:01 ` Andi Kleen
2002-11-02 19:41 ` Linus Torvalds
2002-11-02 21:16 ` Sam Ravnborg
2002-10-30 10:49 [PATCH 2.5.44] dcache_rcu Maneesh Soni
2002-10-31 10:53 ` dcache_rcu [performance results] Dipankar Sarma
2002-11-02 1:36 ` Andrew Morton
2002-11-02 9:13 ` Dipankar Sarma
2002-11-04 17:29 ` Martin J. Bligh
2002-11-05 0:00 ` jw schultz
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox