From: Ravikiran G Thirumalai <kiran@in.ibm.com>
To: Dipankar Sarma <dipankar@in.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>,
Andrew Morton <akpm@digeo.com>,
linux-kernel@vger.kernel.org,
David Mosberger-Tang <davidm@hpl.hp.com>
Subject: Re: [PATCH 4/3] Replace dynamic percpu implementation
Date: Thu, 22 May 2003 13:44:23 +0530 [thread overview]
Message-ID: <20030522081423.GC27614@in.ibm.com> (raw)
In-Reply-To: <20030521103156.GB2861@in.ibm.com>
On Wed, May 21, 2003 at 04:01:56PM +0530, Dipankar Sarma wrote:
>...
> We will do some measurements with this but based on a large number
> of measurements that Kiran had done earlier, we can see a couple of things -
>
> 1. Even though a percpu scheme using pointer arithmatic has one less memory
> reference, the globally shared offset table is often in the cache
> and therefore pointer arithmatic offers no added advantage.
>
> 2. Increased sharing of cacheline helps by reducing associativity misses.
> We see this by comparing an interlaced allocator where only same
> sized objects share blocks vs. the current static allocator. Sharing of
> blocks by differently sized objects also allow cache lines to be
> kept warm as more subsystems in the kernel access them.
>
Here is the summary of my experiments with difft per-cpu allocator methods.
The following methods were compared
1. Static per-cpu areas
2. kmalloc_percpu with NR_CPUS pointers and one extra dereference -- the
current implementation (no interlace) (kmalloc_percpu_current)
3. kmalloc_percpu with pointer arithmetic, but no interlace
(kmalloc_percpu_new)
4. alloc_percpu using Rusty's block allocator and the shared offset table
(alloc_percpu_block)
Application used was speeding up vm_enough_memory using per-cpu counters
and reducing atomic_operataions. Benchmark used was kernbench. Profile
ticks on vm_enough_memory was used to compare allocator methods
(vm_acct_memory was made inline). This was on a 4 processor pIII xeon.
To summarise,
1. Static per-cpu areas was 6.5 % better that kmalloc_percpu_current
2. kmalloc_percpu_new and static per-cpu areas had similar results.
3. alloc_percpu results were similar to static per-cpu areas and
kmalloc_percpu_new
4. Extra dereferences in alloc_percpu were not significant, but alloc_percpu
was interlaced and kmalloc_percpu_new wasn't. Insn profile seemed to
indicate extra cost in memory dereferencing of alloc_percpu was
offset by the interlacing/objects sharing the same cacheline part.
but then insn profiles are only indicative...not accurate.
todo:
I have to see how a interlaced kmalloc_percpu with pointer arithmetic
fares in these tests (once i have it working) and the performace part
of the percpu allocators will be hopefully clear.
Thanks,
Kiran
next prev parent reply other threads:[~2003-05-22 7:51 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2003-05-20 4:32 [PATCH 4/3] Replace dynamic percpu implementation Rusty Russell
2003-05-21 10:31 ` Dipankar Sarma
2003-05-22 0:35 ` Rusty Russell
2003-05-22 8:14 ` Ravikiran G Thirumalai [this message]
2003-05-22 8:36 ` Rusty Russell
2003-05-22 10:49 ` Ravikiran G Thirumalai
2003-05-22 23:56 ` Rusty Russell
2003-05-23 7:23 ` Rusty Russell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20030522081423.GC27614@in.ibm.com \
--to=kiran@in.ibm.com \
--cc=akpm@digeo.com \
--cc=davidm@hpl.hp.com \
--cc=dipankar@in.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=rusty@rustcorp.com.au \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.