From: Eric Dumazet <dada1@cosmosbay.com>
To: Rusty Russell <rusty@rustcorp.com.au>
Cc: David Miller <davem@davemloft.net>,
rostedt@goodmis.org, akpm@linux-foundation.org,
linux-kernel@vger.kernel.org, mathieu.desnoyers@polymtl.ca,
paulus@samba.org, benh@kernel.crashing.org,
linux-ia64@vger.kernel.org, linux-s390@vger.kernel.org
Subject: Re: local_add_return
Date: Tue, 16 Dec 2008 23:59:00 +0000 [thread overview]
Message-ID: <494840C4.50000@cosmosbay.com> (raw)
In-Reply-To: <200812170908.05423.rusty@rustcorp.com.au>
Rusty Russell a Ècrit :
> On Tuesday 16 December 2008 17:43:14 David Miller wrote:
>> Here ya go:
>
> Very interesting. There's a little noise there (that first local_inc of 243
> is wrong), but the picture is clear: trivalue is the best implementation for
> sparc64.
>
> Note: trivalue uses 3 values, so instead of hitting random values across 8MB
> it's across 24MB, and despite the resulting cache damage it's 15% faster. The
> cpu_local_inc test is a single value, so no cache effects: it shows trivalue
> to be 3 to 3.5 times faster in the cache-hot case.
>
> This sucks, because it really does mean that there's no one-size-fits-all
> implementation of local_t. There's also no platform yet where atomic_long_t
> is the right choice; and that's the default!
>
> Any chance of an IA64 or s390 run? You can normalize if you like, since
> it's only to compare the different approaches.
>
> Cheers,
> Rusty.
>
> Benchmarks for local_t variants
>
> (This patch also fixes the x86 cpu_local_* macros, which are obviously
> unused).
>
> I chose a large array (1M longs) for the inc/add/add_return tests so
> the trivalue case would show some cache pressure.
>
> The cpu_local_inc case is always cache-hot, so it's not comparable to
> the others.
Would be good to differenciate results, if data is already in cache or not...
>
> Time in ns per iteration (brackets is with CONFIG_PREEMPT=y):
>
> inc add add_return cpu_local_inc read
> x86-32: 2.13 Ghz Core Duo 2
> atomic_long 118 118 115 17 17
really strange atomic_long performs so badly here.
LOCK + data not in cache -> really really bad...
> irqsave/rest 77 78 77 23 16
> trivalue 45 45 127 3(6) 21
> local_t 36 36 36 1(5) 17
>
> x86-64: 2.6 GHz Dual-Core AMD Opteron 2218
> atomic_long 55 60 - 6 19
> irqsave/rest 54 54 - 11 19
> trivalue 47 47 - 5 28
> local_t 47 46 - 1 19
>
Running local_t variant benchmarks
atomic_long: local_inc95001846/11 local_add95000325/11 cpu_local_inc62000295/10 local_readI000040/1 local_add_return96000322/11 (total was 1728053248)
irqsave/restore: local_incI8000400/14 local_addI6000395/14 cpu_local_incH6000384/14 local_readh000054/2 local_add_returnP2000394/14 (total was 1728053248)
trivalue: local_inc\x1325001024/39 local_add\x1324001226/39 cpu_local_incÅ000080/2 local_readx6000766/23 local_add_returnA93003781/124 (total was 1728053248)
local_t: local_inci000059/2 local_addi000058/2 cpu_local_incB000035/1 local_readP000043/1 local_add_returnê000076/2 (total was 1728053248, warm_total 62914562)
Intel(R) Xeon(R) CPU E5450 @ 3.00GHz
two quadcore cpus, x86-32 kernel
It seems Core2 are really better than Core Duo 2,
or their cache is big enough to hold the array of your test...
(at least for l1 & l2, their 4Mbytes working set fits in cache)
processor : 7
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Xeon(R) CPU E5450 @ 3.00GHz
stepping : 6
cpu MHz : 3000.099
cache size : 6144 KB <<<< yes, thats big :) >>>>
If I double size of working set
#define NUM_LOCAL_TEST (2*1024*1024)
then I get quite different numbers :
Running local_t variant benchmarks
atomic_long: local_incg29007264/100 local_addg27005943/100 cpu_local_incr4000569/10 local_read\x1030000784/15 local
_add_returnf23004616/98 (total was 3456106496)
irqsave/restore: local_incD58002796/66 local_addD59001998/66 cpu_local_incó1000381/14 local_read\x1060000389/15 loc
al_add_returnE28001388/67 (total was 3456106496)
trivalue: local_inc(71000855/42 local_add(67000976/42 cpu_local_inc\x162000052/2 local_read\x1747000551/26 local_add_r
eturnà29002352/131 (total was 3456106496)
local_t: local_inc"10000492/32 local_add"06000460/32 cpu_local_incÑ000017/1 local_read\x1029000203/15 local_add_ret
urn"16000415/33 (total was 3456106496, warm_total 125829124)
If now I reduce NUM_LOCAL_TEST to 256*1024 so that even trivalue l3 fits cache.
Running local_t variant benchmarks
atomic_long: local_incò984929/11 local_addò984889/11 cpu_local_incâ986248/10 local_read\x11998165/1 local_add_retur
nô003292/11 (total was 2579496960)
irqsave/restore: local_inc\x124000102/14 local_add\x124000102/14 cpu_local_inc\x121000100/14 local_read\x17000013/2 local_ad
d_return\x126000103/15 (total was 2579496960)
trivalue: local_inc!000017/2 local_add 000016/2 cpu_local_inc 000017/2 local_read%000021/2 local_add_return\x1360
00110/16 (total was 2579496960)
local_t: local_inc\x17000014/2 local_add\x17000015/2 cpu_local_inc\x11000009/1 local_read\x12000010/1 local_add_return#000
019/2 (total was 2579496960, warm_total 15728642)
About trivalues, their use in percpu_counter local storage (one trivalue for each cpu)
would make the accuracy a litle bit more lazy...
--
To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2008-12-16 23:59 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <alpine.DEB.1.10.0812150823370.18692@gandalf.stny.rr.com>
[not found] ` <200812161703.00697.rusty@rustcorp.com.au>
[not found] ` <20081215.231314.92267481.davem@davemloft.net>
2008-12-16 22:50 ` local_add_return Rusty Russell
2008-12-16 23:25 ` local_add_return Luck, Tony
2008-12-16 23:43 ` local_add_return Heiko Carstens
2008-12-16 23:59 ` Eric Dumazet [this message]
2008-12-17 0:01 ` local_add_return Mathieu Desnoyers
2008-12-18 22:53 ` local_add_return Rusty Russell
2008-12-19 3:35 ` local_add_return Mathieu Desnoyers
2008-12-19 5:54 ` local_add_return Rusty Russell
2008-12-19 17:06 ` local_add_return Mathieu Desnoyers
2008-12-20 1:45 ` local_add_return Rusty Russell
2008-12-22 18:43 ` local_add_return Mathieu Desnoyers
2008-12-24 11:54 ` local_add_return Rusty Russell
2008-12-24 18:53 ` local_add_return Mathieu Desnoyers
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=494840C4.50000@cosmosbay.com \
--to=dada1@cosmosbay.com \
--cc=akpm@linux-foundation.org \
--cc=benh@kernel.crashing.org \
--cc=davem@davemloft.net \
--cc=linux-ia64@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=mathieu.desnoyers@polymtl.ca \
--cc=paulus@samba.org \
--cc=rostedt@goodmis.org \
--cc=rusty@rustcorp.com.au \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox