* L1-dcache-stores twice as big as expected
@ 2015-04-30 19:37 Patrick
2015-04-30 20:00 ` Vince Weaver
2015-04-30 20:14 ` Vince Weaver
0 siblings, 2 replies; 5+ messages in thread
From: Patrick @ 2015-04-30 19:37 UTC (permalink / raw)
To: linux-perf-users
Hello,
I have a simple piece of code that I am analyzing with perf:
int main( int argc, char*argv[] ) {
if( argc < 2 ) {
cout<<"Error: need size argument.\n";
return 1;
}
uint64_t sz = strtoull( argv[1],NULL,10);
uint8_t *a;
a = new uint8_t[sz];
for(int i=0;i<sz;i++ ) {
a[i] = 1;
}
return 0;
}
When I run perf like this:
-> perf stat -e L1-dcache-stores:u ./copy 1048576
I get the following output:
-> Performance counter stats for './copy 1048576':
-> 2,207,859 L1-dcache-stores
-> 0.006255441 seconds time elapsed
I can't figure out why it is recording over 2 million stores to the L1
data cache. I
would expect it to be around 1 million. Has anyone seen this before?
Any help is appreciated.
Patrick
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: L1-dcache-stores twice as big as expected 2015-04-30 19:37 L1-dcache-stores twice as big as expected Patrick @ 2015-04-30 20:00 ` Vince Weaver 2015-04-30 20:14 ` Vince Weaver 1 sibling, 0 replies; 5+ messages in thread From: Vince Weaver @ 2015-04-30 20:00 UTC (permalink / raw) To: Patrick; +Cc: linux-perf-users On Thu, 30 Apr 2015, Patrick wrote: > When I run perf like this: > > -> perf stat -e L1-dcache-stores:u ./copy 1048576 > > I get the following output: > > -> Performance counter stats for './copy 1048576': > -> 2,207,859 L1-dcache-stores > -> 0.006255441 seconds time elapsed > what type of processor are you running this on? What compiler and compiler options? I'm actually surprised the compiler isn't just optimizing away your array since nothing is using it. Vince ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: L1-dcache-stores twice as big as expected 2015-04-30 19:37 L1-dcache-stores twice as big as expected Patrick 2015-04-30 20:00 ` Vince Weaver @ 2015-04-30 20:14 ` Vince Weaver 2015-04-30 20:23 ` Patrick 1 sibling, 1 reply; 5+ messages in thread From: Vince Weaver @ 2015-04-30 20:14 UTC (permalink / raw) To: Patrick; +Cc: linux-perf-users On Thu, 30 Apr 2015, Patrick wrote: > -> 2,207,859 L1-dcache-stores > -> 0.006255441 seconds time elapsed > > I can't figure out why it is recording over 2 million stores to the L1 > data cache. I > would expect it to be around 1 million. Has anyone seen this before? also it looks like you're using C++. Quite possibly the "new" initializer is setting each element in the array to zero and then you later set each to one, so it wouldn't be surprising you'd get a result 2x as much as you'd expect. Vince ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: L1-dcache-stores twice as big as expected 2015-04-30 20:14 ` Vince Weaver @ 2015-04-30 20:23 ` Patrick 2015-04-30 20:53 ` Vince Weaver 0 siblings, 1 reply; 5+ messages in thread From: Patrick @ 2015-04-30 20:23 UTC (permalink / raw) To: linux-perf-users Vince Weaver <vincent.weaver <at> maine.edu> writes: > > On Thu, 30 Apr 2015, Patrick wrote: > > > -> 2,207,859 L1-dcache-stores > > -> 0.006255441 seconds time elapsed > > > > I can't figure out why it is recording over 2 million stores to the L1 > > data cache. I > > would expect it to be around 1 million. Has anyone seen this before? > > also it looks like you're using C++. > > Quite possibly the "new" initializer is setting each element in the array > to zero and then you later set each to one, so it wouldn't be surprising > you'd get a result 2x as much as you'd expect. > > Vince > Vince, Thanks for the response. Processor: Core i7 940 Compiler: g++ Compiler options: compiled as "g++ copy.cpp -o copy" I was a little surprised that the compiler didn't optimize it out, also. But given that it didn't, I went ahead and used the code to start experimenting with perf. I think I may have figured out what is going on by looking at the assembly (shown below). It looks like it is doing two stores per loop iteration - one to set the array value and one to update the index into the array. I am new to perf and seeing a lot of results from counters of which I'm not sure how to make sense. So I'm still trying to get comfortable and make sure I know what's going on. I have a follow-on question regarding possible ways to count memory writes on a processor that's not a Xeon. I have some ideas on how to do this, but I wanted to check to see if anyone on the list had any thoughts on this. I'll post as a separate topic. Thanks, Patrick .L7: movl -4(%rbp), %eax movslq %eax, %rdx movq -24(%rbp), %rax addq %rdx, %rax movb $1, (%rax) addl $1, -4(%rbp) .L6: movl -4(%rbp), %eax cltq cmpq -16(%rbp), %rax setb %al testb %al, %al jne .L7 ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: L1-dcache-stores twice as big as expected 2015-04-30 20:23 ` Patrick @ 2015-04-30 20:53 ` Vince Weaver 0 siblings, 0 replies; 5+ messages in thread From: Vince Weaver @ 2015-04-30 20:53 UTC (permalink / raw) To: Patrick; +Cc: linux-perf-users On Thu, 30 Apr 2015, Patrick wrote: > I think I may have figured out what is going on by looking at the assembly > (shown below). It looks like it is doing two stores per loop iteration - one to > set the array value and one to update the index into the array. Yes, that's because you are compiling without optimization. > I am new to perf and seeing a lot of results from counters of which I'm not > sure how to make sense. So I'm still trying to get comfortable and make sure I > know what's going on. I have a follow-on question regarding possible ways to > count memory writes on a processor that's not a Xeon. good luck sorting things out, perf counter results have many issues that make them hard to interpret, especially with advanced processors and especially if cache is involved. Just be glad you aren't running your test on an AMD model 14h machine as seen below. perf stat -e L1-dcache-stores:u ./copy 1048576 Performance counter stats for './copy_test 1048576': 35 L1-dcache-stores 0.014573254 seconds time elapsed Vince ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2015-04-30 20:48 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-04-30 19:37 L1-dcache-stores twice as big as expected Patrick 2015-04-30 20:00 ` Vince Weaver 2015-04-30 20:14 ` Vince Weaver 2015-04-30 20:23 ` Patrick 2015-04-30 20:53 ` Vince Weaver
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).