linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* L1-dcache-stores twice as big as expected
@ 2015-04-30 19:37 Patrick
  2015-04-30 20:00 ` Vince Weaver
  2015-04-30 20:14 ` Vince Weaver
  0 siblings, 2 replies; 5+ messages in thread
From: Patrick @ 2015-04-30 19:37 UTC (permalink / raw)
  To: linux-perf-users

Hello,

I have a simple piece of code that I am analyzing with perf:

int main( int argc, char*argv[] ) {
    if( argc < 2 ) {
        cout<<"Error: need size argument.\n";
        return 1;
    }
    uint64_t sz = strtoull( argv[1],NULL,10);
    uint8_t *a;
    a = new uint8_t[sz];

    for(int i=0;i<sz;i++ ) {
        a[i] = 1;
    }
    return 0;
}

When I run perf like this:

-> perf stat -e L1-dcache-stores:u ./copy 1048576

I get the following output:

-> Performance counter stats for './copy 1048576':
->         2,207,859 L1-dcache-stores
->       0.006255441 seconds time elapsed

I can't figure out why it is recording over 2 million stores to the L1 
data cache. I 
would expect it to be around 1 million. Has anyone seen this before?

Any help is appreciated.

Patrick

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: L1-dcache-stores twice as big as expected
  2015-04-30 19:37 L1-dcache-stores twice as big as expected Patrick
@ 2015-04-30 20:00 ` Vince Weaver
  2015-04-30 20:14 ` Vince Weaver
  1 sibling, 0 replies; 5+ messages in thread
From: Vince Weaver @ 2015-04-30 20:00 UTC (permalink / raw)
  To: Patrick; +Cc: linux-perf-users

On Thu, 30 Apr 2015, Patrick wrote:


> When I run perf like this:
> 
> -> perf stat -e L1-dcache-stores:u ./copy 1048576
> 
> I get the following output:
> 
> -> Performance counter stats for './copy 1048576':
> ->         2,207,859 L1-dcache-stores
> ->       0.006255441 seconds time elapsed
> 

what type of processor are you running this on?

What compiler and compiler options?

I'm actually surprised the compiler isn't just optimizing
away your array since nothing is using it.

Vince

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: L1-dcache-stores twice as big as expected
  2015-04-30 19:37 L1-dcache-stores twice as big as expected Patrick
  2015-04-30 20:00 ` Vince Weaver
@ 2015-04-30 20:14 ` Vince Weaver
  2015-04-30 20:23   ` Patrick
  1 sibling, 1 reply; 5+ messages in thread
From: Vince Weaver @ 2015-04-30 20:14 UTC (permalink / raw)
  To: Patrick; +Cc: linux-perf-users

On Thu, 30 Apr 2015, Patrick wrote:

> ->         2,207,859 L1-dcache-stores
> ->       0.006255441 seconds time elapsed
> 
> I can't figure out why it is recording over 2 million stores to the L1 
> data cache. I 
> would expect it to be around 1 million. Has anyone seen this before?

also it looks like you're using C++.

Quite possibly the "new" initializer is setting each element in the array 
to zero and then you later set each to one, so it wouldn't be surprising
you'd get a result 2x as much as you'd expect.

Vince

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: L1-dcache-stores twice as big as expected
  2015-04-30 20:14 ` Vince Weaver
@ 2015-04-30 20:23   ` Patrick
  2015-04-30 20:53     ` Vince Weaver
  0 siblings, 1 reply; 5+ messages in thread
From: Patrick @ 2015-04-30 20:23 UTC (permalink / raw)
  To: linux-perf-users

Vince Weaver <vincent.weaver <at> maine.edu> writes:

> 
> On Thu, 30 Apr 2015, Patrick wrote:
> 
> > ->         2,207,859 L1-dcache-stores
> > ->       0.006255441 seconds time elapsed
> > 
> > I can't figure out why it is recording over 2 million stores to the L1 
> > data cache. I 
> > would expect it to be around 1 million. Has anyone seen this before?
> 
> also it looks like you're using C++.
> 
> Quite possibly the "new" initializer is setting each element in the array 
> to zero and then you later set each to one, so it wouldn't be surprising
> you'd get a result 2x as much as you'd expect.
> 
> Vince
> 

Vince,

Thanks for the response.

Processor: Core i7 940
Compiler: g++
Compiler options: compiled as "g++ copy.cpp -o copy"

I was a little surprised that the compiler didn't optimize it out, also. But
given that it didn't, I went ahead and used the code to start experimenting
with perf.

I think I may have figured out what is going on by looking at the assembly
(shown below). It looks like it is doing two stores per loop iteration - one to
set the array value and one to update the index into the array.

I am new to perf and seeing a lot of results from counters of which I'm not
sure how to make sense. So I'm still trying to get comfortable and make sure I
know what's going on. I have a follow-on question regarding possible ways to
count memory writes on a processor that's not a Xeon. I have some ideas on how
to do this, but I wanted to check to see if anyone on the list had any thoughts
on this. I'll post as a separate topic.

Thanks, Patrick

.L7:
    movl    -4(%rbp), %eax
    movslq  %eax, %rdx
    movq    -24(%rbp), %rax
    addq    %rdx, %rax
    movb    $1, (%rax)
    addl    $1, -4(%rbp)
.L6:
    movl    -4(%rbp), %eax
    cltq
    cmpq    -16(%rbp), %rax
    setb    %al
    testb   %al, %al
    jne .L7

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: L1-dcache-stores twice as big as expected
  2015-04-30 20:23   ` Patrick
@ 2015-04-30 20:53     ` Vince Weaver
  0 siblings, 0 replies; 5+ messages in thread
From: Vince Weaver @ 2015-04-30 20:53 UTC (permalink / raw)
  To: Patrick; +Cc: linux-perf-users

On Thu, 30 Apr 2015, Patrick wrote:

> I think I may have figured out what is going on by looking at the assembly
> (shown below). It looks like it is doing two stores per loop iteration - one to
> set the array value and one to update the index into the array.

Yes, that's because you are compiling without optimization.

> I am new to perf and seeing a lot of results from counters of which I'm not
> sure how to make sense. So I'm still trying to get comfortable and make sure I
> know what's going on. I have a follow-on question regarding possible ways to
> count memory writes on a processor that's not a Xeon. 

good luck sorting things out, perf counter results have many issues that 
make them hard to interpret, especially with advanced processors and 
especially if cache is involved.

Just be glad you aren't running your test on an AMD model 14h machine as 
seen below.

perf stat -e L1-dcache-stores:u ./copy 1048576

 Performance counter stats for './copy_test 1048576':

                35      L1-dcache-stores                                            

       0.014573254 seconds time elapsed


Vince

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-04-30 20:48 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-04-30 19:37 L1-dcache-stores twice as big as expected Patrick
2015-04-30 20:00 ` Vince Weaver
2015-04-30 20:14 ` Vince Weaver
2015-04-30 20:23   ` Patrick
2015-04-30 20:53     ` Vince Weaver

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).