Question about LLC-load-misses event

linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Question about LLC-load-misses event
@ 2012-10-15 13:07 Chulmin Kim
  2012-10-15 13:31 ` Chulmin Kim
  0 siblings, 1 reply; 6+ messages in thread
From: Chulmin Kim @ 2012-10-15 13:07 UTC (permalink / raw)
  To: linux-perf-users

Hi, all.

I'm currently evaluating memory performance of my own machine (intel
x5650 cpu).
My machine can have NUMA configuration, but it is under control. (only
local mem access allowed)

As i run STREAM benchmark, i monitor LLC cache misses using "perf"
simultaneously.
(perf command : perf stat -a -A -e LLC-loads -e LLC-load-misses -e
instructions sleep 3)

The problem is,, the bandwidth from STREAM benchmark does not match with
the monitored value.

e.g.
I got 9395MB/s from Stream.

"perf" shows 134,642,063 LLC-load-misses for 3 seconds.
-> BW = ((# of events)/(3 seconds)) * 64 bytes / (1024*1024) = 2739MB/s
In this equation, the term (64bytes) is for cache line size, and the
term(1024*1024) is for (MB/s).

Why does this mismatch occur?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Question about LLC-load-misses event
  2012-10-15 13:07 Question about LLC-load-misses event Chulmin Kim
@ 2012-10-15 13:31 ` Chulmin Kim
  2012-10-23  5:39   ` Namhyung Kim
  0 siblings, 1 reply; 6+ messages in thread
From: Chulmin Kim @ 2012-10-15 13:31 UTC (permalink / raw)
  To: linux-perf-users

2012-10-15 오후 10:07, Chulmin Kim 쓴 글:
> Hi, all.
>
> I'm currently evaluating memory performance of my own machine (intel
> x5650 cpu).
> My machine can have NUMA configuration, but it is under control. (only
> local mem access allowed)
>
> As i run STREAM benchmark, i monitor LLC cache misses using "perf"
> simultaneously.
To be specific, I used STREAM Copy benchmark (which just copies memory
space by "a[i]=b[i]" repeatedly using for loop.)
> (perf command : perf stat -a -A -e LLC-loads -e LLC-load-misses -e
> instructions sleep 3)
>
> The problem is,, the bandwidth from STREAM benchmark does not match with
> the monitored value.
>
> e.g.
> I got 9395MB/s from Stream.
>
> "perf" shows 134,642,063 LLC-load-misses for 3 seconds.
> -> BW = ((# of events)/(3 seconds)) * 64 bytes / (1024*1024) = 2739MB/s
> In this equation, the term (64bytes) is for cache line size, and the
> term(1024*1024) is for (MB/s).
>
> Why does this mismatch occur?
In case of Oprofile, the value for a certain event represents the number
of the overflows which occur when the number of the event exceeds the
predefined value.
Is it a similar case with that?

>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-perf-users" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Question about LLC-load-misses event
  2012-10-15 13:31 ` Chulmin Kim
@ 2012-10-23  5:39   ` Namhyung Kim
  2012-10-23  5:53     ` Chulmin Kim
  0 siblings, 1 reply; 6+ messages in thread
From: Namhyung Kim @ 2012-10-23  5:39 UTC (permalink / raw)
  To: Chulmin Kim; +Cc: linux-perf-users

Hi Chulmin,

On Mon, 15 Oct 2012 22:31:34 +0900, Chulmin Kim wrote:
> 2012-10-15 오후 10:07, Chulmin Kim 쓴 글:
>> (perf command : perf stat -a -A -e LLC-loads -e LLC-load-misses -e
>> instructions sleep 3)
>>
>> The problem is,, the bandwidth from STREAM benchmark does not match with
>> the monitored value.
>>
>> e.g.
>> I got 9395MB/s from Stream.
>>
>> "perf" shows 134,642,063 LLC-load-misses for 3 seconds.
>> -> BW = ((# of events)/(3 seconds)) * 64 bytes / (1024*1024) = 2739MB/s
>> In this equation, the term (64bytes) is for cache line size, and the
>> term(1024*1024) is for (MB/s).
>>
>> Why does this mismatch occur?
> In case of Oprofile, the value for a certain event represents the number
> of the overflows which occur when the number of the event exceeds the
> predefined value.
> Is it a similar case with that?

I guess not.  And what's the result of the LLC-loads?  AFAIK it counts
all cache accesses including hits and misses.  Did you calculate the
bandwidth using the result of LLC-loads?  I suspect the h/w *might*
prefetches a couple of lines when cache-miss occurred, but I'm not
sure. :)

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Question about LLC-load-misses event
  2012-10-23  5:39   ` Namhyung Kim
@ 2012-10-23  5:53     ` Chulmin Kim
  2012-10-24 12:56       ` Chulmin Kim
  0 siblings, 1 reply; 6+ messages in thread
From: Chulmin Kim @ 2012-10-23  5:53 UTC (permalink / raw)
  To: Namhyung Kim, linux-perf-users

2012-10-23 오후 2:39, Namhyung Kim 쓴 글:
> Hi Chulmin,
>
> On Mon, 15 Oct 2012 22:31:34 +0900, Chulmin Kim wrote:
>> 2012-10-15 오후 10:07, Chulmin Kim 쓴 글:
>>> (perf command : perf stat -a -A -e LLC-loads -e LLC-load-misses -e
>>> instructions sleep 3)
>>>
>>> The problem is,, the bandwidth from STREAM benchmark does not match with
>>> the monitored value.
>>>
>>> e.g.
>>> I got 9395MB/s from Stream.
>>>
>>> "perf" shows 134,642,063 LLC-load-misses for 3 seconds.
>>> -> BW = ((# of events)/(3 seconds)) * 64 bytes / (1024*1024) = 2739MB/s
>>> In this equation, the term (64bytes) is for cache line size, and the
>>> term(1024*1024) is for (MB/s).
>>>
>>> Why does this mismatch occur?
>> In case of Oprofile, the value for a certain event represents the number
>> of the overflows which occur when the number of the event exceeds the
>> predefined value.
>> Is it a similar case with that?
> I guess not.  And what's the result of the LLC-loads?  AFAIK it counts
> all cache accesses including hits and misses.

Sorry for the lack of information.

I used STREAM benchmark which generates 100% cache miss.
Of course, the value of LLC-loads shows bit larger number than that of 
LLC-load-misses (but, they are almost same).


>    Did you calculate the
> bandwidth using the result of LLC-loads?

Bandwidth results:
9395MB/s from Stream
2739MB/s from LLC-load (including both hit and miss)
I also want to add the BW from the mem write (about 3000MB/s from 
LLC-store (including both hit and miss) )

I'm wondering why this difference happens?  (9395MB/s vs about 5739MB/s)



> I suspect the h/w *might*
> prefetches a couple of lines when cache-miss occurred, but I'm not
> sure. :)

I also suspected PREFETCH.

After i uploaded this question, i checked prefetch events using "perf".
I got 100% prefetch miss also. (but i don't know the meaning of this 
result thoroughly.)

Do you know what this means?
Are Prefetch events and LLC-load (or store) events exclusive? or 
correlated?

Perf tool is too hard for me.. No remarkable document or site to help 
the user.  (If you know, recommend it please! :) )


Thanks for your attention!




> Thanks,
> Namhyung
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Question about LLC-load-misses event
  2012-10-23  5:53     ` Chulmin Kim
@ 2012-10-24 12:56       ` Chulmin Kim
  0 siblings, 0 replies; 6+ messages in thread
From: Chulmin Kim @ 2012-10-24 12:56 UTC (permalink / raw)
  To: Namhyung Kim, linux-perf-users

2012-10-23 오후 2:53, Chulmin Kim 쓴 글:
> 2012-10-23 오후 2:39, Namhyung Kim 쓴 글:
>> Hi Chulmin,
>>
>> On Mon, 15 Oct 2012 22:31:34 +0900, Chulmin Kim wrote:
>>> 2012-10-15 오후 10:07, Chulmin Kim 쓴 글:
>>>> (perf command : perf stat -a -A -e LLC-loads -e LLC-load-misses -e
>>>> instructions sleep 3)
>>>>
>>>> The problem is,, the bandwidth from STREAM benchmark does not match 
>>>> with
>>>> the monitored value.
>>>>
>>>> e.g.
>>>> I got 9395MB/s from Stream.
>>>>
>>>> "perf" shows 134,642,063 LLC-load-misses for 3 seconds.
>>>> -> BW = ((# of events)/(3 seconds)) * 64 bytes / (1024*1024) = 
>>>> 2739MB/s
>>>> In this equation, the term (64bytes) is for cache line size, and the
>>>> term(1024*1024) is for (MB/s).
>>>>
>>>> Why does this mismatch occur?
>>> In case of Oprofile, the value for a certain event represents the 
>>> number
>>> of the overflows which occur when the number of the event exceeds the
>>> predefined value.
>>> Is it a similar case with that?
>> I guess not.  And what's the result of the LLC-loads?  AFAIK it counts
>> all cache accesses including hits and misses.
>
> Sorry for the lack of information.
>
> I used STREAM benchmark which generates 100% cache miss.
> Of course, the value of LLC-loads shows bit larger number than that of 
> LLC-load-misses (but, they are almost same).
>
>
>>    Did you calculate the
>> bandwidth using the result of LLC-loads?
>
> Bandwidth results:
> 9395MB/s from Stream
> 2739MB/s from LLC-load (including both hit and miss)
> I also want to add the BW from the mem write (about 3000MB/s from 
> LLC-store (including both hit and miss) )
>
> I'm wondering why this difference happens?  (9395MB/s vs about 5739MB/s)
>
>
>
>> I suspect the h/w *might*
>> prefetches a couple of lines when cache-miss occurred, but I'm not
>> sure. :)
>
> I also suspected PREFETCH.
>
> After i uploaded this question, i checked prefetch events using "perf".
> I got 100% prefetch miss also. (but i don't know the meaning of this 
> result thoroughly.)
>
> Do you know what this means?
> Are Prefetch events and LLC-load (or store) events exclusive? or 
> correlated?
>
> Perf tool is too hard for me.. No remarkable document or site to help 
> the user.  (If you know, recommend it please! :) )
>
>
> Thanks for your attention!
>
>

In the end, I changed BIOS setting of my own machine to turn off 
prefetch features.
(Hardware Prefetch & Adjacent Cache Line Prefetch)

Finally, the bandwidth results of STREAM and PMU events are consistent!

I guess it was an issue related with the prefetching though I couldn't 
anlalyze it thouroughly.



Thanks!

>
>
>> Thanks,
>> Namhyung
>>
> -- 
> To unsubscribe from this list: send the line "unsubscribe 
> linux-perf-users" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Question about LLC-load-misses event
@ 2012-10-15 12:59 Chulmin Kim
  0 siblings, 0 replies; 6+ messages in thread
From: Chulmin Kim @ 2012-10-15 12:59 UTC (permalink / raw)
  To: linux-perf-users

Hi, all.

I'm currently evaluating memory performance of my own machine (intel
x5650 cpu).
My machine can have NUMA configuration, but it is under control. (only
local mem access allowed)

As i run STREAM benchmark, i monitor LLC cache misses using "perf"
simultaneously.
(perf command : perf stat -a -A -e LLC-loads -e LLC-load-misses -e
instructions sleep 3)

The problem is,, the bandwidth from STREAM benchmark does not match with
the monitored value.

e.g.
I got 9395MB/s from Stream.

"perf" shows 134,642,063 LLC-load-misses for 3 seconds.
-> BW = ((# of events)/(3 seconds)) * 64 bytes / (1024*1024) = 2739MB/s
In this equation, the term (64bytes) is for cache line size, and the
term(1024*1024) is for (MB/s).

Why does this mismatch occur?

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-10-24 12:56 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-10-15 13:07 Question about LLC-load-misses event Chulmin Kim
2012-10-15 13:31 ` Chulmin Kim
2012-10-23  5:39   ` Namhyung Kim
2012-10-23  5:53     ` Chulmin Kim
2012-10-24 12:56       ` Chulmin Kim
  -- strict thread matches above, loose matches on Subject: below --
2012-10-15 12:59 Chulmin Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).