From mboxrd@z Thu Jan 1 00:00:00 1970 From: Chulmin Kim Subject: Re: Question about LLC-load-misses event Date: Wed, 24 Oct 2012 21:56:04 +0900 Message-ID: <5087E564.4070908@core.kaist.ac.kr> References: <507C0A9F.8060405@core.kaist.ac.kr> <507C1036.7020707@core.kaist.ac.kr> <87a9vd7nmd.fsf@sejong.aot.lge.com> <508630D2.60006@core.kaist.ac.kr> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: Received: from core.kaist.ac.kr ([143.248.147.118]:53733 "EHLO core.kaist.ac.kr" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751778Ab2JXM4W (ORCPT ); Wed, 24 Oct 2012 08:56:22 -0400 In-Reply-To: <508630D2.60006@core.kaist.ac.kr> Sender: linux-perf-users-owner@vger.kernel.org List-ID: To: Namhyung Kim , linux-perf-users@vger.kernel.org 2012-10-23 =EC=98=A4=ED=9B=84 2:53, Chulmin Kim =EC=93=B4 =EA=B8=80: > 2012-10-23 =EC=98=A4=ED=9B=84 2:39, Namhyung Kim =EC=93=B4 =EA=B8=80: >> Hi Chulmin, >> >> On Mon, 15 Oct 2012 22:31:34 +0900, Chulmin Kim wrote: >>> 2012-10-15 =EC=98=A4=ED=9B=84 10:07, Chulmin Kim =EC=93=B4 =EA=B8=80= : >>>> (perf command : perf stat -a -A -e LLC-loads -e LLC-load-misses -e >>>> instructions sleep 3) >>>> >>>> The problem is,, the bandwidth from STREAM benchmark does not matc= h=20 >>>> with >>>> the monitored value. >>>> >>>> e.g. >>>> I got 9395MB/s from Stream. >>>> >>>> "perf" shows 134,642,063 LLC-load-misses for 3 seconds. >>>> -> BW =3D ((# of events)/(3 seconds)) * 64 bytes / (1024*1024) =3D= =20 >>>> 2739MB/s >>>> In this equation, the term (64bytes) is for cache line size, and t= he >>>> term(1024*1024) is for (MB/s). >>>> >>>> Why does this mismatch occur? >>> In case of Oprofile, the value for a certain event represents the=20 >>> number >>> of the overflows which occur when the number of the event exceeds t= he >>> predefined value. >>> Is it a similar case with that? >> I guess not. And what's the result of the LLC-loads? AFAIK it coun= ts >> all cache accesses including hits and misses. > > Sorry for the lack of information. > > I used STREAM benchmark which generates 100% cache miss. > Of course, the value of LLC-loads shows bit larger number than that o= f=20 > LLC-load-misses (but, they are almost same). > > >> Did you calculate the >> bandwidth using the result of LLC-loads? > > Bandwidth results: > 9395MB/s from Stream > 2739MB/s from LLC-load (including both hit and miss) > I also want to add the BW from the mem write (about 3000MB/s from=20 > LLC-store (including both hit and miss) ) > > I'm wondering why this difference happens? (9395MB/s vs about 5739MB= /s) > > > >> I suspect the h/w *might* >> prefetches a couple of lines when cache-miss occurred, but I'm not >> sure. :) > > I also suspected PREFETCH. > > After i uploaded this question, i checked prefetch events using "perf= ". > I got 100% prefetch miss also. (but i don't know the meaning of this=20 > result thoroughly.) > > Do you know what this means? > Are Prefetch events and LLC-load (or store) events exclusive? or=20 > correlated? > > Perf tool is too hard for me.. No remarkable document or site to help= =20 > the user. (If you know, recommend it please! :) ) > > > Thanks for your attention! > > In the end, I changed BIOS setting of my own machine to turn off=20 prefetch features. (Hardware Prefetch & Adjacent Cache Line Prefetch) =46inally, the bandwidth results of STREAM and PMU events are consisten= t! I guess it was an issue related with the prefetching though I couldn't=20 anlalyze it thouroughly. Thanks! > > >> Thanks, >> Namhyung >> > --=20 > To unsubscribe from this list: send the line "unsubscribe=20 > linux-perf-users" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >