From mboxrd@z Thu Jan 1 00:00:00 1970 From: Arnaldo Carvalho de Melo Subject: Re: How to check perf commands are producing proper output or not? Date: Sat, 29 Jan 2011 11:07:34 -0200 Message-ID: <20110129130734.GG6345@ghostprotocols.net> References: <20110128134811.GB6345@ghostprotocols.net> <20110128154314.GC6345@ghostprotocols.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Received: from mail-yi0-f46.google.com ([209.85.218.46]:53777 "EHLO mail-yi0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751767Ab1A2NHl (ORCPT ); Sat, 29 Jan 2011 08:07:41 -0500 Received: by yib18 with SMTP id 18so1403745yib.19 for ; Sat, 29 Jan 2011 05:07:40 -0800 (PST) Content-Disposition: inline In-Reply-To: Sender: linux-perf-users-owner@vger.kernel.org List-ID: To: Vince Weaver Cc: nelakurthi koteswararao , Han Pingtian , linux-perf-users@vger.kernel.org Em Fri, Jan 28, 2011 at 05:03:06PM -0500, Vince Weaver escreveu: > On Fri, 28 Jan 2011, Arnaldo Carvalho de Melo wrote: > > My expectation so far, for basic tests, is not to precisely detect each > > and every event, but to make sure that if a big number of cache misses > > is being provoked, a big number of cache misses is being detected, to > > test the basic working of the system. > > So do you want to just check for a non-zero number for the various counts? > That can be easily done. Yeah, to check the basic functioning of the infrastructure. > If you want something closer, such as if something like branches match > to within 10% you end up needing assembly-coded benchmarks, as compiler > variation can make it hard to tell if the results you are getting are > right or just coincidence. > > After basic tests are in place, we go on trying to make them more > > precise as is practical. > You do have to be careful in situations like this. The in-kernel AMD > branches count was wrong for a few kernel releases. The count returned > a roughly plausible number of branches, but it turns out it was only > counting retired-taken not retired-total branches. Only a fairly exact > assembly-language test I was working on showed this problem. And we want to avoid it from happening again for this specific case, thus regression tests need to be in place, total agreement here :-) > Another problem is with cache results. This is why I wrote these tests to > begin with. I had users of PAPI/perf_events tell me that our tool was > "broken" because they got implausibly low results for L2 cache misses (on > the order of maybe 20 misses for walking an array the size of L2 cache). > It turns out that the HW prefetchers on modern machines are so good that > unless you do random walks of the cache it will look like the counter is > broken. :-) > > > Even things you might think are simple, like retired-instructions can vary > > > wildly. There are enough differences caused by Linux and processor errata > > > with retired-instructions that I wrote a whole 10 page paper about the > > > differences you see. > > > > Cool, can I get the paper URL? I'd love to read it > > Sure, it's: > http://web.eecs.utk.edu/~vweaver1/projects/deterministic/fhpm2010_weaver.pdf Thanks! > > > I have started writing some validation tests, though they use the PAPI > > > library which runs on top of perf-events. You can see that and other > > > validation work linked to here: > > > http://web.eecs.utk.edu/~vweaver1/projects/validation-tests/ > > > > Will look at it, thanks for the pointer, thanks for working on it! > > I'd be glad to help a bit with this, as I think it's an important topic. > I should check out the current perf regression tests to see what's there. > I've been working at higher levels with libpfm4 and PAPI because a lot of > the events I have problems with aren't necessarily the ones exposed by > perf (without using raw events). The tests there are rather simple, basically testing the infrastructure to create counters that I factored out of the tools and into tools/perf/util/{evsel,evlist}.[ch] and that I'm exposing thru a python binding. They create syscall tracepoint events, that we can then trigger by using the syscalls (I used open, pid getting routines), making them happen in specific cpus (using sched_setaffinity), and then checking if the number of syscalls generated on a particular CPU were correctly counted. No tests for hardware counters are in there now, this discussion should provide insights to people interested in writing them :-) - Arnaldo