All of lore.kernel.org
 help / color / mirror / Atom feed
* small perfctr bug or misunderstanding
  2004-07-03 10:28 [PATCH][2.6.7-mm5] perfctr low-level documentation Mikael Pettersson
@ 2004-07-03 14:08 ` bert hubert
  0 siblings, 0 replies; 3+ messages in thread
From: bert hubert @ 2004-07-03 14:08 UTC (permalink / raw)
  To: Mikael Pettersson; +Cc: linux-kernel

On Sat, Jul 03, 2004 at 12:28:09PM +0200, Mikael Pettersson wrote:

> There would be a /proc/<pid>/<tid>/perfctr/ directory
> with files representing the control data, counter
> state, general info, and auxiliary control ops.

Mikael, thanks for the low-level-api.txt documentation. Will vperfctr_* see
some documentation? Want me to whip up manpages?

So far perfctr has been very useful to me already - I now know parts of
PowerDNS that are completely memory bound, which I so far only suspected.
Are the global counters available? There is a note in the perfctl
distribution that says they aren't?

One thing - on my Pentium M I'm unable to get more than one counter going
simultaneously, I get 'Operation not permitted'. Perfex reports that
supposedly two are possible.

PerfCtr Info:
abi_version		0x06000500
driver_version		2.7.3
cpu_type		14 (Intel Pentium M)
cpu_features		0x3 (rdpmc,rdtsc)
cpu_khz			1399252
tsc_to_cpu_mult		1
cpu_nrctrs		2
cpus			[0], total: 1
cpus_forbidden		[], total: 0

PERFCTR INIT: vendor 0, family 6, model 9, stepping 5, clock 1399252 kHz
PERFCTR INIT: NITER == 64
PERFCTR INIT: loop overhead is 118 cycles
PERFCTR INIT: rdtsc cost is 48.5 cycles (3223 total)
PERFCTR INIT: rdpmc cost is 45.4 cycles (3027 total)
PERFCTR INIT: rdmsr (counter) cost is 95.4 cycles (6229 total)
PERFCTR INIT: rdmsr (evntsel) cost is 81.3 cycles (5322 total)
PERFCTR INIT: wrmsr (counter) cost is 143.7 cycles (9318 total)
PERFCTR INIT: wrmsr (evntsel) cost is 132.3 cycles (8591 total)
PERFCTR INIT: read cr4 cost is 3.0 cycles (311 total)
PERFCTR INIT: write cr4 cost is 49.8 cycles (3308 total)
perfctr: driver 2.7.3, cpu type Intel P6 at 1399252 kHz

On my Athlon, 4 are reported possible and 4 work just fine. But I might be
misunderstanding the Intel docs.

The code below works fine when the second counter is commented out:

#include <iostream>
using namespace std;
extern "C" {
#include "libperfctr.h"
}
#include <errno.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include "arch.h"

class PerfCtr
{
public:
  PerfCtr()
  {
    d_self = vperfctr_open();
    if( !d_self ) {
	perror("vperfctr_open");
	exit(1);
    }

    memset(&d_control.cpu_control, 0, sizeof(d_control.cpu_control));
    d_control.cpu_control.tsc_on=1;
  }

  void addCounter(unsigned int v, unsigned int unit=0) 
  {
    int count=d_control.cpu_control.nractrs;

    d_control.cpu_control.evntsel[count] = v | (1 << 16) | (1 << 22) | (unit << 8); 
    d_control.cpu_control.pmc_map[count] = count;
    d_control.cpu_control.nractrs++; // no support for .nrictrs
  }

  void go()
  {
    if(vperfctr_control(d_self, &d_control) < 0) {
      perror("vperfctr_control");
      exit(1);
    }
    zero();
  }

  void zero()
  {
    memset(&d_baseline,0,sizeof(d_baseline));
    vperfctr_read_ctrs(d_self, &d_baseline);
  }

  ~PerfCtr()
  {
    vperfctr_close(d_self);
  }

  void get(long long* counters, long long& tsc)
  {
    struct perfctr_sum_ctrs now;
    memset(&now,0,sizeof(d_baseline));
    if(vperfctr_read_ctrs(d_self, &now) < 0) {
      perror("read counters");
      exit(1);
    }
    
    for(unsigned int n=0;n<d_control.cpu_control.nractrs;++n)
      counters[n]=now.pmc[n] - d_baseline.pmc[n];

    tsc=now.tsc - d_baseline.tsc;
  }

private:
  struct vperfctr *d_self;
  struct vperfctr_control d_control;
  struct perfctr_sum_ctrs d_baseline;
};


int main()
{
  PerfCtr pc;
  pc.addCounter(0x48); // DCU MISS OUTSTANDING
  pc.addCounter(0x43); // DATA_MEM_REFS

  pc.go();

  long long results[2], tsc;
  pc.get(results,tsc);

  cout<<"Cycles waiting on DCU miss:  "<<results[0]<<endl;
  cout<<"Number of memory references: "<<results[1]<<endl;
  cout<<"Cycles spent:                "<<tsc<<endl;
}



-- 
http://www.PowerDNS.com      Open source, database driven DNS Software 
http://lartc.org           Linux Advanced Routing & Traffic Control HOWTO

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: small perfctr bug or misunderstanding
@ 2004-07-03 14:58 Mikael Pettersson
  2004-07-04  1:15 ` bert hubert
  0 siblings, 1 reply; 3+ messages in thread
From: Mikael Pettersson @ 2004-07-03 14:58 UTC (permalink / raw)
  To: ahu; +Cc: linux-kernel

On Sat, 3 Jul 2004 16:08:29 +0200, bert hubert wrote:
>Mikael, thanks for the low-level-api.txt documentation. Will vperfctr_* see
>some documentation? Want me to whip up manpages?

Docs for the syscalls will appear shortly.

>So far perfctr has been very useful to me already - I now know parts of
>PowerDNS that are completely memory bound, which I so far only suspected.
>Are the global counters available? There is a note in the perfctl
>distribution that says they aren't?

Currently no; I removed them while we've been debating the
API to the (IMO more important) per-process counters.
I intend to add them back once the current stuff has been
Linus-approved.

>One thing - on my Pentium M I'm unable to get more than one counter going
>simultaneously, I get 'Operation not permitted'. Perfex reports that
>supposedly two are possible.

Classic beginner's mistake :-)

>  void addCounter(unsigned int v, unsigned int unit=0) 
>  {
>    int count=d_control.cpu_control.nractrs;
>
>    d_control.cpu_control.evntsel[count] = v | (1 << 16) | (1 << 22) | (unit << 8); 
>    d_control.cpu_control.pmc_map[count] = count;
>    d_control.cpu_control.nractrs++; // no support for .nrictrs
>  }

Quoting from Documentation/perfctr/low-level-x86.txt:

>Intel P6
>--------
>The evntsel values are mapped directly onto the counters'
>EVNTSEL control registers.
>
>The global enable bit (22) in EVNTSEL0 must be set. That bit is
>reserved in EVNTSEL1.
>...
>AMD K7/K8
>---------
>Similar to Intel P6. The main difference is that each evntsel has
>its own enable bit, which must be set.

The driver sees ENABLE set in EVNTSEL1 on your P-M,
and properly returns an error.

The proper way is for user-space to consider a set of
events (not yet added to the control struct), and to
use the current CPU type to format the control and
handle any quirks. For P6 vs K7 the differences are
minor, but to program the P4 you _really_ need helper
procedures.

/Mikael

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: small perfctr bug or misunderstanding
  2004-07-03 14:58 small perfctr bug or misunderstanding Mikael Pettersson
@ 2004-07-04  1:15 ` bert hubert
  0 siblings, 0 replies; 3+ messages in thread
From: bert hubert @ 2004-07-04  1:15 UTC (permalink / raw)
  To: Mikael Pettersson; +Cc: linux-kernel

On Sat, Jul 03, 2004 at 04:58:10PM +0200, Mikael Pettersson wrote:
> Currently no; I removed them while we've been debating the
> API to the (IMO more important) per-process counters.
> I intend to add them back once the current stuff has been
> Linus-approved.

Ok - I'd love the ability to diagnose an entire system. Furthermore, it'd be
very cool if it were possible to profile another process, like strace -p
pid.

I think this means looking at 'virtual counters' for arbitrary processes.
Would this be possible?

I currently have a client using a 2.6.7 kernel and they have performance
problems and applications I can't recompile. It'd be very good if I could
spot which of their many application is thrashing the cache.

> The driver sees ENABLE set in EVNTSEL1 on your P-M,
> and properly returns an error.

Ahhhh, I see. With this line things work as intended:
d_control.cpu_control.evntsel[count] = v | (1 << 16) | (!count << 22) | (unit << 8); 

> handle any quirks. For P6 vs K7 the differences are
> minor, but to program the P4 you _really_ need helper
> procedures.

Indeed. Thanks. I'll make a P6PerfCtr and an AMDPerfCtr and a P4PerfCtr. The
pentium 1/2 people can work it out for themselves :-)

Regards,

bert

-- 
http://www.PowerDNS.com      Open source, database driven DNS Software 
http://lartc.org           Linux Advanced Routing & Traffic Control HOWTO

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2004-07-04  1:15 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-07-03 14:58 small perfctr bug or misunderstanding Mikael Pettersson
2004-07-04  1:15 ` bert hubert
  -- strict thread matches above, loose matches on Subject: below --
2004-07-03 10:28 [PATCH][2.6.7-mm5] perfctr low-level documentation Mikael Pettersson
2004-07-03 14:08 ` small perfctr bug or misunderstanding bert hubert

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.