public inbox for linux-ia64@vger.kernel.org
 help / color / mirror / Atom feed
* [Linux-ia64] Strange performance monitoring results
@ 2002-05-10  9:20 Matt Chapman
  2002-05-16 16:31 ` Stephane Eranian
  0 siblings, 1 reply; 2+ messages in thread
From: Matt Chapman @ 2002-05-10  9:20 UTC (permalink / raw)
  To: linux-ia64

* Linux 2.4.18-ia64-020508 (CONFIG_PERFMON, !CONFIG_DISABLE_VHPT)
* pfmon 1.0
* Uniprocessor Itanium C1-step 
* lat_ctx from LMbench 2.0p2 (ftp://ftp.bitmover.com/lmbench/)

(Though I get the same results with 2.4.16 and pfmon 0.06a.)

% pfmon -e ITLB_MISSES_FETCH,ITLB_INSERTS_HPW ./lat_ctx 5

"size=0k ovr=2.65
5 2.52
               221400 ITLB_MISSES_FETCH
                  133 ITLB_INSERTS_HPW

The ITLB misses figure seems much too big, especially given the number
of hardware pagetable walker inserts is low.  Every few times I also get
very big figures for DTLB_MISSES, although not DTC_MISSES (I would have
thought DTLB_MISSES should be less than DTC_MISSES?).

Am I doing something wrong?

Matt



^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [Linux-ia64] Strange performance monitoring results
  2002-05-10  9:20 [Linux-ia64] Strange performance monitoring results Matt Chapman
@ 2002-05-16 16:31 ` Stephane Eranian
  0 siblings, 0 replies; 2+ messages in thread
From: Stephane Eranian @ 2002-05-16 16:31 UTC (permalink / raw)
  To: linux-ia64

[-- Attachment #1: Type: text/plain, Size: 2006 bytes --]

Matt,

On Fri, May 10, 2002 at 07:20:06PM +1000, Matt Chapman wrote:
> * Linux 2.4.18-ia64-020508 (CONFIG_PERFMON, !CONFIG_DISABLE_VHPT)
> * pfmon 1.0
> * Uniprocessor Itanium C1-step 
> * lat_ctx from LMbench 2.0p2 (ftp://ftp.bitmover.com/lmbench/)
> 
> (Though I get the same results with 2.4.16 and pfmon 0.06a.)
> 
> % pfmon -e ITLB_MISSES_FETCH,ITLB_INSERTS_HPW ./lat_ctx 5
> 
> "size=0k ovr=2.65
> 5 2.52
>                221400 ITLB_MISSES_FETCH
>                   133 ITLB_INSERTS_HPW
> 
> The ITLB misses figure seems much too big, especially given the number
> of hardware pagetable walker inserts is low.  Every few times I also get
> very big figures for DTLB_MISSES, although not DTC_MISSES (I would have
> thought DTLB_MISSES should be less than DTC_MISSES?).
> 
> Am I doing something wrong?
> 

I was able to reproduce what you are seing. In fact, I tried measuring
the same event using a different program. The LMbench test involves
several processes competing for the CPU. I used a single process instead.
I verified with the knowlegeable people that the counter is not bogus. 
However it does count more than what you'd expect. It counts all the 
detected ITLB misses (including a demand fetch), however not all of them 
end up in a translation being inserted because they get cancelled. This can 
happen because of prefetching and branch prediction. So if a branch is 
mispredicted, the ITLB misses generated by the (wrong) prediction will get 
cancelled but they are counted.

The attached test program stresses the TLB by having one function
per page. It involves an indirect branch which (most likely) is always
mispredicted. Now if you increase the number of iteration with a constant
(and small) number of functions called, you see the ITLB_MISSES_FETCH count
increase linearly. If you modify the assembly code and try to avoid
the misprediction with a hinted mov to br (mov.sptk.imp), then you suddenly
see the ITLB_MISSES_FETCH remain constant.

Hope this helps.

-- 
-Stephane

[-- Attachment #2: itlb_test.c --]
[-- Type: text/plain, Size: 2180 bytes --]

#include <sys/types.h>
#include <stdlib.h>
#include <stdio.h>

typedef struct {
	unsigned long addr;
	unsigned long gp;
} func_desc_t;

#define FUNC(n) int func_##n(void) { return n; }

FUNC(1) FUNC(2) FUNC(3) FUNC(4) FUNC(5) FUNC(6) FUNC(7) FUNC(8) FUNC(9)
FUNC(10) FUNC(11) FUNC(12) FUNC(13) FUNC(14) FUNC(15) FUNC(16) FUNC(17) FUNC(18) FUNC(19)
FUNC(20) FUNC(21) FUNC(22) FUNC(23) FUNC(24) FUNC(25) FUNC(26) FUNC(27) FUNC(28) FUNC(29)
FUNC(30) FUNC(31) FUNC(32) FUNC(33) FUNC(34) FUNC(35) FUNC(36) FUNC(37) FUNC(38) FUNC(39)
FUNC(40) FUNC(41) FUNC(42) FUNC(43) FUNC(44) FUNC(45) FUNC(46) FUNC(47) FUNC(48) FUNC(49)
FUNC(50) FUNC(51) FUNC(52) FUNC(53) FUNC(54) FUNC(55) FUNC(56) FUNC(57) FUNC(58) FUNC(59)
FUNC(60) FUNC(61) FUNC(62) FUNC(63) FUNC(64)

static int (*tab[])(void)={
	func_1, func_2, func_3, func_4, func_5, func_6, func_7, func_8, func_9,
	func_10, func_11, func_12, func_13, func_14, func_15, func_16, func_17, func_18, func_19,
	func_20, func_21, func_22, func_23, func_24, func_25, func_26, func_27, func_28, func_29,
	func_30, func_31, func_32, func_33, func_34, func_35, func_36, func_37, func_38, func_39,
	func_40, func_41, func_42, func_43, func_44, func_45, func_46, func_47, func_48, func_49,
	func_50, func_51, func_52, func_53, func_54, func_55, func_56, func_57, func_58, func_59,
	func_60, func_61, func_62, func_63, func_64,
	NULL
};

int 
doit(unsigned long iter, unsigned int max)
{
	unsigned int sum = 0, i, j;
	int (**pf)(void);

	for(j=0; j < iter; j++) {
		for(i=0, pf = tab; i < max && *pf; i++, pf++) {
			sum += (**pf)();
		}
	}
	return sum; /* ensures the compiler does not get rid of everything */
}


int 
main(int argc, char **argv)
{
	func_desc_t *fd1, *fd2;
	int pgsz;
	unsigned long iter;
	unsigned int sum = 0, i, j;
	unsigned int max = -1;
	int (**pf)(void);

	pgsz = getpagesize();

	fd1 = (func_desc_t *)func_1;
	fd2 = (func_desc_t *)func_2;

	if ((fd2->addr-fd1->addr) != pgsz) {
		printf("the program was not compiled with -falign-funtions=%d\n", pgsz);
		exit(1);
	}

	iter = argc > 1 ?strtoul(argv[1], NULL, 10) : 10000; 
	max  = argc > 2 ? atoi(argv[2]) : -1;

	doit(iter, max);
	_exit(0); /* short circuit libc exit() */
}

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2002-05-16 16:31 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-05-10  9:20 [Linux-ia64] Strange performance monitoring results Matt Chapman
2002-05-16 16:31 ` Stephane Eranian

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox