* Re: [Linux-ia64] Strange performance monitoring results
2002-05-10 9:20 [Linux-ia64] Strange performance monitoring results Matt Chapman
@ 2002-05-16 16:31 ` Stephane Eranian
0 siblings, 0 replies; 2+ messages in thread
From: Stephane Eranian @ 2002-05-16 16:31 UTC (permalink / raw)
To: linux-ia64
[-- Attachment #1: Type: text/plain, Size: 2006 bytes --]
Matt,
On Fri, May 10, 2002 at 07:20:06PM +1000, Matt Chapman wrote:
> * Linux 2.4.18-ia64-020508 (CONFIG_PERFMON, !CONFIG_DISABLE_VHPT)
> * pfmon 1.0
> * Uniprocessor Itanium C1-step
> * lat_ctx from LMbench 2.0p2 (ftp://ftp.bitmover.com/lmbench/)
>
> (Though I get the same results with 2.4.16 and pfmon 0.06a.)
>
> % pfmon -e ITLB_MISSES_FETCH,ITLB_INSERTS_HPW ./lat_ctx 5
>
> "size=0k ovr=2.65
> 5 2.52
> 221400 ITLB_MISSES_FETCH
> 133 ITLB_INSERTS_HPW
>
> The ITLB misses figure seems much too big, especially given the number
> of hardware pagetable walker inserts is low. Every few times I also get
> very big figures for DTLB_MISSES, although not DTC_MISSES (I would have
> thought DTLB_MISSES should be less than DTC_MISSES?).
>
> Am I doing something wrong?
>
I was able to reproduce what you are seing. In fact, I tried measuring
the same event using a different program. The LMbench test involves
several processes competing for the CPU. I used a single process instead.
I verified with the knowlegeable people that the counter is not bogus.
However it does count more than what you'd expect. It counts all the
detected ITLB misses (including a demand fetch), however not all of them
end up in a translation being inserted because they get cancelled. This can
happen because of prefetching and branch prediction. So if a branch is
mispredicted, the ITLB misses generated by the (wrong) prediction will get
cancelled but they are counted.
The attached test program stresses the TLB by having one function
per page. It involves an indirect branch which (most likely) is always
mispredicted. Now if you increase the number of iteration with a constant
(and small) number of functions called, you see the ITLB_MISSES_FETCH count
increase linearly. If you modify the assembly code and try to avoid
the misprediction with a hinted mov to br (mov.sptk.imp), then you suddenly
see the ITLB_MISSES_FETCH remain constant.
Hope this helps.
--
-Stephane
[-- Attachment #2: itlb_test.c --]
[-- Type: text/plain, Size: 2180 bytes --]
#include <sys/types.h>
#include <stdlib.h>
#include <stdio.h>
typedef struct {
unsigned long addr;
unsigned long gp;
} func_desc_t;
#define FUNC(n) int func_##n(void) { return n; }
FUNC(1) FUNC(2) FUNC(3) FUNC(4) FUNC(5) FUNC(6) FUNC(7) FUNC(8) FUNC(9)
FUNC(10) FUNC(11) FUNC(12) FUNC(13) FUNC(14) FUNC(15) FUNC(16) FUNC(17) FUNC(18) FUNC(19)
FUNC(20) FUNC(21) FUNC(22) FUNC(23) FUNC(24) FUNC(25) FUNC(26) FUNC(27) FUNC(28) FUNC(29)
FUNC(30) FUNC(31) FUNC(32) FUNC(33) FUNC(34) FUNC(35) FUNC(36) FUNC(37) FUNC(38) FUNC(39)
FUNC(40) FUNC(41) FUNC(42) FUNC(43) FUNC(44) FUNC(45) FUNC(46) FUNC(47) FUNC(48) FUNC(49)
FUNC(50) FUNC(51) FUNC(52) FUNC(53) FUNC(54) FUNC(55) FUNC(56) FUNC(57) FUNC(58) FUNC(59)
FUNC(60) FUNC(61) FUNC(62) FUNC(63) FUNC(64)
static int (*tab[])(void)={
func_1, func_2, func_3, func_4, func_5, func_6, func_7, func_8, func_9,
func_10, func_11, func_12, func_13, func_14, func_15, func_16, func_17, func_18, func_19,
func_20, func_21, func_22, func_23, func_24, func_25, func_26, func_27, func_28, func_29,
func_30, func_31, func_32, func_33, func_34, func_35, func_36, func_37, func_38, func_39,
func_40, func_41, func_42, func_43, func_44, func_45, func_46, func_47, func_48, func_49,
func_50, func_51, func_52, func_53, func_54, func_55, func_56, func_57, func_58, func_59,
func_60, func_61, func_62, func_63, func_64,
NULL
};
int
doit(unsigned long iter, unsigned int max)
{
unsigned int sum = 0, i, j;
int (**pf)(void);
for(j=0; j < iter; j++) {
for(i=0, pf = tab; i < max && *pf; i++, pf++) {
sum += (**pf)();
}
}
return sum; /* ensures the compiler does not get rid of everything */
}
int
main(int argc, char **argv)
{
func_desc_t *fd1, *fd2;
int pgsz;
unsigned long iter;
unsigned int sum = 0, i, j;
unsigned int max = -1;
int (**pf)(void);
pgsz = getpagesize();
fd1 = (func_desc_t *)func_1;
fd2 = (func_desc_t *)func_2;
if ((fd2->addr-fd1->addr) != pgsz) {
printf("the program was not compiled with -falign-funtions=%d\n", pgsz);
exit(1);
}
iter = argc > 1 ?strtoul(argv[1], NULL, 10) : 10000;
max = argc > 2 ? atoi(argv[2]) : -1;
doit(iter, max);
_exit(0); /* short circuit libc exit() */
}
^ permalink raw reply [flat|nested] 2+ messages in thread