All of lore.kernel.org
 help / color / mirror / Atom feed
* [parisc-linux] itlb miss handler optimizations!
@ 2003-07-25  7:04 Carlos O'Donell
  2003-07-25 11:46 ` Matthew Wilcox
  0 siblings, 1 reply; 23+ messages in thread
From: Carlos O'Donell @ 2003-07-25  7:04 UTC (permalink / raw)
  To: parisc-linux

[-- Attachment #1: Type: text/plain, Size: 1767 bytes --]


pa,

Lamont and myself were discussing the lightweight syscall
implementations and ran across some interesting itlb optimizations.

We first looked at the itlb_miss_XX functions, where XX is one of 11 or
20 wether your kernel is 32 or 64-bits respectively. And we saw that
there is an interlocked 'or' that nullifies a compare and branch. This
as Lamont argued, isn't as optimal as possible. 

Before:
	mfsp current space
	/* if faulting space is kernel space that's okay */
	or with nullify the current space and 0.
	/* die bad userpace die */
	cmpb if the faulting space <> current space then die.

Which can mean that branch prediction borks _all_ the time since if
userspace was constantly faulting then there wouldn't be much userspace
left.

Now:
	mfsp current space
	/* branch prediciton forward is winning */
	cmpb to itlb_user_fault if faulting space <> current space.
	/* ... else life is good */


	itlb_user_fault:
	/* Was it the kernel? Oh yeah... that's okay then */
	/* branch prediction winning again! */
	cmpb if the faulting space was 0, then go back up.

The nice part seems to be the predicted branches. Since we still have
one interlock between the mfsp and the cmpb, but the processor is
already filled it's queues with coming insn in the next bit of the itlb.
We keep the processor looking forward in the common case. Maybe it's
early in the morning and I'm not thinking well, but maybe it's Lamonts
ability to convince you of something you aren't sure of :)

Patch attached. We also moved a zdep to better the forward path during a
set of insn that weren't doing much waiting around for a memory read.

THE PATCH IS UNTESTED! If you want to give it a shot... please do so and
tell us if your box dies^H^H^H^H runs faster :)

Cheers,
Carlos.


[-- Attachment #2: entry.S.diff --]
[-- Type: text/plain, Size: 2027 bytes --]

Index: entry.S
===================================================================
RCS file: /var/cvs/linux/arch/parisc/kernel/entry.S,v
retrieving revision 1.98
diff -u -p -r1.98 entry.S
--- entry.S	9 Dec 2002 06:09:08 -0000	1.98
+++ entry.S	25 Jul 2003 06:37:58 -0000
@@ -1535,8 +1535,7 @@ itlb_miss_11:
 	mfctl           %cr25,ptp	/* load user pgd */
 
 	mfsp            %sr7,t0		/* Get current space */
-	or,=            %r0,t0,%r0	/* If kernel, nullify following test */
-	cmpb,<>,n       t0,spc,itlb_fault /* forward */
+	cmpb,<>,n	t0,spc,itlb_user_fault /* forward */
 
 	/* First level page table lookup */
 
@@ -1551,6 +1550,10 @@ itlb_miss_common_11:
 	sh2addl 	 t0,ptp,ptp
 	ldi		_PAGE_ACCESSED,t1
 	ldw		 0(ptp),pte
+
+	/* Running parallel, taken from below 'zdep0' */
+	zdep            spc,30,15,prot  /* create prot id from space */
+
 	bb,>=,n 	 pte,_PAGE_PRESENT_BIT,itlb_fault
 
 	/* Check whether the "accessed" bit was set, otherwise do so */
@@ -1559,7 +1562,7 @@ itlb_miss_common_11:
 	and,<>		t1,pte,%r0	/* test and nullify if already set */
 	stw		t0,0(ptp)	/* write back pte */
 
-	zdep            spc,30,15,prot  /* create prot id from space */
+	/* zdep0 moved back */
 	dep             pte,8,7,prot    /* add in prot bits from pte */
 
 	extru,=		pte,_PAGE_NO_CACHE_BIT,1,r0
@@ -1602,8 +1605,7 @@ itlb_miss_20:
 	mfctl           %cr25,ptp	/* load user pgd */
 
 	mfsp            %sr7,t0		/* Get current space */
-	or,=            %r0,t0,%r0	/* If kernel, nullify following test */
-	cmpb,<>,n       t0,spc,itlb_fault /* forward */
+	cmpb,<>,n	t0,spc,itlb_user_fault	/* forward */
 
 	/* First level page table lookup */
 
@@ -1882,6 +1884,15 @@ kernel_bad_space:
 dbit_fault:
 	b               intr_save
 	ldi             20,%r8
+
+itlb_user_fault:
+	/* User tlb missed for other than his own space. Optimization. */
+#ifdef __LP64__
+	cmpb,=		%r0,t0,itlb_miss_common20 /* backward */
+#else
+	cmpb,=		%r0,t0,itlb_miss_common11 /* backward */
+#endif
+	nop
 
 itlb_fault:
 	b               intr_save

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [parisc-linux] itlb miss handler optimizations!
  2003-07-25  7:04 [parisc-linux] itlb miss handler optimizations! Carlos O'Donell
@ 2003-07-25 11:46 ` Matthew Wilcox
  2003-07-26 18:02   ` Carlos O'Donell
  2003-08-12  3:58   ` Carlos O'Donell
  0 siblings, 2 replies; 23+ messages in thread
From: Matthew Wilcox @ 2003-07-25 11:46 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: parisc-linux

On Fri, Jul 25, 2003 at 03:04:50AM -0400, Carlos O'Donell wrote:
> @@ -1882,6 +1884,15 @@ kernel_bad_space:
>  dbit_fault:
>  	b               intr_save
>  	ldi             20,%r8
> +
> +itlb_user_fault:
> +	/* User tlb missed for other than his own space. Optimization. */
> +#ifdef __LP64__
> +	cmpb,=		%r0,t0,itlb_miss_common20 /* backward */
> +#else
> +	cmpb,=		%r0,t0,itlb_miss_common11 /* backward */
> +#endif
> +	nop

can't do that.  we have three sets of routines -- itlb_miss_common_11,
itlb_miss_common_20 and itlb_miss_common_20w.  we select between _20w
or not at compile time (if it's 64-bit, it's PA 2.0 Wide), but select
between _20 and _11 at boot time (fault_vector_20 vs fault_vector_11).

shame on you, you didn't even try assembling it ;-)

-- 
"It's not Hollywood.  War is real, war is primarily not about defeat or
victory, it is about death.  I've seen thousands and thousands of dead bodies.
Do you think I want to have an academic debate on this subject?" -- Robert Fisk

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [parisc-linux] itlb miss handler optimizations!
  2003-07-25 11:46 ` Matthew Wilcox
@ 2003-07-26 18:02   ` Carlos O'Donell
  2003-08-12  3:58   ` Carlos O'Donell
  1 sibling, 0 replies; 23+ messages in thread
From: Carlos O'Donell @ 2003-07-26 18:02 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: parisc-linux

> > @@ -1882,6 +1884,15 @@ kernel_bad_space:
> >  dbit_fault:
> >  	b               intr_save
> >  	ldi             20,%r8
> > +
> > +itlb_user_fault:
> > +	/* User tlb missed for other than his own space. Optimization. */
> > +#ifdef __LP64__
> > +	cmpb,=		%r0,t0,itlb_miss_common20 /* backward */
> > +#else
> > +	cmpb,=		%r0,t0,itlb_miss_common11 /* backward */
> > +#endif
> > +	nop
> 
> can't do that.  we have three sets of routines -- itlb_miss_common_11,
> itlb_miss_common_20 and itlb_miss_common_20w.  we select between _20w
> or not at compile time (if it's 64-bit, it's PA 2.0 Wide), but select
> between _20 and _11 at boot time (fault_vector_20 vs fault_vector_11).
> 
> shame on you, you didn't even try assembling it ;-)

I'll take a look at that and rewrite the patch. It was 3AM when I
finished it and didn't bother to compile it as I passed out on the
bed ;)

c.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [parisc-linux] itlb miss handler optimizations!
  2003-07-25 11:46 ` Matthew Wilcox
  2003-07-26 18:02   ` Carlos O'Donell
@ 2003-08-12  3:58   ` Carlos O'Donell
  2003-08-12 12:21     ` Joel Soete
  2003-08-12 16:06     ` Grant Grundler
  1 sibling, 2 replies; 23+ messages in thread
From: Carlos O'Donell @ 2003-08-12  3:58 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: parisc-linux, LaMont Jones

On Fri, Jul 25, 2003 at 12:46:15PM +0100, Matthew Wilcox wrote:
> can't do that.  we have three sets of routines -- itlb_miss_common_11,
> itlb_miss_common_20 and itlb_miss_common_20w.  we select between _20w
> or not at compile time (if it's 64-bit, it's PA 2.0 Wide), but select
> between _20 and _11 at boot time (fault_vector_20 vs fault_vector_11).
> 
> shame on you, you didn't even try assembling it ;-)

Assembles, and boots on my C3K, 32-bit kernel. Looking for any takers
who want to try it in 64-bit mode. I'm running lmbench to see if I can
tell the difference between this and the original code. 

I would be most appreciative if anyone would pipe up and say "Run X to
test if Y works better/faster/harder" :}

c.

--- arch/parisc/kernel/entry.S	9 Dec 2002 06:09:08 -0000	1.98
+++ arch/parisc/kernel/entry.S	12 Aug 2003 03:49:04 -0000
@@ -1469,8 +1469,7 @@ itlb_miss_20w:
 	mfctl           %cr25,ptp	/* load user pgd */
 
 	mfsp            %sr7,t0		/* Get current space */
-	or,*=           %r0,t0,%r0      /* If kernel, nullify following test */
-	cmpb,*<>,n      t0,spc,itlb_fault /* forward */
+	cmpb,<>,n	t0,spc,itlb_user_fault_20w /* forward */
 
 	/* First level page table lookup */
 
@@ -1535,8 +1534,7 @@ itlb_miss_11:
 	mfctl           %cr25,ptp	/* load user pgd */
 
 	mfsp            %sr7,t0		/* Get current space */
-	or,=            %r0,t0,%r0	/* If kernel, nullify following test */
-	cmpb,<>,n       t0,spc,itlb_fault /* forward */
+	cmpb,<>,n	t0,spc,itlb_user_fault_11 /* forward */
 
 	/* First level page table lookup */
 
@@ -1551,6 +1549,10 @@ itlb_miss_common_11:
 	sh2addl 	 t0,ptp,ptp
 	ldi		_PAGE_ACCESSED,t1
 	ldw		 0(ptp),pte
+
+	/* Running parallel, taken from below 'zdep0' */
+	zdep            spc,30,15,prot  /* create prot id from space */
+
 	bb,>=,n 	 pte,_PAGE_PRESENT_BIT,itlb_fault
 
 	/* Check whether the "accessed" bit was set, otherwise do so */
@@ -1559,7 +1561,7 @@ itlb_miss_common_11:
 	and,<>		t1,pte,%r0	/* test and nullify if already set */
 	stw		t0,0(ptp)	/* write back pte */
 
-	zdep            spc,30,15,prot  /* create prot id from space */
+	/* zdep0 moved back */
 	dep             pte,8,7,prot    /* add in prot bits from pte */
 
 	extru,=		pte,_PAGE_NO_CACHE_BIT,1,r0
@@ -1602,8 +1604,7 @@ itlb_miss_20:
 	mfctl           %cr25,ptp	/* load user pgd */
 
 	mfsp            %sr7,t0		/* Get current space */
-	or,=            %r0,t0,%r0	/* If kernel, nullify following test */
-	cmpb,<>,n       t0,spc,itlb_fault /* forward */
+	cmpb,<>,n	t0,spc,itlb_user_fault_20	/* forward */
 
 	/* First level page table lookup */
 
@@ -1882,6 +1883,37 @@ kernel_bad_space:
 dbit_fault:
 	b               intr_save
 	ldi             20,%r8
+
+/* The following three labels relate to an optimization in the itlb handler.
+   itlb_user_fault_20w:
+   itlb_user_fault_20:
+   itlb_user_fault_11:
+   We keep the CPU jumping fwd/bkwd in the common case, and the uncommon case
+   has the cmpb fail (no jump) and thus branch prediction failing. */
+
+#ifdef __LP64__
+itlb_user_fault_20w:
+	/* User tlb missed for other than his own space. Optimization. */
+	cmpb,=		%r0,t0,itlb_miss_common_20w /* backward */
+	nop
+#else
+itlb_user_fault_20:
+	/* User tlb missed for other than his own space. Optimization. */
+	cmpb,=		%r0,t0,itlb_miss_common_20 /* backward */
+	nop
+
+/* FALL THROUGH - We don't care if we run the test twice. If someone
+                  asks to have the "user is faulting death" path optimal
+                  then they should seek help. */
+
+itlb_user_fault_11:
+	/* User tlb missed for other than his own space. Optimization. */
+	cmpb,=		%r0,t0,itlb_miss_common_11 /* backward */
+	nop
+#endif
+
+/* FALL THROUGH - We have a real itlb_fault from one of the above three
+                  label sequences */
 
 itlb_fault:
 	b               intr_save

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [parisc-linux] itlb miss handler optimizations!
  2003-08-12  3:58   ` Carlos O'Donell
@ 2003-08-12 12:21     ` Joel Soete
  2003-08-12 14:40       ` Carlos O'Donell
  2003-08-12 16:06     ` Grant Grundler
  1 sibling, 1 reply; 23+ messages in thread
From: Joel Soete @ 2003-08-12 12:21 UTC (permalink / raw)
  To: Carlos O'Donell, Matthew Wilcox; +Cc: parisc-linux, LaMont Jones

Hi Carlos,

>Assembles, and boots on my C3K, 32-bit kernel. Looking for any takers
>who want to try  it in 64-bit mode. I'm running lmbench to see if I can
>tell the difference between this and the original code. 
>
>I would be most appreciative if anyone would pipe up and say "Run X to
>test if Y works better/faster/harder" :}

I apply your patch against 2.4.21-pa9 on the N-4000 and compile it successfully
with hppa64-linux-gcc (3.2.3). (having no clue about lmbench) I take this
exercise to roughly compare. 

Running the original 2.4.21-pa9 here are times:
Tue Aug 12 09:59:55 CEST 2003
[...] # Build kernel: 19'58"
Tue Aug 12 10:19:53 CEST 2003
[...] # Build modules: 7'46"
Tue Aug 12 10:27:39 CEST 2003

Running this new kernel:

Tue Aug 12 12:28:37 CEST 2003
[...] # Build kernel: 20'8"
Tue Aug 12 12:48:45 CEST 2003
[...] # Build modules: 7'47"
Tue Aug 12 12:56:32 CEST 2003

The difference are so small that a more accurate tools would be requested.

hth anyway,
    Joel


------------------------------------------------------
Soldes Tiscali ADSL : 27,50 euros/mois jusque fin 2003.
On s'habitue vite à payer son ADSL moins cher!
Plus d'info? Cliquez ici... http://reg.tiscali.be/default.asp?lg=fr 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [parisc-linux] itlb miss handler optimizations!
  2003-08-12 12:21     ` Joel Soete
@ 2003-08-12 14:40       ` Carlos O'Donell
  0 siblings, 0 replies; 23+ messages in thread
From: Carlos O'Donell @ 2003-08-12 14:40 UTC (permalink / raw)
  To: Joel Soete; +Cc: Matthew Wilcox, parisc-linux, LaMont Jones

> I apply your patch against 2.4.21-pa9 on the N-4000 and compile it successfully
> with hppa64-linux-gcc (3.2.3). (having no clue about lmbench) I take this
> exercise to roughly compare. 
> 
> Running the original 2.4.21-pa9 here are times:
> Tue Aug 12 09:59:55 CEST 2003
> [...] # Build kernel: 19'58"
> Tue Aug 12 10:19:53 CEST 2003
> [...] # Build modules: 7'46"
> Tue Aug 12 10:27:39 CEST 2003
> 
> Running this new kernel:
> 
> Tue Aug 12 12:28:37 CEST 2003
> [...] # Build kernel: 20'8"
> Tue Aug 12 12:48:45 CEST 2003
> [...] # Build modules: 7'47"
> Tue Aug 12 12:56:32 CEST 2003
> 
> The difference are so small that a more accurate tools would be requested.

Thanks Joel! Yeah, I figured that perhaps the change might get lost in
the noise... I'm still running lmbench multiple times to get good
numbers.

Thanks again!

c.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [parisc-linux] itlb miss handler optimizations!
  2003-08-12  3:58   ` Carlos O'Donell
  2003-08-12 12:21     ` Joel Soete
@ 2003-08-12 16:06     ` Grant Grundler
  2003-08-12 16:32       ` Matthew Wilcox
                         ` (2 more replies)
  1 sibling, 3 replies; 23+ messages in thread
From: Grant Grundler @ 2003-08-12 16:06 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: Matthew Wilcox, parisc-linux, LaMont Jones

On Mon, Aug 11, 2003 at 11:58:11PM -0400, Carlos O'Donell wrote:
> I would be most appreciative if anyone would pipe up and say "Run X to
> test if Y works better/faster/harder" :}

osdl-aim-7 benchmark probably stresses both itlb and dtlb.
(available from osdl.org - URL is in linux-ia64 archive)

SDET is another candidate.

Note that itlb misses is a function of accesses to lots of "random"
pages in memory and having enough memory so the odds of hitting
the same page often is low. ie run thousands of jobs and the
scheduler thrash the itlb.

at least that's how I understand it.

grant

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [parisc-linux] itlb miss handler optimizations!
  2003-08-12 16:06     ` Grant Grundler
@ 2003-08-12 16:32       ` Matthew Wilcox
  2003-08-12 17:06       ` Joel Soete
  2003-08-13 14:52       ` Joel Soete
  2 siblings, 0 replies; 23+ messages in thread
From: Matthew Wilcox @ 2003-08-12 16:32 UTC (permalink / raw)
  To: Grant Grundler
  Cc: Carlos O'Donell, Matthew Wilcox, parisc-linux, LaMont Jones

On Tue, Aug 12, 2003 at 10:06:12AM -0600, Grant Grundler wrote:
> Note that itlb misses is a function of accesses to lots of "random"
> pages in memory and having enough memory so the odds of hitting
> the same page often is low. ie run thousands of jobs and the
> scheduler thrash the itlb.
> 
> at least that's how I understand it.

since we're still using 4k page size, and we have typically 160 TLB entries,
that only covers 640k of our 1.5MB cache ... 

-- 
"It's not Hollywood.  War is real, war is primarily not about defeat or
victory, it is about death.  I've seen thousands and thousands of dead bodies.
Do you think I want to have an academic debate on this subject?" -- Robert Fisk

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [parisc-linux] itlb miss handler optimizations!
  2003-08-12 16:06     ` Grant Grundler
  2003-08-12 16:32       ` Matthew Wilcox
@ 2003-08-12 17:06       ` Joel Soete
  2003-08-13 15:57         ` Grant Grundler
  2003-08-13 14:52       ` Joel Soete
  2 siblings, 1 reply; 23+ messages in thread
From: Joel Soete @ 2003-08-12 17:06 UTC (permalink / raw)
  To: Grant Grundler, Carlos O'Donell
  Cc: Matthew Wilcox, parisc-linux, LaMont Jones

>
>osdl-aim-7 benchmark probably stresses both itlb and dtlb.
>(available from osdl.org - URL is in linux-ia64 archive)

Ok I finaly find it as re-aim-7 sf.net project. Just launch (alltest) with
kernel-64bits+ C.patch . I will let run for the night and relaunch it with
 tommorrow morning with original kernel-64bits.

Thanks,
    Joel


------------------------------------------------------
Soldes Tiscali ADSL : 27,50 euros/mois jusque fin 2003.
On s'habitue vite à payer son ADSL moins cher!
Plus d'info? Cliquez ici... http://reg.tiscali.be/default.asp?lg=fr 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [parisc-linux] itlb miss handler optimizations!
  2003-08-12 16:06     ` Grant Grundler
  2003-08-12 16:32       ` Matthew Wilcox
  2003-08-12 17:06       ` Joel Soete
@ 2003-08-13 14:52       ` Joel Soete
  2003-08-13 15:56         ` Carlos O'Donell
  2 siblings, 1 reply; 23+ messages in thread
From: Joel Soete @ 2003-08-13 14:52 UTC (permalink / raw)
  To: Grant Grundler, Carlos O'Donell
  Cc: Matthew Wilcox, parisc-linux, LaMont Jones

>
>osdl-aim-7 benchmark probably stresses both itlb and dtlb.
>(available from osdl.org - URL is in linux-ia64 archive)

Well I finaly find it on sf.net (via osdl.org)

And submit some bench which seems to be more in relation with vm (?):

./reaim -x -t -f worfile.shared -r3

# with new itlb stuff I got following results

REAIM Workload
Times are in seconds - Child times from tms.cstime and tms.cutime

Num     Parent   Child   Child  Jobs per   Jobs/min/  Std_dev  Std_dev  JTI
Forked  Time     SysTime UTime   Minute     Child      Time     Percent 
1       32.72    6.17    4.42    177.87     177.87     0.00     0.00    
100  
2       33.16    10.19   9.13    351.03     175.51     0.30     0.90    
99   
3       37.01    14.71   13.94   471.76     157.25     0.04     0.12    
99   
4       43.87    19.44   18.61   530.66     132.66     1.46     3.43    
96   
5       49.55    24.26   23.08   587.29     117.46     1.79     3.76    
96   
6       57.60    28.08   27.99   606.25     101.04     1.33     2.36    
97   
7       67.86    33.32   32.46   600.35     85.76      1.82     2.73    
97   
8       77.51    37.77   37.58   600.70     75.09      1.01     1.32    
98   
9       86.11    41.69   42.05   608.29     67.59      1.21     1.42    
98   
10      96.33    47.01   46.67   604.17     60.42      1.59     1.67    
98   
Max sustained jobs reached
Max Jobs per Minute 608.29
REAIM Workload
Times are in seconds - Child times from tms.cstime and tms.cutime

Num     Parent   Child   Child  Jobs per   Jobs/min/  Std_dev  Std_dev  JTI
Forked  Time     SysTime UTime   Minute     Child      Time     Percent 
1       32.58    6.11    4.32    178.64     178.64     0.00     0.00    
100  
2       31.60    10.16   9.02    368.35     184.18     0.55     1.79    
98   
3       35.51    14.90   13.48   491.69     163.90     1.40     4.09    
95   
4       40.54    19.56   18.50   574.25     143.56     1.51     3.90    
96   
5       49.35    24.16   23.72   589.67     117.93     1.07     2.20    
97   
6       58.16    28.60   27.88   600.41     100.07     1.64     2.91    
97   
7       67.67    33.38   32.62   602.04     86.01      0.48     0.71    
99   
8       78.01    37.71   37.75   596.85     74.61      1.06     1.38    
98   
9       87.46    43.28   41.86   598.90     66.54      1.30     1.51    
98   
Max sustained jobs reached
Max Jobs per Minute 602.04
REAIM Workload
Times are in seconds - Child times from tms.cstime and tms.cutime

Num     Parent   Child   Child  Jobs per   Jobs/min/  Std_dev  Std_dev  JTI
Forked  Time     SysTime UTime   Minute     Child      Time     Percent 
1       32.27    5.60    4.56    180.35     180.35     0.00     0.00    
100  
2       33.02    10.50   9.03    352.51     176.26     0.55     1.68    
98   
3       33.39    15.14   13.68   522.91     174.30     0.54     1.66    
98   
4       39.93    19.55   18.46   583.02     145.76     0.66     1.69    
98   
5       50.01    24.32   23.12   581.88     116.38     1.67     3.46    
96   
6       58.70    28.73   28.27   594.89     99.15      0.42     0.72    
99   
7       66.99    32.31   32.78   608.15     86.88      1.50     2.28    
97   
8       76.04    37.00   37.12   612.31     76.54      0.74     0.99    
99   
9       86.42    42.29   41.68   606.11     67.35      0.93     1.09    
98   
10      96.15    45.99   46.82   605.30     60.53      1.43     1.51    
98   
Max sustained jobs reached
Max Jobs per Minute 612.31

# with the original itlb

REAIM Workload
Times are in seconds - Child times from tms.cstime and tms.cutime

Num     Parent   Child   Child  Jobs per   Jobs/min/  Std_dev  Std_dev  JTI
Forked  Time     SysTime UTime   Minute     Child      Time     Percent 
1       32.37    5.79    4.42    179.80     179.80     0.00     0.00    
100  
2       33.28    10.07   9.33    349.76     174.88     0.06     0.20    
99   
3       36.37    15.30   13.73   480.07     160.02     0.21     0.58    
99   
4       41.49    19.60   18.37   561.10     140.27     2.15     5.38    
94   
5       49.21    24.03   23.40   591.34     118.27     1.26     2.62    
97   
6       59.48    29.56   27.85   587.09     97.85      2.44     4.26    
95   
7       68.38    32.96   32.47   595.79     85.11      0.96     1.43    
98   
8       76.53    36.48   37.74   608.39     76.05      2.08     2.80    
97   
9       86.39    41.80   41.91   606.32     67.37      0.86     1.01    
98   
10      95.58    45.96   46.90   608.91     60.89      1.24     1.31    
98   
11      104.80   50.56   51.60   610.88     55.53      1.21     1.17    
98   
Max sustained jobs reached
Max Jobs per Minute 610.88
REAIM Workload
Times are in seconds - Child times from tms.cstime and tms.cutime

Num     Parent   Child   Child  Jobs per   Jobs/min/  Std_dev  Std_dev  JTI
Forked  Time     SysTime UTime   Minute     Child      Time     Percent 
1       32.05    5.66    4.47    181.59     181.59     0.00     0.00    
100  
2       33.90    10.52   9.13    343.36     171.68     0.45     1.35    
98   
3       38.53    14.55   13.84   453.15     151.05     0.44     1.16    
98   
4       40.38    18.64   18.49   576.52     144.13     0.54     1.35    
98   
5       48.82    23.39   23.26   596.07     119.21     1.71     3.62    
96   
6       58.57    28.88   28.01   596.21     99.37      0.63     1.08    
98   
7       67.80    32.98   32.77   600.88     85.84      2.61     3.98    
96   
8       76.49    36.85   37.50   608.71     76.09      1.14     1.51    
98   
9       87.04    42.31   42.12   601.79     66.87      2.82     3.31    
96   
Max sustained jobs reached
Max Jobs per Minute 608.71
REAIM Workload
Times are in seconds - Child times from tms.cstime and tms.cutime

Num     Parent   Child   Child  Jobs per   Jobs/min/  Std_dev  Std_dev  JTI
Forked  Time     SysTime UTime   Minute     Child      Time     Percent 
1       32.54    5.63    4.59    178.86     178.86     0.00     0.00    
100  
2       33.68    10.74   9.00    345.61     172.80     0.32     0.97    
99   
3       35.66    14.41   13.98   489.62     163.21     1.46     4.26    
95   
4       42.70    19.24   18.43   545.20     136.30     2.18     5.35    
94   
5       49.30    23.86   23.33   590.26     118.05     1.31     2.78    
97   
6       57.14    27.50   28.11   611.13     101.86     1.56     2.81    
97   
7       66.99    31.90   33.07   608.15     86.88      0.70     1.06    
98   
8       77.60    37.64   37.46   600.00     75.00      0.99     1.31    
98   
9       85.76    41.25   42.08   610.77     67.86      1.50     1.78    
98   
10      97.18    47.80   46.20   598.89     59.89      1.48     1.56    
98   
 Job rate dropping avg: 605.79 loss pct: 1.14
Max Jobs per Minute 611.13

AND

./reaim -q -t -f worfile.shared -r3

# with new itlb stuff I got following results

REAIM Workload
Times are in seconds - Child times from tms.cstime and tms.cutime

Num     Parent   Child   Child  Jobs per   Jobs/min/  Std_dev  Std_dev  JTI
Forked  Time     SysTime UTime   Minute     Child      Time     Percent 
1       32.83    6.03    4.51    177.28     177.28     0.00     0.00    
100  
2       33.20    10.88   8.84    350.60     175.30     0.67     2.06    
97   
3       35.94    15.23   13.87   485.81     161.94     1.20     3.47    
96   
4       43.43    19.65   18.23   536.03     134.01     1.01     2.37    
97   
5       49.43    24.33   23.21   588.71     117.74     0.62     1.27    
98   
6       58.17    28.72   27.79   600.31     100.05     1.12     1.96    
98   
7       66.74    32.72   32.14   610.43     87.20      1.95     2.98    
97   
8       77.11    37.67   37.27   603.81     75.48      1.41     1.86    
98   
9       86.75    42.45   41.75   603.80     67.09      0.95     1.11    
98   
10      95.95    46.78   46.57   606.57     60.66      1.09     1.16    
98   
11      105.20   50.86   51.51   608.56     55.32      1.72     1.66    
98   
Crossover achieved
Max Jobs per Minute 610.43
REAIM Workload
Times are in seconds - Child times from tms.cstime and tms.cutime

Num     Parent   Child   Child  Jobs per   Jobs/min/  Std_dev  Std_dev  JTI
Forked  Time     SysTime UTime   Minute     Child      Time     Percent 
1       32.73    6.28    4.47    177.82     177.82     0.00     0.00    
100  
2       33.55    10.97   8.96    346.94     173.47     0.24     0.74    
99   
3       34.55    15.24   13.70   505.35     168.45     1.20     3.58    
96   
4       40.15    19.14   18.33   579.83     144.96     0.88     2.22    
97   
5       49.94    24.48   23.26   582.70     116.54     1.00     2.05    
97   
6       60.80    28.86   27.50   574.34     95.72      0.81     1.35    
98   
7       67.28    33.01   32.08   605.53     86.50      1.47     2.24    
97   
8       77.15    37.59   37.02   603.50     75.44      1.37     1.80    
98   
9       85.40    40.94   42.18   613.35     68.15      1.75     2.09    
97   
10      95.15    45.89   46.72   611.67     61.17      1.11     1.19    
98   
11      104.86   50.49   51.54   610.53     55.50      1.49     1.44    
98   
Crossover achieved
Max Jobs per Minute 613.35
REAIM Workload
Times are in seconds - Child times from tms.cstime and tms.cutime

Num     Parent   Child   Child  Jobs per   Jobs/min/  Std_dev  Std_dev  JTI
Forked  Time     SysTime UTime   Minute     Child      Time     Percent 
1       33.13    6.50    4.44    175.67     175.67     0.00     0.00    
100  
2       32.23    10.82   8.89    361.15     180.58     0.23     0.72    
99   
3       35.72    15.39   13.76   488.80     162.93     2.12     6.37    
93   
4       42.04    20.43   18.52   553.76     138.44     3.55     8.94    
91   
5       49.50    24.03   23.06   587.88     117.58     2.50     5.26    
94   
6       57.70    27.79   28.01   605.20     100.87     1.45     2.60    
97   
7       67.37    32.72   32.59   604.72     86.39      0.68     1.03    
98   
8       76.52    36.82   37.57   608.47     76.06      0.90     1.19    
98   
9       86.14    42.06   41.87   608.08     67.56      0.87     1.03    
98   
10      96.24    47.16   46.63   604.74     60.47      1.62     1.72    
98   
11      105.27   51.04   51.32   608.15     55.29      1.29     1.23    
98   
Crossover achieved
Max Jobs per Minute 608.47

# with the original itlb


REAIM Workload
Times are in seconds - Child times from tms.cstime and tms.cutime

Num     Parent   Child   Child  Jobs per   Jobs/min/  Std_dev  Std_dev  JTI
Forked  Time     SysTime UTime   Minute     Child      Time     Percent 
1       32.35    5.94    4.50    179.91     179.91     0.00     0.00    
100  
2       31.31    10.80   9.27    371.77     185.88     0.40     1.29    
98   
3       34.63    15.23   13.94   504.19     168.06     0.25     0.74    
99   
4       40.22    18.84   18.58   578.82     144.70     1.07     2.74    
97   
5       48.83    23.89   23.31   595.95     119.19     0.84     1.76    
98   
6       58.67    28.74   28.11   595.19     99.20      1.96     3.46    
96   
7       68.28    32.71   33.19   596.66     85.24      1.02     1.52    
98   
8       77.41    37.61   37.58   601.47     75.18      0.76     1.00    
99   
9       86.21    41.55   42.28   607.59     67.51      1.44     1.71    
98   
10      95.30    45.91   46.78   610.70     61.07      1.46     1.56    
98   
11      106.35   51.52   51.38   601.97     54.72      0.82     0.78    
99   
Crossover achieved
Max Jobs per Minute 610.70
REAIM Workload
Times are in seconds - Child times from tms.cstime and tms.cutime

Num     Parent   Child   Child  Jobs per   Jobs/min/  Std_dev  Std_dev  JTI
Forked  Time     SysTime UTime   Minute     Child      Time     Percent 
1       32.33    5.88    4.46    180.02     180.02     0.00     0.00    
100  
2       31.53    10.07   9.19    369.17     184.59     0.38     1.20    
98   
3       35.76    14.74   14.01   488.26     162.75     1.13     3.23    
96   
4       41.04    19.34   18.52   567.25     141.81     1.47     3.70    
96   
5       49.68    24.37   23.11   585.75     117.15     1.32     2.72    
97   
6       58.14    27.69   28.40   600.62     100.10     0.64     1.12    
98   
7       68.26    33.09   33.00   596.84     85.26      0.87     1.30    
98   
8       77.46    37.61   37.39   601.08     75.14      0.85     1.11    
98   
9       86.90    42.44   42.03   602.76     66.97      1.51     1.76    
98   
10      95.84    46.22   46.81   607.26     60.73      1.11     1.18    
98   
11      106.16   51.65   51.22   603.05     54.82      1.32     1.26    
98   
Crossover achieved
Max Jobs per Minute 607.26
REAIM Workload
Times are in seconds - Child times from tms.cstime and tms.cutime

Num     Parent   Child   Child  Jobs per   Jobs/min/  Std_dev  Std_dev  JTI
Forked  Time     SysTime UTime   Minute     Child      Time     Percent 
1       32.64    5.95    4.62    178.31     178.31     0.00     0.00    
100  
2       32.83    10.56   9.20    354.55     177.28     0.27     0.83    
99   
3       36.69    15.87   13.83   475.88     158.63     0.15     0.41    
99   
4       40.89    19.04   18.71   569.33     142.33     0.21     0.51    
99   
5       49.90    23.61   23.39   583.17     116.63     0.60     1.22    
98   
6       59.07    28.39   28.51   591.16     98.53      1.77     3.09    
96   
7       67.94    32.63   32.75   599.65     85.66      1.05     1.58    
98   
8       76.17    36.23   37.74   611.26     76.41      3.08     4.15    
95   
9       86.15    41.37   42.21   608.01     67.56      0.97     1.14    
98   
10      95.27    46.00   46.57   610.90     61.09      1.29     1.37    
98   
11      105.61   51.56   51.21   606.19     55.11      0.79     0.76    
99   
Crossover achieved
Max Jobs per Minute 611.26

===========================

(Carlos, I have no really clue about bench, so if you find some other test
that will better respond to your expectations, do not hesitate ... I will
try to do my best :) )

Joel



-------------------------------------------------------------------------
Tiscali ADSL, seulement 35 eur/mois et le modem est inclus...abonnez-vous!
http://reg.tiscali.be/default.asp?lg=fr 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [parisc-linux] itlb miss handler optimizations!
  2003-08-13 14:52       ` Joel Soete
@ 2003-08-13 15:56         ` Carlos O'Donell
  2003-08-13 16:05           ` Carlos O'Donell
  0 siblings, 1 reply; 23+ messages in thread
From: Carlos O'Donell @ 2003-08-13 15:56 UTC (permalink / raw)
  To: parisc-linux

> Well I finaly find it on sf.net (via osdl.org)
> And submit some bench which seems to be more in relation with vm (?):
> ./reaim -x -t -f worfile.shared -r3

Thanks for that run Joel, I'll take a look at the numbers in a few
minutes. Adding to that here is the lmbench results (urls) for both
the non-optimized and optimized cases of the itlb fault handler.

It seems that some things got faster, or rather more predictably fast
within the confidence levels (e.g. number of tests that I ran).

I ran 10 lmbench run's for each of the two kernels and then munged them
using the stat-summary script provided with lmbench. Do the diff to see
the numbers change :)

Looks like we have better performance in many places.

null call, null i/o, stat, open/close, select, signal install, signal
catch, exec, shell proc, (a variety of the process spawning tests),
create, delete, mmap latency, page fault (way down! and deterministic)
-- All got better with itlb branch prediction optimization

Please give it a double check to make sure I'm not out of it this
morning.

http://www.baldric.uwo.ca/~carlos/itlb-opt.txt
http://www.baldric.uwo.ca/~carlos/no-itlb-opt.txt

c.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [parisc-linux] itlb miss handler optimizations!
  2003-08-12 17:06       ` Joel Soete
@ 2003-08-13 15:57         ` Grant Grundler
  2003-08-13 16:38           ` Joel Soete
  0 siblings, 1 reply; 23+ messages in thread
From: Grant Grundler @ 2003-08-13 15:57 UTC (permalink / raw)
  To: Joel Soete; +Cc: parisc-linux

On Tue, Aug 12, 2003 at 07:06:44PM +0200, Joel Soete wrote:
> >
> >osdl-aim-7 benchmark probably stresses both itlb and dtlb.
> >(available from osdl.org - URL is in linux-ia64 archive)
> 
> Ok I finaly find it as re-aim-7 sf.net project.

I also pulled bits from sf.net for ia64 testing:
    http://umn.dl.sourceforge.net/sourceforge/re-aim-7/reaim-0.1.8.tar.gz

though I know the maintainer planed to change the name since then:
| Apoligies, the naming was not good.
| I'll attempt to change it. ( i like osdl-aim-7 )
| In the meantime, you can also get the source from
| 
| bk://developer.osdl.org/reaim
| cliffw


> Just launch (alltest) with
> kernel-64bits+ C.patch . I will let run for the night and relaunch it with
>  tommorrow morning with original kernel-64bits.

I was told to run with:
iota:/mnt# reaim -f /mnt/usr/local/share/reaim/workfile.new_dbase -s100 -e 500 -i 100

You might need to pick -s and -e values that are more appropriate for
the machine under test. This was for dual CPU 900Mz rx2600 with 2GB RAM.

hth,
grant

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [parisc-linux] itlb miss handler optimizations!
  2003-08-13 15:56         ` Carlos O'Donell
@ 2003-08-13 16:05           ` Carlos O'Donell
  2003-08-13 16:43             ` Joel Soete
  2003-08-14  6:02             ` Joel Soete
  0 siblings, 2 replies; 23+ messages in thread
From: Carlos O'Donell @ 2003-08-13 16:05 UTC (permalink / raw)
  To: parisc-linux

> Thanks for that run Joel, I'll take a look at the numbers in a few
> minutes. Adding to that here is the lmbench results (urls) for both
> the non-optimized and optimized cases of the itlb fault handler.

I forgot that lamont noticed an interlocked zdep in the PA11 common case
and we reorderd the insn sequence. Perhaps this helped the numbers a
bit. I have to run the same lmbench tests on a 64-bit box first to
compare.

c.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [parisc-linux] itlb miss handler optimizations!
  2003-08-13 15:57         ` Grant Grundler
@ 2003-08-13 16:38           ` Joel Soete
  0 siblings, 0 replies; 23+ messages in thread
From: Joel Soete @ 2003-08-13 16:38 UTC (permalink / raw)
  To: Grant Grundler; +Cc: parisc-linux

>I also pulled bits from sf.net for ia64 testing:
>    http://umn.dl.sourceforge.net/sourceforge/re-aim-7/reaim-0.1.8.tar.gz

I will also catch it


>I was told to run with:
>iota:/mnt# reaim -f /mnt/usr/local/share/reaim/workfile.new_dbase -s100
-e 500 > -i 100

>You might need to pick -s and -e values that are more appropriate for
>the machine under test. This
> was for dual CPU 900Mz rx2600 with 2GB RAM.

That is also what I just read on a bench on lwn :) 

I will adapt

Thanks,
    Joel

-------------------------------------------------------------------------
Tiscali ADSL, seulement 35 eur/mois et le modem est inclus...abonnez-vous!
http://reg.tiscali.be/default.asp?lg=fr 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [parisc-linux] itlb miss handler optimizations!
  2003-08-13 16:05           ` Carlos O'Donell
@ 2003-08-13 16:43             ` Joel Soete
  2003-08-13 16:51               ` Grant Grundler
  2003-08-14  6:02             ` Joel Soete
  1 sibling, 1 reply; 23+ messages in thread
From: Joel Soete @ 2003-08-13 16:43 UTC (permalink / raw)
  To: Carlos O'Donell, parisc-linux

>I forgot that lamont noticed an interlocked zdep in the PA11 common case

hmm what is an interlock? (just to complet my knowledge)

>and we reorderd the insn sequence. Perhaps this helped the numbers a
>bit. 
If I well understand reverse the zdep move in the code?
(I would play just a bit more with different bench)

>I have to run the same lmbench tests on a 64-bit box first to
>compare.

Nice I hope that will confirm lmbench results :)

Joel


-------------------------------------------------------------------------
Tiscali ADSL, seulement 35 eur/mois et le modem est inclus...abonnez-vous!
http://reg.tiscali.be/default.asp?lg=fr 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [parisc-linux] itlb miss handler optimizations!
  2003-08-13 16:43             ` Joel Soete
@ 2003-08-13 16:51               ` Grant Grundler
  0 siblings, 0 replies; 23+ messages in thread
From: Grant Grundler @ 2003-08-13 16:51 UTC (permalink / raw)
  To: Joel Soete; +Cc: parisc-linux

On Wed, Aug 13, 2003 at 06:43:27PM +0200, Joel Soete wrote:
> hmm what is an interlock? (just to complet my knowledge)

It's the logic in a CPU to stall an instruction which is waiting
for the results of a previous instruction.
ie the register contents used by the second instruction are not valid
until any previous instruction actually completes.

grant

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [parisc-linux] itlb miss handler optimizations!
  2003-08-13 16:05           ` Carlos O'Donell
  2003-08-13 16:43             ` Joel Soete
@ 2003-08-14  6:02             ` Joel Soete
  2003-08-14 11:46               ` Matthew Wilcox
  1 sibling, 1 reply; 23+ messages in thread
From: Joel Soete @ 2003-08-14  6:02 UTC (permalink / raw)
  To: Carlos O'Donell, parisc-linux

>
>I forgot that lamont noticed an interlocked zdep in the PA11 common case
>and we reorderd the insn sequence. Perhaps this helped the numbers a
>bit. I have to run the same lmbench tests on a 64-bit box first to
>compare.

btw is it for that reason (interlock) that in your patch we can read:
[...]
        cmpb,=        %r0,t0,itlb_miss_...
        nop
[...]
I am alway asking why the 'nop'.

Thanks in advance,
    Joel



-------------------------------------------------------------------------
Tiscali ADSL, seulement 35 eur/mois et le modem est inclus...abonnez-vous!
http://reg.tiscali.be/default.asp?lg=fr 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [parisc-linux] itlb miss handler optimizations!
  2003-08-14  6:02             ` Joel Soete
@ 2003-08-14 11:46               ` Matthew Wilcox
  2003-08-14 13:56                 ` Joel Soete
  0 siblings, 1 reply; 23+ messages in thread
From: Matthew Wilcox @ 2003-08-14 11:46 UTC (permalink / raw)
  To: Joel Soete; +Cc: Carlos O'Donell, parisc-linux

On Thu, Aug 14, 2003 at 08:02:04AM +0200, Joel Soete wrote:
> btw is it for that reason (interlock) that in your patch we can read:
> [...]
>         cmpb,=        %r0,t0,itlb_miss_...
>         nop
> [...]
> I am alway asking why the 'nop'.

To fill the delayed branch slot (a silly idea, but ...)

-- 
"It's not Hollywood.  War is real, war is primarily not about defeat or
victory, it is about death.  I've seen thousands and thousands of dead bodies.
Do you think I want to have an academic debate on this subject?" -- Robert Fisk

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [parisc-linux] itlb miss handler optimizations!
  2003-08-14 11:46               ` Matthew Wilcox
@ 2003-08-14 13:56                 ` Joel Soete
  2003-08-14 15:23                   ` Grant Grundler
  0 siblings, 1 reply; 23+ messages in thread
From: Joel Soete @ 2003-08-14 13:56 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Carlos O'Donell, parisc-linux

>This fill the delayed branch slot (a silly idea, but ...)

Ha Ok (yet another concept for me ;-).  I will look for into architecture
books)

For the moment I am studying TLB miss handling and already have a lot of
questions.

But the very first one: H/W or S/W management (I refer to page 3-9 parisc-2.0:
Adress Resolution and the TLB)?

In the 2 cases, if a fault occurs an interrupt (6, 15, 16,17 or 20) is 'triggered'?

Is a printk() in corresponding handle_interruption() case (kernel/traps.c)
would help?
(if that work, how may I know which processor causes fault?)

Thanks again,
    Joel


-------------------------------------------------------------------------
Tiscali ADSL, seulement 35 eur/mois et le modem est inclus...abonnez-vous!
http://reg.tiscali.be/default.asp?lg=fr 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [parisc-linux] itlb miss handler optimizations!
  2003-08-14 13:56                 ` Joel Soete
@ 2003-08-14 15:23                   ` Grant Grundler
  2003-08-14 16:15                     ` Joel Soete
  0 siblings, 1 reply; 23+ messages in thread
From: Grant Grundler @ 2003-08-14 15:23 UTC (permalink / raw)
  To: Joel Soete; +Cc: parisc-linux

On Thu, Aug 14, 2003 at 03:56:42PM +0200, Joel Soete wrote:
> In the 2 cases, if a fault occurs an interrupt (6, 15, 16,17 or 20) is
> 'triggered'?

yes, but most people call them "traps" or "faults" to be more specific.
Chapter 5 of the PA 2.0 arch book differentiates nicely.

grant

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [parisc-linux] itlb miss handler optimizations!
  2003-08-14 15:23                   ` Grant Grundler
@ 2003-08-14 16:15                     ` Joel Soete
  0 siblings, 0 replies; 23+ messages in thread
From: Joel Soete @ 2003-08-14 16:15 UTC (permalink / raw)
  To: Grant Grundler; +Cc: parisc-linux

 
>> 
>> On Thu, Aug 14, 2003 at 03:56:42PM +0200, Joel Soete wrote:
>> In the 2 cases, if a fault occurs an interrupt (6, 15, 16,17 or 20) is
>> 'triggered'?
>>
>yes, but most people call them "traps" or "faults" to be more specific.
>Chapter 5 of the PA 2.0 arch book differentiates nicely.

Yes (that is from where came 6,15,.. reference).
Oh yes, my bad: I didn't notice that here interrupt is not the shortcut of
interruptions as I use to make the mishmash :(

Thanks,
    Joel


-------------------------------------------------------------------------
Tiscali ADSL, seulement 35 eur/mois et le modem est inclus...abonnez-vous!
http://reg.tiscali.be/default.asp?lg=fr 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [parisc-linux] itlb miss handler optimizations!
@ 2003-08-19 12:33 Joel Soete
  2003-08-19 13:42 ` Matthew Wilcox
  0 siblings, 1 reply; 23+ messages in thread
From: Joel Soete @ 2003-08-19 12:33 UTC (permalink / raw)
  To: parisc-linux; +Cc: Matthew Wilcox

>On Thu, Aug 14, 2003 at 08:02:04AM +0200, Joel Soete wrote:
>> btw is it for that reason (interlock) that in your patch we can read:
>> [...]
>>         cmpb,=        %r0,t0,itlb_miss_...
>>         nop
>> [...]
>> I am alway asking why the 'nop'.
>
>To fill the delayed branch slot (a silly idea, but ...)

Would it be the same 'by setting the "nullify" bit' (ie cmpb,=,n ...)

Joel




-------------------------------------------------------------------------
Tiscali ADSL, seulement 35 eur/mois et le modem est inclus...abonnez-vous!
http://reg.tiscali.be/default.asp?lg=fr 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [parisc-linux] itlb miss handler optimizations!
  2003-08-19 12:33 Joel Soete
@ 2003-08-19 13:42 ` Matthew Wilcox
  0 siblings, 0 replies; 23+ messages in thread
From: Matthew Wilcox @ 2003-08-19 13:42 UTC (permalink / raw)
  To: Joel Soete; +Cc: parisc-linux, Matthew Wilcox

On Tue, Aug 19, 2003 at 02:33:04PM +0200, Joel Soete wrote:
> >On Thu, Aug 14, 2003 at 08:02:04AM +0200, Joel Soete wrote:
> >> btw is it for that reason (interlock) that in your patch we can read:
> >> [...]
> >>         cmpb,=        %r0,t0,itlb_miss_...
> >>         nop
> >> [...]
> >> I am alway asking why the 'nop'.
> >
> >To fill the delayed branch slot (a silly idea, but ...)
> 
> Would it be the same 'by setting the "nullify" bit' (ie cmpb,=,n ...)

Only if you know which direction the branch is going.  See the cmpb
description.

-- 
"It's not Hollywood.  War is real, war is primarily not about defeat or
victory, it is about death.  I've seen thousands and thousands of dead bodies.
Do you think I want to have an academic debate on this subject?" -- Robert Fisk

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2003-08-19 13:42 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-07-25  7:04 [parisc-linux] itlb miss handler optimizations! Carlos O'Donell
2003-07-25 11:46 ` Matthew Wilcox
2003-07-26 18:02   ` Carlos O'Donell
2003-08-12  3:58   ` Carlos O'Donell
2003-08-12 12:21     ` Joel Soete
2003-08-12 14:40       ` Carlos O'Donell
2003-08-12 16:06     ` Grant Grundler
2003-08-12 16:32       ` Matthew Wilcox
2003-08-12 17:06       ` Joel Soete
2003-08-13 15:57         ` Grant Grundler
2003-08-13 16:38           ` Joel Soete
2003-08-13 14:52       ` Joel Soete
2003-08-13 15:56         ` Carlos O'Donell
2003-08-13 16:05           ` Carlos O'Donell
2003-08-13 16:43             ` Joel Soete
2003-08-13 16:51               ` Grant Grundler
2003-08-14  6:02             ` Joel Soete
2003-08-14 11:46               ` Matthew Wilcox
2003-08-14 13:56                 ` Joel Soete
2003-08-14 15:23                   ` Grant Grundler
2003-08-14 16:15                     ` Joel Soete
  -- strict thread matches above, loose matches on Subject: below --
2003-08-19 12:33 Joel Soete
2003-08-19 13:42 ` Matthew Wilcox

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.