linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] [2.6.18] U4 DART improvements
@ 2006-06-02  4:04 Olof Johansson
  2006-06-02 12:44 ` Segher Boessenkool
  0 siblings, 1 reply; 4+ messages in thread
From: Olof Johansson @ 2006-06-02  4:04 UTC (permalink / raw)
  To: paulus; +Cc: linuxppc-dev

Hi Paul,

I've given this a good beating tonight, since I finally got a pci-e
ethernet card for the new g5 to do loopback with. Profiles looked good,
I think we're good to go for 2.6.18 with this, especially if it gets
in early for extra testing.


-Olof

---

Implement single-entry TLB invalidations in the U4 DART.

Simple benchmarking with loopback flood pings of various sizes show that
the previous flush-all code will spend ~5% of the time in flush, while
the new selective invalidations will spend about an order of magnitude
less in the same code path.

This could possibly mean that invalidations larger than, say, 16
entries would better be handled in bulk, but until we have a workload
that actually shows problems or bottlenecks let's keep doing single
invalidations at all mapping sizes.

Signed-off-by: Olof Johansson <olof@lixom.net>


diff --git a/arch/powerpc/sysdev/dart.h b/arch/powerpc/sysdev/dart.h
index c2d0576..1c8817c 100644
Index: 2.6.17-rc5-git8/arch/powerpc/sysdev/dart.h
===================================================================
--- 2.6.17-rc5-git8.orig/arch/powerpc/sysdev/dart.h
+++ 2.6.17-rc5-git8/arch/powerpc/sysdev/dart.h
@@ -47,8 +47,12 @@
 /* U4 registers */
 #define DART_BASE_U4_BASE_MASK	0xffffff
 #define DART_BASE_U4_BASE_SHIFT	0
-#define DART_CNTL_U4_FLUSHTLB	0x20000000
 #define DART_CNTL_U4_ENABLE	0x80000000
+#define DART_CNTL_U4_IONE	0x40000000
+#define DART_CNTL_U4_FLUSHTLB	0x20000000
+#define DART_CNTL_U4_IDLE	0x10000000
+#define DART_CNTL_U4_PAR_EN	0x08000000
+#define DART_CNTL_U4_IONE_MASK	0x07ffffff
 #define DART_SIZE_U4_SIZE_MASK	0x1fff
 #define DART_SIZE_U4_SIZE_SHIFT	0
 
Index: 2.6.17-rc5-git8/arch/powerpc/sysdev/dart_iommu.c
===================================================================
--- 2.6.17-rc5-git8.orig/arch/powerpc/sysdev/dart_iommu.c
+++ 2.6.17-rc5-git8/arch/powerpc/sysdev/dart_iommu.c
@@ -101,8 +101,8 @@ retry:
 	if (l == (1L << limit)) {
 		if (limit < 4) {
 			limit++;
-		        reg = DART_IN(DART_CNTL);
-		        reg &= ~inv_bit;
+			reg = DART_IN(DART_CNTL);
+			reg &= ~inv_bit;
 			DART_OUT(DART_CNTL, reg);
 			goto retry;
 		} else
@@ -111,11 +111,40 @@ retry:
 	}
 }
 
+static inline void dart_tlb_invalidate_one(unsigned long bus_rpn)
+{
+	unsigned int reg;
+	unsigned int l, limit;
+
+	reg = DART_CNTL_U4_ENABLE | DART_CNTL_U4_IONE |
+		(bus_rpn & DART_CNTL_U4_IONE_MASK);
+	DART_OUT(DART_CNTL, reg);
+	mb();
+
+	limit = 0;
+wait_more:
+	l = 0;
+	while ((DART_IN(DART_CNTL) & DART_CNTL_U4_IONE) && l < (1L << limit)) {
+		rmb();
+		l++;
+	}
+
+	if (l == (1L << limit)) {
+		if (limit < 4) {
+			limit++;
+			goto wait_more;
+		} else
+			panic("DART: TLB did not flush after waiting a long "
+			      "time. Buggy U4 ?");
+	}
+}
+
 static void dart_flush(struct iommu_table *tbl)
 {
-	if (dart_dirty)
+	if (dart_dirty) {
 		dart_tlb_invalidate_all();
-	dart_dirty = 0;
+		dart_dirty = 0;
+	}
 }
 
 static void dart_build(struct iommu_table *tbl, long index,
@@ -124,6 +153,7 @@ static void dart_build(struct iommu_tabl
 {
 	unsigned int *dp;
 	unsigned int rpn;
+	long l;
 
 	DBG("dart: build at: %lx, %lx, addr: %x\n", index, npages, uaddr);
 
@@ -135,7 +165,8 @@ static void dart_build(struct iommu_tabl
 	/* On U3, all memory is contigous, so we can move this
 	 * out of the loop.
 	 */
-	while (npages--) {
+	l = npages;
+	while (l--) {
 		rpn = virt_to_abs(uaddr) >> DART_PAGE_SHIFT;
 
 		*(dp++) = DARTMAP_VALID | (rpn & DARTMAP_RPNMASK);
@@ -143,7 +174,14 @@ static void dart_build(struct iommu_tabl
 		uaddr += DART_PAGE_SIZE;
 	}
 
-	dart_dirty = 1;
+	if (dart_is_u4) {
+		rpn = index;
+		mb(); /* make sure all updates have reached memory */
+		while (npages--)
+			dart_tlb_invalidate_one(rpn++);
+	} else {
+		dart_dirty = 1;
+	}
 }
 
 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] [2.6.18] U4 DART improvements
  2006-06-02  4:04 [PATCH] [2.6.18] U4 DART improvements Olof Johansson
@ 2006-06-02 12:44 ` Segher Boessenkool
  2006-06-03  0:40   ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 4+ messages in thread
From: Segher Boessenkool @ 2006-06-02 12:44 UTC (permalink / raw)
  To: Olof Johansson; +Cc: linuxppc-dev, paulus

Hi Olof,

Looks good.  One request:

> +static inline void dart_tlb_invalidate_one(unsigned long bus_rpn)
> +{
> +	unsigned int reg;
> +	unsigned int l, limit;
> +
> +	reg = DART_CNTL_U4_ENABLE | DART_CNTL_U4_IONE |
> +		(bus_rpn & DART_CNTL_U4_IONE_MASK);
> +	DART_OUT(DART_CNTL, reg);
> +	mb();

Could you please comment the memory barriers, to say exactly _why_ a
certain barrier is needed?  I can't see why wmb() wouldn't work here,
for example (note I'm not saying it would -- I just don't see why it
wouldn't).

Same goes for every single memory barrier in the whole kernel source
code, but I have to start somewhere, heh.


Segher

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] [2.6.18] U4 DART improvements
  2006-06-02 12:44 ` Segher Boessenkool
@ 2006-06-03  0:40   ` Benjamin Herrenschmidt
  2006-06-03  3:28     ` Olof Johansson
  0 siblings, 1 reply; 4+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-03  0:40 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Olof Johansson, linuxppc-dev, paulus

On Fri, 2006-06-02 at 14:44 +0200, Segher Boessenkool wrote:
> Hi Olof,
> 
> Looks good.  One request:
> 
> > +static inline void dart_tlb_invalidate_one(unsigned long bus_rpn)
> > +{
> > +	unsigned int reg;
> > +	unsigned int l, limit;
> > +
> > +	reg = DART_CNTL_U4_ENABLE | DART_CNTL_U4_IONE |
> > +		(bus_rpn & DART_CNTL_U4_IONE_MASK);
> > +	DART_OUT(DART_CNTL, reg);
> > +	mb();
> 
> Could you please comment the memory barriers, to say exactly _why_ a
> certain barrier is needed?  I can't see why wmb() wouldn't work here,
> for example (note I'm not saying it would -- I just don't see why it
> wouldn't).
> 
> Same goes for every single memory barrier in the whole kernel source
> code, but I have to start somewhere, heh.

In fact I doubt we need a barrier at all...

Ben.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] [2.6.18] U4 DART improvements
  2006-06-03  0:40   ` Benjamin Herrenschmidt
@ 2006-06-03  3:28     ` Olof Johansson
  0 siblings, 0 replies; 4+ messages in thread
From: Olof Johansson @ 2006-06-03  3:28 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, paulus

On Sat, Jun 03, 2006 at 10:40:38AM +1000, Benjamin Herrenschmidt wrote:

> In fact I doubt we need a barrier at all...

Yeah, I'm not sure what I was thinking. There's need for a barrier
before the invalidation to make sure that a stale entry doesn't get
re-entered into the TLB, and it's already done before the call. I have
no good explanation for why I added that one.

I'll respin, test and repost the patch tomorrow.


-Olof

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2006-06-03  3:28 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-02  4:04 [PATCH] [2.6.18] U4 DART improvements Olof Johansson
2006-06-02 12:44 ` Segher Boessenkool
2006-06-03  0:40   ` Benjamin Herrenschmidt
2006-06-03  3:28     ` Olof Johansson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).