linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] powerpc: Add hcall to read 4 ptes at a time in real mode
@ 2010-05-11  6:28 Michael Neuling
  2010-05-11  6:28 ` [PATCH 2/2] powerpc,kexec: Speedup kexec hpte tear down Michael Neuling
  0 siblings, 1 reply; 9+ messages in thread
From: Michael Neuling @ 2010-05-11  6:28 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, kexec, Anton Blanchard

This adds plpar_pte_read_4_raw() which can be used read 4 PTEs from
PHYP at a time, while in real mode.

It also creates a new hcall9 which can be used in real mode.  It's the
same as plpar_hcall9 but minus the tracing hcall statistics which may
require variables outside the RMO.

Signed-off-by: Michael Neuling <mikey@neuling.org>
---

 arch/powerpc/include/asm/hvcall.h               |    1 
 arch/powerpc/platforms/pseries/hvCall.S         |   38 ++++++++++++++++++++++++
 arch/powerpc/platforms/pseries/plpar_wrappers.h |   18 +++++++++++
 3 files changed, 57 insertions(+)

Index: linux-2.6-ozlabs/arch/powerpc/include/asm/hvcall.h
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/include/asm/hvcall.h
+++ linux-2.6-ozlabs/arch/powerpc/include/asm/hvcall.h
@@ -281,6 +281,7 @@ long plpar_hcall_raw(unsigned long opcod
  */
 #define PLPAR_HCALL9_BUFSIZE 9
 long plpar_hcall9(unsigned long opcode, unsigned long *retbuf, ...);
+long plpar_hcall9_raw(unsigned long opcode, unsigned long *retbuf, ...);
 
 /* For hcall instrumentation.  One structure per-hcall, per-CPU */
 struct hcall_stats {
Index: linux-2.6-ozlabs/arch/powerpc/platforms/pseries/hvCall.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/platforms/pseries/hvCall.S
+++ linux-2.6-ozlabs/arch/powerpc/platforms/pseries/hvCall.S
@@ -228,3 +228,41 @@ _GLOBAL(plpar_hcall9)
 	mtcrf	0xff,r0
 
 	blr				/* return r3 = status */
+
+/* See plpar_hcall_raw to see why this is needed */
+_GLOBAL(plpar_hcall9_raw)
+	HMT_MEDIUM
+
+	mfcr	r0
+	stw	r0,8(r1)
+
+	std     r4,STK_PARM(r4)(r1)     /* Save ret buffer */
+
+	mr	r4,r5
+	mr	r5,r6
+	mr	r6,r7
+	mr	r7,r8
+	mr	r8,r9
+	mr	r9,r10
+	ld	r10,STK_PARM(r11)(r1)	 /* put arg7 in R10 */
+	ld	r11,STK_PARM(r12)(r1)	 /* put arg8 in R11 */
+	ld	r12,STK_PARM(r13)(r1)    /* put arg9 in R12 */
+
+	HVSC				/* invoke the hypervisor */
+
+	mr	r0,r12
+	ld	r12,STK_PARM(r4)(r1)
+	std	r4,  0(r12)
+	std	r5,  8(r12)
+	std	r6, 16(r12)
+	std	r7, 24(r12)
+	std	r8, 32(r12)
+	std	r9, 40(r12)
+	std	r10,48(r12)
+	std	r11,56(r12)
+	std	r0, 64(r12)
+
+	lwz	r0,8(r1)
+	mtcrf	0xff,r0
+
+	blr				/* return r3 = status */
Index: linux-2.6-ozlabs/arch/powerpc/platforms/pseries/plpar_wrappers.h
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/platforms/pseries/plpar_wrappers.h
+++ linux-2.6-ozlabs/arch/powerpc/platforms/pseries/plpar_wrappers.h
@@ -191,6 +191,24 @@ static inline long plpar_pte_read_raw(un
 	return rc;
 }
 
+/*
+ * plpar_pte_read_4_raw can be called in real mode.
+ * ptes must be 8*sizeof(unsigned long)
+ */
+static inline long plpar_pte_read_4_raw(unsigned long flags, unsigned long ptex,
+					unsigned long *ptes)
+
+{
+	long rc;
+	unsigned long retbuf[PLPAR_HCALL9_BUFSIZE];
+
+	rc = plpar_hcall9_raw(H_READ, retbuf, flags | H_READ_4, ptex);
+
+	memcpy(ptes, retbuf, 8*sizeof(unsigned long));
+
+	return rc;
+}
+
 static inline long plpar_pte_protect(unsigned long flags, unsigned long ptex,
 		unsigned long avpn)
 {

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 2/2] powerpc,kexec: Speedup kexec hpte tear down
  2010-05-11  6:28 [PATCH 1/2] powerpc: Add hcall to read 4 ptes at a time in real mode Michael Neuling
@ 2010-05-11  6:28 ` Michael Neuling
  2010-05-11  7:04   ` Michael Ellerman
  0 siblings, 1 reply; 9+ messages in thread
From: Michael Neuling @ 2010-05-11  6:28 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, kexec, Anton Blanchard

Currently for kexec the PTE tear down on 1TB segment systems normally
requires 3 hcalls for each PTE removal. On a machine with 32GB of
memory it can take around a minute to remove all the PTEs.

This optimises the path so that we only remove PTEs that are valid.
It also uses the read 4 PTEs at once HCALL.  For the common case where
a PTEs is invalid in a 1TB segment, this turns the 3 HCALLs per PTE
down to 1 HCALL per 4 PTEs.

This gives an > 10x speedup in kexec times on PHYP, taking a 32GB
machine from around 1 minute down to a few seconds.

Signed-off-by: Michael Neuling <mikey@neuling.org>
---

 arch/powerpc/platforms/pseries/lpar.c |   33 ++++++++++++++++++++-------------
 1 file changed, 20 insertions(+), 13 deletions(-)

Index: linux-2.6-ozlabs/arch/powerpc/platforms/pseries/lpar.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/platforms/pseries/lpar.c
+++ linux-2.6-ozlabs/arch/powerpc/platforms/pseries/lpar.c
@@ -367,21 +367,28 @@ static void pSeries_lpar_hptab_clear(voi
 {
 	unsigned long size_bytes = 1UL << ppc64_pft_size;
 	unsigned long hpte_count = size_bytes >> 4;
-	unsigned long dummy1, dummy2, dword0;
+	struct {
+		unsigned long pteh;
+		unsigned long ptel;
+	} ptes[4];
 	long lpar_rc;
-	int i;
+	int i, j;
 
-	/* TODO: Use bulk call */
-	for (i = 0; i < hpte_count; i++) {
-		/* dont remove HPTEs with VRMA mappings */
-		lpar_rc = plpar_pte_remove_raw(H_ANDCOND, i, HPTE_V_1TB_SEG,
-						&dummy1, &dummy2);
-		if (lpar_rc == H_NOT_FOUND) {
-			lpar_rc = plpar_pte_read_raw(0, i, &dword0, &dummy1);
-			if (!lpar_rc && ((dword0 & HPTE_V_VRMA_MASK)
-				!= HPTE_V_VRMA_MASK))
-				/* Can be hpte for 1TB Seg. So remove it */
-				plpar_pte_remove_raw(0, i, 0, &dummy1, &dummy2);
+	/* Read in batches of 4,
+	 * invalidate only valid entries not in the VRMA
+	 * hpte_count will be a multiple of 4
+         */
+	for (i = 0; i < hpte_count; i += 4) {
+		lpar_rc = plpar_pte_read_4_raw(0, i, (void *)ptes);
+		if (lpar_rc != H_SUCCESS)
+			continue;
+		for (j = 0; j < 4; j++){
+			if ((ptes[j].pteh & HPTE_V_VRMA_MASK) ==
+				HPTE_V_VRMA_MASK)
+				continue;
+			if (ptes[j].pteh & HPTE_V_VALID)
+				plpar_pte_remove_raw(0, i + j, 0,
+					&(ptes[j].pteh), &(ptes[j].ptel));
 		}
 	}
 }

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] powerpc,kexec: Speedup kexec hpte tear down
  2010-05-11  6:28 ` [PATCH 2/2] powerpc,kexec: Speedup kexec hpte tear down Michael Neuling
@ 2010-05-11  7:04   ` Michael Ellerman
  2010-05-11 23:29     ` Michael Neuling
  0 siblings, 1 reply; 9+ messages in thread
From: Michael Ellerman @ 2010-05-11  7:04 UTC (permalink / raw)
  To: Michael Neuling; +Cc: kexec, Anton Blanchard, linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 1026 bytes --]

On Tue, 2010-05-11 at 16:28 +1000, Michael Neuling wrote:
> Currently for kexec the PTE tear down on 1TB segment systems normally
> requires 3 hcalls for each PTE removal. On a machine with 32GB of
> memory it can take around a minute to remove all the PTEs.
> 
..
> -	/* TODO: Use bulk call */

...
> +	/* Read in batches of 4,
> +	 * invalidate only valid entries not in the VRMA
> +	 * hpte_count will be a multiple of 4
> +         */
> +	for (i = 0; i < hpte_count; i += 4) {
> +		lpar_rc = plpar_pte_read_4_raw(0, i, (void *)ptes);
> +		if (lpar_rc != H_SUCCESS)
> +			continue;
> +		for (j = 0; j < 4; j++){
> +			if ((ptes[j].pteh & HPTE_V_VRMA_MASK) ==
> +				HPTE_V_VRMA_MASK)
> +				continue;
> +			if (ptes[j].pteh & HPTE_V_VALID)
> +				plpar_pte_remove_raw(0, i + j, 0,
> +					&(ptes[j].pteh), &(ptes[j].ptel));
>  		}

Have you tried using the bulk remove call, if none of the HPTEs are for
the VRMA? Rumour was it was slower/the-same, but that may have been
apocryphal.

cheers

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] powerpc,kexec: Speedup kexec hpte tear down
  2010-05-11  7:04   ` Michael Ellerman
@ 2010-05-11 23:29     ` Michael Neuling
  2010-05-12  0:36       ` Michael Ellerman
  0 siblings, 1 reply; 9+ messages in thread
From: Michael Neuling @ 2010-05-11 23:29 UTC (permalink / raw)
  To: michael; +Cc: kexec, Anton Blanchard, linuxppc-dev



In message <1273561463.9209.138.camel@concordia> you wrote:
> 
> --=-S056dRzmrEHDBzKyyTOs
> Content-Type: text/plain; charset="UTF-8"
> Content-Transfer-Encoding: quoted-printable
> 
> On Tue, 2010-05-11 at 16:28 +1000, Michael Neuling wrote:
> > Currently for kexec the PTE tear down on 1TB segment systems normally
> > requires 3 hcalls for each PTE removal. On a machine with 32GB of
> > memory it can take around a minute to remove all the PTEs.
> >=20
> ..
> > -	/* TODO: Use bulk call */
> 
> ...
> > +	/* Read in batches of 4,
> > +	 * invalidate only valid entries not in the VRMA
> > +	 * hpte_count will be a multiple of 4
> > +         */
> > +	for (i =3D 0; i < hpte_count; i +=3D 4) {
> > +		lpar_rc =3D plpar_pte_read_4_raw(0, i, (void *)ptes);
> > +		if (lpar_rc !=3D H_SUCCESS)
> > +			continue;
> > +		for (j =3D 0; j < 4; j++){
> > +			if ((ptes[j].pteh & HPTE_V_VRMA_MASK) =3D=3D
> > +				HPTE_V_VRMA_MASK)
> > +				continue;
> > +			if (ptes[j].pteh & HPTE_V_VALID)
> > +				plpar_pte_remove_raw(0, i + j, 0,
> > +					&(ptes[j].pteh), &(ptes[j].ptel));
> >  		}
> 
> Have you tried using the bulk remove call, if none of the HPTEs are for
> the VRMA? Rumour was it was slower/the-same, but that may have been
> apocryphal.

No, I didn't try it.

I think the real solution is to ask FW for a new call to do it all for
us.

Mikey

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] powerpc,kexec: Speedup kexec hpte tear down
  2010-05-11 23:29     ` Michael Neuling
@ 2010-05-12  0:36       ` Michael Ellerman
  2010-05-12  0:43         ` Michael Neuling
  0 siblings, 1 reply; 9+ messages in thread
From: Michael Ellerman @ 2010-05-12  0:36 UTC (permalink / raw)
  To: Michael Neuling; +Cc: kexec, Anton Blanchard, linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 1625 bytes --]

On Wed, 2010-05-12 at 09:29 +1000, Michael Neuling wrote:
> 
> In message <1273561463.9209.138.camel@concordia> you wrote:
> > 
> > --=-S056dRzmrEHDBzKyyTOs
> > Content-Type: text/plain; charset="UTF-8"
> > Content-Transfer-Encoding: quoted-printable
> > 
> > On Tue, 2010-05-11 at 16:28 +1000, Michael Neuling wrote:
> > > Currently for kexec the PTE tear down on 1TB segment systems normally
> > > requires 3 hcalls for each PTE removal. On a machine with 32GB of
> > > memory it can take around a minute to remove all the PTEs.
> > >=20
> > ..
> > > -	/* TODO: Use bulk call */
> > 
> > ...
> > > +	/* Read in batches of 4,
> > > +	 * invalidate only valid entries not in the VRMA
> > > +	 * hpte_count will be a multiple of 4
> > > +         */
> > > +	for (i =3D 0; i < hpte_count; i +=3D 4) {
> > > +		lpar_rc =3D plpar_pte_read_4_raw(0, i, (void *)ptes);
> > > +		if (lpar_rc !=3D H_SUCCESS)
> > > +			continue;
> > > +		for (j =3D 0; j < 4; j++){
> > > +			if ((ptes[j].pteh & HPTE_V_VRMA_MASK) =3D=3D
> > > +				HPTE_V_VRMA_MASK)
> > > +				continue;
> > > +			if (ptes[j].pteh & HPTE_V_VALID)
> > > +				plpar_pte_remove_raw(0, i + j, 0,
> > > +					&(ptes[j].pteh), &(ptes[j].ptel));
> > >  		}
> > 
> > Have you tried using the bulk remove call, if none of the HPTEs are for
> > the VRMA? Rumour was it was slower/the-same, but that may have been
> > apocryphal.
> 
> No, I didn't try it.
> 
> I think the real solution is to ask FW for a new call to do it all for
> us.

Sure, you could theoretically still get a 4x speedup though by using the
bulk remove.

cheers

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] powerpc,kexec: Speedup kexec hpte tear down
  2010-05-12  0:36       ` Michael Ellerman
@ 2010-05-12  0:43         ` Michael Neuling
  2010-05-12  1:00           ` Paul Mackerras
  0 siblings, 1 reply; 9+ messages in thread
From: Michael Neuling @ 2010-05-12  0:43 UTC (permalink / raw)
  To: michael; +Cc: kexec, Anton Blanchard, linuxppc-dev



In message <1273624565.5738.8.camel@concordia> you wrote:
> 
> --=-wnrJa93KBardFtse2eHB
> Content-Type: text/plain; charset="UTF-8"
> Content-Transfer-Encoding: quoted-printable
> 
> On Wed, 2010-05-12 at 09:29 +1000, Michael Neuling wrote:
> >=20
> > In message <1273561463.9209.138.camel@concordia> you wrote:
> > >=20
> > > --=3D-S056dRzmrEHDBzKyyTOs
> > > Content-Type: text/plain; charset=3D"UTF-8"
> > > Content-Transfer-Encoding: quoted-printable
> > >=20
> > > On Tue, 2010-05-11 at 16:28 +1000, Michael Neuling wrote:
> > > > Currently for kexec the PTE tear down on 1TB segment systems normally
> > > > requires 3 hcalls for each PTE removal. On a machine with 32GB of
> > > > memory it can take around a minute to remove all the PTEs.
> > > >=3D20
> > > ..
> > > > -	/* TODO: Use bulk call */
> > >=20
> > > ...
> > > > +	/* Read in batches of 4,
> > > > +	 * invalidate only valid entries not in the VRMA
> > > > +	 * hpte_count will be a multiple of 4
> > > > +         */
> > > > +	for (i =3D3D 0; i < hpte_count; i +=3D3D 4) {
> > > > +		lpar_rc =3D3D plpar_pte_read_4_raw(0, i, (void *)ptes);
> > > > +		if (lpar_rc !=3D3D H_SUCCESS)
> > > > +			continue;
> > > > +		for (j =3D3D 0; j < 4; j++){
> > > > +			if ((ptes[j].pteh & HPTE_V_VRMA_MASK) =3D3D=3D3
D
> > > > +				HPTE_V_VRMA_MASK)
> > > > +				continue;
> > > > +			if (ptes[j].pteh & HPTE_V_VALID)
> > > > +				plpar_pte_remove_raw(0, i + j, 0,
> > > > +					&(ptes[j].pteh), &(ptes[j].ptel
));
> > > >  		}
> > >=20
> > > Have you tried using the bulk remove call, if none of the HPTEs are for
> > > the VRMA? Rumour was it was slower/the-same, but that may have been
> > > apocryphal.
> >=20
> > No, I didn't try it.
> >=20
> > I think the real solution is to ask FW for a new call to do it all for
> > us.
> 
> Sure, you could theoretically still get a 4x speedup though by using the
> bulk remove.

We probably only do the remove on < 1% of the hptes now.  So I doubt we
would get a speedup since most of the time we aren't do the remove
anymore.

Mikey

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] powerpc,kexec: Speedup kexec hpte tear down
  2010-05-12  0:43         ` Michael Neuling
@ 2010-05-12  1:00           ` Paul Mackerras
  2010-05-12  1:06             ` Michael Neuling
  0 siblings, 1 reply; 9+ messages in thread
From: Paul Mackerras @ 2010-05-12  1:00 UTC (permalink / raw)
  To: Michael Neuling; +Cc: linuxppc-dev, kexec, Anton Blanchard

On Wed, May 12, 2010 at 10:43:08AM +1000, Michael Neuling wrote:

> We probably only do the remove on < 1% of the hptes now.  So I doubt we
> would get a speedup since most of the time we aren't do the remove
> anymore.

It would be nice to have some actual numbers.  Could you add some
counters and print the results at the end?  (Or don't you have any
way to print things at that stage?)

Paul.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] powerpc,kexec: Speedup kexec hpte tear down
  2010-05-12  1:00           ` Paul Mackerras
@ 2010-05-12  1:06             ` Michael Neuling
  2010-05-12  1:36               ` Michael Ellerman
  0 siblings, 1 reply; 9+ messages in thread
From: Michael Neuling @ 2010-05-12  1:06 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev, kexec, Anton Blanchard

> > We probably only do the remove on < 1% of the hptes now.  So I doubt we
> > would get a speedup since most of the time we aren't do the remove
> > anymore.
> 
> It would be nice to have some actual numbers.  Could you add some
> counters and print the results at the end?  (Or don't you have any
> way to print things at that stage?)

Printing is hard at that point but I think we can do it.  I'll try to
when I get some time.

A heavily loaded system which kdumps will need a lot more hptes removes
than kexec, so stats for both these cases might be useful also.

Mikey

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] powerpc,kexec: Speedup kexec hpte tear down
  2010-05-12  1:06             ` Michael Neuling
@ 2010-05-12  1:36               ` Michael Ellerman
  0 siblings, 0 replies; 9+ messages in thread
From: Michael Ellerman @ 2010-05-12  1:36 UTC (permalink / raw)
  To: Michael Neuling; +Cc: kexec, Paul Mackerras, Anton Blanchard, linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 922 bytes --]

On Wed, 2010-05-12 at 11:06 +1000, Michael Neuling wrote:
> > > We probably only do the remove on < 1% of the hptes now.  So I doubt we
> > > would get a speedup since most of the time we aren't do the remove
> > > anymore.
> > 
> > It would be nice to have some actual numbers.  Could you add some
> > counters and print the results at the end?  (Or don't you have any
> > way to print things at that stage?)
> 
> Printing is hard at that point but I think we can do it.  I'll try to
> when I get some time.

A version of udbg_putcLP() that uses a raw hcall should work, or there
was code added to purgatory recently to print to the HV console which
you could nick.

> A heavily loaded system which kdumps will need a lot more hptes removes
> than kexec, so stats for both these cases might be useful also.

Yeah, or even a system that is kexec'ed while lots of stuff is still
running.

cheers



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2010-05-12  1:36 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-11  6:28 [PATCH 1/2] powerpc: Add hcall to read 4 ptes at a time in real mode Michael Neuling
2010-05-11  6:28 ` [PATCH 2/2] powerpc,kexec: Speedup kexec hpte tear down Michael Neuling
2010-05-11  7:04   ` Michael Ellerman
2010-05-11 23:29     ` Michael Neuling
2010-05-12  0:36       ` Michael Ellerman
2010-05-12  0:43         ` Michael Neuling
2010-05-12  1:00           ` Paul Mackerras
2010-05-12  1:06             ` Michael Neuling
2010-05-12  1:36               ` Michael Ellerman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).