linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/13] idle and soft-irq improvements and POWER9 idle optimisation
@ 2017-08-05 17:02 Nicholas Piggin
  2017-08-05 17:02 ` [PATCH 01/13] powerpc/64s: masked interrupt avoid branch Nicholas Piggin
                   ` (12 more replies)
  0 siblings, 13 replies; 29+ messages in thread
From: Nicholas Piggin @ 2017-08-05 17:02 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin

The first 9 patches are a bunch of independent small cleanups
and improvements I've collected. The next 4 are improvements to
POWER9 idle state entry and exit. They significantly change how
we enter idle, now in virtual mode and without ptesync. EC=0 idle
is significantly simplified from there, and does not save anything.

This runs fine on a DD1 test machine, but the hardware has some
problem with performance all over the place, so I'm not able to get
anything meaningful there yet.

Thanks,
Nick

Nicholas Piggin (13):
  powerpc/64s: masked interrupt avoid branch
  powerpc/64s: masked interrupt avoid instruction
  powerpc/64s: masked interrupt returns to kernel so avoid r13 restore
  powerpc/64: cleanup __check_irq_replay
  powerpc/64s: irq replay merge HV and non-HV paths for doorbell replay
  powerpc/64s: irq replay external use the HV handler in HV mode on
    POWER9
  powerpc/64: remove redundant instruction in interrupt replay
  powerpc/64s: irq replay remove spurious irq reason
  powerpc/64: runlatch CTRL[RUN] set optimisation
  powerpc/64s: idle simplify KVM idle on POWER9
  powerpc/64s: idle POWER9 can execute stop without ptesync
  powerpc/64s: idle POWER9 can execute stop in virtual mode
  powerpc/64s: idle ESL=0 stop can avoid all save/restore overhead

 arch/powerpc/include/asm/kvm_book3s_asm.h |  4 +++
 arch/powerpc/kernel/entry_64.S            |  7 +---
 arch/powerpc/kernel/exceptions-64s.S      | 19 +++++-----
 arch/powerpc/kernel/idle_book3s.S         | 60 ++++++++++++++-----------------
 arch/powerpc/kernel/irq.c                 | 47 ++++++++++++------------
 arch/powerpc/kernel/process.c             | 35 +++++++++++++-----
 arch/powerpc/kvm/book3s_hv.c              | 37 ++++++++++++++++---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   | 32 ++++++++++-------
 8 files changed, 143 insertions(+), 98 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 01/13] powerpc/64s: masked interrupt avoid branch
  2017-08-05 17:02 [PATCH 00/13] idle and soft-irq improvements and POWER9 idle optimisation Nicholas Piggin
@ 2017-08-05 17:02 ` Nicholas Piggin
  2017-08-05 17:02 ` [PATCH 02/13] powerpc/64s: masked interrupt avoid instruction Nicholas Piggin
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 29+ messages in thread
From: Nicholas Piggin @ 2017-08-05 17:02 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin

Interrupts which do not require EE to be cleared can all
be tested with a single bitwise test.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/exceptions-64s.S | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index f14f3c04ec7e..f8ad3f0eb383 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1370,10 +1370,8 @@ masked_##_H##interrupt:					\
 	ori	r10,r10,0xffff;				\
 	mtspr	SPRN_DEC,r10;				\
 	b	MASKED_DEC_HANDLER_LABEL;		\
-1:	cmpwi	r10,PACA_IRQ_DBELL;			\
-	beq	2f;					\
-	cmpwi	r10,PACA_IRQ_HMI;			\
-	beq	2f;					\
+1:	andi.	r10,r10,(PACA_IRQ_DBELL|PACA_IRQ_HMI);	\
+	bne	2f;					\
 	mfspr	r10,SPRN_##_H##SRR1;			\
 	rldicl	r10,r10,48,1; /* clear MSR_EE */	\
 	rotldi	r10,r10,16;				\
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 02/13] powerpc/64s: masked interrupt avoid instruction
  2017-08-05 17:02 [PATCH 00/13] idle and soft-irq improvements and POWER9 idle optimisation Nicholas Piggin
  2017-08-05 17:02 ` [PATCH 01/13] powerpc/64s: masked interrupt avoid branch Nicholas Piggin
@ 2017-08-05 17:02 ` Nicholas Piggin
  2017-08-05 17:02 ` [PATCH 03/13] powerpc/64s: masked interrupt returns to kernel so avoid r13 restore Nicholas Piggin
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 29+ messages in thread
From: Nicholas Piggin @ 2017-08-05 17:02 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin

EE is always enabled in SRR1 for masked interrupts, so clearing
it can use xor.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/exceptions-64s.S | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index f8ad3f0eb383..c4f50a9e2ab5 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1373,8 +1373,7 @@ masked_##_H##interrupt:					\
 1:	andi.	r10,r10,(PACA_IRQ_DBELL|PACA_IRQ_HMI);	\
 	bne	2f;					\
 	mfspr	r10,SPRN_##_H##SRR1;			\
-	rldicl	r10,r10,48,1; /* clear MSR_EE */	\
-	rotldi	r10,r10,16;				\
+	xori	r10,r10,MSR_EE; /* clear MSR_EE */	\
 	mtspr	SPRN_##_H##SRR1,r10;			\
 2:	mtcrf	0x80,r9;				\
 	ld	r9,PACA_EXGEN+EX_R9(r13);		\
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 03/13] powerpc/64s: masked interrupt returns to kernel so avoid r13 restore
  2017-08-05 17:02 [PATCH 00/13] idle and soft-irq improvements and POWER9 idle optimisation Nicholas Piggin
  2017-08-05 17:02 ` [PATCH 01/13] powerpc/64s: masked interrupt avoid branch Nicholas Piggin
  2017-08-05 17:02 ` [PATCH 02/13] powerpc/64s: masked interrupt avoid instruction Nicholas Piggin
@ 2017-08-05 17:02 ` Nicholas Piggin
  2017-08-05 17:02 ` [PATCH 04/13] powerpc/64: cleanup __check_irq_replay Nicholas Piggin
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 29+ messages in thread
From: Nicholas Piggin @ 2017-08-05 17:02 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin

Places in the kernel where r13 is not the PACA pointer must have
maskable interrupts disabled, so r13 does not have to be restored
when returning from a soft-masked interrupt.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/exceptions-64s.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index c4f50a9e2ab5..67321be3122c 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1379,7 +1379,7 @@ masked_##_H##interrupt:					\
 	ld	r9,PACA_EXGEN+EX_R9(r13);		\
 	ld	r10,PACA_EXGEN+EX_R10(r13);		\
 	ld	r11,PACA_EXGEN+EX_R11(r13);		\
-	GET_SCRATCH0(r13);				\
+	/* returns to kernel where r13 must be set up, so don't restore it */ \
 	##_H##rfid;					\
 	b	.;					\
 	MASKED_DEC_HANDLER(_H)
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 04/13] powerpc/64: cleanup __check_irq_replay
  2017-08-05 17:02 [PATCH 00/13] idle and soft-irq improvements and POWER9 idle optimisation Nicholas Piggin
                   ` (2 preceding siblings ...)
  2017-08-05 17:02 ` [PATCH 03/13] powerpc/64s: masked interrupt returns to kernel so avoid r13 restore Nicholas Piggin
@ 2017-08-05 17:02 ` Nicholas Piggin
  2017-08-05 17:02 ` [PATCH 05/13] powerpc/64s: irq replay merge HV and non-HV paths for doorbell replay Nicholas Piggin
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 29+ messages in thread
From: Nicholas Piggin @ 2017-08-05 17:02 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin

Move the clearing of irq_happened bits into the condition where
they were found to be set. This reduces instruction count slightly,
and reduces stores into irq_happened.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/irq.c | 45 +++++++++++++++++++++++----------------------
 1 file changed, 23 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index f291f7826abc..7c46e0cce054 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -143,9 +143,10 @@ notrace unsigned int __check_irq_replay(void)
 	 */
 	unsigned char happened = local_paca->irq_happened;
 
-	/* Clear bit 0 which we wouldn't clear otherwise */
-	local_paca->irq_happened &= ~PACA_IRQ_HARD_DIS;
 	if (happened & PACA_IRQ_HARD_DIS) {
+		/* Clear bit 0 which we wouldn't clear otherwise */
+		local_paca->irq_happened &= ~PACA_IRQ_HARD_DIS;
+
 		/*
 		 * We may have missed a decrementer interrupt if hard disabled.
 		 * Check the decrementer register in case we had a rollover
@@ -173,39 +174,39 @@ notrace unsigned int __check_irq_replay(void)
 	 * This is a higher priority interrupt than the others, so
 	 * replay it first.
 	 */
-	local_paca->irq_happened &= ~PACA_IRQ_HMI;
-	if (happened & PACA_IRQ_HMI)
+	if (happened & PACA_IRQ_HMI) {
+		local_paca->irq_happened &= ~PACA_IRQ_HMI;
 		return 0xe60;
+	}
 
-	/*
-	 * We may have missed a decrementer interrupt. We check the
-	 * decrementer itself rather than the paca irq_happened field
-	 * in case we also had a rollover while hard disabled
-	 */
-	local_paca->irq_happened &= ~PACA_IRQ_DEC;
-	if (happened & PACA_IRQ_DEC)
+	if (happened & PACA_IRQ_DEC) {
+		local_paca->irq_happened &= ~PACA_IRQ_DEC;
 		return 0x900;
+	}
 
-	/* Finally check if an external interrupt happened */
-	local_paca->irq_happened &= ~PACA_IRQ_EE;
-	if (happened & PACA_IRQ_EE)
+	if (happened & PACA_IRQ_EE) {
+		local_paca->irq_happened &= ~PACA_IRQ_EE;
 		return 0x500;
+	}
 
 #ifdef CONFIG_PPC_BOOK3E
-	/* Finally check if an EPR external interrupt happened
-	 * this bit is typically set if we need to handle another
-	 * "edge" interrupt from within the MPIC "EPR" handler
+	/*
+	 * Check if an EPR external interrupt happened this bit is typically
+	 * set if we need to handle another "edge" interrupt from within the
+	 * MPIC "EPR" handler.
 	 */
-	local_paca->irq_happened &= ~PACA_IRQ_EE_EDGE;
-	if (happened & PACA_IRQ_EE_EDGE)
+	if (happened & PACA_IRQ_EE_EDGE) {
+		local_paca->irq_happened &= ~PACA_IRQ_EE_EDGE;
 		return 0x500;
+	}
 
-	local_paca->irq_happened &= ~PACA_IRQ_DBELL;
-	if (happened & PACA_IRQ_DBELL)
+	if (happened & PACA_IRQ_DBELL) {
+		local_paca->irq_happened &= ~PACA_IRQ_DBELL;
 		return 0x280;
+	}
 #else
-	local_paca->irq_happened &= ~PACA_IRQ_DBELL;
 	if (happened & PACA_IRQ_DBELL) {
+		local_paca->irq_happened &= ~PACA_IRQ_DBELL;
 		if (cpu_has_feature(CPU_FTR_HVMODE))
 			return 0xe80;
 		return 0xa00;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 05/13] powerpc/64s: irq replay merge HV and non-HV paths for doorbell replay
  2017-08-05 17:02 [PATCH 00/13] idle and soft-irq improvements and POWER9 idle optimisation Nicholas Piggin
                   ` (3 preceding siblings ...)
  2017-08-05 17:02 ` [PATCH 04/13] powerpc/64: cleanup __check_irq_replay Nicholas Piggin
@ 2017-08-05 17:02 ` Nicholas Piggin
  2017-08-05 17:02 ` [PATCH 06/13] powerpc/64s: irq replay external use the HV handler in HV mode on POWER9 Nicholas Piggin
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 29+ messages in thread
From: Nicholas Piggin @ 2017-08-05 17:02 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin

This results in smaller code, and fewer branches.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/entry_64.S       | 6 +-----
 arch/powerpc/kernel/exceptions-64s.S | 2 +-
 arch/powerpc/kernel/irq.c            | 2 --
 3 files changed, 2 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 49d8422767b4..ec67f67dafab 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -990,11 +990,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 #ifdef CONFIG_PPC_BOOK3E
 	cmpwi	cr0,r3,0x280
 #else
-	BEGIN_FTR_SECTION
-		cmpwi	cr0,r3,0xe80
-	FTR_SECTION_ELSE
-		cmpwi	cr0,r3,0xa00
-	ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE)
+	cmpwi	cr0,r3,0xa00
 #endif /* CONFIG_PPC_BOOK3E */
 	bne	1f
 	addi	r3,r1,STACK_FRAME_OVERHEAD;
diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index 67321be3122c..f9d0796fb2c9 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1674,7 +1674,7 @@ _GLOBAL(__replay_interrupt)
 	cmpwi	r3,0x500
 	beq	hardware_interrupt_common
 BEGIN_FTR_SECTION
-	cmpwi	r3,0xe80
+	cmpwi	r3,0xa00
 	beq	h_doorbell_common_msgclr
 	cmpwi	r3,0xea0
 	beq	h_virt_irq_common
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 7c46e0cce054..60ee6d7251b8 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -207,8 +207,6 @@ notrace unsigned int __check_irq_replay(void)
 #else
 	if (happened & PACA_IRQ_DBELL) {
 		local_paca->irq_happened &= ~PACA_IRQ_DBELL;
-		if (cpu_has_feature(CPU_FTR_HVMODE))
-			return 0xe80;
 		return 0xa00;
 	}
 #endif /* CONFIG_PPC_BOOK3E */
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 06/13] powerpc/64s: irq replay external use the HV handler in HV mode on POWER9
  2017-08-05 17:02 [PATCH 00/13] idle and soft-irq improvements and POWER9 idle optimisation Nicholas Piggin
                   ` (4 preceding siblings ...)
  2017-08-05 17:02 ` [PATCH 05/13] powerpc/64s: irq replay merge HV and non-HV paths for doorbell replay Nicholas Piggin
@ 2017-08-05 17:02 ` Nicholas Piggin
  2017-08-05 17:02 ` [PATCH 07/13] powerpc/64: remove redundant instruction in interrupt replay Nicholas Piggin
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 29+ messages in thread
From: Nicholas Piggin @ 2017-08-05 17:02 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin

POWER9 host external interrupts use the h_virt_irq_common handler, so
use that to replay them rather than using the hardware_interrupt_common
handler. Both call do_IRQ, but using the correct handler reduces i-cache
footprint.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/exceptions-64s.S | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index f9d0796fb2c9..29253cecf713 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1672,7 +1672,11 @@ _GLOBAL(__replay_interrupt)
 	cmpwi	r3,0x900
 	beq	decrementer_common
 	cmpwi	r3,0x500
+BEGIN_FTR_SECTION
+	beq	h_virt_irq_common
+FTR_SECTION_ELSE
 	beq	hardware_interrupt_common
+ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_300)
 BEGIN_FTR_SECTION
 	cmpwi	r3,0xa00
 	beq	h_doorbell_common_msgclr
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 07/13] powerpc/64: remove redundant instruction in interrupt replay
  2017-08-05 17:02 [PATCH 00/13] idle and soft-irq improvements and POWER9 idle optimisation Nicholas Piggin
                   ` (5 preceding siblings ...)
  2017-08-05 17:02 ` [PATCH 06/13] powerpc/64s: irq replay external use the HV handler in HV mode on POWER9 Nicholas Piggin
@ 2017-08-05 17:02 ` Nicholas Piggin
  2017-08-05 17:02 ` [PATCH 08/13] powerpc/64s: irq replay remove spurious irq reason Nicholas Piggin
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 29+ messages in thread
From: Nicholas Piggin @ 2017-08-05 17:02 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/entry_64.S | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index ec67f67dafab..3f2666d24a7e 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -995,7 +995,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 	bne	1f
 	addi	r3,r1,STACK_FRAME_OVERHEAD;
 	bl	doorbell_exception
-	b	ret_from_except
 #endif /* CONFIG_PPC_DOORBELL */
 1:	b	ret_from_except /* What else to do here ? */
  
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 08/13] powerpc/64s: irq replay remove spurious irq reason
  2017-08-05 17:02 [PATCH 00/13] idle and soft-irq improvements and POWER9 idle optimisation Nicholas Piggin
                   ` (6 preceding siblings ...)
  2017-08-05 17:02 ` [PATCH 07/13] powerpc/64: remove redundant instruction in interrupt replay Nicholas Piggin
@ 2017-08-05 17:02 ` Nicholas Piggin
  2017-08-05 23:00   ` Benjamin Herrenschmidt
  2017-08-05 17:02 ` [PATCH 09/13] powerpc/64: runlatch CTRL[RUN] set optimisation Nicholas Piggin
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 29+ messages in thread
From: Nicholas Piggin @ 2017-08-05 17:02 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin

HVI interrupts have always used 0x500, so remove the dead branch.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/exceptions-64s.S | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index 29253cecf713..566cf126c13b 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1680,8 +1680,6 @@ ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_300)
 BEGIN_FTR_SECTION
 	cmpwi	r3,0xa00
 	beq	h_doorbell_common_msgclr
-	cmpwi	r3,0xea0
-	beq	h_virt_irq_common
 	cmpwi	r3,0xe60
 	beq	hmi_exception_common
 FTR_SECTION_ELSE
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 09/13] powerpc/64: runlatch CTRL[RUN] set optimisation
  2017-08-05 17:02 [PATCH 00/13] idle and soft-irq improvements and POWER9 idle optimisation Nicholas Piggin
                   ` (7 preceding siblings ...)
  2017-08-05 17:02 ` [PATCH 08/13] powerpc/64s: irq replay remove spurious irq reason Nicholas Piggin
@ 2017-08-05 17:02 ` Nicholas Piggin
  2017-08-05 17:02 ` [PATCH 10/13] powerpc/64s: idle simplify KVM idle on POWER9 Nicholas Piggin
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 29+ messages in thread
From: Nicholas Piggin @ 2017-08-05 17:02 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin

The CTRL register is read-only except bit 63 which is the run latch
control. This means it can be updated with a mtspr rather than
mfspr/mtspr.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/process.c | 35 +++++++++++++++++++++++++++--------
 1 file changed, 27 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 9f3e2c932dcc..75306b6e1812 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1994,11 +1994,25 @@ void show_stack(struct task_struct *tsk, unsigned long *stack)
 void notrace __ppc64_runlatch_on(void)
 {
 	struct thread_info *ti = current_thread_info();
-	unsigned long ctrl;
 
-	ctrl = mfspr(SPRN_CTRLF);
-	ctrl |= CTRL_RUNLATCH;
-	mtspr(SPRN_CTRLT, ctrl);
+	if (cpu_has_feature(CPU_FTR_ARCH_206)) {
+		/*
+		 * Least significant bit (RUN) is the only writable bit of
+		 * the CTRL register, so we can avoid mfspr. 2.06 is not the
+		 * earliest ISA where this is the case, but it's convenient.
+		 */
+		mtspr(SPRN_CTRLT, CTRL_RUNLATCH);
+	} else {
+		unsigned long ctrl;
+
+		/*
+		 * Some architectures (e.g., Cell) have writable fields other
+		 * than RUN, so do the read-modify-write.
+		 */
+		ctrl = mfspr(SPRN_CTRLF);
+		ctrl |= CTRL_RUNLATCH;
+		mtspr(SPRN_CTRLT, ctrl);
+	}
 
 	ti->local_flags |= _TLF_RUNLATCH;
 }
@@ -2007,13 +2021,18 @@ void notrace __ppc64_runlatch_on(void)
 void notrace __ppc64_runlatch_off(void)
 {
 	struct thread_info *ti = current_thread_info();
-	unsigned long ctrl;
 
 	ti->local_flags &= ~_TLF_RUNLATCH;
 
-	ctrl = mfspr(SPRN_CTRLF);
-	ctrl &= ~CTRL_RUNLATCH;
-	mtspr(SPRN_CTRLT, ctrl);
+	if (cpu_has_feature(CPU_FTR_ARCH_206)) {
+		mtspr(SPRN_CTRLT, 0);
+	} else {
+		unsigned long ctrl;
+
+		ctrl = mfspr(SPRN_CTRLF);
+		ctrl &= ~CTRL_RUNLATCH;
+		mtspr(SPRN_CTRLT, ctrl);
+	}
 }
 #endif /* CONFIG_PPC64 */
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 10/13] powerpc/64s: idle simplify KVM idle on POWER9
  2017-08-05 17:02 [PATCH 00/13] idle and soft-irq improvements and POWER9 idle optimisation Nicholas Piggin
                   ` (8 preceding siblings ...)
  2017-08-05 17:02 ` [PATCH 09/13] powerpc/64: runlatch CTRL[RUN] set optimisation Nicholas Piggin
@ 2017-08-05 17:02 ` Nicholas Piggin
  2017-08-08 10:36   ` Gautham R Shenoy
  2017-08-10 13:14   ` Michael Ellerman
  2017-08-05 17:02 ` [PATCH 11/13] powerpc/64s: idle POWER9 can execute stop without ptesync Nicholas Piggin
                   ` (2 subsequent siblings)
  12 siblings, 2 replies; 29+ messages in thread
From: Nicholas Piggin @ 2017-08-05 17:02 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin, Gautham R . Shenoy, Paul Mackerras, kvm-ppc

POWER9 CPUs have independent MMU contexts per thread so KVM
does not have to bring sibling threads into real-mode when
switching MMU mode to guest. This can simplify POWER9 sleep/wake
paths and avoids hwsyncs.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/include/asm/kvm_book3s_asm.h |  4 ++++
 arch/powerpc/kernel/idle_book3s.S         |  8 ++-----
 arch/powerpc/kvm/book3s_hv.c              | 37 ++++++++++++++++++++++++++-----
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   |  8 +++++++
 4 files changed, 46 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h b/arch/powerpc/include/asm/kvm_book3s_asm.h
index 7cea76f11c26..83596f32f50b 100644
--- a/arch/powerpc/include/asm/kvm_book3s_asm.h
+++ b/arch/powerpc/include/asm/kvm_book3s_asm.h
@@ -104,6 +104,10 @@ struct kvmppc_host_state {
 	u8 napping;
 
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+	/*
+	 * hwthread_req/hwthread_state pair is used to pull sibling threads
+	 * out of guest on pre-ISAv3.0B CPUs where threads share MMU.
+	 */
 	u8 hwthread_req;
 	u8 hwthread_state;
 	u8 host_ipi;
diff --git a/arch/powerpc/kernel/idle_book3s.S b/arch/powerpc/kernel/idle_book3s.S
index e6252c5a57a4..3ab73f9223e4 100644
--- a/arch/powerpc/kernel/idle_book3s.S
+++ b/arch/powerpc/kernel/idle_book3s.S
@@ -243,12 +243,6 @@ enter_winkle:
  * r3 - PSSCR value corresponding to the requested stop state.
  */
 power_enter_stop:
-#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
-	/* Tell KVM we're entering idle */
-	li	r4,KVM_HWTHREAD_IN_IDLE
-	/* DO THIS IN REAL MODE!  See comment above. */
-	stb	r4,HSTATE_HWTHREAD_STATE(r13)
-#endif
 /*
  * Check if we are executing the lite variant with ESL=EC=0
  */
@@ -435,6 +429,7 @@ ALT_FTR_SECTION_END_IFSET(CPU_FTR_ARCH_300)
 	mr	r3,r12
 
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+BEGIN_FTR_SECTION
 	li	r0,KVM_HWTHREAD_IN_KERNEL
 	stb	r0,HSTATE_HWTHREAD_STATE(r13)
 	/* Order setting hwthread_state vs. testing hwthread_req */
@@ -444,6 +439,7 @@ ALT_FTR_SECTION_END_IFSET(CPU_FTR_ARCH_300)
 	beq	1f
 	b	kvm_start_guest
 1:
+END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
 #endif
 
 	/* Return SRR1 from power7_nap() */
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 359c79cdf0cc..bb1ab14f963a 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -2111,6 +2111,16 @@ static int kvmppc_grab_hwthread(int cpu)
 	struct paca_struct *tpaca;
 	long timeout = 10000;
 
+	/*
+	 * ISA v3.0 idle routines do not set hwthread_state or test
+	 * hwthread_req, so they can not grab idle threads.
+	 */
+	if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+		WARN_ON(1);
+		pr_err("KVM: can not control sibling threads\n");
+		return -EBUSY;
+	}
+
 	tpaca = &paca[cpu];
 
 	/* Ensure the thread won't go into the kernel if it wakes */
@@ -2145,12 +2155,26 @@ static void kvmppc_release_hwthread(int cpu)
 	struct paca_struct *tpaca;
 
 	tpaca = &paca[cpu];
-	tpaca->kvm_hstate.hwthread_req = 0;
 	tpaca->kvm_hstate.kvm_vcpu = NULL;
 	tpaca->kvm_hstate.kvm_vcore = NULL;
 	tpaca->kvm_hstate.kvm_split_mode = NULL;
 }
 
+static void kvmppc_release_hwthread_secondary(int cpu)
+{
+	struct paca_struct *tpaca;
+
+	if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+		WARN_ON(1);
+		return;
+	}
+
+	tpaca = &paca[cpu];
+	tpaca->kvm_hstate.hwthread_req = 0;
+	kvmppc_release_hwthread(cpu);
+}
+
+
 static void radix_flush_cpu(struct kvm *kvm, int cpu, struct kvm_vcpu *vcpu)
 {
 	int i;
@@ -2274,7 +2298,7 @@ static int on_primary_thread(void)
 		if (kvmppc_grab_hwthread(cpu + thr)) {
 			/* Couldn't grab one; let the others go */
 			do {
-				kvmppc_release_hwthread(cpu + thr);
+				kvmppc_release_hwthread_secondary(cpu + thr);
 			} while (--thr > 0);
 			return 0;
 		}
@@ -2702,8 +2726,9 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc)
 			kvmppc_vcore_preempt(pvc);
 			spin_unlock(&pvc->lock);
 		}
-		for (i = 0; i < controlled_threads; ++i)
-			kvmppc_release_hwthread(pcpu + i);
+		for (i = 1; i < controlled_threads; ++i)
+			kvmppc_release_hwthread_secondary(pcpu + i);
+		kvmppc_release_hwthread(pcpu);
 		return;
 	}
 
@@ -2858,11 +2883,13 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc)
 
 	/* Let secondaries go back to the offline loop */
 	for (i = 0; i < controlled_threads; ++i) {
-		kvmppc_release_hwthread(pcpu + i);
 		if (sip && sip->napped[i])
 			kvmppc_ipi_thread(pcpu + i);
 		cpumask_clear_cpu(pcpu + i, &vc->kvm->arch.cpu_in_guest);
 	}
+	for (i = 1; i < controlled_threads; ++i)
+		kvmppc_release_hwthread_secondary(pcpu + i);
+	kvmppc_release_hwthread(pcpu);
 
 	spin_unlock(&vc->lock);
 
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index c52184a8efdf..3e024fd71fe8 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -149,9 +149,11 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 	subf	r4, r4, r3
 	mtspr	SPRN_DEC, r4
 
+BEGIN_FTR_SECTION
 	/* hwthread_req may have got set by cede or no vcpu, so clear it */
 	li	r0, 0
 	stb	r0, HSTATE_HWTHREAD_REQ(r13)
+END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
 
 	/*
 	 * For external interrupts we need to call the Linux
@@ -314,6 +316,7 @@ kvm_novcpu_exit:
  * Relocation is off and most register values are lost.
  * r13 points to the PACA.
  * r3 contains the SRR1 wakeup value, SRR1 is trashed.
+ * This is not used by ISAv3.0B processors.
  */
 	.globl	kvm_start_guest
 kvm_start_guest:
@@ -432,6 +435,9 @@ kvm_secondary_got_guest:
  * While waiting we also need to check if we get given a vcpu to run.
  */
 kvm_no_guest:
+BEGIN_FTR_SECTION
+	twi	31,0,0
+END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
 	lbz	r3, HSTATE_HWTHREAD_REQ(r13)
 	cmpwi	r3, 0
 	bne	53f
@@ -2509,8 +2515,10 @@ kvm_do_nap:
 	clrrdi	r0, r0, 1
 	mtspr	SPRN_CTRLT, r0
 
+BEGIN_FTR_SECTION
 	li	r0,1
 	stb	r0,HSTATE_HWTHREAD_REQ(r13)
+END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
 	mfspr	r5,SPRN_LPCR
 	ori	r5,r5,LPCR_PECE0 | LPCR_PECE1
 BEGIN_FTR_SECTION
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 11/13] powerpc/64s: idle POWER9 can execute stop without ptesync
  2017-08-05 17:02 [PATCH 00/13] idle and soft-irq improvements and POWER9 idle optimisation Nicholas Piggin
                   ` (9 preceding siblings ...)
  2017-08-05 17:02 ` [PATCH 10/13] powerpc/64s: idle simplify KVM idle on POWER9 Nicholas Piggin
@ 2017-08-05 17:02 ` Nicholas Piggin
  2017-08-08 10:37   ` Gautham R Shenoy
  2017-08-10 13:15   ` Michael Ellerman
  2017-08-05 17:02 ` [PATCH 12/13] powerpc/64s: idle POWER9 can execute stop in virtual mode Nicholas Piggin
  2017-08-05 17:02 ` [PATCH 13/13] powerpc/64s: idle ESL=0 stop can avoid all save/restore overhead Nicholas Piggin
  12 siblings, 2 replies; 29+ messages in thread
From: Nicholas Piggin @ 2017-08-05 17:02 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin, Gautham R . Shenoy

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/idle_book3s.S       |  7 ++++---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 24 ++++++++++++------------
 2 files changed, 16 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/kernel/idle_book3s.S b/arch/powerpc/kernel/idle_book3s.S
index 3ab73f9223e4..75746111e2c4 100644
--- a/arch/powerpc/kernel/idle_book3s.S
+++ b/arch/powerpc/kernel/idle_book3s.S
@@ -249,7 +249,7 @@ power_enter_stop:
 	andis.   r4,r3,PSSCR_EC_ESL_MASK_SHIFTED
 	clrldi   r3,r3,60 /* r3 = Bits[60:63] = Requested Level (RL) */
 	bne	 .Lhandle_esl_ec_set
-	IDLE_STATE_ENTER_SEQ(PPC_STOP)
+	PPC_STOP
 	li	r3,0  /* Since we didn't lose state, return 0 */
 
 	/*
@@ -282,7 +282,8 @@ power_enter_stop:
 	ld	r4,ADDROFF(pnv_first_deep_stop_state)(r5)
 	cmpd	r3,r4
 	bge	.Lhandle_deep_stop
-	IDLE_STATE_ENTER_SEQ_NORET(PPC_STOP)
+	PPC_STOP	/* Does not return (system reset interrupt) */
+
 .Lhandle_deep_stop:
 /*
  * Entering deep idle state.
@@ -304,7 +305,7 @@ lwarx_loop_stop:
 
 	bl	save_sprs_to_stack
 
-	IDLE_STATE_ENTER_SEQ_NORET(PPC_STOP)
+	PPC_STOP	/* Does not return (system reset interrupt) */
 
 /*
  * Entered with MSR[EE]=0 and no soft-masked interrupts pending.
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 3e024fd71fe8..edb47738a686 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -2527,7 +2527,17 @@ BEGIN_FTR_SECTION
 END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 
 kvm_nap_sequence:		/* desired LPCR value in r5 */
-BEGIN_FTR_SECTION
+BEGIN_FTR_SECTION	/* nap sequence */
+	mtspr	SPRN_LPCR,r5
+	isync
+	li	r0, 0
+	std	r0, HSTATE_SCRATCH0(r13)
+	ptesync
+	ld	r0, HSTATE_SCRATCH0(r13)
+1:	cmpd	r0, r0
+	bne	1b
+	nap
+FTR_SECTION_ELSE	/* stop sequence */
 	/*
 	 * PSSCR bits:	exit criterion = 1 (wakeup based on LPCR at sreset)
 	 *		enable state loss = 1 (allow SMT mode switch)
@@ -2539,18 +2549,8 @@ BEGIN_FTR_SECTION
 	li	r4, LPCR_PECE_HVEE@higher
 	sldi	r4, r4, 32
 	or	r5, r5, r4
-END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
 	mtspr	SPRN_LPCR,r5
-	isync
-	li	r0, 0
-	std	r0, HSTATE_SCRATCH0(r13)
-	ptesync
-	ld	r0, HSTATE_SCRATCH0(r13)
-1:	cmpd	r0, r0
-	bne	1b
-BEGIN_FTR_SECTION
-	nap
-FTR_SECTION_ELSE
+
 	PPC_STOP
 ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_300)
 	b	.
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 12/13] powerpc/64s: idle POWER9 can execute stop in virtual mode
  2017-08-05 17:02 [PATCH 00/13] idle and soft-irq improvements and POWER9 idle optimisation Nicholas Piggin
                   ` (10 preceding siblings ...)
  2017-08-05 17:02 ` [PATCH 11/13] powerpc/64s: idle POWER9 can execute stop without ptesync Nicholas Piggin
@ 2017-08-05 17:02 ` Nicholas Piggin
  2017-08-05 22:57   ` Benjamin Herrenschmidt
  2017-08-08 10:45   ` Gautham R Shenoy
  2017-08-05 17:02 ` [PATCH 13/13] powerpc/64s: idle ESL=0 stop can avoid all save/restore overhead Nicholas Piggin
  12 siblings, 2 replies; 29+ messages in thread
From: Nicholas Piggin @ 2017-08-05 17:02 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin, Gautham R . Shenoy, Paul Mackerras

The hardware can execute stop in any context, and KVM does not
require real mode. This saves a switch to real-mode when going
idle.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/idle_book3s.S | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/powerpc/kernel/idle_book3s.S b/arch/powerpc/kernel/idle_book3s.S
index 75746111e2c4..8ac366a51bb5 100644
--- a/arch/powerpc/kernel/idle_book3s.S
+++ b/arch/powerpc/kernel/idle_book3s.S
@@ -141,7 +141,16 @@ pnv_powersave_common:
 	std	r5,_CCR(r1)
 	std	r1,PACAR1(r13)
 
+BEGIN_FTR_SECTION
+	/*
+	 * POWER9 does not require real mode to stop, and does not set
+	 * hwthread_state for KVM (threads don't share MMU context), so
+	 * we can remain in virtual mode for this.
+	 */
+	bctr
+END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
 	/*
+	 * POWER8
 	 * Go to real mode to do the nap, as required by the architecture.
 	 * Also, we need to be in real mode before setting hwthread_state,
 	 * because as soon as we do that, another thread can switch
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 13/13] powerpc/64s: idle ESL=0 stop can avoid all save/restore overhead
  2017-08-05 17:02 [PATCH 00/13] idle and soft-irq improvements and POWER9 idle optimisation Nicholas Piggin
                   ` (11 preceding siblings ...)
  2017-08-05 17:02 ` [PATCH 12/13] powerpc/64s: idle POWER9 can execute stop in virtual mode Nicholas Piggin
@ 2017-08-05 17:02 ` Nicholas Piggin
  2017-08-05 17:11   ` Nicholas Piggin
  2017-08-08 10:49   ` Gautham R Shenoy
  12 siblings, 2 replies; 29+ messages in thread
From: Nicholas Piggin @ 2017-08-05 17:02 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin, Gautham R . Shenoy

When stop is executed with EC=ESL=0, it appears to execute like a
normal instruction (resuming from NIP when woken by interrupt).
So all the save/restore handling can be avoided completely. In
particular NV GPRs do not have to be saved, and MSR does not have
to be switched back to kernel MSR.

So move the test for "lite" sleep states out to power9_idle_stop.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/idle_book3s.S | 38 +++++++++++++-------------------------
 1 file changed, 13 insertions(+), 25 deletions(-)

diff --git a/arch/powerpc/kernel/idle_book3s.S b/arch/powerpc/kernel/idle_book3s.S
index 8ac366a51bb5..9eb47c99dc39 100644
--- a/arch/powerpc/kernel/idle_book3s.S
+++ b/arch/powerpc/kernel/idle_book3s.S
@@ -251,31 +251,8 @@ enter_winkle:
 /*
  * r3 - PSSCR value corresponding to the requested stop state.
  */
-power_enter_stop:
-/*
- * Check if we are executing the lite variant with ESL=EC=0
- */
-	andis.   r4,r3,PSSCR_EC_ESL_MASK_SHIFTED
+power_enter_stop_esl:
 	clrldi   r3,r3,60 /* r3 = Bits[60:63] = Requested Level (RL) */
-	bne	 .Lhandle_esl_ec_set
-	PPC_STOP
-	li	r3,0  /* Since we didn't lose state, return 0 */
-
-	/*
-	 * pnv_wakeup_noloss() expects r12 to contain the SRR1 value so
-	 * it can determine if the wakeup reason is an HMI in
-	 * CHECK_HMI_INTERRUPT.
-	 *
-	 * However, when we wakeup with ESL=0, SRR1 will not contain the wakeup
-	 * reason, so there is no point setting r12 to SRR1.
-	 *
-	 * Further, we clear r12 here, so that we don't accidentally enter the
-	 * HMI in pnv_wakeup_noloss() if the value of r12[42:45] == WAKE_HMI.
-	 */
-	li	r12, 0
-	b 	pnv_wakeup_noloss
-
-.Lhandle_esl_ec_set:
 	/*
 	 * POWER9 DD2 can incorrectly set PMAO when waking up after a
 	 * state-loss idle. Saving and restoring MMCR0 over idle is a
@@ -348,9 +325,20 @@ ALT_FTR_SECTION_END_NESTED_IFSET(CPU_FTR_ARCH_207S, 66);		\
  * r3 contains desired PSSCR register value.
  */
 _GLOBAL(power9_idle_stop)
+	/*
+	 * Check if we are executing the lite variant with ESL=EC=0
+	 * This case resumes execution after the stop instruction without
+	 * losing any state, so nothing has to be saved.
+	 */
+	andis.	r4,r3,PSSCR_EC_ESL_MASK_SHIFTED
+	bne	1f
+	PPC_STOP
+	li	r3,0  /* Since we didn't lose state, return 0 */
+	blr
+1:	/* state-loss idle */
 	std	r3, PACA_REQ_PSSCR(r13)
 	mtspr 	SPRN_PSSCR,r3
-	LOAD_REG_ADDR(r4,power_enter_stop)
+	LOAD_REG_ADDR(r4,power_enter_stop_esl)
 	b	pnv_powersave_common
 	/* No return */
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 13/13] powerpc/64s: idle ESL=0 stop can avoid all save/restore overhead
  2017-08-05 17:02 ` [PATCH 13/13] powerpc/64s: idle ESL=0 stop can avoid all save/restore overhead Nicholas Piggin
@ 2017-08-05 17:11   ` Nicholas Piggin
  2017-08-08 10:49   ` Gautham R Shenoy
  1 sibling, 0 replies; 29+ messages in thread
From: Nicholas Piggin @ 2017-08-05 17:11 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gautham R . Shenoy

On Sun,  6 Aug 2017 03:02:41 +1000
Nicholas Piggin <npiggin@gmail.com> wrote:

> When stop is executed with EC=ESL=0, it appears to execute like a
> normal instruction (resuming from NIP when woken by interrupt).
> So all the save/restore handling can be avoided completely. In
> particular NV GPRs do not have to be saved, and MSR does not have
> to be switched back to kernel MSR.
> 
> So move the test for "lite" sleep states out to power9_idle_stop.

I forgot to mention this actually breaks in the simulator because
it sets MSR as though it's taken a system reset interrupt when
resuming from EC=0 stop. It works on real hardware.

This might have to be quirked away by firmware or something, but
I've reported the bug so waiting to hear back first.

Thanks,
Nick

> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>  arch/powerpc/kernel/idle_book3s.S | 38 +++++++++++++-------------------------
>  1 file changed, 13 insertions(+), 25 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/idle_book3s.S b/arch/powerpc/kernel/idle_book3s.S
> index 8ac366a51bb5..9eb47c99dc39 100644
> --- a/arch/powerpc/kernel/idle_book3s.S
> +++ b/arch/powerpc/kernel/idle_book3s.S
> @@ -251,31 +251,8 @@ enter_winkle:
>  /*
>   * r3 - PSSCR value corresponding to the requested stop state.
>   */
> -power_enter_stop:
> -/*
> - * Check if we are executing the lite variant with ESL=EC=0
> - */
> -	andis.   r4,r3,PSSCR_EC_ESL_MASK_SHIFTED
> +power_enter_stop_esl:
>  	clrldi   r3,r3,60 /* r3 = Bits[60:63] = Requested Level (RL) */
> -	bne	 .Lhandle_esl_ec_set
> -	PPC_STOP
> -	li	r3,0  /* Since we didn't lose state, return 0 */
> -
> -	/*
> -	 * pnv_wakeup_noloss() expects r12 to contain the SRR1 value so
> -	 * it can determine if the wakeup reason is an HMI in
> -	 * CHECK_HMI_INTERRUPT.
> -	 *
> -	 * However, when we wakeup with ESL=0, SRR1 will not contain the wakeup
> -	 * reason, so there is no point setting r12 to SRR1.
> -	 *
> -	 * Further, we clear r12 here, so that we don't accidentally enter the
> -	 * HMI in pnv_wakeup_noloss() if the value of r12[42:45] == WAKE_HMI.
> -	 */
> -	li	r12, 0
> -	b 	pnv_wakeup_noloss
> -
> -.Lhandle_esl_ec_set:
>  	/*
>  	 * POWER9 DD2 can incorrectly set PMAO when waking up after a
>  	 * state-loss idle. Saving and restoring MMCR0 over idle is a
> @@ -348,9 +325,20 @@ ALT_FTR_SECTION_END_NESTED_IFSET(CPU_FTR_ARCH_207S, 66);		\
>   * r3 contains desired PSSCR register value.
>   */
>  _GLOBAL(power9_idle_stop)
> +	/*
> +	 * Check if we are executing the lite variant with ESL=EC=0
> +	 * This case resumes execution after the stop instruction without
> +	 * losing any state, so nothing has to be saved.
> +	 */
> +	andis.	r4,r3,PSSCR_EC_ESL_MASK_SHIFTED
> +	bne	1f
> +	PPC_STOP
> +	li	r3,0  /* Since we didn't lose state, return 0 */
> +	blr
> +1:	/* state-loss idle */
>  	std	r3, PACA_REQ_PSSCR(r13)
>  	mtspr 	SPRN_PSSCR,r3
> -	LOAD_REG_ADDR(r4,power_enter_stop)
> +	LOAD_REG_ADDR(r4,power_enter_stop_esl)
>  	b	pnv_powersave_common
>  	/* No return */
>  

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 12/13] powerpc/64s: idle POWER9 can execute stop in virtual mode
  2017-08-05 17:02 ` [PATCH 12/13] powerpc/64s: idle POWER9 can execute stop in virtual mode Nicholas Piggin
@ 2017-08-05 22:57   ` Benjamin Herrenschmidt
  2017-08-08 10:45   ` Gautham R Shenoy
  1 sibling, 0 replies; 29+ messages in thread
From: Benjamin Herrenschmidt @ 2017-08-05 22:57 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev; +Cc: Gautham R . Shenoy, Paul Mackerras

On Sun, 2017-08-06 at 03:02 +1000, Nicholas Piggin wrote:
> The hardware can execute stop in any context, and KVM does not
> require real mode. This saves a switch to real-mode when going
> idle.

+Paulus.

> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>  arch/powerpc/kernel/idle_book3s.S | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/arch/powerpc/kernel/idle_book3s.S b/arch/powerpc/kernel/idle_book3s.S
> index 75746111e2c4..8ac366a51bb5 100644
> --- a/arch/powerpc/kernel/idle_book3s.S
> +++ b/arch/powerpc/kernel/idle_book3s.S
> @@ -141,7 +141,16 @@ pnv_powersave_common:
>  	std	r5,_CCR(r1)
>  	std	r1,PACAR1(r13)
>  
> +BEGIN_FTR_SECTION
> +	/*
> +	 * POWER9 does not require real mode to stop, and does not set
> +	 * hwthread_state for KVM (threads don't share MMU context), so
> +	 * we can remain in virtual mode for this.
> +	 */
> +	bctr
> +END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
>  	/*
> +	 * POWER8
>  	 * Go to real mode to do the nap, as required by the architecture.
>  	 * Also, we need to be in real mode before setting hwthread_state,
>  	 * because as soon as we do that, another thread can switch

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 08/13] powerpc/64s: irq replay remove spurious irq reason
  2017-08-05 17:02 ` [PATCH 08/13] powerpc/64s: irq replay remove spurious irq reason Nicholas Piggin
@ 2017-08-05 23:00   ` Benjamin Herrenschmidt
  2017-08-06  0:51     ` Nicholas Piggin
  0 siblings, 1 reply; 29+ messages in thread
From: Benjamin Herrenschmidt @ 2017-08-05 23:00 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev

On Sun, 2017-08-06 at 03:02 +1000, Nicholas Piggin wrote:
> HVI interrupts have always used 0x500, so remove the dead branch.

Maybe we should fix that and "catch" in incorrect entry via 0x500
which would mean the XIVE is trying to deliver guest irqs to the OS...

That can happen if some LPCR bits aren't set properly and/or KVM
doesn't pull the guest in time. I had bugs like that in my early
dev so I've been running with a b . at 0x500 for a while :-)

> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>  arch/powerpc/kernel/exceptions-64s.S | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
> index 29253cecf713..566cf126c13b 100644
> --- a/arch/powerpc/kernel/exceptions-64s.S
> +++ b/arch/powerpc/kernel/exceptions-64s.S
> @@ -1680,8 +1680,6 @@ ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_300)
>  BEGIN_FTR_SECTION
>  	cmpwi	r3,0xa00
>  	beq	h_doorbell_common_msgclr
> -	cmpwi	r3,0xea0
> -	beq	h_virt_irq_common
>  	cmpwi	r3,0xe60
>  	beq	hmi_exception_common
>  FTR_SECTION_ELSE

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 08/13] powerpc/64s: irq replay remove spurious irq reason
  2017-08-05 23:00   ` Benjamin Herrenschmidt
@ 2017-08-06  0:51     ` Nicholas Piggin
  2017-08-10 13:12       ` Michael Ellerman
  0 siblings, 1 reply; 29+ messages in thread
From: Nicholas Piggin @ 2017-08-06  0:51 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev

On Sun, 06 Aug 2017 09:00:32 +1000
Benjamin Herrenschmidt <benh@au1.ibm.com> wrote:

> On Sun, 2017-08-06 at 03:02 +1000, Nicholas Piggin wrote:
> > HVI interrupts have always used 0x500, so remove the dead branch.  
> 
> Maybe we should fix that and "catch" in incorrect entry via 0x500
> which would mean the XIVE is trying to deliver guest irqs to the OS...

I should be more clear, when I say 0x500, it is only in reference to
the constant used by the soft-irq replay. After patch 6 the replay is
sent to the 0xea0 common handler.

> That can happen if some LPCR bits aren't set properly and/or KVM
> doesn't pull the guest in time. I had bugs like that in my early
> dev so I've been running with a b . at 0x500 for a while :-)

So that's a separate issue of hardware actually doing a 0x500. I
had this

http://patchwork.ozlabs.org/patch/750033/


> 
> > Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> > ---
> >  arch/powerpc/kernel/exceptions-64s.S | 2 --
> >  1 file changed, 2 deletions(-)
> > 
> > diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
> > index 29253cecf713..566cf126c13b 100644
> > --- a/arch/powerpc/kernel/exceptions-64s.S
> > +++ b/arch/powerpc/kernel/exceptions-64s.S
> > @@ -1680,8 +1680,6 @@ ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_300)
> >  BEGIN_FTR_SECTION
> >  	cmpwi	r3,0xa00
> >  	beq	h_doorbell_common_msgclr
> > -	cmpwi	r3,0xea0
> > -	beq	h_virt_irq_common
> >  	cmpwi	r3,0xe60
> >  	beq	hmi_exception_common
> >  FTR_SECTION_ELSE  
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 10/13] powerpc/64s: idle simplify KVM idle on POWER9
  2017-08-05 17:02 ` [PATCH 10/13] powerpc/64s: idle simplify KVM idle on POWER9 Nicholas Piggin
@ 2017-08-08 10:36   ` Gautham R Shenoy
  2017-08-08 12:42     ` Nicholas Piggin
  2017-08-10 13:14   ` Michael Ellerman
  1 sibling, 1 reply; 29+ messages in thread
From: Gautham R Shenoy @ 2017-08-08 10:36 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: linuxppc-dev, Gautham R . Shenoy, Paul Mackerras, kvm-ppc

Hi Nicholas,

On Sun, Aug 06, 2017 at 03:02:38AM +1000, Nicholas Piggin wrote:
> POWER9 CPUs have independent MMU contexts per thread so KVM
> does not have to bring sibling threads into real-mode when
> switching MMU mode to guest. This can simplify POWER9 sleep/wake
> paths and avoids hwsyncs.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>  arch/powerpc/include/asm/kvm_book3s_asm.h |  4 ++++
>  arch/powerpc/kernel/idle_book3s.S         |  8 ++-----
>  arch/powerpc/kvm/book3s_hv.c              | 37 ++++++++++++++++++++++++++-----
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S   |  8 +++++++
>  4 files changed, 46 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h b/arch/powerpc/include/asm/kvm_book3s_asm.h
> index 7cea76f11c26..83596f32f50b 100644
> --- a/arch/powerpc/include/asm/kvm_book3s_asm.h
> +++ b/arch/powerpc/include/asm/kvm_book3s_asm.h
> @@ -104,6 +104,10 @@ struct kvmppc_host_state {
>  	u8 napping;
> 
>  #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
> +	/*
> +	 * hwthread_req/hwthread_state pair is used to pull sibling threads
> +	 * out of guest on pre-ISAv3.0B CPUs where threads share MMU.
> +	 */
>  	u8 hwthread_req;
>  	u8 hwthread_state;
>  	u8 host_ipi;
> diff --git a/arch/powerpc/kernel/idle_book3s.S b/arch/powerpc/kernel/idle_book3s.S
> index e6252c5a57a4..3ab73f9223e4 100644
> --- a/arch/powerpc/kernel/idle_book3s.S
> +++ b/arch/powerpc/kernel/idle_book3s.S
> @@ -243,12 +243,6 @@ enter_winkle:
>   * r3 - PSSCR value corresponding to the requested stop state.
>   */
>  power_enter_stop:
> -#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
> -	/* Tell KVM we're entering idle */
> -	li	r4,KVM_HWTHREAD_IN_IDLE
> -	/* DO THIS IN REAL MODE!  See comment above. */
> -	stb	r4,HSTATE_HWTHREAD_STATE(r13)
> -#endif
>  /*
>   * Check if we are executing the lite variant with ESL=EC=0
>   */
> @@ -435,6 +429,7 @@ ALT_FTR_SECTION_END_IFSET(CPU_FTR_ARCH_300)
>  	mr	r3,r12
> 
>  #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
> +BEGIN_FTR_SECTION
>  	li	r0,KVM_HWTHREAD_IN_KERNEL
>  	stb	r0,HSTATE_HWTHREAD_STATE(r13)
>  	/* Order setting hwthread_state vs. testing hwthread_req */
> @@ -444,6 +439,7 @@ ALT_FTR_SECTION_END_IFSET(CPU_FTR_ARCH_300)
>  	beq	1f
>  	b	kvm_start_guest
>  1:
> +END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)

This would be 7 nops on power9. Should we move this to a different
function and do a bl to that?


>  #endif
> 
>  	/* Return SRR1 from power7_nap() */
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 359c79cdf0cc..bb1ab14f963a 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -2111,6 +2111,16 @@ static int kvmppc_grab_hwthread(int cpu)
>  	struct paca_struct *tpaca;
>  	long timeout = 10000;
> 
> +	/*
> +	 * ISA v3.0 idle routines do not set hwthread_state or test
> +	 * hwthread_req, so they can not grab idle threads.
> +	 */
> +	if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> +		WARN_ON(1);
> +		pr_err("KVM: can not control sibling threads\n");
> +		return -EBUSY;
> +	}
> +
>  	tpaca = &paca[cpu];
> 
>  	/* Ensure the thread won't go into the kernel if it wakes */
> @@ -2145,12 +2155,26 @@ static void kvmppc_release_hwthread(int cpu)
>  	struct paca_struct *tpaca;
> 
>  	tpaca = &paca[cpu];
> -	tpaca->kvm_hstate.hwthread_req = 0;
>  	tpaca->kvm_hstate.kvm_vcpu = NULL;
>  	tpaca->kvm_hstate.kvm_vcore = NULL;
>  	tpaca->kvm_hstate.kvm_split_mode = NULL;
>  }
> 
> +static void kvmppc_release_hwthread_secondary(int cpu)
> +{
> +	struct paca_struct *tpaca;
> +
> +	if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> +		WARN_ON(1);
> +		return;
> +	}
> +
> +	tpaca = &paca[cpu];
> +	tpaca->kvm_hstate.hwthread_req = 0;
> +	kvmppc_release_hwthread(cpu);
> +}
> +
> +

Extra blank line not needed.

>  static void radix_flush_cpu(struct kvm *kvm, int cpu, struct kvm_vcpu *vcpu)
>  {
>  	int i;
> @@ -2274,7 +2298,7 @@ static int on_primary_thread(void)
>  		if (kvmppc_grab_hwthread(cpu + thr)) {
>  			/* Couldn't grab one; let the others go */
>  			do {
> -				kvmppc_release_hwthread(cpu + thr);
> +				kvmppc_release_hwthread_secondary(cpu + thr);
>  			} while (--thr > 0);
>  			return 0;
>  		}
> @@ -2702,8 +2726,9 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc)
>  			kvmppc_vcore_preempt(pvc);
>  			spin_unlock(&pvc->lock);
>  		}
> -		for (i = 0; i < controlled_threads; ++i)
> -			kvmppc_release_hwthread(pcpu + i);
> +		for (i = 1; i < controlled_threads; ++i)
> +			kvmppc_release_hwthread_secondary(pcpu + i);
> +		kvmppc_release_hwthread(pcpu);
>  		return;
>  	}
> 
> @@ -2858,11 +2883,13 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc)
> 
>  	/* Let secondaries go back to the offline loop */
>  	for (i = 0; i < controlled_threads; ++i) {
> -		kvmppc_release_hwthread(pcpu + i);
>  		if (sip && sip->napped[i])
>  			kvmppc_ipi_thread(pcpu + i);
>  		cpumask_clear_cpu(pcpu + i, &vc->kvm->arch.cpu_in_guest);
>  	}

We are sending an IPI to the thread that has exited the guest and is
currently napping. The IPI wakes it up so that it can executes
offline loop. But we haven't released the hwthread yet, which means
that hwthread_req for this thread is still set.

The thread wakes up from nap, executes the pnv_powersave_wakeup code
where it can enter kvm_start_guest. Is this a legitimate race or am I
missing something?

> +	for (i = 1; i < controlled_threads; ++i)
> +		kvmppc_release_hwthread_secondary(pcpu + i);
> +	kvmppc_release_hwthread(pcpu);
> 
>  	spin_unlock(&vc->lock);
> 
> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> index c52184a8efdf..3e024fd71fe8 100644
> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> @@ -149,9 +149,11 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>  	subf	r4, r4, r3
>  	mtspr	SPRN_DEC, r4
> 
> +BEGIN_FTR_SECTION
>  	/* hwthread_req may have got set by cede or no vcpu, so clear it */
>  	li	r0, 0
>  	stb	r0, HSTATE_HWTHREAD_REQ(r13)
> +END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
> 
>  	/*
>  	 * For external interrupts we need to call the Linux
> @@ -314,6 +316,7 @@ kvm_novcpu_exit:
>   * Relocation is off and most register values are lost.
>   * r13 points to the PACA.
>   * r3 contains the SRR1 wakeup value, SRR1 is trashed.
> + * This is not used by ISAv3.0B processors.
>   */
>  	.globl	kvm_start_guest
>  kvm_start_guest:
> @@ -432,6 +435,9 @@ kvm_secondary_got_guest:
>   * While waiting we also need to check if we get given a vcpu to run.
>   */
>  kvm_no_guest:
> +BEGIN_FTR_SECTION
> +	twi	31,0,0
> +END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)


>  	lbz	r3, HSTATE_HWTHREAD_REQ(r13)
>  	cmpwi	r3, 0
>  	bne	53f
> @@ -2509,8 +2515,10 @@ kvm_do_nap:
>  	clrrdi	r0, r0, 1
>  	mtspr	SPRN_CTRLT, r0
> 
> +BEGIN_FTR_SECTION
>  	li	r0,1
>  	stb	r0,HSTATE_HWTHREAD_REQ(r13)
> +END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
>  	mfspr	r5,SPRN_LPCR
>  	ori	r5,r5,LPCR_PECE0 | LPCR_PECE1
>  BEGIN_FTR_SECTION
> -- 
> 2.11.0
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 11/13] powerpc/64s: idle POWER9 can execute stop without ptesync
  2017-08-05 17:02 ` [PATCH 11/13] powerpc/64s: idle POWER9 can execute stop without ptesync Nicholas Piggin
@ 2017-08-08 10:37   ` Gautham R Shenoy
  2017-08-10 13:15   ` Michael Ellerman
  1 sibling, 0 replies; 29+ messages in thread
From: Gautham R Shenoy @ 2017-08-08 10:37 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: linuxppc-dev, Gautham R . Shenoy

On Sun, Aug 06, 2017 at 03:02:39AM +1000, Nicholas Piggin wrote:
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>

--
Thanks and Regards
gautham.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 12/13] powerpc/64s: idle POWER9 can execute stop in virtual mode
  2017-08-05 17:02 ` [PATCH 12/13] powerpc/64s: idle POWER9 can execute stop in virtual mode Nicholas Piggin
  2017-08-05 22:57   ` Benjamin Herrenschmidt
@ 2017-08-08 10:45   ` Gautham R Shenoy
  1 sibling, 0 replies; 29+ messages in thread
From: Gautham R Shenoy @ 2017-08-08 10:45 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: linuxppc-dev, Gautham R . Shenoy, Paul Mackerras

On Sun, Aug 06, 2017 at 03:02:40AM +1000, Nicholas Piggin wrote:
> The hardware can execute stop in any context, and KVM does not
> require real mode. This saves a switch to real-mode when going
> idle.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

Acked-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>

> ---
>  arch/powerpc/kernel/idle_book3s.S | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/arch/powerpc/kernel/idle_book3s.S b/arch/powerpc/kernel/idle_book3s.S
> index 75746111e2c4..8ac366a51bb5 100644
> --- a/arch/powerpc/kernel/idle_book3s.S
> +++ b/arch/powerpc/kernel/idle_book3s.S
> @@ -141,7 +141,16 @@ pnv_powersave_common:
>  	std	r5,_CCR(r1)
>  	std	r1,PACAR1(r13)
> 
> +BEGIN_FTR_SECTION
> +	/*
> +	 * POWER9 does not require real mode to stop, and does not set
> +	 * hwthread_state for KVM (threads don't share MMU context), so
> +	 * we can remain in virtual mode for this.
> +	 */
> +	bctr
> +END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
>  	/*
> +	 * POWER8
>  	 * Go to real mode to do the nap, as required by the architecture.
>  	 * Also, we need to be in real mode before setting hwthread_state,
>  	 * because as soon as we do that, another thread can switch
> -- 
> 2.11.0
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 13/13] powerpc/64s: idle ESL=0 stop can avoid all save/restore overhead
  2017-08-05 17:02 ` [PATCH 13/13] powerpc/64s: idle ESL=0 stop can avoid all save/restore overhead Nicholas Piggin
  2017-08-05 17:11   ` Nicholas Piggin
@ 2017-08-08 10:49   ` Gautham R Shenoy
  1 sibling, 0 replies; 29+ messages in thread
From: Gautham R Shenoy @ 2017-08-08 10:49 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: linuxppc-dev, Gautham R . Shenoy

On Sun, Aug 06, 2017 at 03:02:41AM +1000, Nicholas Piggin wrote:
> When stop is executed with EC=ESL=0, it appears to execute like a
> normal instruction (resuming from NIP when woken by interrupt).
> So all the save/restore handling can be avoided completely. In
> particular NV GPRs do not have to be saved, and MSR does not have
> to be switched back to kernel MSR.
> 
> So move the test for "lite" sleep states out to power9_idle_stop.

Nice optimization!

Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>

> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>  arch/powerpc/kernel/idle_book3s.S | 38 +++++++++++++-------------------------
>  1 file changed, 13 insertions(+), 25 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/idle_book3s.S b/arch/powerpc/kernel/idle_book3s.S
> index 8ac366a51bb5..9eb47c99dc39 100644
> --- a/arch/powerpc/kernel/idle_book3s.S
> +++ b/arch/powerpc/kernel/idle_book3s.S
> @@ -251,31 +251,8 @@ enter_winkle:
>  /*
>   * r3 - PSSCR value corresponding to the requested stop state.
>   */
> -power_enter_stop:
> -/*
> - * Check if we are executing the lite variant with ESL=EC=0
> - */
> -	andis.   r4,r3,PSSCR_EC_ESL_MASK_SHIFTED
> +power_enter_stop_esl:
>  	clrldi   r3,r3,60 /* r3 = Bits[60:63] = Requested Level (RL) */
> -	bne	 .Lhandle_esl_ec_set
> -	PPC_STOP
> -	li	r3,0  /* Since we didn't lose state, return 0 */
> -
> -	/*
> -	 * pnv_wakeup_noloss() expects r12 to contain the SRR1 value so
> -	 * it can determine if the wakeup reason is an HMI in
> -	 * CHECK_HMI_INTERRUPT.
> -	 *
> -	 * However, when we wakeup with ESL=0, SRR1 will not contain the wakeup
> -	 * reason, so there is no point setting r12 to SRR1.
> -	 *
> -	 * Further, we clear r12 here, so that we don't accidentally enter the
> -	 * HMI in pnv_wakeup_noloss() if the value of r12[42:45] == WAKE_HMI.
> -	 */
> -	li	r12, 0
> -	b 	pnv_wakeup_noloss
> -
> -.Lhandle_esl_ec_set:
>  	/*
>  	 * POWER9 DD2 can incorrectly set PMAO when waking up after a
>  	 * state-loss idle. Saving and restoring MMCR0 over idle is a
> @@ -348,9 +325,20 @@ ALT_FTR_SECTION_END_NESTED_IFSET(CPU_FTR_ARCH_207S, 66);		\
>   * r3 contains desired PSSCR register value.
>   */
>  _GLOBAL(power9_idle_stop)
> +	/*
> +	 * Check if we are executing the lite variant with ESL=EC=0
> +	 * This case resumes execution after the stop instruction without
> +	 * losing any state, so nothing has to be saved.
> +	 */
> +	andis.	r4,r3,PSSCR_EC_ESL_MASK_SHIFTED
> +	bne	1f
> +	PPC_STOP
> +	li	r3,0  /* Since we didn't lose state, return 0 */
> +	blr
> +1:	/* state-loss idle */
>  	std	r3, PACA_REQ_PSSCR(r13)
>  	mtspr 	SPRN_PSSCR,r3
> -	LOAD_REG_ADDR(r4,power_enter_stop)
> +	LOAD_REG_ADDR(r4,power_enter_stop_esl)
>  	b	pnv_powersave_common
>  	/* No return */
> 
> -- 
> 2.11.0
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 10/13] powerpc/64s: idle simplify KVM idle on POWER9
  2017-08-08 10:36   ` Gautham R Shenoy
@ 2017-08-08 12:42     ` Nicholas Piggin
  2017-08-10  6:24       ` Gautham R Shenoy
  0 siblings, 1 reply; 29+ messages in thread
From: Nicholas Piggin @ 2017-08-08 12:42 UTC (permalink / raw)
  To: Gautham R Shenoy; +Cc: linuxppc-dev, Paul Mackerras, kvm-ppc

On Tue, 8 Aug 2017 16:06:43 +0530
Gautham R Shenoy <ego@linux.vnet.ibm.com> wrote:

> Hi Nicholas,
> 
> On Sun, Aug 06, 2017 at 03:02:38AM +1000, Nicholas Piggin wrote:
> > POWER9 CPUs have independent MMU contexts per thread so KVM
> > does not have to bring sibling threads into real-mode when
> > switching MMU mode to guest. This can simplify POWER9 sleep/wake
> > paths and avoids hwsyncs.
> > 


> > @@ -444,6 +439,7 @@ ALT_FTR_SECTION_END_IFSET(CPU_FTR_ARCH_300)
> >  	beq	1f
> >  	b	kvm_start_guest
> >  1:
> > +END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)  
> 
> This would be 7 nops on power9. Should we move this to a different
> function and do a bl to that?

Yes that's a good idea.

> > +static void kvmppc_release_hwthread_secondary(int cpu)
> > +{
> > +	struct paca_struct *tpaca;
> > +
> > +	if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> > +		WARN_ON(1);
> > +		return;
> > +	}
> > +
> > +	tpaca = &paca[cpu];
> > +	tpaca->kvm_hstate.hwthread_req = 0;
> > +	kvmppc_release_hwthread(cpu);
> > +}
> > +
> > +  
> 
> Extra blank line not needed.

Sure.

> > @@ -2858,11 +2883,13 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc)
> > 
> >  	/* Let secondaries go back to the offline loop */
> >  	for (i = 0; i < controlled_threads; ++i) {
> > -		kvmppc_release_hwthread(pcpu + i);
> >  		if (sip && sip->napped[i])
> >  			kvmppc_ipi_thread(pcpu + i);
> >  		cpumask_clear_cpu(pcpu + i, &vc->kvm->arch.cpu_in_guest);
> >  	}  
> 
> We are sending an IPI to the thread that has exited the guest and is
> currently napping. The IPI wakes it up so that it can executes
> offline loop. But we haven't released the hwthread yet, which means
> that hwthread_req for this thread is still set.
> 
> The thread wakes up from nap, executes the pnv_powersave_wakeup code
> where it can enter kvm_start_guest. Is this a legitimate race or am I
> missing something?

Oh I think it's just a silly mistake in my patch, good catch.
Would moving this loop below the one below solve it? I wasn't
completely happy with uglifying these loops by making the
primary release different than secondary... maybe I will just
move the difference into kvmppc_release_hwthread and which is
less intrusive to callers.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 10/13] powerpc/64s: idle simplify KVM idle on POWER9
  2017-08-08 12:42     ` Nicholas Piggin
@ 2017-08-10  6:24       ` Gautham R Shenoy
  0 siblings, 0 replies; 29+ messages in thread
From: Gautham R Shenoy @ 2017-08-10  6:24 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: Gautham R Shenoy, linuxppc-dev, Paul Mackerras, kvm-ppc

On Tue, Aug 08, 2017 at 10:42:57PM +1000, Nicholas Piggin wrote:
> On Tue, 8 Aug 2017 16:06:43 +0530
> Gautham R Shenoy <ego@linux.vnet.ibm.com> wrote:
> 
> > Hi Nicholas,
> > 
> > On Sun, Aug 06, 2017 at 03:02:38AM +1000, Nicholas Piggin wrote:
> > > POWER9 CPUs have independent MMU contexts per thread so KVM
> > > does not have to bring sibling threads into real-mode when
> > > switching MMU mode to guest. This can simplify POWER9 sleep/wake
> > > paths and avoids hwsyncs.
> > > @@ -2858,11 +2883,13 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc)
> > > 
> > >  	/* Let secondaries go back to the offline loop */
> > >  	for (i = 0; i < controlled_threads; ++i) {
> > > -		kvmppc_release_hwthread(pcpu + i);
> > >  		if (sip && sip->napped[i])
> > >  			kvmppc_ipi_thread(pcpu + i);
> > >  		cpumask_clear_cpu(pcpu + i, &vc->kvm->arch.cpu_in_guest);
> > >  	}  
> > 
> > We are sending an IPI to the thread that has exited the guest and is
> > currently napping. The IPI wakes it up so that it can executes
> > offline loop. But we haven't released the hwthread yet, which means
> > that hwthread_req for this thread is still set.
> > 
> > The thread wakes up from nap, executes the pnv_powersave_wakeup code
> > where it can enter kvm_start_guest. Is this a legitimate race or am I
> > missing something?
> 
> Oh I think it's just a silly mistake in my patch, good catch.

Ah,np!

> Would moving this loop below the one below solve it? I wasn't
> completely happy with uglifying these loops by making the
> primary release different than secondary... maybe I will just
> move the difference into kvmppc_release_hwthread and which is
> less intrusive to callers.

I think moving it to kvmppc_release_hwthread is a good idea.

> 
> Thanks,
> Nick
> 

--
Thanks and Regards
gautham.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 08/13] powerpc/64s: irq replay remove spurious irq reason
  2017-08-06  0:51     ` Nicholas Piggin
@ 2017-08-10 13:12       ` Michael Ellerman
  0 siblings, 0 replies; 29+ messages in thread
From: Michael Ellerman @ 2017-08-10 13:12 UTC (permalink / raw)
  To: Nicholas Piggin, Benjamin Herrenschmidt; +Cc: linuxppc-dev

Nicholas Piggin <npiggin@gmail.com> writes:

> On Sun, 06 Aug 2017 09:00:32 +1000
> Benjamin Herrenschmidt <benh@au1.ibm.com> wrote:
>
>> On Sun, 2017-08-06 at 03:02 +1000, Nicholas Piggin wrote:
>> > HVI interrupts have always used 0x500, so remove the dead branch.  
>> 
>> Maybe we should fix that and "catch" in incorrect entry via 0x500
>> which would mean the XIVE is trying to deliver guest irqs to the OS...
>
> I should be more clear, when I say 0x500, it is only in reference to
> the constant used by the soft-irq replay. After patch 6 the replay is
> sent to the 0xea0 common handler.
>
>> That can happen if some LPCR bits aren't set properly and/or KVM
>> doesn't pull the guest in time. I had bugs like that in my early
>> dev so I've been running with a b . at 0x500 for a while :-)
>
> So that's a separate issue of hardware actually doing a 0x500. I
> had this
>
> http://patchwork.ozlabs.org/patch/750033/

Hmm, yeah I was going to merge that.

But I got side tracked by HVICE, ie. the "interrupts don't go to 0x500"
is only true if we set HVICE.

Which we currently always do, in __setup_cpu_power9(), but then we have
a DT CPU feature for it, so we really shouldn't be always enabling
HVICE, we should leave it up to the DT CPU feature code.

And then it becomes controlled by the device tree, which makes your
patch potentially wrong depending on the device tree CPU features.

Ugh.

cheers

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 10/13] powerpc/64s: idle simplify KVM idle on POWER9
  2017-08-05 17:02 ` [PATCH 10/13] powerpc/64s: idle simplify KVM idle on POWER9 Nicholas Piggin
  2017-08-08 10:36   ` Gautham R Shenoy
@ 2017-08-10 13:14   ` Michael Ellerman
  2017-08-10 13:53     ` Nicholas Piggin
  1 sibling, 1 reply; 29+ messages in thread
From: Michael Ellerman @ 2017-08-10 13:14 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev
  Cc: Gautham R . Shenoy, kvm-ppc, Nicholas Piggin

Nicholas Piggin <npiggin@gmail.com> writes:

> POWER9 CPUs have independent MMU contexts per thread so KVM
> does not have to bring sibling threads into real-mode when
> switching MMU mode to guest. This can simplify POWER9 sleep/wake
> paths and avoids hwsyncs.
>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>  arch/powerpc/include/asm/kvm_book3s_asm.h |  4 ++++
>  arch/powerpc/kernel/idle_book3s.S         |  8 ++-----
>  arch/powerpc/kvm/book3s_hv.c              | 37 ++++++++++++++++++++++++++-----
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S   |  8 +++++++

This will need to go via, or at least be shared with Paul's tree.

So if it's possible, splitting it out of this series would be easier.

cheers

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 11/13] powerpc/64s: idle POWER9 can execute stop without ptesync
  2017-08-05 17:02 ` [PATCH 11/13] powerpc/64s: idle POWER9 can execute stop without ptesync Nicholas Piggin
  2017-08-08 10:37   ` Gautham R Shenoy
@ 2017-08-10 13:15   ` Michael Ellerman
  2017-08-10 14:03     ` Nicholas Piggin
  1 sibling, 1 reply; 29+ messages in thread
From: Michael Ellerman @ 2017-08-10 13:15 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev; +Cc: Gautham R . Shenoy, Nicholas Piggin

Nicholas Piggin <npiggin@gmail.com> writes:

> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>  arch/powerpc/kernel/idle_book3s.S       |  7 ++++---
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S | 24 ++++++++++++------------

If you can split this into a KVM and non-KVM patch that would be
helpful.

cheers

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 10/13] powerpc/64s: idle simplify KVM idle on POWER9
  2017-08-10 13:14   ` Michael Ellerman
@ 2017-08-10 13:53     ` Nicholas Piggin
  0 siblings, 0 replies; 29+ messages in thread
From: Nicholas Piggin @ 2017-08-10 13:53 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev, Gautham R . Shenoy, kvm-ppc

On Thu, 10 Aug 2017 23:14:46 +1000
Michael Ellerman <mpe@ellerman.id.au> wrote:

> Nicholas Piggin <npiggin@gmail.com> writes:
> 
> > POWER9 CPUs have independent MMU contexts per thread so KVM
> > does not have to bring sibling threads into real-mode when
> > switching MMU mode to guest. This can simplify POWER9 sleep/wake
> > paths and avoids hwsyncs.
> >
> > Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> > ---
> >  arch/powerpc/include/asm/kvm_book3s_asm.h |  4 ++++
> >  arch/powerpc/kernel/idle_book3s.S         |  8 ++-----
> >  arch/powerpc/kvm/book3s_hv.c              | 37 ++++++++++++++++++++++++++-----
> >  arch/powerpc/kvm/book3s_hv_rmhandlers.S   |  8 +++++++  
> 
> This will need to go via, or at least be shared with Paul's tree.
> 
> So if it's possible, splitting it out of this series would be easier.

I agree it's really a KVM patch, but patch 12 depends on this, 
it is a Linux patch. Not sure how you want to handle that?

Thanks,
Nick

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 11/13] powerpc/64s: idle POWER9 can execute stop without ptesync
  2017-08-10 13:15   ` Michael Ellerman
@ 2017-08-10 14:03     ` Nicholas Piggin
  0 siblings, 0 replies; 29+ messages in thread
From: Nicholas Piggin @ 2017-08-10 14:03 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev, Gautham R . Shenoy

On Thu, 10 Aug 2017 23:15:50 +1000
Michael Ellerman <mpe@ellerman.id.au> wrote:

> Nicholas Piggin <npiggin@gmail.com> writes:
> 
> > Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> > ---
> >  arch/powerpc/kernel/idle_book3s.S       |  7 ++++---
> >  arch/powerpc/kvm/book3s_hv_rmhandlers.S | 24 ++++++++++++------------  
> 
> If you can split this into a KVM and non-KVM patch that would be
> helpful.

Yes I can do that.

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2017-08-10 14:03 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-08-05 17:02 [PATCH 00/13] idle and soft-irq improvements and POWER9 idle optimisation Nicholas Piggin
2017-08-05 17:02 ` [PATCH 01/13] powerpc/64s: masked interrupt avoid branch Nicholas Piggin
2017-08-05 17:02 ` [PATCH 02/13] powerpc/64s: masked interrupt avoid instruction Nicholas Piggin
2017-08-05 17:02 ` [PATCH 03/13] powerpc/64s: masked interrupt returns to kernel so avoid r13 restore Nicholas Piggin
2017-08-05 17:02 ` [PATCH 04/13] powerpc/64: cleanup __check_irq_replay Nicholas Piggin
2017-08-05 17:02 ` [PATCH 05/13] powerpc/64s: irq replay merge HV and non-HV paths for doorbell replay Nicholas Piggin
2017-08-05 17:02 ` [PATCH 06/13] powerpc/64s: irq replay external use the HV handler in HV mode on POWER9 Nicholas Piggin
2017-08-05 17:02 ` [PATCH 07/13] powerpc/64: remove redundant instruction in interrupt replay Nicholas Piggin
2017-08-05 17:02 ` [PATCH 08/13] powerpc/64s: irq replay remove spurious irq reason Nicholas Piggin
2017-08-05 23:00   ` Benjamin Herrenschmidt
2017-08-06  0:51     ` Nicholas Piggin
2017-08-10 13:12       ` Michael Ellerman
2017-08-05 17:02 ` [PATCH 09/13] powerpc/64: runlatch CTRL[RUN] set optimisation Nicholas Piggin
2017-08-05 17:02 ` [PATCH 10/13] powerpc/64s: idle simplify KVM idle on POWER9 Nicholas Piggin
2017-08-08 10:36   ` Gautham R Shenoy
2017-08-08 12:42     ` Nicholas Piggin
2017-08-10  6:24       ` Gautham R Shenoy
2017-08-10 13:14   ` Michael Ellerman
2017-08-10 13:53     ` Nicholas Piggin
2017-08-05 17:02 ` [PATCH 11/13] powerpc/64s: idle POWER9 can execute stop without ptesync Nicholas Piggin
2017-08-08 10:37   ` Gautham R Shenoy
2017-08-10 13:15   ` Michael Ellerman
2017-08-10 14:03     ` Nicholas Piggin
2017-08-05 17:02 ` [PATCH 12/13] powerpc/64s: idle POWER9 can execute stop in virtual mode Nicholas Piggin
2017-08-05 22:57   ` Benjamin Herrenschmidt
2017-08-08 10:45   ` Gautham R Shenoy
2017-08-05 17:02 ` [PATCH 13/13] powerpc/64s: idle ESL=0 stop can avoid all save/restore overhead Nicholas Piggin
2017-08-05 17:11   ` Nicholas Piggin
2017-08-08 10:49   ` Gautham R Shenoy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).