LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH] pseries/cpuidle: Remove redundant call to ppc64_runlatch_off() in cpu idle routines
From: Preeti U Murthy @ 2014-01-13  4:22 UTC (permalink / raw)
  To: Deepthi Dharwar; +Cc: linuxppc-dev, paulus
In-Reply-To: <52D3640F.2060805@linux.vnet.ibm.com>

Hi Deepthi,

On 01/13/2014 09:27 AM, Deepthi Dharwar wrote:
> On 01/09/2014 10:35 AM, Preeti U Murthy wrote:
>> Commit fbd7740fdfdf9475f switched pseries cpu idle handling from complete idle
>> loops to ppc_md.powersave functions. Earlier to this switch,
>> ppc64_runlatch_off() had to be called in each of the idle routines. But after
>> the switch this call is handled in arch_cpu_idle(),just before the call
>> to ppc_md.powersave, where platform specific idle routines are called.
>>
>> As a consequence, the call to ppc64_runlatch_off() got duplicated in the
>> arch_cpu_idle() routine as well as in the some of the idle routines in
>> pseries and commit fbd7740fdfdf9475f missed to get rid of these redundant
>> calls. These calls were carried over subsequent enhancements to the pseries
>> cpuidle routines. This patch takes care of eliminating this redundancy.
>>
>> Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
>> ---
> 
> Acked-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
> 
> Preeti, I will include this patch as part of the pseries cpuidle driver
> clean-ups series which I have undertaken.

Yes that would be great, thanks!

Regards
Preeti U Murthy
> 
> Regards,
> Deepthi
> 
>>  arch/powerpc/platforms/pseries/processor_idle.c |    3 ---
>>  1 file changed, 3 deletions(-)
>>
>> diff --git a/arch/powerpc/platforms/pseries/processor_idle.c b/arch/powerpc/platforms/pseries/processor_idle.c
>> index a166e38..09e4f56 100644
>> --- a/arch/powerpc/platforms/pseries/processor_idle.c
>> +++ b/arch/powerpc/platforms/pseries/processor_idle.c
>> @@ -17,7 +17,6 @@
>>  #include <asm/reg.h>
>>  #include <asm/machdep.h>
>>  #include <asm/firmware.h>
>> -#include <asm/runlatch.h>
>>  #include <asm/plpar_wrappers.h>
>>
>>  struct cpuidle_driver pseries_idle_driver = {
>> @@ -63,7 +62,6 @@ static int snooze_loop(struct cpuidle_device *dev,
>>  	set_thread_flag(TIF_POLLING_NRFLAG);
>>
>>  	while ((!need_resched()) && cpu_online(cpu)) {
>> -		ppc64_runlatch_off();
>>  		HMT_low();
>>  		HMT_very_low();
>>  	}
>> @@ -103,7 +101,6 @@ static int dedicated_cede_loop(struct cpuidle_device *dev,
>>  	idle_loop_prolog(&in_purr);
>>  	get_lppaca()->donate_dedicated_cpu = 1;
>>
>> -	ppc64_runlatch_off();
>>  	HMT_medium();
>>  	check_and_cede_processor();
>>
>>
>> _______________________________________________
>> Linuxppc-dev mailing list
>> Linuxppc-dev@lists.ozlabs.org
>> https://lists.ozlabs.org/listinfo/linuxppc-dev
>>
> 

^ permalink raw reply

* Re: [PATCH] powerpc/iommu: Don't detach device without IOMMU group
From: Alexey Kardashevskiy @ 2014-01-13  4:54 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev
In-Reply-To: <1389584182-6349-1-git-send-email-shangw@linux.vnet.ibm.com>

On 01/13/2014 02:36 PM, Gavin Shan wrote:
> Some devices, for example PCI root port, don't have IOMMU table and
> group. We needn't detach them from their IOMMU group. Otherwise, it
> potentially incurs kernel crash because of referring NULL IOMMU group
> as following backtrace indicates:
> 
>   .iommu_group_remove_device+0x74/0x1b0
>   .iommu_bus_notifier+0x94/0xb4
>   .notifier_call_chain+0x78/0xe8
>   .__blocking_notifier_call_chain+0x7c/0xbc
>   .blocking_notifier_call_chain+0x38/0x48
>   .device_del+0x50/0x234
>   .pci_remove_bus_device+0x88/0x138
>   .pci_stop_and_remove_bus_device+0x2c/0x40
>   .pcibios_remove_pci_devices+0xcc/0xfc
>   .pcibios_remove_pci_devices+0x3c/0xfc
> 
> Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>

Thanks! I never tested the unplug case...

Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>



> ---
>  arch/powerpc/kernel/iommu.c |   11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
> index 572bb5b..8a49424 100644
> --- a/arch/powerpc/kernel/iommu.c
> +++ b/arch/powerpc/kernel/iommu.c
> @@ -1137,6 +1137,17 @@ static int iommu_add_device(struct device *dev)
>  
>  static void iommu_del_device(struct device *dev)
>  {
> +	/*
> +	 * Some devices might not have IOMMU table and group
> +	 * and we needn't detach them from the associated
> +	 * IOMMU groups
> +	 */
> +	if (!dev->iommu_group) {
> +		pr_debug("iommu_tce: skipping device %s with no tbl\n",
> +			 dev_name(dev));
> +		return;
> +	}
> +
>  	iommu_group_remove_device(dev);
>  }
>  
> 


-- 
Alexey

^ permalink raw reply

* [PATCH 0/3] Transactional memory fixes
From: Paul Mackerras @ 2014-01-13  4:56 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev

This series of patches fixes a couple of cases where bugs in the
handling of FP/VMX/VSX state around transactions can lead to
corruption of the FP/VMX/VSX state visible to user processes.  The
bugs were found basically by inspection but then verified by writing
small test programs that provoke the bugs.

Paul.

 arch/powerpc/include/asm/processor.h   |   2 +
 arch/powerpc/include/asm/thread_info.h |   9 +-
 arch/powerpc/include/asm/tm.h          |   1 +
 arch/powerpc/kernel/entry_64.S         |  12 ++-
 arch/powerpc/kernel/fpu.S              |  16 ++++
 arch/powerpc/kernel/process.c          | 146 ++++++++++++++++++++++++++++++---
 arch/powerpc/kernel/signal.c           |   3 +-
 arch/powerpc/kernel/signal_32.c        |  21 ++---
 arch/powerpc/kernel/signal_64.c        |  14 ++--
 arch/powerpc/kernel/traps.c            |  57 +++++++++----
 arch/powerpc/kernel/vector.S           |  10 +++
 11 files changed, 231 insertions(+), 60 deletions(-)

^ permalink raw reply

* [PATCH 1/3] powerpc: Reclaim two unused thread_info flag bits
From: Paul Mackerras @ 2014-01-13  4:56 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev
In-Reply-To: <1389588990-25953-1-git-send-email-paulus@samba.org>

TIF_PERFMON_WORK and TIF_PERFMON_CTXSW are completely unused.  They
appear to be related to the old perfmon2 code, which has been
superseded by the perf_event infrastructure.  This removes their
definitions so that the bits can be used for other purposes.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/asm/thread_info.h | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/arch/powerpc/include/asm/thread_info.h b/arch/powerpc/include/asm/thread_info.h
index 9854c56..fc2bf41 100644
--- a/arch/powerpc/include/asm/thread_info.h
+++ b/arch/powerpc/include/asm/thread_info.h
@@ -91,8 +91,6 @@ static inline struct thread_info *current_thread_info(void)
 #define TIF_POLLING_NRFLAG	3	/* true if poll_idle() is polling
 					   TIF_NEED_RESCHED */
 #define TIF_32BIT		4	/* 32 bit binary */
-#define TIF_PERFMON_WORK	5	/* work for pfm_handle_work() */
-#define TIF_PERFMON_CTXSW	6	/* perfmon needs ctxsw calls */
 #define TIF_SYSCALL_AUDIT	7	/* syscall auditing active */
 #define TIF_SINGLESTEP		8	/* singlestepping active */
 #define TIF_NOHZ		9	/* in adaptive nohz mode */
@@ -115,8 +113,6 @@ static inline struct thread_info *current_thread_info(void)
 #define _TIF_NEED_RESCHED	(1<<TIF_NEED_RESCHED)
 #define _TIF_POLLING_NRFLAG	(1<<TIF_POLLING_NRFLAG)
 #define _TIF_32BIT		(1<<TIF_32BIT)
-#define _TIF_PERFMON_WORK	(1<<TIF_PERFMON_WORK)
-#define _TIF_PERFMON_CTXSW	(1<<TIF_PERFMON_CTXSW)
 #define _TIF_SYSCALL_AUDIT	(1<<TIF_SYSCALL_AUDIT)
 #define _TIF_SINGLESTEP		(1<<TIF_SINGLESTEP)
 #define _TIF_SECCOMP		(1<<TIF_SECCOMP)
-- 
1.8.4.2

^ permalink raw reply related

* [PATCH 2/3] powerpc: Don't corrupt transactional state when using FP/VMX in kernel
From: Paul Mackerras @ 2014-01-13  4:56 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev
In-Reply-To: <1389588990-25953-1-git-send-email-paulus@samba.org>

Currently, when we have a process using the transactional memory
facilities on POWER8 (that is, the processor is in transactional
or suspended state), and the process enters the kernel and the
kernel then uses the floating-point or vector (VMX/Altivec) facility,
we end up corrupting the user-visible FP/VMX/VSX state.  This
happens, for example, if a page fault causes a copy-on-write
operation, because the copy_page function will use VMX to do the
copy on POWER8.  The test program below demonstrates the bug.

The bug happens because when FP/VMX state for a transactional process
is stored in the thread_struct, we store the checkpointed state in
.fp_state/.vr_state and the transactional (current) state in
.transact_fp/.transact_vr.  However, when the kernel wants to use
FP/VMX, it calls enable_kernel_fp() or enable_kernel_altivec(),
which saves the current state in .fp_state/.vr_state.  Furthermore,
when we return to the user process we return with FP/VMX/VSX
disabled.  The next time the process uses FP/VMX/VSX, we don't know
which set of state (the current register values, .fp_state/.vr_state,
or .transact_fp/.transact_vr) we should be using, since we have no
way to tell if we are still in the same transaction, and if not,
whether the previous transaction succeeded or failed.

Thus it is necessary to strictly adhere to the rule that if FP has
been enabled at any point in a transaction, we must keep FP enabled
for the user process with the current transactional state in the
FP registers, until we detect that it is no longer in a transaction.
Similarly for VMX; once enabled it must stay enabled until the
process is no longer transactional.

In order to keep this rule, we add a new thread_info flag which we
test when returning from the kernel to userspace, called TIF_RESTORE_TM.
This flag indicates that there is FP/VMX/VSX state to be restored
before entering userspace, and when it is set the .tm_orig_msr field
in the thread_struct indicates what state needs to be restored.
The restoration is done by restore_tm_state().  The TIF_RESTORE_TM
bit is set by new giveup_fpu/altivec_maybe_transactional helpers,
which are called from enable_kernel_fp/altivec, giveup_vsx, and
flush_fp/altivec_to_thread instead of giveup_fpu/altivec.

The other thing to be done is to get the transactional FP/VMX/VSX
state from .fp_state/.vr_state when doing reclaim, if that state
has been saved there by giveup_fpu/altivec_maybe_transactional.
Having done this, we set the FP/VMX bit in the thread's MSR after
reclaim to indicate that that part of the state is now valid
(having been reclaimed from the processor's checkpointed state).

Finally, in the signal handling code, we move the clearing of the
transactional state bits in the thread's MSR a bit earlier, before
calling flush_fp_to_thread(), so that we don't unnecessarily set
the TIF_RESTORE_TM bit.

This is the test program:

/* Michael Neuling 4/12/2013
 *
 * See if the altivec state is leaked out of an aborted transaction due to
 * kernel vmx copy loops.
 *
 *   gcc -m64 htm_vmxcopy.c -o htm_vmxcopy
 *
 */

/* We don't use all of these, but for reference: */

int main(int argc, char *argv[])
{
	long double vecin = 1.3;
	long double vecout;
	unsigned long pgsize = getpagesize();
	int i;
	int fd;
	int size = pgsize*16;
	char tmpfile[] = "/tmp/page_faultXXXXXX";
	char buf[pgsize];
	char *a;
	uint64_t aborted = 0;

	fd = mkstemp(tmpfile);
	assert(fd >= 0);

	memset(buf, 0, pgsize);
	for (i = 0; i < size; i += pgsize)
		assert(write(fd, buf, pgsize) == pgsize);

	unlink(tmpfile);

	a = mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0);
	assert(a != MAP_FAILED);

	asm __volatile__(
		"lxvd2x 40,0,%[vecinptr] ; " // set 40 to initial value
		TBEGIN
		"beq	3f ;"
		TSUSPEND
		"xxlxor 40,40,40 ; " // set 40 to 0
		"std	5, 0(%[map]) ;" // cause kernel vmx copy page
		TABORT
		TRESUME
		TEND
		"li	%[res], 0 ;"
		"b	5f ;"
		"3: ;" // Abort handler
		"li	%[res], 1 ;"
		"5: ;"
		"stxvd2x 40,0,%[vecoutptr] ; "
		: [res]"=r"(aborted)
		: [vecinptr]"r"(&vecin),
		  [vecoutptr]"r"(&vecout),
		  [map]"r"(a)
		: "memory", "r0", "r3", "r4", "r5", "r6", "r7");

	if (aborted && (vecin != vecout)){
		printf("FAILED: vector state leaked on abort %f != %f\n",
		       (double)vecin, (double)vecout);
		exit(1);
	}

	munmap(a, size);

	close(fd);

	printf("PASSED!\n");
	return 0;
}

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/asm/processor.h   |   2 +
 arch/powerpc/include/asm/thread_info.h |   5 +-
 arch/powerpc/include/asm/tm.h          |   1 +
 arch/powerpc/kernel/entry_64.S         |  12 ++-
 arch/powerpc/kernel/fpu.S              |  16 ++++
 arch/powerpc/kernel/process.c          | 146 ++++++++++++++++++++++++++++++---
 arch/powerpc/kernel/signal.c           |   3 +-
 arch/powerpc/kernel/signal_32.c        |  21 ++---
 arch/powerpc/kernel/signal_64.c        |  14 ++--
 arch/powerpc/kernel/traps.c            |  12 +--
 arch/powerpc/kernel/vector.S           |  10 +++
 11 files changed, 195 insertions(+), 47 deletions(-)

diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
index fc14a38..232a2fa 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -373,6 +373,8 @@ extern int set_endian(struct task_struct *tsk, unsigned int val);
 extern int get_unalign_ctl(struct task_struct *tsk, unsigned long adr);
 extern int set_unalign_ctl(struct task_struct *tsk, unsigned int val);
 
+extern void fp_enable(void);
+extern void vec_enable(void);
 extern void load_fp_state(struct thread_fp_state *fp);
 extern void store_fp_state(struct thread_fp_state *fp);
 extern void load_vr_state(struct thread_vr_state *vr);
diff --git a/arch/powerpc/include/asm/thread_info.h b/arch/powerpc/include/asm/thread_info.h
index fc2bf41..b034ecd 100644
--- a/arch/powerpc/include/asm/thread_info.h
+++ b/arch/powerpc/include/asm/thread_info.h
@@ -91,6 +91,7 @@ static inline struct thread_info *current_thread_info(void)
 #define TIF_POLLING_NRFLAG	3	/* true if poll_idle() is polling
 					   TIF_NEED_RESCHED */
 #define TIF_32BIT		4	/* 32 bit binary */
+#define TIF_RESTORE_TM		5	/* need to restore TM FP/VEC/VSX */
 #define TIF_SYSCALL_AUDIT	7	/* syscall auditing active */
 #define TIF_SINGLESTEP		8	/* singlestepping active */
 #define TIF_NOHZ		9	/* in adaptive nohz mode */
@@ -113,6 +114,7 @@ static inline struct thread_info *current_thread_info(void)
 #define _TIF_NEED_RESCHED	(1<<TIF_NEED_RESCHED)
 #define _TIF_POLLING_NRFLAG	(1<<TIF_POLLING_NRFLAG)
 #define _TIF_32BIT		(1<<TIF_32BIT)
+#define _TIF_RESTORE_TM		(1<<TIF_RESTORE_TM)
 #define _TIF_SYSCALL_AUDIT	(1<<TIF_SYSCALL_AUDIT)
 #define _TIF_SINGLESTEP		(1<<TIF_SINGLESTEP)
 #define _TIF_SECCOMP		(1<<TIF_SECCOMP)
@@ -128,7 +130,8 @@ static inline struct thread_info *current_thread_info(void)
 				 _TIF_NOHZ)
 
 #define _TIF_USER_WORK_MASK	(_TIF_SIGPENDING | _TIF_NEED_RESCHED | \
-				 _TIF_NOTIFY_RESUME | _TIF_UPROBE)
+				 _TIF_NOTIFY_RESUME | _TIF_UPROBE | \
+				 _TIF_RESTORE_TM)
 #define _TIF_PERSYSCALL_MASK	(_TIF_RESTOREALL|_TIF_NOERROR)
 
 /* Bits in local_flags */
diff --git a/arch/powerpc/include/asm/tm.h b/arch/powerpc/include/asm/tm.h
index 9dfbc34..0c9f8b7 100644
--- a/arch/powerpc/include/asm/tm.h
+++ b/arch/powerpc/include/asm/tm.h
@@ -15,6 +15,7 @@ extern void do_load_up_transact_altivec(struct thread_struct *thread);
 extern void tm_enable(void);
 extern void tm_reclaim(struct thread_struct *thread,
 		       unsigned long orig_msr, uint8_t cause);
+extern void tm_reclaim_current(uint8_t cause);
 extern void tm_recheckpoint(struct thread_struct *thread,
 			    unsigned long orig_msr);
 extern void tm_abort(uint8_t cause);
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index bbfb029..662c6dd 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -664,8 +664,16 @@ _GLOBAL(ret_from_except_lite)
 	bl	.restore_interrupts
 	SCHEDULE_USER
 	b	.ret_from_except_lite
-
-2:	bl	.save_nvgprs
+2:
+#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
+	andi.	r0,r4,_TIF_USER_WORK_MASK & ~_TIF_RESTORE_TM
+	bne	3f		/* only restore TM if nothing else to do */
+	addi	r3,r1,STACK_FRAME_OVERHEAD
+	bl	.restore_tm_state
+	b	restore
+3:
+#endif
+	bl	.save_nvgprs
 	bl	.restore_interrupts
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	.do_notify_resume
diff --git a/arch/powerpc/kernel/fpu.S b/arch/powerpc/kernel/fpu.S
index f7f5b8b..9ad236e 100644
--- a/arch/powerpc/kernel/fpu.S
+++ b/arch/powerpc/kernel/fpu.S
@@ -81,6 +81,22 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX)
 #endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
 
 /*
+ * Enable use of the FPU, and VSX if possible, for the caller.
+ */
+_GLOBAL(fp_enable)
+	mfmsr	r3
+	ori	r3,r3,MSR_FP
+#ifdef CONFIG_VSX
+BEGIN_FTR_SECTION
+	oris	r3,r3,MSR_VSX@h
+END_FTR_SECTION_IFSET(CPU_FTR_VSX)
+#endif
+	SYNC
+	MTMSRD(r3)
+	isync			/* (not necessary for arch 2.02 and later) */
+	blr
+
+/*
  * Load state from memory into FP registers including FPSCR.
  * Assumes the caller has enabled FP in the MSR.
  */
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 4a96556..51acc39 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -74,6 +74,48 @@ struct task_struct *last_task_used_vsx = NULL;
 struct task_struct *last_task_used_spe = NULL;
 #endif
 
+#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
+void giveup_fpu_maybe_transactional(struct task_struct *tsk)
+{
+	/*
+	 * If we are saving the current thread's registers, and the
+	 * thread is in a transactional state, set the TIF_RESTORE_TM
+	 * bit so that we know to restore the registers before
+	 * returning to userspace.
+	 */
+	if (tsk == current && tsk->thread.regs &&
+	    MSR_TM_ACTIVE(tsk->thread.regs->msr) &&
+	    !test_thread_flag(TIF_RESTORE_TM)) {
+		tsk->thread.tm_orig_msr = tsk->thread.regs->msr;
+		set_thread_flag(TIF_RESTORE_TM);
+	}
+
+	giveup_fpu(tsk);
+}
+
+void giveup_altivec_maybe_transactional(struct task_struct *tsk)
+{
+	/*
+	 * If we are saving the current thread's registers, and the
+	 * thread is in a transactional state, set the TIF_RESTORE_TM
+	 * bit so that we know to restore the registers before
+	 * returning to userspace.
+	 */
+	if (tsk == current && tsk->thread.regs &&
+	    MSR_TM_ACTIVE(tsk->thread.regs->msr) &&
+	    !test_thread_flag(TIF_RESTORE_TM)) {
+		tsk->thread.tm_orig_msr = tsk->thread.regs->msr;
+		set_thread_flag(TIF_RESTORE_TM);
+	}
+
+	giveup_altivec(tsk);
+}
+
+#else
+#define giveup_fpu_maybe_transactional(tsk)	giveup_fpu(tsk)
+#define giveup_altivec_maybe_transactional(tsk)	giveup_altivec(tsk)
+#endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
+
 #ifdef CONFIG_PPC_FPU
 /*
  * Make sure the floating-point register state in the
@@ -102,13 +144,13 @@ void flush_fp_to_thread(struct task_struct *tsk)
 			 */
 			BUG_ON(tsk != current);
 #endif
-			giveup_fpu(tsk);
+			giveup_fpu_maybe_transactional(tsk);
 		}
 		preempt_enable();
 	}
 }
 EXPORT_SYMBOL_GPL(flush_fp_to_thread);
-#endif
+#endif /* CONFIG_PPC_FPU */
 
 void enable_kernel_fp(void)
 {
@@ -116,11 +158,11 @@ void enable_kernel_fp(void)
 
 #ifdef CONFIG_SMP
 	if (current->thread.regs && (current->thread.regs->msr & MSR_FP))
-		giveup_fpu(current);
+		giveup_fpu_maybe_transactional(current);
 	else
 		giveup_fpu(NULL);	/* just enables FP for kernel */
 #else
-	giveup_fpu(last_task_used_math);
+	giveup_fpu_maybe_transactional(last_task_used_math);
 #endif /* CONFIG_SMP */
 }
 EXPORT_SYMBOL(enable_kernel_fp);
@@ -132,11 +174,11 @@ void enable_kernel_altivec(void)
 
 #ifdef CONFIG_SMP
 	if (current->thread.regs && (current->thread.regs->msr & MSR_VEC))
-		giveup_altivec(current);
+		giveup_altivec_maybe_transactional(current);
 	else
 		giveup_altivec_notask();
 #else
-	giveup_altivec(last_task_used_altivec);
+	giveup_altivec_maybe_transactional(last_task_used_altivec);
 #endif /* CONFIG_SMP */
 }
 EXPORT_SYMBOL(enable_kernel_altivec);
@@ -153,7 +195,7 @@ void flush_altivec_to_thread(struct task_struct *tsk)
 #ifdef CONFIG_SMP
 			BUG_ON(tsk != current);
 #endif
-			giveup_altivec(tsk);
+			giveup_altivec_maybe_transactional(tsk);
 		}
 		preempt_enable();
 	}
@@ -182,8 +224,8 @@ EXPORT_SYMBOL(enable_kernel_vsx);
 
 void giveup_vsx(struct task_struct *tsk)
 {
-	giveup_fpu(tsk);
-	giveup_altivec(tsk);
+	giveup_fpu_maybe_transactional(tsk);
+	giveup_altivec_maybe_transactional(tsk);
 	__giveup_vsx(tsk);
 }
 
@@ -479,7 +521,48 @@ static inline bool hw_brk_match(struct arch_hw_breakpoint *a,
 		return false;
 	return true;
 }
+
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
+static void tm_reclaim_thread(struct thread_struct *thr,
+			      struct thread_info *ti, uint8_t cause)
+{
+	unsigned long msr_diff = 0;
+
+	/*
+	 * If FP/VSX registers have been already saved to the
+	 * thread_struct, move them to the transact_fp array.
+	 * We clear the TIF_RESTORE_TM bit since after the reclaim
+	 * the thread will no longer be transactional.
+	 */
+	if (test_ti_thread_flag(ti, TIF_RESTORE_TM)) {
+		msr_diff = thr->tm_orig_msr & ~thr->regs->msr;
+		if (msr_diff & MSR_FP)
+			memcpy(&thr->transact_fp, &thr->fp_state,
+			       sizeof(struct thread_fp_state));
+		if (msr_diff & MSR_VEC)
+			memcpy(&thr->transact_vr, &thr->vr_state,
+			       sizeof(struct thread_vr_state));
+		clear_ti_thread_flag(ti, TIF_RESTORE_TM);
+		msr_diff &= MSR_FP | MSR_VEC | MSR_VSX | MSR_FE0 | MSR_FE1;
+	}
+
+	tm_reclaim(thr, thr->regs->msr, cause);
+
+	/* Having done the reclaim, we now have the checkpointed
+	 * FP/VSX values in the registers.  These might be valid
+	 * even if we have previously called enable_kernel_fp() or
+	 * flush_fp_to_thread(), so update thr->regs->msr to
+	 * indicate their current validity.
+	 */
+	thr->regs->msr |= msr_diff;
+}
+
+void tm_reclaim_current(uint8_t cause)
+{
+	tm_enable();
+	tm_reclaim_thread(&current->thread, current_thread_info(), cause);
+}
+
 static inline void tm_reclaim_task(struct task_struct *tsk)
 {
 	/* We have to work out if we're switching from/to a task that's in the
@@ -502,9 +585,11 @@ static inline void tm_reclaim_task(struct task_struct *tsk)
 
 	/* Stash the original thread MSR, as giveup_fpu et al will
 	 * modify it.  We hold onto it to see whether the task used
-	 * FP & vector regs.
+	 * FP & vector regs.  If the TIF_RESTORE_TM flag is set,
+	 * tm_orig_msr is already set.
 	 */
-	thr->tm_orig_msr = thr->regs->msr;
+	if (!test_ti_thread_flag(task_thread_info(tsk), TIF_RESTORE_TM))
+		thr->tm_orig_msr = thr->regs->msr;
 
 	TM_DEBUG("--- tm_reclaim on pid %d (NIP=%lx, "
 		 "ccr=%lx, msr=%lx, trap=%lx)\n",
@@ -512,7 +597,7 @@ static inline void tm_reclaim_task(struct task_struct *tsk)
 		 thr->regs->ccr, thr->regs->msr,
 		 thr->regs->trap);
 
-	tm_reclaim(thr, thr->regs->msr, TM_CAUSE_RESCHED);
+	tm_reclaim_thread(thr, task_thread_info(tsk), TM_CAUSE_RESCHED);
 
 	TM_DEBUG("--- tm_reclaim on pid %d complete\n",
 		 tsk->pid);
@@ -588,6 +673,43 @@ static inline void __switch_to_tm(struct task_struct *prev)
 		tm_reclaim_task(prev);
 	}
 }
+
+/*
+ * This is called if we are on the way out to userspace and the
+ * TIF_RESTORE_TM flag is set.  It checks if we need to reload
+ * FP and/or vector state and does so if necessary.
+ * If userspace is inside a transaction (whether active or
+ * suspended) and FP/VMX/VSX instructions have ever been enabled
+ * inside that transaction, then we have to keep them enabled
+ * and keep the FP/VMX/VSX state loaded while ever the transaction
+ * continues.  The reason is that if we didn't, and subsequently
+ * got a FP/VMX/VSX unavailable interrupt inside a transaction,
+ * we don't know whether it's the same transaction, and thus we
+ * don't know which of the checkpointed state and the transactional
+ * state to use.
+ */
+void restore_tm_state(struct pt_regs *regs)
+{
+	unsigned long msr_diff;
+
+	clear_thread_flag(TIF_RESTORE_TM);
+	if (!MSR_TM_ACTIVE(regs->msr))
+		return;
+
+	msr_diff = current->thread.tm_orig_msr & ~regs->msr;
+	msr_diff &= MSR_FP | MSR_VEC | MSR_VSX;
+	if (msr_diff & MSR_FP) {
+		fp_enable();
+		load_fp_state(&current->thread.fp_state);
+		regs->msr |= current->thread.fpexc_mode;
+	}
+	if (msr_diff & MSR_VEC) {
+		vec_enable();
+		load_vr_state(&current->thread.vr_state);
+	}
+	regs->msr |= msr_diff;
+}
+
 #else
 #define tm_recheckpoint_new_task(new)
 #define __switch_to_tm(prev)
diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
index 457e97a..8fc4177 100644
--- a/arch/powerpc/kernel/signal.c
+++ b/arch/powerpc/kernel/signal.c
@@ -203,8 +203,7 @@ unsigned long get_tm_stackpointer(struct pt_regs *regs)
 
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
 	if (MSR_TM_ACTIVE(regs->msr)) {
-		tm_enable();
-		tm_reclaim(&current->thread, regs->msr, TM_CAUSE_SIGNAL);
+		tm_reclaim_current(TM_CAUSE_SIGNAL);
 		if (MSR_TM_TRANSACTIONAL(regs->msr))
 			return current->thread.ckpt_regs.gpr[1];
 	}
diff --git a/arch/powerpc/kernel/signal_32.c b/arch/powerpc/kernel/signal_32.c
index 68027bf..6ce69e6 100644
--- a/arch/powerpc/kernel/signal_32.c
+++ b/arch/powerpc/kernel/signal_32.c
@@ -519,6 +519,13 @@ static int save_tm_user_regs(struct pt_regs *regs,
 {
 	unsigned long msr = regs->msr;
 
+	/* Remove TM bits from thread's MSR.  The MSR in the sigcontext
+	 * just indicates to userland that we were doing a transaction, but we
+	 * don't want to return in transactional state.  This also ensures
+	 * that flush_fp_to_thread won't set TIF_RESTORE_TM again.
+	 */
+	regs->msr &= ~MSR_TS_MASK;
+
 	/* Make sure floating point registers are stored in regs */
 	flush_fp_to_thread(current);
 
@@ -1056,13 +1063,6 @@ int handle_rt_signal32(unsigned long sig, struct k_sigaction *ka,
 	/* enter the signal handler in native-endian mode */
 	regs->msr &= ~MSR_LE;
 	regs->msr |= (MSR_KERNEL & MSR_LE);
-#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
-	/* Remove TM bits from thread's MSR.  The MSR in the sigcontext
-	 * just indicates to userland that we were doing a transaction, but we
-	 * don't want to return in transactional state:
-	 */
-	regs->msr &= ~MSR_TS_MASK;
-#endif
 	return 1;
 
 badframe:
@@ -1484,13 +1484,6 @@ int handle_signal32(unsigned long sig, struct k_sigaction *ka,
 	regs->nip = (unsigned long) ka->sa.sa_handler;
 	/* enter the signal handler in big-endian mode */
 	regs->msr &= ~MSR_LE;
-#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
-	/* Remove TM bits from thread's MSR.  The MSR in the sigcontext
-	 * just indicates to userland that we were doing a transaction, but we
-	 * don't want to return in transactional state:
-	 */
-	regs->msr &= ~MSR_TS_MASK;
-#endif
 	return 1;
 
 badframe:
diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
index 4299104..e35bf77 100644
--- a/arch/powerpc/kernel/signal_64.c
+++ b/arch/powerpc/kernel/signal_64.c
@@ -192,6 +192,13 @@ static long setup_tm_sigcontexts(struct sigcontext __user *sc,
 
 	BUG_ON(!MSR_TM_ACTIVE(regs->msr));
 
+	/* Remove TM bits from thread's MSR.  The MSR in the sigcontext
+	 * just indicates to userland that we were doing a transaction, but we
+	 * don't want to return in transactional state.  This also ensures
+	 * that flush_fp_to_thread won't set TIF_RESTORE_TM again.
+	 */
+	regs->msr &= ~MSR_TS_MASK;
+
 	flush_fp_to_thread(current);
 
 #ifdef CONFIG_ALTIVEC
@@ -749,13 +756,6 @@ int handle_rt_signal64(int signr, struct k_sigaction *ka, siginfo_t *info,
 
 	/* Make sure signal handler doesn't get spurious FP exceptions */
 	current->thread.fp_state.fpscr = 0;
-#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
-	/* Remove TM bits from thread's MSR.  The MSR in the sigcontext
-	 * just indicates to userland that we were doing a transaction, but we
-	 * don't want to return in transactional state:
-	 */
-	regs->msr &= ~MSR_TS_MASK;
-#endif
 
 	/* Set up to return from userspace. */
 	if (vdso64_rt_sigtramp && current->mm->context.vdso_base) {
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 907a472..b543587 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -1384,7 +1384,6 @@ void fp_unavailable_tm(struct pt_regs *regs)
 
 	TM_DEBUG("FP Unavailable trap whilst transactional at 0x%lx, MSR=%lx\n",
 		 regs->nip, regs->msr);
-	tm_enable();
 
         /* We can only have got here if the task started using FP after
          * beginning the transaction.  So, the transactional regs are just a
@@ -1393,8 +1392,7 @@ void fp_unavailable_tm(struct pt_regs *regs)
          * transaction, and probably retry but now with FP enabled.  So the
          * checkpointed FP registers need to be loaded.
 	 */
-	tm_reclaim(&current->thread, current->thread.regs->msr,
-		   TM_CAUSE_FAC_UNAV);
+	tm_reclaim_current(TM_CAUSE_FAC_UNAV);
 	/* Reclaim didn't save out any FPRs to transact_fprs. */
 
 	/* Enable FP for the task: */
@@ -1417,9 +1415,7 @@ void altivec_unavailable_tm(struct pt_regs *regs)
 	TM_DEBUG("Vector Unavailable trap whilst transactional at 0x%lx,"
 		 "MSR=%lx\n",
 		 regs->nip, regs->msr);
-	tm_enable();
-	tm_reclaim(&current->thread, current->thread.regs->msr,
-		   TM_CAUSE_FAC_UNAV);
+	tm_reclaim_current(TM_CAUSE_FAC_UNAV);
 	regs->msr |= MSR_VEC;
 	tm_recheckpoint(&current->thread, regs->msr);
 	current->thread.used_vr = 1;
@@ -1440,10 +1436,8 @@ void vsx_unavailable_tm(struct pt_regs *regs)
 		 "MSR=%lx\n",
 		 regs->nip, regs->msr);
 
-	tm_enable();
 	/* This reclaims FP and/or VR regs if they're already enabled */
-	tm_reclaim(&current->thread, current->thread.regs->msr,
-		   TM_CAUSE_FAC_UNAV);
+	tm_reclaim_current(TM_CAUSE_FAC_UNAV);
 
 	regs->msr |= MSR_VEC | MSR_FP | current->thread.fpexc_mode |
 		MSR_VSX;
diff --git a/arch/powerpc/kernel/vector.S b/arch/powerpc/kernel/vector.S
index 0458a9a..74f8050 100644
--- a/arch/powerpc/kernel/vector.S
+++ b/arch/powerpc/kernel/vector.S
@@ -37,6 +37,16 @@ _GLOBAL(do_load_up_transact_altivec)
 #endif
 
 /*
+ * Enable use of VMX/Altivec for the caller.
+ */
+_GLOBAL(vec_enable)
+	mfmsr	r3
+	oris	r3,r3,MSR_VEC@h
+	MTMSRD(r3)
+	isync
+	blr
+
+/*
  * Load state from memory into VMX registers including VSCR.
  * Assumes the caller has enabled VMX in the MSR.
  */
-- 
1.8.4.2

^ permalink raw reply related

* [PATCH 3/3] powerpc: Fix transactional FP/VMX/VSX unavailable handlers
From: Paul Mackerras @ 2014-01-13  4:56 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev
In-Reply-To: <1389588990-25953-1-git-send-email-paulus@samba.org>

Currently, if a process starts a transaction and then takes an
exception because the FPU, VMX or VSX unit is unavailable to it,
we end up corrupting any FP/VMX/VSX state that was valid before
the interrupt.  For example, if the process starts a transaction
with the FPU available to it but VMX unavailable, and then does
a VMX instruction inside the transaction, the FP state gets
corrupted.

Loading up the desired state generally involves doing a reclaim
and a recheckpoint.  To avoid corrupting already-valid state, we have
to be careful not to reload that state from the thread_struct
between the reclaim and the recheckpoint (since the thread_struct
values are stale by now), and we have to reload that state from
the transact_fp/vr arrays after the recheckpoint to get back the
current transactional values saved there by the reclaim.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/kernel/traps.c | 45 ++++++++++++++++++++++++++++++++++++---------
 1 file changed, 36 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index b543587..50a7ec3 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -1401,11 +1401,19 @@ void fp_unavailable_tm(struct pt_regs *regs)
 	/* This loads and recheckpoints the FP registers from
 	 * thread.fpr[].  They will remain in registers after the
 	 * checkpoint so we don't need to reload them after.
+	 * If VMX is in use, the VRs now hold checkpointed values,
+	 * so we don't want to load the VRs from the thread_struct.
 	 */
-	tm_recheckpoint(&current->thread, regs->msr);
+	tm_recheckpoint(&current->thread, MSR_FP);
+
+	/* If VMX is in use, get the transactional values back */
+	if (regs->msr & MSR_VEC) {
+		do_load_up_transact_altivec(&current->thread);
+		/* At this point all the VSX state is loaded, so enable it */
+		regs->msr |= MSR_VSX;
+	}
 }
 
-#ifdef CONFIG_ALTIVEC
 void altivec_unavailable_tm(struct pt_regs *regs)
 {
 	/* See the comments in fp_unavailable_tm().  This function operates
@@ -1417,14 +1425,19 @@ void altivec_unavailable_tm(struct pt_regs *regs)
 		 regs->nip, regs->msr);
 	tm_reclaim_current(TM_CAUSE_FAC_UNAV);
 	regs->msr |= MSR_VEC;
-	tm_recheckpoint(&current->thread, regs->msr);
+	tm_recheckpoint(&current->thread, MSR_VEC);
 	current->thread.used_vr = 1;
+
+	if (regs->msr & MSR_FP) {
+		do_load_up_transact_fpu(&current->thread);
+		regs->msr |= MSR_VSX;
+	}
 }
-#endif
 
-#ifdef CONFIG_VSX
 void vsx_unavailable_tm(struct pt_regs *regs)
 {
+	unsigned long orig_msr = regs->msr;
+
 	/* See the comments in fp_unavailable_tm().  This works similarly,
 	 * though we're loading both FP and VEC registers in here.
 	 *
@@ -1436,16 +1449,30 @@ void vsx_unavailable_tm(struct pt_regs *regs)
 		 "MSR=%lx\n",
 		 regs->nip, regs->msr);
 
+	current->thread.used_vsr = 1;
+
+	/* If FP and VMX are already loaded, we have all the state we need */
+	if ((orig_msr & (MSR_FP | MSR_VEC)) == (MSR_FP | MSR_VEC)) {
+		regs->msr |= MSR_VSX;
+		return;
+	}
+
 	/* This reclaims FP and/or VR regs if they're already enabled */
 	tm_reclaim_current(TM_CAUSE_FAC_UNAV);
 
 	regs->msr |= MSR_VEC | MSR_FP | current->thread.fpexc_mode |
 		MSR_VSX;
-	/* This loads & recheckpoints FP and VRs. */
-	tm_recheckpoint(&current->thread, regs->msr);
-	current->thread.used_vsr = 1;
+
+	/* This loads & recheckpoints FP and VRs; but we have
+	 * to be sure not to overwrite previously-valid state.
+	 */
+	tm_recheckpoint(&current->thread, regs->msr & ~orig_msr);
+
+	if (orig_msr & MSR_FP)
+		do_load_up_transact_fpu(&current->thread);
+	if (orig_msr & MSR_VEC)
+		do_load_up_transact_altivec(&current->thread);
 }
-#endif
 #endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
 
 void performance_monitor_exception(struct pt_regs *regs)
-- 
1.8.4.2

^ permalink raw reply related

* Re: [PATCH] pseries/cpuidle: Remove redundant call to ppc64_runlatch_off() in cpu idle routines
From: Michael Ellerman @ 2014-01-13  5:38 UTC (permalink / raw)
  To: Preeti U Murthy; +Cc: linuxppc-dev, paulus, deepthi
In-Reply-To: <20140109050519.11532.6044.stgit@preeti.in.ibm.com>

On Thu, 2014-01-09 at 10:35 +0530, Preeti U Murthy wrote:
> Commit fbd7740fdfdf9475f switched pseries cpu idle handling from complete idle
> loops to ppc_md.powersave functions. Earlier to this switch,
> ppc64_runlatch_off() had to be called in each of the idle routines. But after
> the switch this call is handled in arch_cpu_idle(),just before the call
> to ppc_md.powersave, where platform specific idle routines are called.
> 
> As a consequence, the call to ppc64_runlatch_off() got duplicated in the
> arch_cpu_idle() routine as well as in the some of the idle routines in
> pseries and commit fbd7740fdfdf9475f missed to get rid of these redundant
> calls. These calls were carried over subsequent enhancements to the pseries
> cpuidle routines. This patch takes care of eliminating this redundancy.

It's "obvious" that turning the runlatch off multiple times is harmless,
although it adds extra overhead, but please spell that out in the changelog.

cheers

^ permalink raw reply

* [PATCH V4] powerpc: thp: Fix crash on mremap
From: Aneesh Kumar K.V @ 2014-01-13  6:04 UTC (permalink / raw)
  To: benh, paulus, aarcange, kirill.shutemov
  Cc: linux-mm, linuxppc-dev, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

This patch fix the below crash

NIP [c00000000004cee4] .__hash_page_thp+0x2a4/0x440
LR [c0000000000439ac] .hash_page+0x18c/0x5e0
...
Call Trace:
[c000000736103c40] [00001ffffb000000] 0x1ffffb000000(unreliable)
[437908.479693] [c000000736103d50] [c0000000000439ac] .hash_page+0x18c/0x5e0
[437908.479699] [c000000736103e30] [c00000000000924c] .do_hash_page+0x4c/0x58

On ppc64 we use the pgtable for storing the hpte slot information and
store address to the pgtable at a constant offset (PTRS_PER_PMD) from
pmd. On mremap, when we switch the pmd, we need to withdraw and deposit
the pgtable again, so that we find the pgtable at PTRS_PER_PMD offset
from new pmd.

We also want to move the withdraw and deposit before the set_pmd so
that, when page fault find the pmd as trans huge we can be sure that
pgtable can be located at the offset.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
Changes from V3:
* Drop "powerpc: mm: Move ppc64 page table range definitions to separate header"" patch

 arch/powerpc/include/asm/pgtable-ppc64.h | 14 ++++++++++++++
 include/asm-generic/pgtable.h            | 12 ++++++++++++
 mm/huge_memory.c                         | 14 +++++---------
 3 files changed, 31 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h b/arch/powerpc/include/asm/pgtable-ppc64.h
index 4a191c472867..d27960c89a71 100644
--- a/arch/powerpc/include/asm/pgtable-ppc64.h
+++ b/arch/powerpc/include/asm/pgtable-ppc64.h
@@ -558,5 +558,19 @@ extern pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp);
 #define __HAVE_ARCH_PMDP_INVALIDATE
 extern void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
 			    pmd_t *pmdp);
+
+#define pmd_move_must_withdraw pmd_move_must_withdraw
+typedef struct spinlock spinlock_t;
+static inline int pmd_move_must_withdraw(spinlock_t *new_pmd_ptl,
+					 spinlock_t *old_pmd_ptl)
+{
+	/*
+	 * Archs like ppc64 use pgtable to store per pmd
+	 * specific information. So when we switch the pmd,
+	 * we should also withdraw and deposit the pgtable
+	 */
+	return true;
+}
+
 #endif /* __ASSEMBLY__ */
 #endif /* _ASM_POWERPC_PGTABLE_PPC64_H_ */
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index db0923458940..8e4f41d9af4d 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -558,6 +558,18 @@ static inline pmd_t pmd_read_atomic(pmd_t *pmdp)
 }
 #endif
 
+#ifndef pmd_move_must_withdraw
+static inline int pmd_move_must_withdraw(spinlock_t *new_pmd_ptl,
+					 spinlock_t *old_pmd_ptl)
+{
+	/*
+	 * With split pmd lock we also need to move preallocated
+	 * PTE page table if new_pmd is on different PMD page table.
+	 */
+	return new_pmd_ptl != old_pmd_ptl;
+}
+#endif
+
 /*
  * This function is meant to be used by sites walking pagetables with
  * the mmap_sem hold in read mode to protect against MADV_DONTNEED and
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 95d1acb0f3d2..5d80c53b87cb 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1502,19 +1502,15 @@ int move_huge_pmd(struct vm_area_struct *vma, struct vm_area_struct *new_vma,
 			spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING);
 		pmd = pmdp_get_and_clear(mm, old_addr, old_pmd);
 		VM_BUG_ON(!pmd_none(*new_pmd));
-		set_pmd_at(mm, new_addr, new_pmd, pmd_mksoft_dirty(pmd));
-		if (new_ptl != old_ptl) {
-			pgtable_t pgtable;
 
-			/*
-			 * Move preallocated PTE page table if new_pmd is on
-			 * different PMD page table.
-			 */
+		if (pmd_move_must_withdraw(new_ptl, old_ptl)) {
+			pgtable_t pgtable;
 			pgtable = pgtable_trans_huge_withdraw(mm, old_pmd);
 			pgtable_trans_huge_deposit(mm, new_pmd, pgtable);
-
-			spin_unlock(new_ptl);
 		}
+		set_pmd_at(mm, new_addr, new_pmd, pmd_mksoft_dirty(pmd));
+		if (new_ptl != old_ptl)
+			spin_unlock(new_ptl);
 		spin_unlock(old_ptl);
 	}
 out:
-- 
1.8.3.2

^ permalink raw reply related

* Re: [PATCH] pseries/cpuidle: Remove redundant call to ppc64_runlatch_off() in cpu idle routines
From: Preeti U Murthy @ 2014-01-13  6:34 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev, paulus, deepthi
In-Reply-To: <1389591507.29912.6.camel@concordia>

Hi Mikey

I have the patch with the changelog according to your suggestion below.

Thanks

On 01/13/2014 11:08 AM, Michael Ellerman wrote:
> On Thu, 2014-01-09 at 10:35 +0530, Preeti U Murthy wrote:
>> Commit fbd7740fdfdf9475f switched pseries cpu idle handling from complete idle
>> loops to ppc_md.powersave functions. Earlier to this switch,
>> ppc64_runlatch_off() had to be called in each of the idle routines. But after
>> the switch this call is handled in arch_cpu_idle(),just before the call
>> to ppc_md.powersave, where platform specific idle routines are called.
>>
>> As a consequence, the call to ppc64_runlatch_off() got duplicated in the
>> arch_cpu_idle() routine as well as in the some of the idle routines in
>> pseries and commit fbd7740fdfdf9475f missed to get rid of these redundant
>> calls. These calls were carried over subsequent enhancements to the pseries
>> cpuidle routines. This patch takes care of eliminating this redundancy.
> 
> It's "obvious" that turning the runlatch off multiple times is harmless,
> although it adds extra overhead, but please spell that out in the changelog.
> 
> cheers
> 
> 

pseries/cpuidle: Remove redundant call to ppc64_runlatch_off() in cpu idle routines

From: Preeti U Murthy <preeti@linux.vnet.ibm.com>

Commit fbd7740fdfdf9475f(powerpc: Simplify pSeries idle loop) switched pseries cpu
idle handling from complete idle loops to ppc_md.powersave functions. Earlier to
this switch, ppc64_runlatch_off() had to be called in each of the idle routines.
But after the switch, this call is handled in arch_cpu_idle(),just before the call
to ppc_md.powersave, where platform specific idle routines are called.

As a consequence, the call to ppc64_runlatch_off() got duplicated in the
arch_cpu_idle() routine as well as in the some of the idle routines in
pseries and commit fbd7740fdfdf9475f missed to get rid of these redundant
calls. These calls were carried over subsequent enhancements to the pseries
cpuidle routines.

Although multiple calls to ppc64_runlatch_off() is harmless, there is still some
overhead due to it. Besides that, these calls could also make way for a
misunderstanding that it is *necessary* to call ppc64_runlatch_off() multiple
times, when that is not the case. Hence this patch takes care of eliminating
this redundancy.

Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/pseries/processor_idle.c |    3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/processor_idle.c b/arch/powerpc/platforms/pseries/processor_idle.c
index a166e38..09e4f56 100644
--- a/arch/powerpc/platforms/pseries/processor_idle.c
+++ b/arch/powerpc/platforms/pseries/processor_idle.c
@@ -17,7 +17,6 @@
 #include <asm/reg.h>
 #include <asm/machdep.h>
 #include <asm/firmware.h>
-#include <asm/runlatch.h>
 #include <asm/plpar_wrappers.h>
 
 struct cpuidle_driver pseries_idle_driver = {
@@ -63,7 +62,6 @@ static int snooze_loop(struct cpuidle_device *dev,
 	set_thread_flag(TIF_POLLING_NRFLAG);
 
 	while ((!need_resched()) && cpu_online(cpu)) {
-		ppc64_runlatch_off();
 		HMT_low();
 		HMT_very_low();
 	}
@@ -103,7 +101,6 @@ static int dedicated_cede_loop(struct cpuidle_device *dev,
 	idle_loop_prolog(&in_purr);
 	get_lppaca()->donate_dedicated_cpu = 1;
 
-	ppc64_runlatch_off();
 	HMT_medium();
 	check_and_cede_processor();


Regards
Preeti U Murthy 

^ permalink raw reply related

* Re: [PATCH V4] powerpc: thp: Fix crash on mremap
From: Benjamin Herrenschmidt @ 2014-01-13  7:36 UTC (permalink / raw)
  To: aarcange
  Cc: aarcange, linux-mm, paulus, Aneesh Kumar K.V, linuxppc-dev,
	kirill.shutemov
In-Reply-To: <1389593064-32664-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>

On Mon, 2014-01-13 at 11:34 +0530, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> This patch fix the below crash

Andrea, can you ack the generic bit please ?

Thanks !

Cheers,
Ben.

> NIP [c00000000004cee4] .__hash_page_thp+0x2a4/0x440
> LR [c0000000000439ac] .hash_page+0x18c/0x5e0
> ...
> Call Trace:
> [c000000736103c40] [00001ffffb000000] 0x1ffffb000000(unreliable)
> [437908.479693] [c000000736103d50] [c0000000000439ac] .hash_page+0x18c/0x5e0
> [437908.479699] [c000000736103e30] [c00000000000924c] .do_hash_page+0x4c/0x58
> 
> On ppc64 we use the pgtable for storing the hpte slot information and
> store address to the pgtable at a constant offset (PTRS_PER_PMD) from
> pmd. On mremap, when we switch the pmd, we need to withdraw and deposit
> the pgtable again, so that we find the pgtable at PTRS_PER_PMD offset
> from new pmd.
> 
> We also want to move the withdraw and deposit before the set_pmd so
> that, when page fault find the pmd as trans huge we can be sure that
> pgtable can be located at the offset.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> ---
> Changes from V3:
> * Drop "powerpc: mm: Move ppc64 page table range definitions to separate header"" patch
> 
>  arch/powerpc/include/asm/pgtable-ppc64.h | 14 ++++++++++++++
>  include/asm-generic/pgtable.h            | 12 ++++++++++++
>  mm/huge_memory.c                         | 14 +++++---------
>  3 files changed, 31 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h b/arch/powerpc/include/asm/pgtable-ppc64.h
> index 4a191c472867..d27960c89a71 100644
> --- a/arch/powerpc/include/asm/pgtable-ppc64.h
> +++ b/arch/powerpc/include/asm/pgtable-ppc64.h
> @@ -558,5 +558,19 @@ extern pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp);
>  #define __HAVE_ARCH_PMDP_INVALIDATE
>  extern void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
>  			    pmd_t *pmdp);
> +
> +#define pmd_move_must_withdraw pmd_move_must_withdraw
> +typedef struct spinlock spinlock_t;
> +static inline int pmd_move_must_withdraw(spinlock_t *new_pmd_ptl,
> +					 spinlock_t *old_pmd_ptl)
> +{
> +	/*
> +	 * Archs like ppc64 use pgtable to store per pmd
> +	 * specific information. So when we switch the pmd,
> +	 * we should also withdraw and deposit the pgtable
> +	 */
> +	return true;
> +}
> +
>  #endif /* __ASSEMBLY__ */
>  #endif /* _ASM_POWERPC_PGTABLE_PPC64_H_ */
> diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
> index db0923458940..8e4f41d9af4d 100644
> --- a/include/asm-generic/pgtable.h
> +++ b/include/asm-generic/pgtable.h
> @@ -558,6 +558,18 @@ static inline pmd_t pmd_read_atomic(pmd_t *pmdp)
>  }
>  #endif
>  
> +#ifndef pmd_move_must_withdraw
> +static inline int pmd_move_must_withdraw(spinlock_t *new_pmd_ptl,
> +					 spinlock_t *old_pmd_ptl)
> +{
> +	/*
> +	 * With split pmd lock we also need to move preallocated
> +	 * PTE page table if new_pmd is on different PMD page table.
> +	 */
> +	return new_pmd_ptl != old_pmd_ptl;
> +}
> +#endif
> +
>  /*
>   * This function is meant to be used by sites walking pagetables with
>   * the mmap_sem hold in read mode to protect against MADV_DONTNEED and
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 95d1acb0f3d2..5d80c53b87cb 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1502,19 +1502,15 @@ int move_huge_pmd(struct vm_area_struct *vma, struct vm_area_struct *new_vma,
>  			spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING);
>  		pmd = pmdp_get_and_clear(mm, old_addr, old_pmd);
>  		VM_BUG_ON(!pmd_none(*new_pmd));
> -		set_pmd_at(mm, new_addr, new_pmd, pmd_mksoft_dirty(pmd));
> -		if (new_ptl != old_ptl) {
> -			pgtable_t pgtable;
>  
> -			/*
> -			 * Move preallocated PTE page table if new_pmd is on
> -			 * different PMD page table.
> -			 */
> +		if (pmd_move_must_withdraw(new_ptl, old_ptl)) {
> +			pgtable_t pgtable;
>  			pgtable = pgtable_trans_huge_withdraw(mm, old_pmd);
>  			pgtable_trans_huge_deposit(mm, new_pmd, pgtable);
> -
> -			spin_unlock(new_ptl);
>  		}
> +		set_pmd_at(mm, new_addr, new_pmd, pmd_mksoft_dirty(pmd));
> +		if (new_ptl != old_ptl)
> +			spin_unlock(new_ptl);
>  		spin_unlock(old_ptl);
>  	}
>  out:

^ permalink raw reply

* Re: [PATCH RFC v6 4/5] dma: mpc512x: register for device tree channel lookup
From: Alexander Popov @ 2014-01-13  8:17 UTC (permalink / raw)
  To: Vinod Koul
  Cc: Lars-Peter Clausen, Arnd Bergmann, Gerhard Sittig,
	Alexander Popov, dmaengine, Dan Williams, Anatolij Gustschin,
	linuxppc-dev
In-Reply-To: <20140109111957.GE16227@intel.com>

Thanks for your replies, Gerhard and Vinod.

2014/1/9 Vinod Koul <vinod.koul@intel.com>:
> On Wed, Jan 08, 2014 at 05:47:19PM +0100, Gerhard Sittig wrote:
>> [ what is the semantics of DMA_PRIVATE capability flag?
>>   is documentation available beyond the initial commit message?
>>   need individual channels be handled instead of controllers? ]
>
> The DMA_PRIVATE means that your channels are not to be used for global memcpy,
> as one can do in async cases (this is hwere DMAengine came into existence)
>
> If the device has the capablity of doing genric memcpy then it should not set
> this. For slave dma usage the dam channel can transfer data to a specfic
> slave device(s), hence we should use this is geric fashion so setting
> DMA_PRIVATE makes sense in those cases.

Each DMA channel of MPC512x DMA controller can do _both_
mem-to-mem transfers and transfers between mem and some slave peripheral
(only one DMA channel is fully dedicated to DDR).
All DMA channels of MPC512x DMA controller belong to one dma_device.
So we _don't_ need setting DMA_PRIVATE flag for this dma_device at all, do we?

>> On Sat, Jan 04, 2014 at 00:54 +0400, Alexander Popov wrote:
>> > I've involved DMA_PRIVATE flag because new of_dma_xlate_by_chan_id()
>> > uses dma_get_slave_channel() instead of dma_request_channel()
>> > (PATCH RFC v6 3/5). This flag is implicitly set in dma_request_channel(),
>> > but is not set in dma_get_slave_channel().
> Which makes me thing you are targetting slave usages. Do you intend to use for
> mempcy too on all controllers you support. in that case you should set it
> selectively.

Vinod, please correct me if I'm wrong.
As I could understand from your comments and the code,
DMA_PRIVATE flag is needed for dma_devices with DMA channels
which _can_ work with slave peripheral but _can't_ do mem-to-mem transfers.
If DMA_PRIVATE flag is set for some dma_device before
dma_async_device_register()
then its DMA channels are not published in tables for kernel slab allocator
(because these channels are simply useless for memcpy).

>> Still I see a difference in the lookup approaches:  Yours applies
>> DMA_PRIVATE globally and in advance, preventing _any_ use of DMA
>> for memory transfers.  While the __dma_request_channel() routine
>> only applies it _temporarily_ around a dma_chan_get() operation.
>> Allowing for use of DMA channels by both individual peripherals
>> as well as memory transfers.
>>
> No it doesnt prevent. You can still use it for memcpy once you have the channel.

Excuse me, I don't completely understand why dma_request_channel()
needs to set DMA_PRIVATE flag.
If dma_request_channel() for some dma_device without DMA_PRIVATE
is called before the first dmaengine_get()
then no DMA channels of this dma_device will become available for memcpy
by slab allocator.
Could you give me a clue?

>> > > Consider the fact that this driver
>> > > handles both MPC5121 as well as MPC8308 hardware.
>> >
>> > Ah, yes, sorry. I should certainly fix this, if setting of DMA_PRIVATE flag
>> > is needed at all.
>>
>> What I meant here is that implications for all affected platforms
>> should be considered.  There is one driver source, but the driver
>> applies to more than one platform (another issue of the driver is
>> that this is not apparent from the doc nor the compat strings).

I'll add a comment with information about the supported platforms to
mpc512x_dma.c
in RFC PATCH 1/5. Ok?

>> So blocking memory transfers in mpc512x_dma.c is a total breakage
>> for MPC8308 (removes the only previous feature and adds nothing),
>> and is a regression for MPC512x (removes the previously supported
>> memory transfers, while it may add peripheral supports with very
>> few users).

Yes, I see. MPC512x and MPC8308 should be treated differently.

Thanks!
Alexander

^ permalink raw reply

* [PATCH v9] clk: corenet: Adds the clock binding
From: Tang Yuantian @ 2014-01-13  8:16 UTC (permalink / raw)
  To: b07421; +Cc: mark.rutland, devicetree, Tang Yuantian, linuxppc-dev

From: Tang Yuantian <yuantian.tang@freescale.com>

Adds the clock bindings for Freescale PowerPC CoreNet platforms

Signed-off-by: Tang Yuantian <Yuantian.Tang@freescale.com>
Signed-off-by: Li Yang <leoli@freescale.com>
---
v9:
	- refined some properties' description
v8:
	- added clock-frequency property description
	- fixed whitespace and tab mixing issue
v7:
	- refined some properties' definitions
v6:
	- splited the previous patch into 2 parts, one is for binding(this one),
	  the other is for DTS modification(will submit once this gets accepted)
	- fixed typo
	- refined #clock-cells and clock-output-names properties
	- removed fixed-clock compatible string
v5:
	- refine the binding document
	- update the compatible string
v4:
	- add binding document
	- update compatible string
	- update the reg property
v3:
	- fix typo
v2:
	- add t4240, b4420, b4860 support
	- remove pll/4 clock from p2041, p3041 and p5020 board

 .../devicetree/bindings/clock/corenet-clock.txt    | 134 +++++++++++++++++++++
 1 file changed, 134 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/clock/corenet-clock.txt

diff --git a/Documentation/devicetree/bindings/clock/corenet-clock.txt b/Documentation/devicetree/bindings/clock/corenet-clock.txt
new file mode 100644
index 0000000..8394bc7
--- /dev/null
+++ b/Documentation/devicetree/bindings/clock/corenet-clock.txt
@@ -0,0 +1,134 @@
+* Clock Block on Freescale CoreNet Platforms
+
+Freescale CoreNet chips take primary clocking input from the external
+SYSCLK signal. The SYSCLK input (frequency) is multiplied using
+multiple phase locked loops (PLL) to create a variety of frequencies
+which can then be passed to a variety of internal logic, including
+cores and peripheral IP blocks.
+Please refer to the Reference Manual for details.
+
+1. Clock Block Binding
+
+Required properties:
+- compatible: Should contain a specific clock block compatible string
+	and a single chassis clock compatible string.
+	Clock block strings include, but not limited to, one of the:
+	* "fsl,p2041-clockgen"
+	* "fsl,p3041-clockgen"
+	* "fsl,p4080-clockgen"
+	* "fsl,p5020-clockgen"
+	* "fsl,p5040-clockgen"
+	* "fsl,t4240-clockgen"
+	* "fsl,b4420-clockgen"
+	* "fsl,b4860-clockgen"
+	Chassis clock strings include:
+	* "fsl,qoriq-clockgen-1.0": for chassis 1.0 clocks
+	* "fsl,qoriq-clockgen-2.0": for chassis 2.0 clocks
+- reg: Describes the address of the device's resources within the
+	address space defined by its parent bus, and resource zero
+	represents the clock register set
+- clock-frequency: Input system clock frequency
+
+Recommended properties:
+- ranges: Allows valid translation between child's address space and
+	parent's. Must be present if the device has sub-nodes.
+- #address-cells: Specifies the number of cells used to represent
+	physical base addresses.  Must be present if the device has
+	sub-nodes and set to 1 if present
+- #size-cells: Specifies the number of cells used to represent
+	the size of an address. Must be present if the device has
+	sub-nodes and set to 1 if present
+
+2. Clock Provider/Consumer Binding
+
+Most of the bindings are from the common clock binding[1].
+ [1] Documentation/devicetree/bindings/clock/clock-bindings.txt
+
+Required properties:
+- compatible : Should include one of the following:
+	* "fsl,qoriq-core-pll-1.0" for core PLL clocks (v1.0)
+	* "fsl,qoriq-core-pll-2.0" for core PLL clocks (v2.0)
+	* "fsl,qoriq-core-mux-1.0" for core mux clocks (v1.0)
+	* "fsl,qoriq-core-mux-2.0" for core mux clocks (v2.0)
+	* "fsl,qoriq-sysclk-1.0": for input system clock (v1.0).
+		It takes parent's clock-frequency as its clock.
+	* "fsl,qoriq-sysclk-2.0": for input system clock (v2.0).
+		It takes parent's clock-frequency as its clock.
+- #clock-cells: From common clock binding. The number of cells in a
+	clock-specifier. Should be <0> for "fsl,qoriq-sysclk-[1,2].0"
+	clocks, or <1> for "fsl,qoriq-core-pll-[1,2].0" clocks.
+	For "fsl,qoriq-core-pll-[1,2].0" clocks, the single
+	clock-specifier cell may take the following values:
+	* 0 - equal to the PLL frequency
+	* 1 - equal to the PLL frequency divided by 2
+	* 2 - equal to the PLL frequency divided by 4
+
+Recommended properties:
+- clocks: Should be the phandle of input parent clock
+- clock-names: From common clock binding, indicates the clock name
+- clock-output-names: From common clock binding, indicates the names of
+	output clocks
+- reg: Should be the offset and length of clock block base address.
+	The length should be 4.
+
+Example for clock block and clock provider:
+/ {
+	clockgen: global-utilities@e1000 {
+		compatible = "fsl,p5020-clockgen", "fsl,qoriq-clockgen-1.0";
+		ranges = <0x0 0xe1000 0x1000>;
+		clock-frequency = 133333;
+		reg = <0xe1000 0x1000>;
+		#address-cells = <1>;
+		#size-cells = <1>;
+
+		sysclk: sysclk {
+			#clock-cells = <0>;
+			compatible = "fsl,qoriq-sysclk-1.0";
+			clock-output-names = "sysclk";
+		}
+
+		pll0: pll0@800 {
+			#clock-cells = <1>;
+			reg = <0x800 0x4>;
+			compatible = "fsl,qoriq-core-pll-1.0";
+			clocks = <&sysclk>;
+			clock-output-names = "pll0", "pll0-div2";
+		};
+
+		pll1: pll1@820 {
+			#clock-cells = <1>;
+			reg = <0x820 0x4>;
+			compatible = "fsl,qoriq-core-pll-1.0";
+			clocks = <&sysclk>;
+			clock-output-names = "pll1", "pll1-div2";
+		};
+
+		mux0: mux0@0 {
+			#clock-cells = <0>;
+			reg = <0x0 0x4>;
+			compatible = "fsl,qoriq-core-mux-1.0";
+			clocks = <&pll0 0>, <&pll0 1>, <&pll1 0>, <&pll1 1>;
+			clock-names = "pll0", "pll0-div2", "pll1", "pll1-div2";
+			clock-output-names = "cmux0";
+		};
+
+		mux1: mux1@20 {
+			#clock-cells = <0>;
+			reg = <0x20 0x4>;
+			compatible = "fsl,qoriq-core-mux-1.0";
+			clocks = <&pll0 0>, <&pll0 1>, <&pll1 0>, <&pll1 1>;
+			clock-names = "pll0", "pll0-div2", "pll1", "pll1-div2";
+			clock-output-names = "cmux1";
+		};
+	};
+  }
+
+Example for clock consumer:
+
+/ {
+	cpu0: PowerPC,e5500@0 {
+		...
+		clocks = <&mux0>;
+		...
+	};
+  }
-- 
1.8.0

^ permalink raw reply related

* Re: [PATCH 02/13] ppc/cell: use get_unused_fd_flags(0) instead of get_unused_fd()
From: Yann Droneaud @ 2014-01-13  9:30 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: cbe-oss-dev, linuxppc-dev, linux-kernel
In-Reply-To: <1389567961.4672.111.camel@pasglop>

Hi Benjamin,

Le lundi 13 janvier 2014 à 10:06 +1100, Benjamin Herrenschmidt a écrit :
> On Tue, 2013-07-02 at 18:39 +0200, Yann Droneaud wrote:
> > Macro get_unused_fd() is used to allocate a file descriptor with
> > default flags. Those default flags (0) can be "unsafe":
> > O_CLOEXEC must be used by default to not leak file descriptor
> > across exec().
> > 
> > Instead of macro get_unused_fd(), functions anon_inode_getfd()
> > or get_unused_fd_flags() should be used with flags given by userspace.
> > If not possible, flags should be set to O_CLOEXEC to provide userspace
> > with a default safe behavor.
> > 
> > In a further patch, get_unused_fd() will be removed so that
> > new code start using anon_inode_getfd() or get_unused_fd_flags()
> > with correct flags.
> > 
> > This patch replaces calls to get_unused_fd() with equivalent call to
> > get_unused_fd_flags(0) to preserve current behavor for existing code.
> > 
> > The hard coded flag value (0) should be reviewed on a per-subsystem basis,
> > and, if possible, set to O_CLOEXEC.
> > 
> > Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
> 
> Should I merge this (v5 on patchwork) or let Al do it ?
> 

Please merge it directly: patches from the previous patchsets were
picked individually by each subsystem maintainer after proper review
regarding setting close on exec flag by default.

> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> 

Thanks a lot.

> > ---
> >  arch/powerpc/platforms/cell/spufs/inode.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/powerpc/platforms/cell/spufs/inode.c b/arch/powerpc/platforms/cell/spufs/inode.c
> > index f390042..88df441 100644
> > --- a/arch/powerpc/platforms/cell/spufs/inode.c
> > +++ b/arch/powerpc/platforms/cell/spufs/inode.c
> > @@ -301,7 +301,7 @@ static int spufs_context_open(struct path *path)
> >  	int ret;
> >  	struct file *filp;
> >  
> > -	ret = get_unused_fd();
> > +	ret = get_unused_fd_flags(0);
> >  	if (ret < 0)
> >  		return ret;
> >  
> > @@ -518,7 +518,7 @@ static int spufs_gang_open(struct path *path)
> >  	int ret;
> >  	struct file *filp;
> >  
> > -	ret = get_unused_fd();
> > +	ret = get_unused_fd_flags(0);
> >  	if (ret < 0)
> >  		return ret;
> >  
> 
> 

Note:
latest patch (from v5 patchset) is at
http://lkml.kernel.org/r/fe27abcfab5563d36a3e5e58ff36e5500c39be6a.1388952061.git.ydroneaud@opteya.com
v5 patchset is at
http://lkml.kernel.org/r/cover.1388952061.git.ydroneaud@opteya.com

Regards.

-- 
Yann Droneaud
OPTEYA

^ permalink raw reply

* Re: [PATCH V4] powerpc: thp: Fix crash on mremap
From: Aneesh Kumar K.V @ 2014-01-13  9:46 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, aarcange
  Cc: aarcange, linuxppc-dev, paulus, kirill.shutemov, linux-mm
In-Reply-To: <1389598587.4672.121.camel@pasglop>

Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:

> On Mon, 2014-01-13 at 11:34 +0530, Aneesh Kumar K.V wrote:
>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>> 
>> This patch fix the below crash
>
> Andrea, can you ack the generic bit please ?
>
> Thanks !

Kirill A. Shutemov did ack an earlier version

http://article.gmane.org/gmane.linux.kernel.mm/111368

-aneesh

^ permalink raw reply

* Re: [PATCH V4] powerpc: thp: Fix crash on mremap
From: Benjamin Herrenschmidt @ 2014-01-13 13:37 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: aarcange, linuxppc-dev, paulus, kirill.shutemov, linux-mm
In-Reply-To: <87wqi42p0f.fsf@linux.vnet.ibm.com>

On Mon, 2014-01-13 at 15:16 +0530, Aneesh Kumar K.V wrote:
> Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:
> 
> > On Mon, 2014-01-13 at 11:34 +0530, Aneesh Kumar K.V wrote:
> >> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> >> 
> >> This patch fix the below crash
> >
> > Andrea, can you ack the generic bit please ?
> >
> > Thanks !
> 
> Kirill A. Shutemov did ack an earlier version
> 
> http://article.gmane.org/gmane.linux.kernel.mm/111368

Doesn't help. If I'm going to send Linus a patch with a generic change
like that, I need an ack of that exact version of the change by a senior
mm person such as Andrea.

Cheers,
Ben.

^ permalink raw reply

* [PATCH 0/4] remap non-modular uses of module_init properly
From: Paul Gortmaker @ 2014-01-13 16:21 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras; +Cc: Paul Gortmaker, linuxppc-dev

The goal is to move module_init/module_exit from init.h and into
module.h -- however in doing so, we uncover several instances in
powerpc code where module_init is used somewhat incorrectly by
non modular code, and a file that needs module.h but isn't sourcing
it.  We need to make these fixups 1st before changing the headers
so that we don't cause build failures.

The changes are largely inert, however we do cause a largely trivial
change in the initcall ordering -- that happens because module_init
is really device_initcall; and yet we shouldn't be using device_initcall
where clearly arch_initcall or subsys_initcall are more appropriate.

Boot tested on sbc8548 on powerpc next branch of today.

Paul Gortmaker (4):
  powerpc: use device_initcall for registering rtc devices
  powerpc: book3s kvm can be modular so it should use module.h
  powerpc: use subsys_initcall for Freescale Local Bus
  powerpc: don't use module_init for non-modular core hugetlb code

 arch/powerpc/kernel/time.c        | 2 +-
 arch/powerpc/kvm/book3s.c         | 2 +-
 arch/powerpc/mm/hugetlbpage.c     | 2 +-
 arch/powerpc/platforms/ps3/time.c | 3 +--
 arch/powerpc/sysdev/fsl_lbc.c     | 2 +-
 5 files changed, 5 insertions(+), 6 deletions(-)

-- 
1.8.5.2

^ permalink raw reply

* [PATCH 1/4] powerpc: use device_initcall for registering rtc devices
From: Paul Gortmaker @ 2014-01-13 16:21 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras; +Cc: Paul Gortmaker, linuxppc-dev
In-Reply-To: <1389630113-7919-1-git-send-email-paul.gortmaker@windriver.com>

Currently these two RTC devices are in core platform code
where it is not possible for them to be modular.  It will
never be modular, so using module_init as an alias for
__initcall can be somewhat misleading.

Fix this up now, so that we can relocate module_init from
init.h into module.h in the future.  If we don't do this, we'd
have to add module.h to obviously non-modular code, and that
would be a worse thing.

Note that direct use of __initcall is discouraged, vs. one
of the priority categorized subgroups.  As __initcall gets
mapped onto device_initcall, our use of device_initcall
directly in this change means that the runtime impact is
zero -- they will remain at level 6 in initcall ordering.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 arch/powerpc/kernel/time.c        | 2 +-
 arch/powerpc/platforms/ps3/time.c | 3 +--
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index afb1b56ef4fa..bee2bb2bbc75 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -1053,4 +1053,4 @@ static int __init rtc_init(void)
 	return PTR_ERR_OR_ZERO(pdev);
 }

-module_init(rtc_init);
+device_initcall(rtc_init);
diff --git a/arch/powerpc/platforms/ps3/time.c b/arch/powerpc/platforms/ps3/time.c
index ce73ce865613..791c6142c4a7 100644
--- a/arch/powerpc/platforms/ps3/time.c
+++ b/arch/powerpc/platforms/ps3/time.c
@@ -92,5 +92,4 @@ static int __init ps3_rtc_init(void)

 	return PTR_ERR_OR_ZERO(pdev);
 }
-
-module_init(ps3_rtc_init);
+device_initcall(ps3_rtc_init);
-- 
1.8.5.2

^ permalink raw reply related

* [PATCH 4/4] powerpc: don't use module_init for non-modular core hugetlb code
From: Paul Gortmaker @ 2014-01-13 16:21 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras; +Cc: Paul Gortmaker, linuxppc-dev
In-Reply-To: <1389630113-7919-1-git-send-email-paul.gortmaker@windriver.com>

The hugetlbpage.o is obj-y (always built in).  It will never
be modular, so using module_init as an alias for __initcall is
somewhat misleading.

Fix this up now, so that we can relocate module_init from
init.h into module.h in the future.  If we don't do this, we'd
have to add module.h to obviously non-modular code, and that
would be a worse thing.

Note that direct use of __initcall is discouraged, vs. one
of the priority categorized subgroups.  As __initcall gets
mapped onto device_initcall, our use of arch_initcall (which
makes sense for arch code) will thus change this registration
from level 6-device to level 3-arch (i.e. slightly earlier).
However no observable impact of that small difference has
been observed during testing, or is expected.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 arch/powerpc/mm/hugetlbpage.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 90bb6d9409bf..d25c202420da 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -911,7 +911,7 @@ static int __init hugetlbpage_init(void)
 	return 0;
 }
 #endif
-module_init(hugetlbpage_init);
+arch_initcall(hugetlbpage_init);

 void flush_dcache_icache_hugepage(struct page *page)
 {
-- 
1.8.5.2

^ permalink raw reply related

* [PATCH 3/4] powerpc: use subsys_initcall for Freescale Local Bus
From: Paul Gortmaker @ 2014-01-13 16:21 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras; +Cc: Paul Gortmaker, linuxppc-dev
In-Reply-To: <1389630113-7919-1-git-send-email-paul.gortmaker@windriver.com>

The FSL_SOC option is bool, and hence this code is either
present or absent.  It will never be modular, so using
module_init as an alias for __initcall is rather misleading.

Fix this up now, so that we can relocate module_init from
init.h into module.h in the future.  If we don't do this, we'd
have to add module.h to obviously non-modular code, and that
would be a worse thing.

Note that direct use of __initcall is discouraged, vs. one
of the priority categorized subgroups.  As __initcall gets
mapped onto device_initcall, our use of subsys_initcall (which
makes sense for bus code) will thus change this registration
from level 6-device to level 4-subsys (i.e. slightly earlier).
However no observable impact of that small difference has
been observed during testing, or is expected.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 arch/powerpc/sysdev/fsl_lbc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/sysdev/fsl_lbc.c b/arch/powerpc/sysdev/fsl_lbc.c
index 6bc5a546d49f..9f00e5f84abe 100644
--- a/arch/powerpc/sysdev/fsl_lbc.c
+++ b/arch/powerpc/sysdev/fsl_lbc.c
@@ -388,4 +388,4 @@ static int __init fsl_lbc_init(void)
 {
 	return platform_driver_register(&fsl_lbc_ctrl_driver);
 }
-module_init(fsl_lbc_init);
+subsys_initcall(fsl_lbc_init);
-- 
1.8.5.2

^ permalink raw reply related

* [PATCH 2/4] powerpc: book3s kvm can be modular so it should use module.h
From: Paul Gortmaker @ 2014-01-13 16:21 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras; +Cc: Paul Gortmaker, linuxppc-dev
In-Reply-To: <1389630113-7919-1-git-send-email-paul.gortmaker@windriver.com>

KVM support is tristate, so this file should be including
module.h instead of export.h -- it only works currently because
module_init is currently (mis)placed in init.h -- but we are
intending to clean that up and relocate it to module.h

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 arch/powerpc/kvm/book3s.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 8912608b7e1b..279459e8a072 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -16,7 +16,7 @@
 
 #include <linux/kvm_host.h>
 #include <linux/err.h>
-#include <linux/export.h>
+#include <linux/module.h>
 #include <linux/slab.h>
 
 #include <asm/reg.h>
-- 
1.8.5.2

^ permalink raw reply related

* Re: [PATCH V4] powerpc: thp: Fix crash on mremap
From: Kirill A. Shutemov @ 2014-01-13 16:32 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: aarcange, linux-mm, paulus, linuxppc-dev, kirill.shutemov
In-Reply-To: <1389593064-32664-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>

On Mon, Jan 13, 2014 at 11:34:24AM +0530, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> This patch fix the below crash
> 
> NIP [c00000000004cee4] .__hash_page_thp+0x2a4/0x440
> LR [c0000000000439ac] .hash_page+0x18c/0x5e0
> ...
> Call Trace:
> [c000000736103c40] [00001ffffb000000] 0x1ffffb000000(unreliable)
> [437908.479693] [c000000736103d50] [c0000000000439ac] .hash_page+0x18c/0x5e0
> [437908.479699] [c000000736103e30] [c00000000000924c] .do_hash_page+0x4c/0x58
> 
> On ppc64 we use the pgtable for storing the hpte slot information and
> store address to the pgtable at a constant offset (PTRS_PER_PMD) from
> pmd. On mremap, when we switch the pmd, we need to withdraw and deposit
> the pgtable again, so that we find the pgtable at PTRS_PER_PMD offset
> from new pmd.
> 
> We also want to move the withdraw and deposit before the set_pmd so
> that, when page fault find the pmd as trans huge we can be sure that
> pgtable can be located at the offset.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>


-- 
 Kirill A. Shutemov

^ permalink raw reply

* [PATCH] powerpc/relocate fix relocate processing in LE mode
From: Laurent Dufour @ 2014-01-13 16:36 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, paulus, linuxppc-dev

Relocation's code is not working in little endian mode because the r_info
field, which is a 64 bits value, should be read from the right offset.

The current code is optimized to read the r_info field as a 32 bits value
starting at the middle of the double word (offset 12). When running in LE
mode, the read value is not correct since only the MSB is read.

This patch removes this optimization which consist to deal with a 32 bits
value instead of a 64 bits one. This way it works in big and little endian
mode.

Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/reloc_64.S |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/reloc_64.S b/arch/powerpc/kernel/reloc_64.S
index b47a0e1..1482327 100644
--- a/arch/powerpc/kernel/reloc_64.S
+++ b/arch/powerpc/kernel/reloc_64.S
@@ -69,8 +69,8 @@ _GLOBAL(relocate)
 	 * R_PPC64_RELATIVE ones.
 	 */
 	mtctr	r8
-5:	lwz	r0,12(9)	/* ELF64_R_TYPE(reloc->r_info) */
-	cmpwi	r0,R_PPC64_RELATIVE
+5:	ld	r0,8(9)		/* ELF64_R_TYPE(reloc->r_info) */
+	cmpdi	r0,R_PPC64_RELATIVE
 	bne	6f
 	ld	r6,0(r9)	/* reloc->r_offset */
 	ld	r0,16(r9)	/* reloc->r_addend */

^ permalink raw reply related

* Re: [PATCH v2 0/9] cpuidle: rework device state count handling
From: Rafael J. Wysocki @ 2014-01-13 21:20 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz
  Cc: linux-samsung-soc, linux-pm, daniel.lezcano, linux-kernel,
	kyungmin.park, linuxppc-dev, lenb
In-Reply-To: <2079155.EyEBRDoJjP@vostro.rjw.lan>

On Saturday, January 11, 2014 01:37:29 AM Rafael J. Wysocki wrote:
> On Friday, December 20, 2013 07:47:22 PM Bartlomiej Zolnierkiewicz wrote:
> > Hi,
> > 
> > Some cpuidle drivers assume that cpuidle core will handle cases where
> > device->state_count is smaller than driver->state_count, unfortunately
> > currently this is untrue (device->state_count is used only for handling
> > cpuidle state sysfs entries and driver->state_count is used for all
> > other cases) and will not be fixed in the future as device->state_count
> > is planned to be removed [1].
> > 
> > This patchset fixes such drivers (ARM EXYNOS cpuidle driver and ACPI
> > cpuidle driver), removes superflous device->state_count initialization
> > from drivers for which device->state_count equals driver->state_count
> > (POWERPC pseries cpuidle driver and intel_idle driver) and finally
> > removes state_count field from struct cpuidle_device.
> > 
> > Additionaly (while at it) this patchset fixes C1E promotion disable
> > quirk handling (in intel_idle driver) and converts cpuidle drivers code
> > to use the common cpuidle_[un]register() routines (in POWERPC pseries
> > cpuidle driver and intel_idle driver).
> > 
> > [1] http://permalink.gmane.org/gmane.linux.power-management.general/36908
> > 
> > Reference to v1:
> > 	http://comments.gmane.org/gmane.linux.power-management.general/37390
> > 
> > Changes since v1:
> > - synced patch series with next-20131220
> > - added ACKs from Daniel Lezcano
> 
> This series breaks boot on one of my test machines with intel_idle, so I'm
> not sure how well it has been tested.
> 
> I've dropped it entirely for now.  If I have the time, I will try to identify
> the root cause of the failure, but that may not happen before the merge window.
> Sorry about that.

The breakage was introduced by patch [8/9], so I've re-applied patches [1-7/9]
from this series.  Please refer to Fengguang's report [1] for the breakage
details.

Thanks!

[1] http://marc.info/?l=linux-kernel&m=138964167909907&w=2

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply

* Re: [PATCH] powerpc: thp: Fix crash on mremap
From: Andrew Morton @ 2014-01-13 22:17 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: aarcange, linux-mm, paulus, Aneesh Kumar K.V, linuxppc-dev,
	kirill.shutemov
In-Reply-To: <20140102021951.GA26369@node.dhcp.inet.fi>

On Thu, 2 Jan 2014 04:19:51 +0200 "Kirill A. Shutemov" <kirill@shutemov.name> wrote:

> On Wed, Jan 01, 2014 at 09:29:05PM +1100, Benjamin Herrenschmidt wrote:
> > On Wed, 2014-01-01 at 15:23 +0530, Aneesh Kumar K.V wrote:
> > > From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> > > 
> > > This patch fix the below crash
> > > 
> > > NIP [c00000000004cee4] .__hash_page_thp+0x2a4/0x440
> > > LR [c0000000000439ac] .hash_page+0x18c/0x5e0
> > > ...
> > > Call Trace:
> > > [c000000736103c40] [00001ffffb000000] 0x1ffffb000000(unreliable)
> > > [437908.479693] [c000000736103d50] [c0000000000439ac] .hash_page+0x18c/0x5e0
> > > [437908.479699] [c000000736103e30] [c00000000000924c] .do_hash_page+0x4c/0x58
> > > 
> > > On ppc64 we use the pgtable for storing the hpte slot information and
> > > store address to the pgtable at a constant offset (PTRS_PER_PMD) from
> > > pmd. On mremap, when we switch the pmd, we need to withdraw and deposit
> > > the pgtable again, so that we find the pgtable at PTRS_PER_PMD offset
> > > from new pmd.
> > > 
> > > We also want to move the withdraw and deposit before the set_pmd so
> > > that, when page fault find the pmd as trans huge we can be sure that
> > > pgtable can be located at the offset.
> > > 

Did this get fixed?

^ permalink raw reply

* Re: [PATCH] powerpc: thp: Fix crash on mremap
From: Kirill A. Shutemov @ 2014-01-13 22:30 UTC (permalink / raw)
  To: Andrew Morton
  Cc: aarcange, linux-mm, paulus, Aneesh Kumar K.V, linuxppc-dev,
	kirill.shutemov
In-Reply-To: <20140113141748.0b851e1573e41bf26de7c0ae@linux-foundation.org>

On Mon, Jan 13, 2014 at 02:17:48PM -0800, Andrew Morton wrote:
> On Thu, 2 Jan 2014 04:19:51 +0200 "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> 
> > On Wed, Jan 01, 2014 at 09:29:05PM +1100, Benjamin Herrenschmidt wrote:
> > > On Wed, 2014-01-01 at 15:23 +0530, Aneesh Kumar K.V wrote:
> > > > From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> > > > 
> > > > This patch fix the below crash
> > > > 
> > > > NIP [c00000000004cee4] .__hash_page_thp+0x2a4/0x440
> > > > LR [c0000000000439ac] .hash_page+0x18c/0x5e0
> > > > ...
> > > > Call Trace:
> > > > [c000000736103c40] [00001ffffb000000] 0x1ffffb000000(unreliable)
> > > > [437908.479693] [c000000736103d50] [c0000000000439ac] .hash_page+0x18c/0x5e0
> > > > [437908.479699] [c000000736103e30] [c00000000000924c] .do_hash_page+0x4c/0x58
> > > > 
> > > > On ppc64 we use the pgtable for storing the hpte slot information and
> > > > store address to the pgtable at a constant offset (PTRS_PER_PMD) from
> > > > pmd. On mremap, when we switch the pmd, we need to withdraw and deposit
> > > > the pgtable again, so that we find the pgtable at PTRS_PER_PMD offset
> > > > from new pmd.
> > > > 
> > > > We also want to move the withdraw and deposit before the set_pmd so
> > > > that, when page fault find the pmd as trans huge we can be sure that
> > > > pgtable can be located at the offset.
> > > > 
> 
> Did this get fixed?

New version: http://thread.gmane.org/gmane.linux.kernel.mm/111809

-- 
 Kirill A. Shutemov

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox