LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH 1/2] ARM: kirkwood: proper retain MAC address workaround on DT ethernet
From: Sebastian Hesselbarth @ 2013-05-26 20:06 UTC (permalink / raw)
  To: David Miller
  Cc: andrew, jason, netdev, linux-kernel, buytenh, Grant Likely,
	linuxppc-dev, linux-arm-kernel
In-Reply-To: <20130525.210441.818472895703230779.davem@davemloft.net>

On 05/26/2013 06:04 AM, David Miller wrote:
> From: Sebastian Hesselbarth<sebastian.hesselbarth@gmail.com>
> Date: Wed, 22 May 2013 22:04:01 +0200
>
>> +			memcpy((void *)p->value, reg, 6);
>
> This cast is completely unnecessary, non-void to void pointer casts
> are automatic.
>
> If it is necessary, because p->value is const, then you are trying
> to change something behind the OF layer's back and need to use
> the appropriate interface to change the property contents.

David,

good you mention it. I added Grant on Cc and will give a short
sum-up why I casted the const from property->value away here.

Maybe I overlooked the API for modifying the DT property but as
far as I've seen - there is no API for modifying it. And yes,
you are right, it is kind of an abuse of DT here.

As Kirkwoods loose their MAC address on clock gating, I was looking
for a place to store it early. (a) DT property "local-mac-address"
looked as a good place as it will allow the driver to find it without
any extra code. Of course, I am doing severaly sanity checks if it is
safe to overwrite it, i.e. no other MAC set, property is there, long
enough.

If Grant also NACKs modifying the DT we basically have two more options
left for Kirkwood: (b) have MAC stored early in two global arrays in
board init and reference that from mv643xx_eth or (c) leave the clock
ungated unconditionally on all Kirkwoods.

I can live with all three, just name it and I prepare a final patch set.

Sebastian

^ permalink raw reply

* Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
From: Alexey Kardashevskiy @ 2013-05-27  2:44 UTC (permalink / raw)
  To: Scott Wood
  Cc: kvm, Alexander Graf, kvm-ppc, linux-kernel, Paul Mackerras,
	linuxppc-dev, David Gibson
In-Reply-To: <20130525024524.GA6112@boomeroo.fritz.box>

On 05/25/2013 12:45 PM, David Gibson wrote:
> On Wed, May 22, 2013 at 04:06:57PM -0500, Scott Wood wrote:
>> On 05/20/2013 10:06:46 PM, Alexey Kardashevskiy wrote:
>>> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
>>> index 8465c2a..da6bf61 100644
>>> --- a/arch/powerpc/kvm/powerpc.c
>>> @@ -396,6 +396,7 @@ int kvm_dev_ioctl_check_extension(long ext)
>>> +++ b/arch/powerpc/kvm/powerpc.c
>>> 		break;
>>> #endif
>>> 	case KVM_CAP_SPAPR_MULTITCE:
>>> +	case KVM_CAP_SPAPR_TCE_IOMMU:
>>> 		r = 1;
>>> 		break;
>>> 	default:
>>
>> Don't advertise SPAPR capabilities if it's not book3s -- and
>> probably there's some additional limitation that would be
>> appropriate.
> 
> So, in the case of MULTITCE, that's not quite right.  PR KVM can
> emulate a PAPR system on a BookE machine, and there's no reason not to
> allow TCE acceleration as well.  We can't make it dependent on PAPR
> mode being selected, because that's enabled per-vcpu, whereas these
> capabilities are queried on the VM before the vcpus are created.
> 
> CAP_SPAPR_TCE_IOMMU should be dependent on the presence of suitable
> host side hardware (i.e. a PAPR style IOMMU), though.


The capability says that the ioctl is supported. If there is no IOMMU group
registered, than it will fail with a reasonable error and nobody gets hurt.
What is the problem?


>>
>>> @@ -1025,6 +1026,17 @@ long kvm_arch_vm_ioctl(struct file *filp,
>>> 		r = kvm_vm_ioctl_create_spapr_tce(kvm, &create_tce);
>>> 		goto out;
>>> 	}
>>> +	case KVM_CREATE_SPAPR_TCE_IOMMU: {
>>> +		struct kvm_create_spapr_tce_iommu create_tce_iommu;
>>> +		struct kvm *kvm = filp->private_data;
>>> +
>>> +		r = -EFAULT;
>>> +		if (copy_from_user(&create_tce_iommu, argp,
>>> +				sizeof(create_tce_iommu)))
>>> +			goto out;
>>> +		r = kvm_vm_ioctl_create_spapr_tce_iommu(kvm,
>>> &create_tce_iommu);
>>> +		goto out;
>>> +	}
>>> #endif /* CONFIG_PPC_BOOK3S_64 */
>>>
>>> #ifdef CONFIG_KVM_BOOK3S_64_HV
>>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>>> index 5a2afda..450c82a 100644
>>> --- a/include/uapi/linux/kvm.h
>>> +++ b/include/uapi/linux/kvm.h
>>> @@ -667,6 +667,7 @@ struct kvm_ppc_smmu_info {
>>> #define KVM_CAP_PPC_RTAS 91
>>> #define KVM_CAP_IRQ_XICS 92
>>> #define KVM_CAP_SPAPR_MULTITCE (0x110000 + 89)
>>> +#define KVM_CAP_SPAPR_TCE_IOMMU (0x110000 + 90)
>>
>> Hmm...
> 
> Ah, yeah, that needs to be fixed.  Those were interim numbers so that
> we didn't have to keep changing our internal trees as new upstream
> ioctls got added to the list.  We need to get a proper number for the
> merge, though.
>
>>> @@ -939,6 +940,9 @@ struct kvm_s390_ucas_mapping {
>>> #define KVM_GET_DEVICE_ATTR	  _IOW(KVMIO,  0xe2, struct
>>> kvm_device_attr)
>>> #define KVM_HAS_DEVICE_ATTR	  _IOW(KVMIO,  0xe3, struct
>>> kvm_device_attr)
>>>
>>> +/* ioctl for SPAPR TCE IOMMU */
>>> +#define KVM_CREATE_SPAPR_TCE_IOMMU _IOW(KVMIO,  0xe4, struct
>>> kvm_create_spapr_tce_iommu)
>>
>> Shouldn't this go under the vm ioctl section?


The KVM_CREATE_SPAPR_TCE_IOMMU ioctl (the version for emulated devices) is
in this section so I decided to keep them together. Wrong?


-- 
Alexey

^ permalink raw reply

* [PATCH 0/5] powerpc/tm: Various transactional memory fixes
From: Michael Neuling @ 2013-05-27  4:09 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Michael Neuling, linuxppc-dev, Matt Evans

Bunch of fixes for transactional memory.

Michael Neuling (5):
  powerpc/tm: Make room for hypervisor in abort cause codes
  powerpc/tm: Update cause codes documentation
  powerpc/tm: Abort on emulation and alignment faults
  powerpc/tm: Move TM abort cause codes to uapi
  powerpc/tm: Fix userspace stack corruption on signal delivery for
    active transactions

 Documentation/powerpc/transactional_memory.txt |   27 ++++++++++++++--
 arch/powerpc/include/asm/processor.h           |   13 +++-----
 arch/powerpc/include/asm/reg.h                 |   11 -------
 arch/powerpc/include/asm/signal.h              |    3 ++
 arch/powerpc/include/asm/tm.h                  |    2 ++
 arch/powerpc/include/uapi/asm/tm.h             |   18 +++++++++++
 arch/powerpc/kernel/signal.c                   |   40 ++++++++++++++++++++++--
 arch/powerpc/kernel/signal.h                   |    2 +-
 arch/powerpc/kernel/signal_32.c                |   10 ++----
 arch/powerpc/kernel/signal_64.c                |   23 +++++---------
 arch/powerpc/kernel/traps.c                    |   29 +++++++++++++++++
 11 files changed, 129 insertions(+), 49 deletions(-)
 create mode 100644 arch/powerpc/include/uapi/asm/tm.h

-- 
1.7.10.4

^ permalink raw reply

* [PATCH 1/5] powerpc/tm: Make room for hypervisor in abort cause codes
From: Michael Neuling @ 2013-05-27  4:09 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Michael Neuling, linuxppc-dev, Matt Evans
In-Reply-To: <1369627781-31088-1-git-send-email-mikey@neuling.org>

PAPR carves out 0xff-0xe0 for hypervisor use of transactional memory software
abort cause codes.  Unfortunately we don't respect this currently.

Below fixes this to move our cause codes to below this region.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Cc: <stable@vger.kernel.org> # 3.9 only
---
 arch/powerpc/include/asm/reg.h |   15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index a613651..8f6a94b 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -113,14 +113,15 @@
 
 /* Reason codes describing kernel causes for transaction aborts.  By
    convention, bit0 is copied to TEXASR[56] (IBM bit 7) which is set if
-   the failure is persistent.
+   the failure is persistent.  PAPR saves 0xff-0xe0 for the hypervisor.
 */
-#define TM_CAUSE_RESCHED	0xfe
-#define TM_CAUSE_TLBI		0xfc
-#define TM_CAUSE_FAC_UNAV	0xfa
-#define TM_CAUSE_SYSCALL	0xf9 /* Persistent */
-#define TM_CAUSE_MISC		0xf6
-#define TM_CAUSE_SIGNAL		0xf4
+#define TM_CAUSE_PERSISTENT	0x01
+#define TM_CAUSE_RESCHED	0xde
+#define TM_CAUSE_TLBI		0xdc
+#define TM_CAUSE_FAC_UNAV	0xda
+#define TM_CAUSE_SYSCALL	0xd8  /* future use */
+#define TM_CAUSE_MISC		0xd6  /* future use */
+#define TM_CAUSE_SIGNAL		0xd4
 
 #if defined(CONFIG_PPC_BOOK3S_64)
 #define MSR_64BIT	MSR_SF
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 2/5] powerpc/tm: Update cause codes documentation
From: Michael Neuling @ 2013-05-27  4:09 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Michael Neuling, linuxppc-dev, Matt Evans
In-Reply-To: <1369627781-31088-1-git-send-email-mikey@neuling.org>

Signed-off-by: Michael Neuling <mikey@neuling.org>
Cc: <stable@vger.kernel.org> # 3.9 only
---
 Documentation/powerpc/transactional_memory.txt |    1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/powerpc/transactional_memory.txt b/Documentation/powerpc/transactional_memory.txt
index c907be4..84e04a0 100644
--- a/Documentation/powerpc/transactional_memory.txt
+++ b/Documentation/powerpc/transactional_memory.txt
@@ -155,6 +155,7 @@ These are defined in <asm/reg.h>, and distinguish different reasons why the
 kernel aborted a transaction:
 
  TM_CAUSE_RESCHED       Thread was rescheduled.
+ TM_CAUSE_TLBI          Software TLB invalide.
  TM_CAUSE_FAC_UNAV      FP/VEC/VSX unavailable trap.
  TM_CAUSE_SYSCALL       Currently unused; future syscalls that must abort
                         transactions for consistency will use this.
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 3/5] powerpc/tm: Abort on emulation and alignment faults
From: Michael Neuling @ 2013-05-27  4:09 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Michael Neuling, linuxppc-dev, Matt Evans
In-Reply-To: <1369627781-31088-1-git-send-email-mikey@neuling.org>

If we are emulating an instruction inside an active user transaction that
touches memory, the kernel can't emulate it as it operates in transactional
suspend context.  We need to abort these transactions and send them back to
userspace for the hardware to rollback.

We can service these if the user transaction is in suspend mode, since the
kernel will operate in the same suspend context.

This adds a check to all alignment faults and to specific instruction
emulations (only string instructions for now).  If the user process is in an
active (non-suspended) transaction, we abort the transaction go back to
userspace allowing the HW to roll back the transaction and tell the user of the
failure.  This also adds new tm abort cause codes to report the reason of the
persistent error to the user.

Crappy test case here http://neuling.org/devel/junkcode/aligntm.c

Signed-off-by: Michael Neuling <mikey@neuling.org>
Cc: <stable@vger.kernel.org> # v3.9
---
 Documentation/powerpc/transactional_memory.txt |    7 ++++--
 arch/powerpc/include/asm/reg.h                 |    2 ++
 arch/powerpc/kernel/traps.c                    |   29 ++++++++++++++++++++++++
 3 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/Documentation/powerpc/transactional_memory.txt b/Documentation/powerpc/transactional_memory.txt
index 84e04a0..c54bf31 100644
--- a/Documentation/powerpc/transactional_memory.txt
+++ b/Documentation/powerpc/transactional_memory.txt
@@ -161,9 +161,12 @@ kernel aborted a transaction:
                         transactions for consistency will use this.
  TM_CAUSE_SIGNAL        Signal delivered.
  TM_CAUSE_MISC          Currently unused.
+ TM_CAUSE_ALIGNMENT     Alignment fault.
+ TM_CAUSE_EMULATE       Emulation that touched memory.
 
-These can be checked by the user program's abort handler as TEXASR[0:7].
-
+These can be checked by the user program's abort handler as TEXASR[0:7].  If
+bit 7 is set, it indicates that the error is consider persistent.  For example
+a TM_CAUSE_ALIGNMENT will be persistent while a TM_CAUSE_RESCHED will not.q
 
 GDB
 ===
diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 8f6a94b..d0528e0 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -122,6 +122,8 @@
 #define TM_CAUSE_SYSCALL	0xd8  /* future use */
 #define TM_CAUSE_MISC		0xd6  /* future use */
 #define TM_CAUSE_SIGNAL		0xd4
+#define TM_CAUSE_ALIGNMENT	0xd2
+#define TM_CAUSE_EMULATE	0xd0
 
 #if defined(CONFIG_PPC_BOOK3S_64)
 #define MSR_64BIT	MSR_SF
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index a7a648f..f18c79c 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -53,6 +53,7 @@
 #ifdef CONFIG_PPC64
 #include <asm/firmware.h>
 #include <asm/processor.h>
+#include <asm/tm.h>
 #endif
 #include <asm/kexec.h>
 #include <asm/ppc-opcode.h>
@@ -932,6 +933,28 @@ static int emulate_isel(struct pt_regs *regs, u32 instword)
 	return 0;
 }
 
+#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
+static inline bool tm_abort_check(struct pt_regs *regs, int cause)
+{
+        /* If we're emulating a load/store in an active transaction, we cannot
+         * emulate it as the kernel operates in transaction suspended context.
+         * We need to abort the transaction.  This creates a persistent TM
+         * abort so tell the user what caused it with a new code.
+	 */
+	if (MSR_TM_TRANSACTIONAL(regs->msr)) {
+		tm_enable();
+		tm_abort(cause);
+		return true;
+	}
+	return false;
+}
+#else
+static inline bool tm_abort_check(struct pt_regs *regs, int reason)
+{
+	return false;
+}
+#endif
+
 static int emulate_instruction(struct pt_regs *regs)
 {
 	u32 instword;
@@ -971,6 +994,9 @@ static int emulate_instruction(struct pt_regs *regs)
 
 	/* Emulate load/store string insn. */
 	if ((instword & PPC_INST_STRING_GEN_MASK) == PPC_INST_STRING) {
+		if (tm_abort_check(regs,
+				   TM_CAUSE_EMULATE | TM_CAUSE_PERSISTENT))
+			return -EINVAL;
 		PPC_WARN_EMULATED(string, regs);
 		return emulate_string_inst(regs, instword);
 	}
@@ -1148,6 +1174,9 @@ void alignment_exception(struct pt_regs *regs)
 	if (!arch_irq_disabled_regs(regs))
 		local_irq_enable();
 
+	if (tm_abort_check(regs, TM_CAUSE_ALIGNMENT | TM_CAUSE_PERSISTENT))
+		goto bail;
+
 	/* we don't implement logging of alignment exceptions */
 	if (!(current->thread.align_ctl & PR_UNALIGN_SIGBUS))
 		fixed = fix_alignment(regs);
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 4/5] powerpc/tm: Move TM abort cause codes to uapi
From: Michael Neuling @ 2013-05-27  4:09 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Michael Neuling, linuxppc-dev, Matt Evans
In-Reply-To: <1369627781-31088-1-git-send-email-mikey@neuling.org>

These cause codes are usable by userspace, so let's export to uapi.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Cc: <stable@vger.kernel.org> # v3.9
---
 arch/powerpc/include/asm/reg.h     |   14 --------------
 arch/powerpc/include/asm/tm.h      |    2 ++
 arch/powerpc/include/uapi/asm/tm.h |   18 ++++++++++++++++++
 3 files changed, 20 insertions(+), 14 deletions(-)
 create mode 100644 arch/powerpc/include/uapi/asm/tm.h

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index d0528e0..4a9e408 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -111,20 +111,6 @@
 #define MSR_TM_TRANSACTIONAL(x)	(((x) & MSR_TS_MASK) == MSR_TS_T)
 #define MSR_TM_SUSPENDED(x)	(((x) & MSR_TS_MASK) == MSR_TS_S)
 
-/* Reason codes describing kernel causes for transaction aborts.  By
-   convention, bit0 is copied to TEXASR[56] (IBM bit 7) which is set if
-   the failure is persistent.  PAPR saves 0xff-0xe0 for the hypervisor.
-*/
-#define TM_CAUSE_PERSISTENT	0x01
-#define TM_CAUSE_RESCHED	0xde
-#define TM_CAUSE_TLBI		0xdc
-#define TM_CAUSE_FAC_UNAV	0xda
-#define TM_CAUSE_SYSCALL	0xd8  /* future use */
-#define TM_CAUSE_MISC		0xd6  /* future use */
-#define TM_CAUSE_SIGNAL		0xd4
-#define TM_CAUSE_ALIGNMENT	0xd2
-#define TM_CAUSE_EMULATE	0xd0
-
 #if defined(CONFIG_PPC_BOOK3S_64)
 #define MSR_64BIT	MSR_SF
 
diff --git a/arch/powerpc/include/asm/tm.h b/arch/powerpc/include/asm/tm.h
index 4b4449a..9dfbc34 100644
--- a/arch/powerpc/include/asm/tm.h
+++ b/arch/powerpc/include/asm/tm.h
@@ -5,6 +5,8 @@
  * Copyright 2012 Matt Evans & Michael Neuling, IBM Corporation.
  */
 
+#include <uapi/asm/tm.h>
+
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
 extern void do_load_up_transact_fpu(struct thread_struct *thread);
 extern void do_load_up_transact_altivec(struct thread_struct *thread);
diff --git a/arch/powerpc/include/uapi/asm/tm.h b/arch/powerpc/include/uapi/asm/tm.h
new file mode 100644
index 0000000..85059a0
--- /dev/null
+++ b/arch/powerpc/include/uapi/asm/tm.h
@@ -0,0 +1,18 @@
+#ifndef _ASM_POWERPC_TM_H
+#define _ASM_POWERPC_TM_H
+
+/* Reason codes describing kernel causes for transaction aborts.  By
+ * convention, bit0 is copied to TEXASR[56] (IBM bit 7) which is set if
+ * the failure is persistent.  PAPR saves 0xff-0xe0 for the hypervisor.
+ */
+#define TM_CAUSE_PERSISTENT	0x01
+#define TM_CAUSE_RESCHED	0xde
+#define TM_CAUSE_TLBI		0xdc
+#define TM_CAUSE_FAC_UNAV	0xda
+#define TM_CAUSE_SYSCALL	0xd8  /* future use */
+#define TM_CAUSE_MISC		0xd6  /* future use */
+#define TM_CAUSE_SIGNAL		0xd4
+#define TM_CAUSE_ALIGNMENT	0xd2
+#define TM_CAUSE_EMULATE	0xd0
+
+#endif
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 5/5] powerpc/tm: Fix userspace stack corruption on signal delivery for active transactions
From: Michael Neuling @ 2013-05-27  4:09 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Michael Neuling, linuxppc-dev, Matt Evans
In-Reply-To: <1369627781-31088-1-git-send-email-mikey@neuling.org>

When in an active transaction that takes a signal, we need to be careful with
the stack.  It's possible that the stack has moved back up after the tbegin.
The obvious case here is when the tbegin is called inside a function that
returns before a tend.  In this case, the stack is part of the checkpointed
transactional memory state.  If we write over this non transactionally or in
suspend, we are in trouble because if we get a tm abort, the program counter
and stack pointer will be back at the tbegin but our in memory stack won't be
valid anymore.

To avoid this, when taking a signal in an active transaction, we need to use
the stack pointer from the checkpointed state, rather than the speculated
state.  This ensures that the signal context (written tm suspended) will be
written below the stack required for the rollback.  The transaction is aborted
becuase of the treclaim, so any memory written between the tbegin and the
signal will be rolled back anyway.

For signals taken in non-TM or suspended mode, we use the
normal/non-checkpointed stack pointer.

Tested with 64 and 32 bit signals

Signed-off-by: Michael Neuling <mikey@neuling.org>
Cc: <stable@vger.kernel.org> # v3.9
---
 Documentation/powerpc/transactional_memory.txt |   19 +++++++++++
 arch/powerpc/include/asm/processor.h           |   13 +++-----
 arch/powerpc/include/asm/signal.h              |    3 ++
 arch/powerpc/kernel/signal.c                   |   40 ++++++++++++++++++++++--
 arch/powerpc/kernel/signal.h                   |    2 +-
 arch/powerpc/kernel/signal_32.c                |   10 ++----
 arch/powerpc/kernel/signal_64.c                |   23 +++++---------
 7 files changed, 74 insertions(+), 36 deletions(-)

diff --git a/Documentation/powerpc/transactional_memory.txt b/Documentation/powerpc/transactional_memory.txt
index c54bf31..dc23e58 100644
--- a/Documentation/powerpc/transactional_memory.txt
+++ b/Documentation/powerpc/transactional_memory.txt
@@ -147,6 +147,25 @@ Example signal handler:
       fix_the_problem(ucp->dar);
     }
 
+When in an active transaction that takes a signal, we need to be careful with
+the stack.  It's possible that the stack has moved back up after the tbegin.
+The obvious case here is when the tbegin is called inside a function that
+returns before a tend.  In this case, the stack is part of the checkpointed
+transactional memory state.  If we write over this non transactionally or in
+suspend, we are in trouble because if we get a tm abort, the program counter and
+stack pointer will be back at the tbegin but our in memory stack won't be valid
+anymore.
+
+To avoid this, when taking a signal in an active transaction, we need to use
+the stack pointer from the checkpointed state, rather than the speculated
+state.  This ensures that the signal context (written tm suspended) will be
+written below the stack required for the rollback.  The transaction is aborted
+becuase of the treclaim, so any memory written between the tbegin and the
+signal will be rolled back anyway.
+
+For signals taken in non-TM or suspended mode, we use the
+normal/non-checkpointed stack pointer.
+
 
 Failure cause codes used by kernel
 ==================================
diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
index 594db6b..14a6583 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -409,21 +409,16 @@ static inline void prefetchw(const void *x)
 #endif
 
 #ifdef CONFIG_PPC64
-static inline unsigned long get_clean_sp(struct pt_regs *regs, int is_32)
+static inline unsigned long get_clean_sp(unsigned long sp, int is_32)
 {
-	unsigned long sp;
-
 	if (is_32)
-		sp = regs->gpr[1] & 0x0ffffffffUL;
-	else
-		sp = regs->gpr[1];
-
+		return sp & 0x0ffffffffUL;
 	return sp;
 }
 #else
-static inline unsigned long get_clean_sp(struct pt_regs *regs, int is_32)
+static inline unsigned long get_clean_sp(unsigned long sp, int is_32)
 {
-	return regs->gpr[1];
+	return sp;
 }
 #endif
 
diff --git a/arch/powerpc/include/asm/signal.h b/arch/powerpc/include/asm/signal.h
index fbe66c4..9322c28 100644
--- a/arch/powerpc/include/asm/signal.h
+++ b/arch/powerpc/include/asm/signal.h
@@ -3,5 +3,8 @@
 
 #define __ARCH_HAS_SA_RESTORER
 #include <uapi/asm/signal.h>
+#include <uapi/asm/ptrace.h>
+
+extern unsigned long get_tm_stackpointer(struct pt_regs *regs);
 
 #endif /* _ASM_POWERPC_SIGNAL_H */
diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
index 577a8aa..457e97a 100644
--- a/arch/powerpc/kernel/signal.c
+++ b/arch/powerpc/kernel/signal.c
@@ -18,6 +18,7 @@
 #include <asm/uaccess.h>
 #include <asm/unistd.h>
 #include <asm/debug.h>
+#include <asm/tm.h>
 
 #include "signal.h"
 
@@ -30,13 +31,13 @@ int show_unhandled_signals = 1;
 /*
  * Allocate space for the signal frame
  */
-void __user * get_sigframe(struct k_sigaction *ka, struct pt_regs *regs,
+void __user * get_sigframe(struct k_sigaction *ka, unsigned long sp,
 			   size_t frame_size, int is_32)
 {
         unsigned long oldsp, newsp;
 
         /* Default to using normal stack */
-        oldsp = get_clean_sp(regs, is_32);
+        oldsp = get_clean_sp(sp, is_32);
 
 	/* Check for alt stack */
 	if ((ka->sa.sa_flags & SA_ONSTACK) &&
@@ -175,3 +176,38 @@ void do_notify_resume(struct pt_regs *regs, unsigned long thread_info_flags)
 
 	user_enter();
 }
+
+unsigned long get_tm_stackpointer(struct pt_regs *regs)
+{
+	/* When in an active transaction that takes a signal, we need to be
+	 * careful with the stack.  It's possible that the stack has moved back
+	 * up after the tbegin.  The obvious case here is when the tbegin is
+	 * called inside a function that returns before a tend.  In this case,
+	 * the stack is part of the checkpointed transactional memory state.
+	 * If we write over this non transactionally or in suspend, we are in
+	 * trouble because if we get a tm abort, the program counter and stack
+	 * pointer will be back at the tbegin but our in memory stack won't be
+	 * valid anymore.
+	 *
+	 * To avoid this, when taking a signal in an active transaction, we
+	 * need to use the stack pointer from the checkpointed state, rather
+	 * than the speculated state.  This ensures that the signal context
+	 * (written tm suspended) will be written below the stack required for
+	 * the rollback.  The transaction is aborted becuase of the treclaim,
+	 * so any memory written between the tbegin and the signal will be
+	 * rolled back anyway.
+	 *
+	 * For signals taken in non-TM or suspended mode, we use the
+	 * normal/non-checkpointed stack pointer.
+	 */
+
+#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
+	if (MSR_TM_ACTIVE(regs->msr)) {
+		tm_enable();
+		tm_reclaim(&current->thread, regs->msr, TM_CAUSE_SIGNAL);
+		if (MSR_TM_TRANSACTIONAL(regs->msr))
+			return current->thread.ckpt_regs.gpr[1];
+	}
+#endif
+	return regs->gpr[1];
+}
diff --git a/arch/powerpc/kernel/signal.h b/arch/powerpc/kernel/signal.h
index ec84c90..c69b9ae 100644
--- a/arch/powerpc/kernel/signal.h
+++ b/arch/powerpc/kernel/signal.h
@@ -12,7 +12,7 @@
 
 extern void do_notify_resume(struct pt_regs *regs, unsigned long thread_info_flags);
 
-extern void __user * get_sigframe(struct k_sigaction *ka, struct pt_regs *regs,
+extern void __user * get_sigframe(struct k_sigaction *ka, unsigned long sp,
 				  size_t frame_size, int is_32);
 
 extern int handle_signal32(unsigned long sig, struct k_sigaction *ka,
diff --git a/arch/powerpc/kernel/signal_32.c b/arch/powerpc/kernel/signal_32.c
index 95068bf..201385c 100644
--- a/arch/powerpc/kernel/signal_32.c
+++ b/arch/powerpc/kernel/signal_32.c
@@ -503,12 +503,6 @@ static int save_tm_user_regs(struct pt_regs *regs,
 {
 	unsigned long msr = regs->msr;
 
-	/* tm_reclaim rolls back all reg states, updating thread.ckpt_regs,
-	 * thread.transact_fpr[], thread.transact_vr[], etc.
-	 */
-	tm_enable();
-	tm_reclaim(&current->thread, msr, TM_CAUSE_SIGNAL);
-
 	/* Make sure floating point registers are stored in regs */
 	flush_fp_to_thread(current);
 
@@ -965,7 +959,7 @@ int handle_rt_signal32(unsigned long sig, struct k_sigaction *ka,
 
 	/* Set up Signal Frame */
 	/* Put a Real Time Context onto stack */
-	rt_sf = get_sigframe(ka, regs, sizeof(*rt_sf), 1);
+	rt_sf = get_sigframe(ka, get_tm_stackpointer(regs), sizeof(*rt_sf), 1);
 	addr = rt_sf;
 	if (unlikely(rt_sf == NULL))
 		goto badframe;
@@ -1403,7 +1397,7 @@ int handle_signal32(unsigned long sig, struct k_sigaction *ka,
 	unsigned long tramp;
 
 	/* Set up Signal Frame */
-	frame = get_sigframe(ka, regs, sizeof(*frame), 1);
+	frame = get_sigframe(ka, get_tm_stackpointer(regs), sizeof(*frame), 1);
 	if (unlikely(frame == NULL))
 		goto badframe;
 	sc = (struct sigcontext __user *) &frame->sctx;
diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
index c179428..3459473 100644
--- a/arch/powerpc/kernel/signal_64.c
+++ b/arch/powerpc/kernel/signal_64.c
@@ -154,11 +154,12 @@ static long setup_sigcontext(struct sigcontext __user *sc, struct pt_regs *regs,
  * As above, but Transactional Memory is in use, so deliver sigcontexts
  * containing checkpointed and transactional register states.
  *
- * To do this, we treclaim to gather both sets of registers and set up the
- * 'normal' sigcontext registers with rolled-back register values such that a
- * simple signal handler sees a correct checkpointed register state.
- * If interested, a TM-aware sighandler can examine the transactional registers
- * in the 2nd sigcontext to determine the real origin of the signal.
+ * To do this, we treclaim (done before entering here) to gather both sets of
+ * registers and set up the 'normal' sigcontext registers with rolled-back
+ * register values such that a simple signal handler sees a correct
+ * checkpointed register state.  If interested, a TM-aware sighandler can
+ * examine the transactional registers in the 2nd sigcontext to determine the
+ * real origin of the signal.
  */
 static long setup_tm_sigcontexts(struct sigcontext __user *sc,
 				 struct sigcontext __user *tm_sc,
@@ -184,16 +185,6 @@ static long setup_tm_sigcontexts(struct sigcontext __user *sc,
 
 	BUG_ON(!MSR_TM_ACTIVE(regs->msr));
 
-	/* tm_reclaim rolls back all reg states, saving checkpointed (older)
-	 * GPRs to thread.ckpt_regs and (if used) FPRs to (newer)
-	 * thread.transact_fp and/or VRs to (newer) thread.transact_vr.
-	 * THEN we save out FP/VRs, if necessary, to the checkpointed (older)
-	 * thread.fr[]/vr[]s.  The transactional (newer) GPRs are on the
-	 * stack, in *regs.
-	 */
-	tm_enable();
-	tm_reclaim(&current->thread, msr, TM_CAUSE_SIGNAL);
-
 	flush_fp_to_thread(current);
 
 #ifdef CONFIG_ALTIVEC
@@ -711,7 +702,7 @@ int handle_rt_signal64(int signr, struct k_sigaction *ka, siginfo_t *info,
 	unsigned long newsp = 0;
 	long err = 0;
 
-	frame = get_sigframe(ka, regs, sizeof(*frame), 0);
+	frame = get_sigframe(ka, get_tm_stackpointer(regs), sizeof(*frame), 0);
 	if (unlikely(frame == NULL))
 		goto badframe;
 
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH v2 4/5] powerpc/tm: Move TM abort cause codes to uapi
From: Michael Neuling @ 2013-05-27  4:30 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Matt Evans
In-Reply-To: <1369627781-31088-5-git-send-email-mikey@neuling.org>

These cause codes are usable by userspace, so let's export to uapi.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Cc: <stable@vger.kernel.org> # v3.9
--
v2:
  add uapi/tm.h to Kbuild so it gets exported correctly

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index d0528e0..4a9e408 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -111,20 +111,6 @@
 #define MSR_TM_TRANSACTIONAL(x)	(((x) & MSR_TS_MASK) == MSR_TS_T)
 #define MSR_TM_SUSPENDED(x)	(((x) & MSR_TS_MASK) == MSR_TS_S)
 
-/* Reason codes describing kernel causes for transaction aborts.  By
-   convention, bit0 is copied to TEXASR[56] (IBM bit 7) which is set if
-   the failure is persistent.  PAPR saves 0xff-0xe0 for the hypervisor.
-*/
-#define TM_CAUSE_PERSISTENT	0x01
-#define TM_CAUSE_RESCHED	0xde
-#define TM_CAUSE_TLBI		0xdc
-#define TM_CAUSE_FAC_UNAV	0xda
-#define TM_CAUSE_SYSCALL	0xd8  /* future use */
-#define TM_CAUSE_MISC		0xd6  /* future use */
-#define TM_CAUSE_SIGNAL		0xd4
-#define TM_CAUSE_ALIGNMENT	0xd2
-#define TM_CAUSE_EMULATE	0xd0
-
 #if defined(CONFIG_PPC_BOOK3S_64)
 #define MSR_64BIT	MSR_SF
 
diff --git a/arch/powerpc/include/asm/tm.h b/arch/powerpc/include/asm/tm.h
index 4b4449a..9dfbc34 100644
--- a/arch/powerpc/include/asm/tm.h
+++ b/arch/powerpc/include/asm/tm.h
@@ -5,6 +5,8 @@
  * Copyright 2012 Matt Evans & Michael Neuling, IBM Corporation.
  */
 
+#include <uapi/asm/tm.h>
+
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
 extern void do_load_up_transact_fpu(struct thread_struct *thread);
 extern void do_load_up_transact_altivec(struct thread_struct *thread);
diff --git a/arch/powerpc/include/uapi/asm/Kbuild b/arch/powerpc/include/uapi/asm/Kbuild
index f7bca63..5182c86 100644
--- a/arch/powerpc/include/uapi/asm/Kbuild
+++ b/arch/powerpc/include/uapi/asm/Kbuild
@@ -40,6 +40,7 @@ header-y += statfs.h
 header-y += swab.h
 header-y += termbits.h
 header-y += termios.h
+header-y += tm.h
 header-y += types.h
 header-y += ucontext.h
 header-y += unistd.h
diff --git a/arch/powerpc/include/uapi/asm/tm.h b/arch/powerpc/include/uapi/asm/tm.h
new file mode 100644
index 0000000..85059a0
--- /dev/null
+++ b/arch/powerpc/include/uapi/asm/tm.h
@@ -0,0 +1,18 @@
+#ifndef _ASM_POWERPC_TM_H
+#define _ASM_POWERPC_TM_H
+
+/* Reason codes describing kernel causes for transaction aborts.  By
+ * convention, bit0 is copied to TEXASR[56] (IBM bit 7) which is set if
+ * the failure is persistent.  PAPR saves 0xff-0xe0 for the hypervisor.
+ */
+#define TM_CAUSE_PERSISTENT	0x01
+#define TM_CAUSE_RESCHED	0xde
+#define TM_CAUSE_TLBI		0xdc
+#define TM_CAUSE_FAC_UNAV	0xda
+#define TM_CAUSE_SYSCALL	0xd8  /* future use */
+#define TM_CAUSE_MISC		0xd6  /* future use */
+#define TM_CAUSE_SIGNAL		0xd4
+#define TM_CAUSE_ALIGNMENT	0xd2
+#define TM_CAUSE_EMULATE	0xd0
+
+#endif

^ permalink raw reply related

* [PATCH] powerpc/32bit,PREEMPT:Load TI_FLAGS to check NEED_RESCHED
From: Priyanka Jain @ 2013-05-27  6:27 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: scottwood, Priyanka Jain

Add instruction to load TI_FLAGS in r8
    
While returning from exception handling in case of PREEMPT enabled,
_TIF_NEED_RESCHED bit is checked in TI_FLAGS (thread_info flag) of current
task. Only if this bit is set, it should continue with the process of
calling preempt_schedule_irq() to schedule highest priority task if
available.
    
Current code assumes that r8 contains TI_FLAGS and check this for
_TIF_NEED_RESCHED, but as r8 is modified in the code which executes before
this check, r8 no longer contains the expected TI_FLAGS information.
    
As a result check for comparison with _TIF_NEED_RESCHED was failing even if
NEED_RESCHED bit is set in the current thread_info flag. Due to this,
preempt_schedule_irq() and in turn scheduler was not getting called even if
highest priority task is ready for execution.
    

Signed-off-by: Priyanka Jain <Priyanka.Jain@freescale.com>
---
 arch/powerpc/kernel/entry_32.S |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index d22e73e..0239c7f 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -887,6 +887,7 @@ resume_kernel:
 #ifdef CONFIG_PREEMPT
 	/* check current_thread_info->preempt_count */
 	lwz	r0,TI_PREEMPT(r9)
+	lwz	r8,TI_FLAGS(r9)
 	cmpwi	0,r0,0		/* if non-zero, just restore regs and return */
 	bne	restore
 	andi.	r8,r8,_TIF_NEED_RESCHED
-- 
1.7.4.1

^ permalink raw reply related

* Re: [PATCH] powerpc/32bit, PREEMPT:Load TI_FLAGS to check NEED_RESCHED
From: tiejun.chen @ 2013-05-27  6:45 UTC (permalink / raw)
  To: Priyanka Jain; +Cc: scottwood, linuxppc-dev
In-Reply-To: <1369636076-2644-1-git-send-email-Priyanka.Jain@freescale.com>

On 05/27/2013 02:27 PM, Priyanka Jain wrote:
> Add instruction to load TI_FLAGS in r8
>
> While returning from exception handling in case of PREEMPT enabled,
> _TIF_NEED_RESCHED bit is checked in TI_FLAGS (thread_info flag) of current
> task. Only if this bit is set, it should continue with the process of
> calling preempt_schedule_irq() to schedule highest priority task if
> available.
>
> Current code assumes that r8 contains TI_FLAGS and check this for
> _TIF_NEED_RESCHED, but as r8 is modified in the code which executes before

Could you elaborate this scenario? As I take a look at this path:

	...
         /* Clear _TIF_EMULATE_STACK_STORE flag */
         lis     r11,_TIF_EMULATE_STACK_STORE@h
         addi    r5,r9,TI_FLAGS
0:      lwarx   r8,0,r5
         andc    r8,r8,r11
#ifdef CONFIG_IBM405_ERR77
         dcbt    0,r5
#endif
         stwcx.  r8,0,r5
         bne-    0b
1:

#ifdef CONFIG_PREEMPT
         /* check current_thread_info->preempt_count */
         lwz     r0,TI_PREEMPT(r9)
         cmpwi   0,r0,0          /* if non-zero, just restore regs and return */
         bne     restore
         andi.   r8,r8,_TIF_NEED_RESCHED
	...

Where is R8 clobbered?

Tiejun

> this check, r8 no longer contains the expected TI_FLAGS information.
>
> As a result check for comparison with _TIF_NEED_RESCHED was failing even if
> NEED_RESCHED bit is set in the current thread_info flag. Due to this,
> preempt_schedule_irq() and in turn scheduler was not getting called even if
> highest priority task is ready for execution.
>
>
> Signed-off-by: Priyanka Jain <Priyanka.Jain@freescale.com>
> ---
>   arch/powerpc/kernel/entry_32.S |    1 +
>   1 files changed, 1 insertions(+), 0 deletions(-)
>
> diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
> index d22e73e..0239c7f 100644
> --- a/arch/powerpc/kernel/entry_32.S
> +++ b/arch/powerpc/kernel/entry_32.S
> @@ -887,6 +887,7 @@ resume_kernel:
>   #ifdef CONFIG_PREEMPT
>   	/* check current_thread_info->preempt_count */
>   	lwz	r0,TI_PREEMPT(r9)
> +	lwz	r8,TI_FLAGS(r9)
>   	cmpwi	0,r0,0		/* if non-zero, just restore regs and return */
>   	bne	restore
>   	andi.	r8,r8,_TIF_NEED_RESCHED
>

^ permalink raw reply

* RE: [PATCH] powerpc/32bit,PREEMPT:Load TI_FLAGS to check NEED_RESCHED
From: Jain Priyanka-B32167 @ 2013-05-27  6:55 UTC (permalink / raw)
  To: tiejun.chen; +Cc: Wood Scott-B07421, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <51A300F4.80103@windriver.com>

DQpJZiB3ZSBnbyBzb21lIG1vcmUgbGluZXMgdXAgaW4gdGhlIHNhbWUgZmlsZSwgdGhlIGNvZGUg
aXMgDQoNCnJlc3VtZV9rZXJuZWw6DQogICAgICAgIC8qIGNoZWNrIGN1cnJlbnRfdGhyZWFkX2lu
Zm8sIF9USUZfRU1VTEFURV9TVEFDS19TVE9SRSAqLw0KICAgICAgICBDVVJSRU5UX1RIUkVBRF9J
TkZPKHI5LCByMSkNCiAgICAgICAgbHd6ICAgICByOCxUSV9GTEFHUyhyOSkNCiAgICAgICAgYW5k
aXMuICByOCxyOCxfVElGX0VNVUxBVEVfU1RBQ0tfU1RPUkVAaA0KICAgICAgICBiZXErICAgIDFm
DQotLS0NCg0KQWZ0ZXIgZXhlY3V0aW9uIG9mIGFuZGlzLiBJbnN0cnVjdGlvbiwgcjggY29udGFp
bnMgbG9naWNhbCBBTkQgcmVzdWx0IG9mIHI4IGFuZCBfVElGX0VNVUxBVEVfU1RBQ0tfU1RPUkUg
YW5kIGl0IGFsc28gc2V0cyB0aGUgY29uZGl0aW9uIHJlZ2lzdGVyIGZsYWcuDQpJZiBiZXErIGlz
IHRydWUsIGl0IHdpbGwganVtcCB0byBsYWJlbCAnMScgd2hpY2ggcG9pbnRzIHRvIHRoZSBjb2Rl
IGNvbnRhaW5lZCBpbiB0aGUgcGF0Y2guIEluIHRoaXMgcGFydGljdWxhciBzY2VuYXJpbywgcjgg
ZG9lcyBub3QgY29udGFpbnMgdGhlIGFwcHJvcHJpYXRlIHZhbHVlLg0KDQpSZWdhcmRzDQpQcml5
YW5rYQ0KDQoNCj4gLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0NCj4gRnJvbTogdGllanVuLmNo
ZW4gW21haWx0bzp0aWVqdW4uY2hlbkB3aW5kcml2ZXIuY29tXQ0KPiBTZW50OiBNb25kYXksIE1h
eSAyNywgMjAxMyAxMjoxNSBQTQ0KPiBUbzogSmFpbiBQcml5YW5rYS1CMzIxNjcNCj4gQ2M6IGxp
bnV4cHBjLWRldkBsaXN0cy5vemxhYnMub3JnOyBXb29kIFNjb3R0LUIwNzQyMQ0KPiBTdWJqZWN0
OiBSZTogW1BBVENIXSBwb3dlcnBjLzMyYml0LFBSRUVNUFQ6TG9hZCBUSV9GTEFHUyB0byBjaGVj
aw0KPiBORUVEX1JFU0NIRUQNCj4gDQo+IE9uIDA1LzI3LzIwMTMgMDI6MjcgUE0sIFByaXlhbmth
IEphaW4gd3JvdGU6DQo+ID4gQWRkIGluc3RydWN0aW9uIHRvIGxvYWQgVElfRkxBR1MgaW4gcjgN
Cj4gPg0KPiA+IFdoaWxlIHJldHVybmluZyBmcm9tIGV4Y2VwdGlvbiBoYW5kbGluZyBpbiBjYXNl
IG9mIFBSRUVNUFQgZW5hYmxlZCwNCj4gPiBfVElGX05FRURfUkVTQ0hFRCBiaXQgaXMgY2hlY2tl
ZCBpbiBUSV9GTEFHUyAodGhyZWFkX2luZm8gZmxhZykgb2YNCj4gPiBjdXJyZW50IHRhc2suIE9u
bHkgaWYgdGhpcyBiaXQgaXMgc2V0LCBpdCBzaG91bGQgY29udGludWUgd2l0aCB0aGUNCj4gPiBw
cm9jZXNzIG9mIGNhbGxpbmcgcHJlZW1wdF9zY2hlZHVsZV9pcnEoKSB0byBzY2hlZHVsZSBoaWdo
ZXN0IHByaW9yaXR5DQo+ID4gdGFzayBpZiBhdmFpbGFibGUuDQo+ID4NCj4gPiBDdXJyZW50IGNv
ZGUgYXNzdW1lcyB0aGF0IHI4IGNvbnRhaW5zIFRJX0ZMQUdTIGFuZCBjaGVjayB0aGlzIGZvcg0K
PiA+IF9USUZfTkVFRF9SRVNDSEVELCBidXQgYXMgcjggaXMgbW9kaWZpZWQgaW4gdGhlIGNvZGUg
d2hpY2ggZXhlY3V0ZXMNCj4gPiBiZWZvcmUNCj4gDQo+IENvdWxkIHlvdSBlbGFib3JhdGUgdGhp
cyBzY2VuYXJpbz8gQXMgSSB0YWtlIGEgbG9vayBhdCB0aGlzIHBhdGg6DQo+IA0KPiAJLi4uDQo+
ICAgICAgICAgIC8qIENsZWFyIF9USUZfRU1VTEFURV9TVEFDS19TVE9SRSBmbGFnICovDQo+ICAg
ICAgICAgIGxpcyAgICAgcjExLF9USUZfRU1VTEFURV9TVEFDS19TVE9SRUBoDQo+ICAgICAgICAg
IGFkZGkgICAgcjUscjksVElfRkxBR1MNCj4gMDogICAgICBsd2FyeCAgIHI4LDAscjUNCj4gICAg
ICAgICAgYW5kYyAgICByOCxyOCxyMTENCj4gI2lmZGVmIENPTkZJR19JQk00MDVfRVJSNzcNCj4g
ICAgICAgICAgZGNidCAgICAwLHI1DQo+ICNlbmRpZg0KPiAgICAgICAgICBzdHdjeC4gIHI4LDAs
cjUNCj4gICAgICAgICAgYm5lLSAgICAwYg0KPiAxOg0KPiANCj4gI2lmZGVmIENPTkZJR19QUkVF
TVBUDQo+ICAgICAgICAgIC8qIGNoZWNrIGN1cnJlbnRfdGhyZWFkX2luZm8tPnByZWVtcHRfY291
bnQgKi8NCj4gICAgICAgICAgbHd6ICAgICByMCxUSV9QUkVFTVBUKHI5KQ0KPiAgICAgICAgICBj
bXB3aSAgIDAscjAsMCAgICAgICAgICAvKiBpZiBub24temVybywganVzdCByZXN0b3JlIHJlZ3Mg
YW5kDQo+IHJldHVybiAqLw0KPiAgICAgICAgICBibmUgICAgIHJlc3RvcmUNCj4gICAgICAgICAg
YW5kaS4gICByOCxyOCxfVElGX05FRURfUkVTQ0hFRA0KPiAJLi4uDQo+IA0KPiBXaGVyZSBpcyBS
OCBjbG9iYmVyZWQ/DQo+IA0KPiBUaWVqdW4NCj4gDQo+ID4gdGhpcyBjaGVjaywgcjggbm8gbG9u
Z2VyIGNvbnRhaW5zIHRoZSBleHBlY3RlZCBUSV9GTEFHUyBpbmZvcm1hdGlvbi4NCj4gPg0KPiA+
IEFzIGEgcmVzdWx0IGNoZWNrIGZvciBjb21wYXJpc29uIHdpdGggX1RJRl9ORUVEX1JFU0NIRUQg
d2FzIGZhaWxpbmcNCj4gPiBldmVuIGlmIE5FRURfUkVTQ0hFRCBiaXQgaXMgc2V0IGluIHRoZSBj
dXJyZW50IHRocmVhZF9pbmZvIGZsYWcuIER1ZQ0KPiA+IHRvIHRoaXMsDQo+ID4gcHJlZW1wdF9z
Y2hlZHVsZV9pcnEoKSBhbmQgaW4gdHVybiBzY2hlZHVsZXIgd2FzIG5vdCBnZXR0aW5nIGNhbGxl
ZA0KPiA+IGV2ZW4gaWYgaGlnaGVzdCBwcmlvcml0eSB0YXNrIGlzIHJlYWR5IGZvciBleGVjdXRp
b24uDQo+ID4NCj4gPg0KPiA+IFNpZ25lZC1vZmYtYnk6IFByaXlhbmthIEphaW4gPFByaXlhbmth
LkphaW5AZnJlZXNjYWxlLmNvbT4NCj4gPiAtLS0NCj4gPiAgIGFyY2gvcG93ZXJwYy9rZXJuZWwv
ZW50cnlfMzIuUyB8ICAgIDEgKw0KPiA+ICAgMSBmaWxlcyBjaGFuZ2VkLCAxIGluc2VydGlvbnMo
KyksIDAgZGVsZXRpb25zKC0pDQo+ID4NCj4gPiBkaWZmIC0tZ2l0IGEvYXJjaC9wb3dlcnBjL2tl
cm5lbC9lbnRyeV8zMi5TDQo+ID4gYi9hcmNoL3Bvd2VycGMva2VybmVsL2VudHJ5XzMyLlMgaW5k
ZXggZDIyZTczZS4uMDIzOWM3ZiAxMDA2NDQNCj4gPiAtLS0gYS9hcmNoL3Bvd2VycGMva2VybmVs
L2VudHJ5XzMyLlMNCj4gPiArKysgYi9hcmNoL3Bvd2VycGMva2VybmVsL2VudHJ5XzMyLlMNCj4g
PiBAQCAtODg3LDYgKzg4Nyw3IEBAIHJlc3VtZV9rZXJuZWw6DQo+ID4gICAjaWZkZWYgQ09ORklH
X1BSRUVNUFQNCj4gPiAgIAkvKiBjaGVjayBjdXJyZW50X3RocmVhZF9pbmZvLT5wcmVlbXB0X2Nv
dW50ICovDQo+ID4gICAJbHd6CXIwLFRJX1BSRUVNUFQocjkpDQo+ID4gKwlsd3oJcjgsVElfRkxB
R1MocjkpDQo+ID4gICAJY21wd2kJMCxyMCwwCQkvKiBpZiBub24temVybywganVzdCByZXN0b3Jl
IHJlZ3MgYW5kDQo+IHJldHVybiAqLw0KPiA+ICAgCWJuZQlyZXN0b3JlDQo+ID4gICAJYW5kaS4J
cjgscjgsX1RJRl9ORUVEX1JFU0NIRUQNCj4gPg0KPiANCg0K

^ permalink raw reply

* Re: [PATCH] powerpc/32bit,PREEMPT:Load TI_FLAGS to check NEED_RESCHED
From: Benjamin Herrenschmidt @ 2013-05-27  7:12 UTC (permalink / raw)
  To: Priyanka Jain; +Cc: scottwood, tiejun.chen, linuxppc-dev
In-Reply-To: <1369636076-2644-1-git-send-email-Priyanka.Jain@freescale.com>

On Mon, 2013-05-27 at 11:57 +0530, Priyanka Jain wrote:
> Add instruction to load TI_FLAGS in r8
>     
> While returning from exception handling in case of PREEMPT enabled,
> _TIF_NEED_RESCHED bit is checked in TI_FLAGS (thread_info flag) of
> current
> task. Only if this bit is set, it should continue with the process of
> calling preempt_schedule_irq() to schedule highest priority task if
> available.
>
> Current code assumes that r8 contains TI_FLAGS and check this for
> _TIF_NEED_RESCHED, but as r8 is modified in the code which executes
> before
> this check, r8 no longer contains the expected TI_FLAGS information.

Hrm, the code was supposed to still have TI_FLAGS in r8 and mostly does
until the 

	andis.	r8,r8,_TIF_EMULATE_STACK_STORE@h

Was added which clobbers it.

Can't we just fix the above to use a different destination register ?
r0 looks fair game at this point...
 
> As a result check for comparison with _TIF_NEED_RESCHED was failing
> even if
> NEED_RESCHED bit is set in the current thread_info flag. Due to this,
> preempt_schedule_irq() and in turn scheduler was not getting called
> even if
> highest priority task is ready for execution.

Ben.

^ permalink raw reply

* Re: [PATCH] powerpc/32bit, PREEMPT:Load TI_FLAGS to check NEED_RESCHED
From: tiejun.chen @ 2013-05-27  7:18 UTC (permalink / raw)
  To: Jain Priyanka-B32167; +Cc: Wood Scott-B07421, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <AC83832D6324604BB08FAE5A33553F190B3B7A4C@039-SN2MPN1-012.039d.mgd.msft.net>

On 05/27/2013 02:55 PM, Jain Priyanka-B32167 wrote:
>
> If we go some more lines up in the same file, the code is
>
> resume_kernel:
>          /* check current_thread_info, _TIF_EMULATE_STACK_STORE */
>          CURRENT_THREAD_INFO(r9, r1)
>          lwz     r8,TI_FLAGS(r9)
>          andis.  r8,r8,_TIF_EMULATE_STACK_STORE@h

Okay, but could you fix this directly like:

diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index e514de5..4498467 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -851,7 +851,7 @@ resume_kernel:
         /* check current_thread_info, _TIF_EMULATE_STACK_STORE */
         CURRENT_THREAD_INFO(r9, r1)
         lwz     r8,TI_FLAGS(r9)
-       andis.  r8,r8,_TIF_EMULATE_STACK_STORE@h
+       andis.  r0,r8,_TIF_EMULATE_STACK_STORE@h
         beq+    1f

         addi    r8,r1,INT_FRAME_SIZE    /* Get the kprobed function entry */

Tiejun

^ permalink raw reply related

* RE: [PATCH] powerpc/32bit,PREEMPT:Load TI_FLAGS to check NEED_RESCHED
From: Jain Priyanka-B32167 @ 2013-05-27  7:21 UTC (permalink / raw)
  To: tiejun.chen, Benjamin Herrenschmidt
  Cc: Wood Scott-B07421, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <51A308D2.6040101@windriver.com>

WWVzLA0KQmVuIGlzIGFsc28gcG9pbnRpbmcgdG8gc2FtZSB0aGluZy4NCkkgd2lsbCBzZW5kIGFu
b3RoZXIgcGF0Y2ggd2l0aCBjaGFuZ2VzIHN1Z2dlc3RlZCAod2l0aCBib3RoIHN1YmplY3QgYW5k
IGRlc2NyaXB0aW9uIG1vZGlmaWVkKSB3aGljaCB3aWxsIHN1cGVyc2VkZSB0aGlzIHBhdGNoLg0K
DQpSZWdhcmRzDQpQcml5YW5rYQ0KDQo+IC0tLS0tT3JpZ2luYWwgTWVzc2FnZS0tLS0tDQo+IEZy
b206IHRpZWp1bi5jaGVuIFttYWlsdG86dGllanVuLmNoZW5Ad2luZHJpdmVyLmNvbV0NCj4gU2Vu
dDogTW9uZGF5LCBNYXkgMjcsIDIwMTMgMTI6NDkgUE0NCj4gVG86IEphaW4gUHJpeWFua2EtQjMy
MTY3DQo+IENjOiBsaW51eHBwYy1kZXZAbGlzdHMub3psYWJzLm9yZzsgV29vZCBTY290dC1CMDc0
MjENCj4gU3ViamVjdDogUmU6IFtQQVRDSF0gcG93ZXJwYy8zMmJpdCxQUkVFTVBUOkxvYWQgVElf
RkxBR1MgdG8gY2hlY2sNCj4gTkVFRF9SRVNDSEVEDQo+IA0KPiBPbiAwNS8yNy8yMDEzIDAyOjU1
IFBNLCBKYWluIFByaXlhbmthLUIzMjE2NyB3cm90ZToNCj4gPg0KPiA+IElmIHdlIGdvIHNvbWUg
bW9yZSBsaW5lcyB1cCBpbiB0aGUgc2FtZSBmaWxlLCB0aGUgY29kZSBpcw0KPiA+DQo+ID4gcmVz
dW1lX2tlcm5lbDoNCj4gPiAgICAgICAgICAvKiBjaGVjayBjdXJyZW50X3RocmVhZF9pbmZvLCBf
VElGX0VNVUxBVEVfU1RBQ0tfU1RPUkUgKi8NCj4gPiAgICAgICAgICBDVVJSRU5UX1RIUkVBRF9J
TkZPKHI5LCByMSkNCj4gPiAgICAgICAgICBsd3ogICAgIHI4LFRJX0ZMQUdTKHI5KQ0KPiA+ICAg
ICAgICAgIGFuZGlzLiAgcjgscjgsX1RJRl9FTVVMQVRFX1NUQUNLX1NUT1JFQGgNCj4gDQo+IE9r
YXksIGJ1dCBjb3VsZCB5b3UgZml4IHRoaXMgZGlyZWN0bHkgbGlrZToNCj4gDQo+IGRpZmYgLS1n
aXQgYS9hcmNoL3Bvd2VycGMva2VybmVsL2VudHJ5XzMyLlMNCj4gYi9hcmNoL3Bvd2VycGMva2Vy
bmVsL2VudHJ5XzMyLlMgaW5kZXggZTUxNGRlNS4uNDQ5ODQ2NyAxMDA2NDQNCj4gLS0tIGEvYXJj
aC9wb3dlcnBjL2tlcm5lbC9lbnRyeV8zMi5TDQo+ICsrKyBiL2FyY2gvcG93ZXJwYy9rZXJuZWwv
ZW50cnlfMzIuUw0KPiBAQCAtODUxLDcgKzg1MSw3IEBAIHJlc3VtZV9rZXJuZWw6DQo+ICAgICAg
ICAgIC8qIGNoZWNrIGN1cnJlbnRfdGhyZWFkX2luZm8sIF9USUZfRU1VTEFURV9TVEFDS19TVE9S
RSAqLw0KPiAgICAgICAgICBDVVJSRU5UX1RIUkVBRF9JTkZPKHI5LCByMSkNCj4gICAgICAgICAg
bHd6ICAgICByOCxUSV9GTEFHUyhyOSkNCj4gLSAgICAgICBhbmRpcy4gIHI4LHI4LF9USUZfRU1V
TEFURV9TVEFDS19TVE9SRUBoDQo+ICsgICAgICAgYW5kaXMuICByMCxyOCxfVElGX0VNVUxBVEVf
U1RBQ0tfU1RPUkVAaA0KPiAgICAgICAgICBiZXErICAgIDFmDQo+IA0KPiAgICAgICAgICBhZGRp
ICAgIHI4LHIxLElOVF9GUkFNRV9TSVpFICAgIC8qIEdldCB0aGUga3Byb2JlZCBmdW5jdGlvbg0K
PiBlbnRyeSAqLw0KPiANCj4gVGllanVuDQoNCg==

^ permalink raw reply

* [PATCH] powerpc/32bit:Store temporary result in r0 instead of r8
From: Priyanka Jain @ 2013-05-27  7:30 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: scottwood, Priyanka Jain, tiejun.chen

While returning from exception handling in case of PREEMPT enabled,
_TIF_NEED_RESCHED bit is checked in TI_FLAGS (thread_info flag) of current
task. Only if this bit is set, it should continue with the process of
calling preempt_schedule_irq() to schedule highest priority task if
available.
    
Current code assumes that r8 contains TI_FLAGS and check this for
_TIF_NEED_RESCHED, but as r8 is modified in the code which executes before
this check, r8 no longer contains the expected TI_FLAGS information.
    
As a result check for comparison with _TIF_NEED_RESCHED was failing even if
NEED_RESCHED bit is set in the current thread_info flag. Due to this,
preempt_schedule_irq() and in turn scheduler was not getting called even if
highest priority task is ready for execution.
    
So, store temporary results in r0 instead of r8 to prevent r8 from getting
modified as subsequent code is dependent on its value.

Signed-off-by: Priyanka Jain <Priyanka.Jain@freescale.com>
---
 Supersedes patch:
	powerpc/32bit,PREEMPT:Load TI_FLAGS to check NEED_RESCHED to
 incorporate review comments from Ben and Tiejun

 arch/powerpc/kernel/entry_32.S |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index d22e73e..22b45a4 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -849,7 +849,7 @@ resume_kernel:
 	/* check current_thread_info, _TIF_EMULATE_STACK_STORE */
 	CURRENT_THREAD_INFO(r9, r1)
 	lwz	r8,TI_FLAGS(r9)
-	andis.	r8,r8,_TIF_EMULATE_STACK_STORE@h
+	andis.	r0,r8,_TIF_EMULATE_STACK_STORE@h
 	beq+	1f
 
 	addi	r8,r1,INT_FRAME_SIZE	/* Get the kprobed function entry */
-- 
1.7.4.1

^ permalink raw reply related

* RE: SATA hang on 8315E triggered by heavy flash write?
From: Xie Shaohui-B21989 @ 2013-05-27  7:50 UTC (permalink / raw)
  To: Anthony Foiani; +Cc: Wood Scott-B07421, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <g4ndt3f5q.fsf@dworkin.scrye.com>

> > So it seems the NOR write break the signal Integrity of SATA.
> > I don't have schematic and board right now, could you please measure
> > signals related to NOR write to see if anything abnormal? Is the board
> > use FPGA or CPLD to control signal?
>=20
> I'll have to pass these questions on to my hardware vendor; I'm not
> equipped to do this level of hardware debugging (neither hardware nor
> knowledge!).
>=20
> > If stop NOR write, could the SATA recover and work?
>=20
> Earlier in my development, I was seeing this error and it would
> recover:
>=20
>   ata2: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0xe frozen
>   ata2: PHY RDY changed
>   ata2: hard resetting link
>   ata2: Signature Update detected @ 0 msecs
>   ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
>   ata2.00: configured for UDMA/133
>   ata2: EH complete
>=20
> At the current time, however, it seems that it does not recover.
>=20
> I don't know whether this is due to the speed limiting code, or if it's
> because we are doing more disk accesses (when the actual product is up
> and running).
[S.H] it seems it's not due to speed limiting code, 1.5Gbps is still used t=
o recover link.

>=20
> I can re-do the tests with the speed limit disabled, but I won't be able
> to get to that for a few hours yet.  You can read about the speed limit
> issues in this thread:
>=20
>   http://article.gmane.org/gmane.linux.ports.ppc.embedded/50652
>=20
> And my final patch (yes, a year later):
>=20
>   http://article.gmane.org/gmane.linux.ports.ppc.embedded/58969
[S.H] for the speed limit issue, I checked 3.4.rc7 kernel, there seems a pl=
ace can be used to limit the speed for 8315:
	if (!of_device_is_compatible(ofdev->dev.of_node, "fsl,mpc8315-sata")) {
		temp =3D ioread32(csr_base + TRANSCFG);
		temp =3D temp & 0xffffffe0;
		iowrite32(temp | TRANSCFG_RX_WATER_MARK, csr_base + TRANSCFG);
	} else {
		/* the speed limitation code for 8315 may can be put here.
		* just move the original code which wrapped by "#ifdef CONFIG_MPC8315_DS"=
 here.
		* please let me know if you will give a try. */
	}


Best Regards,=20
Shaohui Xie

^ permalink raw reply

* Re: [PATCH 1/2] ARM: kirkwood: proper retain MAC address workaround on DT ethernet
From: David Miller @ 2013-05-27  9:23 UTC (permalink / raw)
  To: sebastian.hesselbarth
  Cc: andrew, jason, netdev, linux-kernel, buytenh, grant.likely,
	linuxppc-dev, linux-arm-kernel
In-Reply-To: <51A26B62.4050009@gmail.com>

From: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Date: Sun, 26 May 2013 22:06:58 +0200

> good you mention it. I added Grant on Cc and will give a short
> sum-up why I casted the const from property->value away here.
> 
> Maybe I overlooked the API for modifying the DT property but as
> far as I've seen - there is no API for modifying it. And yes,
> you are right, it is kind of an abuse of DT here.

Sparc has an of_set_property(), it needs to become generic.

^ permalink raw reply

* Re: [PATCH v3-resend 07/11] powerpc: uaccess s/might_sleep/might_fault/
From: Benjamin Herrenschmidt @ 2013-05-27  9:36 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Arnd Bergmann, Peter Zijlstra, linux-kernel, Ingo Molnar,
	Paul Mackerras, linuxppc-dev
In-Reply-To: <1369577426-26721-7-git-send-email-mst@redhat.com>

On Sun, 2013-05-26 at 17:31 +0300, Michael S. Tsirkin wrote:
> The only reason uaccess routines might sleep
> is if they fault. Make this explicit.
> 
> Arnd Bergmann suggested that the following code
> 	if (!is_kernel_addr((unsigned long)__pu_addr))
> 		might_fault();
> can be further simplified by adding a version of might_fault
> that includes the kernel addr check.
> 
> Will be considered as a further optimization in future.
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

As long as Peter is happy with the general semantics of might_fault()
and you have at least build-tested it, then I'm happy.

Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

> ---
>  arch/powerpc/include/asm/uaccess.h | 16 ++++++++--------
>  1 file changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/uaccess.h b/arch/powerpc/include/asm/uaccess.h
> index 4db4959..9485b43 100644
> --- a/arch/powerpc/include/asm/uaccess.h
> +++ b/arch/powerpc/include/asm/uaccess.h
> @@ -178,7 +178,7 @@ do {								\
>  	long __pu_err;						\
>  	__typeof__(*(ptr)) __user *__pu_addr = (ptr);		\
>  	if (!is_kernel_addr((unsigned long)__pu_addr))		\
> -		might_sleep();					\
> +		might_fault();					\
>  	__chk_user_ptr(ptr);					\
>  	__put_user_size((x), __pu_addr, (size), __pu_err);	\
>  	__pu_err;						\
> @@ -188,7 +188,7 @@ do {								\
>  ({									\
>  	long __pu_err = -EFAULT;					\
>  	__typeof__(*(ptr)) __user *__pu_addr = (ptr);			\
> -	might_sleep();							\
> +	might_fault();							\
>  	if (access_ok(VERIFY_WRITE, __pu_addr, size))			\
>  		__put_user_size((x), __pu_addr, (size), __pu_err);	\
>  	__pu_err;							\
> @@ -268,7 +268,7 @@ do {								\
>  	const __typeof__(*(ptr)) __user *__gu_addr = (ptr);	\
>  	__chk_user_ptr(ptr);					\
>  	if (!is_kernel_addr((unsigned long)__gu_addr))		\
> -		might_sleep();					\
> +		might_fault();					\
>  	__get_user_size(__gu_val, __gu_addr, (size), __gu_err);	\
>  	(x) = (__typeof__(*(ptr)))__gu_val;			\
>  	__gu_err;						\
> @@ -282,7 +282,7 @@ do {								\
>  	const __typeof__(*(ptr)) __user *__gu_addr = (ptr);	\
>  	__chk_user_ptr(ptr);					\
>  	if (!is_kernel_addr((unsigned long)__gu_addr))		\
> -		might_sleep();					\
> +		might_fault();					\
>  	__get_user_size(__gu_val, __gu_addr, (size), __gu_err);	\
>  	(x) = (__typeof__(*(ptr)))__gu_val;			\
>  	__gu_err;						\
> @@ -294,7 +294,7 @@ do {								\
>  	long __gu_err = -EFAULT;					\
>  	unsigned long  __gu_val = 0;					\
>  	const __typeof__(*(ptr)) __user *__gu_addr = (ptr);		\
> -	might_sleep();							\
> +	might_fault();							\
>  	if (access_ok(VERIFY_READ, __gu_addr, (size)))			\
>  		__get_user_size(__gu_val, __gu_addr, (size), __gu_err);	\
>  	(x) = (__typeof__(*(ptr)))__gu_val;				\
> @@ -419,14 +419,14 @@ static inline unsigned long __copy_to_user_inatomic(void __user *to,
>  static inline unsigned long __copy_from_user(void *to,
>  		const void __user *from, unsigned long size)
>  {
> -	might_sleep();
> +	might_fault();
>  	return __copy_from_user_inatomic(to, from, size);
>  }
>  
>  static inline unsigned long __copy_to_user(void __user *to,
>  		const void *from, unsigned long size)
>  {
> -	might_sleep();
> +	might_fault();
>  	return __copy_to_user_inatomic(to, from, size);
>  }
>  
> @@ -434,7 +434,7 @@ extern unsigned long __clear_user(void __user *addr, unsigned long size);
>  
>  static inline unsigned long clear_user(void __user *addr, unsigned long size)
>  {
> -	might_sleep();
> +	might_fault();
>  	if (likely(access_ok(VERIFY_WRITE, addr, size)))
>  		return __clear_user(addr, size);
>  	if ((unsigned long)addr < TASK_SIZE) {

^ permalink raw reply

* Re: [PATCH 1/2] ARM: kirkwood: proper retain MAC address workaround on DT ethernet
From: Benjamin Herrenschmidt @ 2013-05-27  9:38 UTC (permalink / raw)
  To: Sebastian Hesselbarth
  Cc: andrew, jason, netdev, linux-kernel, buytenh, Grant Likely,
	linuxppc-dev, David Miller, linux-arm-kernel
In-Reply-To: <51A26B62.4050009@gmail.com>

On Sun, 2013-05-26 at 22:06 +0200, Sebastian Hesselbarth wrote:

> good you mention it. I added Grant on Cc and will give a short
> sum-up why I casted the const from property->value away here.
> 
> Maybe I overlooked the API for modifying the DT property but as
> far as I've seen - there is no API for modifying it. And yes,
> you are right, it is kind of an abuse of DT here.

of_update_property(). That also makes sure that any notifiers
is called and proc/device-tree is updated in the case where the property
is new.

> As Kirkwoods loose their MAC address on clock gating, I was looking
> for a place to store it early. (a) DT property "local-mac-address"
> looked as a good place as it will allow the driver to find it without
> any extra code. Of course, I am doing severaly sanity checks if it is
> safe to overwrite it, i.e. no other MAC set, property is there, long
> enough.
> 
> If Grant also NACKs modifying the DT we basically have two more options
> left for Kirkwood: (b) have MAC stored early in two global arrays in
> board init and reference that from mv643xx_eth or (c) leave the clock
> ungated unconditionally on all Kirkwoods.
> 
> I can live with all three, just name it and I prepare a final patch set.

No, putting it in the DT makes sense, just use the right accessor.

Cheers,
Ben.

> Sebastian
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply

* Re: [PATCH 1/2] ARM: kirkwood: proper retain MAC address workaround on DT ethernet
From: Benjamin Herrenschmidt @ 2013-05-27  9:39 UTC (permalink / raw)
  To: David Miller
  Cc: andrew, jason, netdev, linux-kernel, buytenh, grant.likely,
	linuxppc-dev, linux-arm-kernel, sebastian.hesselbarth
In-Reply-To: <20130527.022339.791886107453761860.davem@davemloft.net>

On Mon, 2013-05-27 at 02:23 -0700, David Miller wrote:
> Sparc has an of_set_property(), it needs to become generic.

There is an of_update_property(), we could change the name though, yours
is nicer :-)

Cheers,
Ben.

^ permalink raw reply

* Re: [PATCH v2] PowerPC: kernel: need return the related error code when failure occurs
From: Chen Gang @ 2013-05-27 10:01 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Linux-Arch, Arnd Bergmann, linux-kernel@vger.kernel.org,
	zhangyanfei, Jiri Kosina, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <519B3C72.9060405@asianux.com>

Hello Maintainers:

Please help check this patch whether is OK, when you have time.

Thanks.

On 05/21/2013 05:20 PM, Chen Gang wrote:
> 
> When error occurs, need return the related error code to let upper
> caller know about it.
> 
> ppc_md.nvram_size() can return the error code (e.g. core99_nvram_size()
> in 'arch/powerpc/platforms/powermac/nvram.c').
> 
> Also set ret value when only need it, so can save structions for normal
> cases.
> 
> The original related patch: "f9ce299 [PATCH] powerpc: fix large nvram
> access".
> 
> 
> Signed-off-by: Chen Gang <gang.chen@asianux.com>
> ---
>  arch/powerpc/kernel/nvram_64.c |   20 ++++++++++++++------
>  1 files changed, 14 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/nvram_64.c b/arch/powerpc/kernel/nvram_64.c
> index 48fbc2b..8213ee1 100644
> --- a/arch/powerpc/kernel/nvram_64.c
> +++ b/arch/powerpc/kernel/nvram_64.c
> @@ -84,22 +84,30 @@ static ssize_t dev_nvram_read(struct file *file, char __user *buf,
>  	char *tmp = NULL;
>  	ssize_t size;
>  
> -	ret = -ENODEV;
> -	if (!ppc_md.nvram_size)
> +	if (!ppc_md.nvram_size) {
> +		ret = -ENODEV;
>  		goto out;
> +	}
>  
> -	ret = 0;
>  	size = ppc_md.nvram_size();
> -	if (*ppos >= size || size < 0)
> +	if (size < 0) {
> +		ret = size;
> +		goto out;
> +	}
> +
> +	if (*ppos >= size) {
> +		ret = 0;
>  		goto out;
> +	}
>  
>  	count = min_t(size_t, count, size - *ppos);
>  	count = min(count, PAGE_SIZE);
>  
> -	ret = -ENOMEM;
>  	tmp = kmalloc(count, GFP_KERNEL);
> -	if (!tmp)
> +	if (!tmp) {
> +		ret = -ENOMEM;
>  		goto out;
> +	}
>  
>  	ret = ppc_md.nvram_read(tmp, count, ppos);
>  	if (ret <= 0)
> 


-- 
Chen Gang

Asianux Corporation

^ permalink raw reply

* Re: [PATCH 1/2] powerpc, perf: Ignore separate BHRB privilege state filter request
From: Anshuman Khandual @ 2013-05-27 10:02 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev, mikey, linux-kernel
In-Reply-To: <519C88D9.30009@linux.vnet.ibm.com>

On 05/22/2013 02:29 PM, Anshuman Khandual wrote:
>>
>> Your description from patch 0 should be here.
> Does it sound better ?
> 
>>
>>> -	if ((br_privilege != 7) && (br_privilege != 0))
>>> -		return -1;
>>> +
>>> +	if (br_privilege)
>>> +		pr_info("BHRB privilege state filter request %llx ignored\n",
>>> +								br_privilege);
>>
>> Don't do that. Ignoring the br_privilege is either the right thing to do
>> in which case we do it and print nothing,
> 
> 
> I thought the informational print would at least make the user aware
> of the fact that the separate filter request for BHRB went ignored.
> Can we add this some where in the documentation ?

So, what we decide here ? We will just ignore any separate BHRB
privilege state filter request without printing any informational
event or warning ?

^ permalink raw reply

* Re: [PATCH 1/4] KVM: PPC: Add support for multiple-TCE hcalls
From: Paolo Bonzini @ 2013-05-27 10:08 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: kvm, linux-kernel, kvm-ppc, Alexander Graf, Paul Mackerras,
	linuxppc-dev, David Gibson
In-Reply-To: <1369105607-20957-2-git-send-email-aik@ozlabs.ru>

Il 21/05/2013 05:06, Alexey Kardashevskiy ha scritto:
> This adds real mode handlers for the H_PUT_TCE_INDIRECT and
> H_STUFF_TCE hypercalls for QEMU emulated devices such as virtio
> devices or emulated PCI.

Do you mean vio?  virtio (without getting into whether that's good or
bad) will bypass the iommu.

Paolo

> These calls allow adding multiple entries
> (up to 512) into the TCE table in one call which saves time on
> transition to/from real mode.
> 
> This adds a tce_tmp cache to kvm_vcpu_arch to save valid TCEs
> (copied from user and verified) before writing the whole list into
> the TCE table. This cache will be utilized more in the upcoming
> VFIO/IOMMU support to continue TCE list processing in the virtual
> mode in the case if the real mode handler failed for some reason.
> 
> This adds a guest physical to host real address converter
> and calls the existing H_PUT_TCE handler. The converting function
> is going to be fully utilized by upcoming VFIO supporting patches.
> 
> This also implements the KVM_CAP_PPC_MULTITCE capability,
> so in order to support the functionality of this patch, QEMU
> needs to query for this capability and set the "hcall-multi-tce"
> hypertas property only if the capability is present, otherwise
> there will be serious performance degradation.
> 
> Cc: David Gibson <david@gibson.dropbear.id.au>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> Signed-off-by: Paul Mackerras <paulus@samba.org>
> 
> ---
> Changelog:
> * added kvm_vcpu_arch::tce_tmp
> * removed cleanup if put_indirect failed, instead we do not even start
> writing to TCE table if we cannot get TCEs from the user and they are
> invalid
> * kvmppc_emulated_h_put_tce is split to kvmppc_emulated_put_tce
> and kvmppc_emulated_validate_tce (for the previous item)
> * fixed bug with failthrough for H_IPI
> * removed all get_user() from real mode handlers
> * kvmppc_lookup_pte() added (instead of making lookup_linux_pte public)
> ---
>  Documentation/virtual/kvm/api.txt       |   14 ++
>  arch/powerpc/include/asm/kvm_host.h     |    2 +
>  arch/powerpc/include/asm/kvm_ppc.h      |   16 +-
>  arch/powerpc/kvm/book3s_64_vio.c        |  118 ++++++++++++++
>  arch/powerpc/kvm/book3s_64_vio_hv.c     |  266 +++++++++++++++++++++++++++----
>  arch/powerpc/kvm/book3s_hv.c            |   39 +++++
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S |    6 +
>  arch/powerpc/kvm/book3s_pr_papr.c       |   37 ++++-
>  arch/powerpc/kvm/powerpc.c              |    3 +
>  include/uapi/linux/kvm.h                |    1 +
>  10 files changed, 470 insertions(+), 32 deletions(-)
> 
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index 5f91eda..3c7c7ea 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -2780,3 +2780,17 @@ Parameters: args[0] is the XICS device fd
>              args[1] is the XICS CPU number (server ID) for this vcpu
>  
>  This capability connects the vcpu to an in-kernel XICS device.
> +
> +6.8 KVM_CAP_PPC_MULTITCE
> +
> +Architectures: ppc
> +Parameters: none
> +Returns: 0 on success; -1 on error
> +
> +This capability enables the guest to put/remove multiple TCE entries
> +per hypercall which significanly accelerates DMA operations for PPC KVM
> +guests.
> +
> +When this capability is enabled, H_PUT_TCE_INDIRECT and H_STUFF_TCE are
> +expected to occur rather than H_PUT_TCE which supports only one TCE entry
> +per call.
> diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
> index af326cd..85d8f26 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -609,6 +609,8 @@ struct kvm_vcpu_arch {
>  	spinlock_t tbacct_lock;
>  	u64 busy_stolen;
>  	u64 busy_preempt;
> +
> +	unsigned long *tce_tmp;    /* TCE cache for TCE_PUT_INDIRECT hall */
>  #endif
>  };
>  
> diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
> index a5287fe..e852921b 100644
> --- a/arch/powerpc/include/asm/kvm_ppc.h
> +++ b/arch/powerpc/include/asm/kvm_ppc.h
> @@ -133,8 +133,20 @@ extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu);
>  
>  extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
>  				struct kvm_create_spapr_tce *args);
> -extern long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
> -			     unsigned long ioba, unsigned long tce);
> +extern struct kvmppc_spapr_tce_table *kvmppc_find_tce_table(
> +		struct kvm_vcpu *vcpu, unsigned long liobn);
> +extern long kvmppc_emulated_validate_tce(unsigned long tce);
> +extern void kvmppc_emulated_put_tce(struct kvmppc_spapr_tce_table *tt,
> +		unsigned long ioba, unsigned long tce);
> +extern long kvmppc_virtmode_h_put_tce(struct kvm_vcpu *vcpu,
> +		unsigned long liobn, unsigned long ioba,
> +		unsigned long tce);
> +extern long kvmppc_virtmode_h_put_tce_indirect(struct kvm_vcpu *vcpu,
> +		unsigned long liobn, unsigned long ioba,
> +		unsigned long tce_list, unsigned long npages);
> +extern long kvmppc_virtmode_h_stuff_tce(struct kvm_vcpu *vcpu,
> +		unsigned long liobn, unsigned long ioba,
> +		unsigned long tce_value, unsigned long npages);
>  extern long kvm_vm_ioctl_allocate_rma(struct kvm *kvm,
>  				struct kvm_allocate_rma *rma);
>  extern struct kvmppc_linear_info *kvm_alloc_rma(void);
> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
> index b2d3f3b..06b7b20 100644
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -14,6 +14,7 @@
>   *
>   * Copyright 2010 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com>
>   * Copyright 2011 David Gibson, IBM Corporation <dwg@au1.ibm.com>
> + * Copyright 2013 Alexey Kardashevskiy, IBM Corporation <aik@au1.ibm.com>
>   */
>  
>  #include <linux/types.h>
> @@ -36,8 +37,11 @@
>  #include <asm/ppc-opcode.h>
>  #include <asm/kvm_host.h>
>  #include <asm/udbg.h>
> +#include <asm/iommu.h>
> +#include <asm/tce.h>
>  
>  #define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
> +#define ERROR_ADDR      ((void *)~(unsigned long)0x0)
>  
>  static long kvmppc_stt_npages(unsigned long window_size)
>  {
> @@ -148,3 +152,117 @@ fail:
>  	}
>  	return ret;
>  }
> +
> +/* Converts guest physical address into host virtual */
> +static void __user *kvmppc_virtmode_gpa_to_hva(struct kvm_vcpu *vcpu,
> +		unsigned long gpa)
> +{
> +	unsigned long hva, gfn = gpa >> PAGE_SHIFT;
> +	struct kvm_memory_slot *memslot;
> +
> +	memslot = search_memslots(kvm_memslots(vcpu->kvm), gfn);
> +	if (!memslot)
> +		return ERROR_ADDR;
> +
> +	hva = __gfn_to_hva_memslot(memslot, gfn) + (gpa & ~PAGE_MASK);
> +	return (void *) hva;
> +}
> +
> +long kvmppc_virtmode_h_put_tce(struct kvm_vcpu *vcpu,
> +		unsigned long liobn, unsigned long ioba,
> +		unsigned long tce)
> +{
> +	long ret;
> +	struct kvmppc_spapr_tce_table *tt;
> +
> +	tt = kvmppc_find_tce_table(vcpu, liobn);
> +	/* Didn't find the liobn, put it to userspace */
> +	if (!tt)
> +		return H_TOO_HARD;
> +
> +	/* Emulated IO */
> +	if (ioba >= tt->window_size)
> +		return H_PARAMETER;
> +
> +	ret = kvmppc_emulated_validate_tce(tce);
> +	if (ret)
> +		return ret;
> +
> +	kvmppc_emulated_put_tce(tt, ioba, tce);
> +
> +	return H_SUCCESS;
> +}
> +
> +long kvmppc_virtmode_h_put_tce_indirect(struct kvm_vcpu *vcpu,
> +		unsigned long liobn, unsigned long ioba,
> +		unsigned long tce_list, unsigned long npages)
> +{
> +	struct kvmppc_spapr_tce_table *tt;
> +	long i, ret;
> +	unsigned long __user *tces;
> +
> +	tt = kvmppc_find_tce_table(vcpu, liobn);
> +	/* Didn't find the liobn, put it to userspace */
> +	if (!tt)
> +		return H_TOO_HARD;
> +
> +	/*
> +	 * The spec says that the maximum size of the list is 512 TCEs so
> +	 * so the whole table addressed resides in 4K page
> +	 */
> +	if (npages > 512)
> +		return H_PARAMETER;
> +
> +	if (tce_list & ~IOMMU_PAGE_MASK)
> +		return H_PARAMETER;
> +
> +	tces = kvmppc_virtmode_gpa_to_hva(vcpu, tce_list);
> +	if (tces == ERROR_ADDR)
> +		return H_TOO_HARD;
> +
> +	/* Emulated IO */
> +	if ((ioba + (npages << IOMMU_PAGE_SHIFT)) > tt->window_size)
> +		return H_PARAMETER;
> +
> +	for (i = 0; i < npages; ++i) {
> +		if (get_user(vcpu->arch.tce_tmp[i], tces + i))
> +			return H_PARAMETER;
> +
> +		ret = kvmppc_emulated_validate_tce(vcpu->arch.tce_tmp[i]);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	for (i = 0; i < npages; ++i)
> +		kvmppc_emulated_put_tce(tt,
> +				ioba + (i << IOMMU_PAGE_SHIFT),
> +				vcpu->arch.tce_tmp[i]);
> +
> +	return H_SUCCESS;
> +}
> +
> +long kvmppc_virtmode_h_stuff_tce(struct kvm_vcpu *vcpu,
> +		unsigned long liobn, unsigned long ioba,
> +		unsigned long tce_value, unsigned long npages)
> +{
> +	struct kvmppc_spapr_tce_table *tt;
> +	long i, ret;
> +
> +	tt = kvmppc_find_tce_table(vcpu, liobn);
> +	/* Didn't find the liobn, put it to userspace */
> +	if (!tt)
> +		return H_TOO_HARD;
> +
> +	/* Emulated IO */
> +	if ((ioba + (npages << IOMMU_PAGE_SHIFT)) > tt->window_size)
> +		return H_PARAMETER;
> +
> +	ret = kvmppc_emulated_validate_tce(tce_value);
> +	if (ret || (tce_value & (TCE_PCI_WRITE | TCE_PCI_READ)))
> +		return H_PARAMETER;
> +
> +	for (i = 0; i < npages; ++i, ioba += IOMMU_PAGE_SIZE)
> +		kvmppc_emulated_put_tce(tt, ioba, tce_value);
> +
> +	return H_SUCCESS;
> +}
> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
> index 30c2f3b..c68d538 100644
> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
> @@ -14,6 +14,7 @@
>   *
>   * Copyright 2010 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com>
>   * Copyright 2011 David Gibson, IBM Corporation <dwg@au1.ibm.com>
> + * Copyright 2013 Alexey Kardashevskiy, IBM Corporation <aik@au1.ibm.com>
>   */
>  
>  #include <linux/types.h>
> @@ -35,42 +36,249 @@
>  #include <asm/ppc-opcode.h>
>  #include <asm/kvm_host.h>
>  #include <asm/udbg.h>
> +#include <asm/iommu.h>
> +#include <asm/tce.h>
>  
>  #define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
> +#define ERROR_ADDR      (~(unsigned long)0x0)
>  
> -/* WARNING: This will be called in real-mode on HV KVM and virtual
> - *          mode on PR KVM
> +/* Finds a TCE table descriptor by LIOBN */
> +struct kvmppc_spapr_tce_table *kvmppc_find_tce_table(struct kvm_vcpu *vcpu,
> +		unsigned long liobn)
> +{
> +	struct kvmppc_spapr_tce_table *tt;
> +
> +	list_for_each_entry(tt, &vcpu->kvm->arch.spapr_tce_tables, list) {
> +		if (tt->liobn == liobn)
> +			return tt;
> +	}
> +
> +	return NULL;
> +}
> +EXPORT_SYMBOL_GPL(kvmppc_find_tce_table);
> +
> +/*
> + * Validate TCE address.
> + * At the moment only flags are validated
> + * as other check will significantly slow down
> + * or can make it even impossible to handle TCE requests
> + * in real mode.
> + */
> +long kvmppc_emulated_validate_tce(unsigned long tce)
> +{
> +	if (tce & ~(IOMMU_PAGE_MASK | TCE_PCI_WRITE | TCE_PCI_READ))
> +		return H_PARAMETER;
> +
> +	return H_SUCCESS;
> +}
> +EXPORT_SYMBOL_GPL(kvmppc_emulated_validate_tce);
> +
> +/*
> + * kvmppc_emulated_put_tce() handles TCE requests for devices emulated
> + * by QEMU. It puts guest TCE values into the table and expects
> + * the QEMU to convert them later in the QEMU device implementation.
> + * Wiorks in both real and virtual modes.
> + * It cannot fail so kvmppc_emulated_validate_tce must be called before it.
>   */
> +void kvmppc_emulated_put_tce(struct kvmppc_spapr_tce_table *tt,
> +		unsigned long ioba, unsigned long tce)
> +{
> +	unsigned long idx = ioba >> SPAPR_TCE_SHIFT;
> +	struct page *page;
> +	u64 *tbl;
> +
> +	/*
> +	 * Note on the use of page_address() in real mode,
> +	 *
> +	 * It is safe to use page_address() in real mode on ppc64 because
> +	 * page_address() is always defined as lowmem_page_address()
> +	 * which returns __va(PFN_PHYS(page_to_pfn(page))) which is arithmetial
> +	 * operation and does not access page struct.
> +	 *
> +	 * Theoretically page_address() could be defined different
> +	 * but either WANT_PAGE_VIRTUAL or HASHED_PAGE_VIRTUAL
> +	 * should be enabled.
> +	 * WANT_PAGE_VIRTUAL is never enabled on ppc32/ppc64,
> +	 * HASHED_PAGE_VIRTUAL could be enabled for ppc32 only and only
> +	 * if CONFIG_HIGHMEM is defined. As CONFIG_SPARSEMEM_VMEMMAP
> +	 * is not expected to be enabled on ppc32, page_address()
> +	 * is safe for ppc32 as well.
> +	 */
> +#if defined(HASHED_PAGE_VIRTUAL) || defined(WANT_PAGE_VIRTUAL)
> +#error TODO: fix to avoid page_address() here
> +#endif
> +	page = tt->pages[idx / TCES_PER_PAGE];
> +	tbl = (u64 *)page_address(page);
> +
> +	/* udbg_printf("tce @ %p\n", &tbl[idx % TCES_PER_PAGE]); */
> +	tbl[idx % TCES_PER_PAGE] = tce;
> +}
> +EXPORT_SYMBOL_GPL(kvmppc_emulated_put_tce);
> +
> +#ifdef CONFIG_KVM_BOOK3S_64_HV
> +
> +static pte_t kvmppc_lookup_pte(pgd_t *pgdir, unsigned long hva, bool writing,
> +			unsigned long *pte_sizep)
> +{
> +	pte_t *ptep;
> +	unsigned int shift = 0;
> +	pte_t pte, tmp, ret;
> +
> +	ptep = find_linux_pte_or_hugepte(pgdir, hva, &shift);
> +	if (!ptep)
> +		return __pte(0);
> +	if (shift)
> +		*pte_sizep = 1ul << shift;
> +	else
> +		*pte_sizep = PAGE_SIZE;
> +
> +	if (!pte_present(*ptep))
> +		return __pte(0);
> +
> +	/* wait until _PAGE_BUSY is clear then set it atomically */
> +	__asm__ __volatile__ (
> +		"1:	ldarx	%0,0,%3\n"
> +		"	andi.	%1,%0,%4\n"
> +		"	bne-	1b\n"
> +		"	ori	%1,%0,%4\n"
> +		"	stdcx.	%1,0,%3\n"
> +		"	bne-	1b"
> +		: "=&r" (pte), "=&r" (tmp), "=m" (*ptep)
> +		: "r" (ptep), "i" (_PAGE_BUSY)
> +		: "cc");
> +
> +	ret = pte;
> +
> +	return ret;
> +}
> +
> +/*
> + * Converts guest physical address into host physical address.
> + * Also returns pte and page size if the page is present in page table.
> + */
> +static unsigned long kvmppc_realmode_gpa_to_hpa(struct kvm_vcpu *vcpu,
> +		unsigned long gpa)
> +{
> +	struct kvm_memory_slot *memslot;
> +	pte_t pte;
> +	unsigned long hva, hpa, pg_size = 0, offset;
> +	unsigned long gfn = gpa >> PAGE_SHIFT;
> +	bool writing = gpa & TCE_PCI_WRITE;
> +
> +	/* Find a KVM memslot */
> +	memslot = search_memslots(kvm_memslots(vcpu->kvm), gfn);
> +	if (!memslot)
> +		return ERROR_ADDR;
> +
> +	/* Convert guest physical address to host virtual */
> +	hva = __gfn_to_hva_memslot(memslot, gfn);
> +
> +	/* Find a PTE and determine the size */
> +	pte = kvmppc_lookup_pte(vcpu->arch.pgdir, hva,
> +			writing, &pg_size);
> +	if (!pte)
> +		return ERROR_ADDR;
> +
> +	/* Calculate host phys address keeping flags and offset in the page */
> +	offset = gpa & (pg_size - 1);
> +
> +	/* pte_pfn(pte) should return an address aligned to pg_size */
> +	hpa = (pte_pfn(pte) << PAGE_SHIFT) + offset;
> +
> +	return hpa;
> +}
> +
>  long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  		      unsigned long ioba, unsigned long tce)
>  {
> -	struct kvm *kvm = vcpu->kvm;
> -	struct kvmppc_spapr_tce_table *stt;
> -
> -	/* udbg_printf("H_PUT_TCE(): liobn=0x%lx ioba=0x%lx, tce=0x%lx\n", */
> -	/* 	    liobn, ioba, tce); */
> -
> -	list_for_each_entry(stt, &kvm->arch.spapr_tce_tables, list) {
> -		if (stt->liobn == liobn) {
> -			unsigned long idx = ioba >> SPAPR_TCE_SHIFT;
> -			struct page *page;
> -			u64 *tbl;
> -
> -			/* udbg_printf("H_PUT_TCE: liobn 0x%lx => stt=%p  window_size=0x%x\n", */
> -			/* 	    liobn, stt, stt->window_size); */
> -			if (ioba >= stt->window_size)
> -				return H_PARAMETER;
> -
> -			page = stt->pages[idx / TCES_PER_PAGE];
> -			tbl = (u64 *)page_address(page);
> -
> -			/* FIXME: Need to validate the TCE itself */
> -			/* udbg_printf("tce @ %p\n", &tbl[idx % TCES_PER_PAGE]); */
> -			tbl[idx % TCES_PER_PAGE] = tce;
> -			return H_SUCCESS;
> -		}
> +	long ret;
> +	struct kvmppc_spapr_tce_table *tt;
> +
> +	tt = kvmppc_find_tce_table(vcpu, liobn);
> +	/* Didn't find the liobn, put it to virtual space */
> +	if (!tt)
> +		return H_TOO_HARD;
> +
> +	/* Emulated IO */
> +	if (ioba >= tt->window_size)
> +		return H_PARAMETER;
> +
> +	ret = kvmppc_emulated_validate_tce(tce);
> +	if (ret)
> +		return ret;
> +
> +	kvmppc_emulated_put_tce(tt, ioba, tce);
> +
> +	return H_SUCCESS;
> +}
> +
> +long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
> +		unsigned long liobn, unsigned long ioba,
> +		unsigned long tce_list,	unsigned long npages)
> +{
> +	struct kvmppc_spapr_tce_table *tt;
> +	long i, ret;
> +	unsigned long *tces;
> +
> +	tt = kvmppc_find_tce_table(vcpu, liobn);
> +	/* Didn't find the liobn, put it to virtual space */
> +	if (!tt)
> +		return H_TOO_HARD;
> +
> +	/*
> +	 * The spec says that the maximum size of the list is 512 TCEs so
> +	 * so the whole table addressed resides in 4K page
> +	 */
> +	if (npages > 512)
> +		return H_PARAMETER;
> +
> +	if (tce_list & ~IOMMU_PAGE_MASK)
> +		return H_PARAMETER;
> +
> +	tces = (unsigned long *) kvmppc_realmode_gpa_to_hpa(vcpu, tce_list);
> +	if ((unsigned long)tces == ERROR_ADDR)
> +		return H_TOO_HARD;
> +
> +	/* Emulated IO */
> +	if ((ioba + (npages << IOMMU_PAGE_SHIFT)) > tt->window_size)
> +		return H_PARAMETER;
> +
> +	for (i = 0; i < npages; ++i) {
> +		ret = kvmppc_emulated_validate_tce(tces[i]);
> +		if (ret)
> +			return ret;
>  	}
>  
> -	/* Didn't find the liobn, punt it to userspace */
> -	return H_TOO_HARD;
> +	for (i = 0; i < npages; ++i)
> +		kvmppc_emulated_put_tce(tt, ioba + (i << IOMMU_PAGE_SHIFT),
> +				tces[i]);
> +
> +	return H_SUCCESS;
> +}
> +
> +long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
> +		unsigned long liobn, unsigned long ioba,
> +		unsigned long tce_value, unsigned long npages)
> +{
> +	struct kvmppc_spapr_tce_table *tt;
> +	long i, ret;
> +
> +	tt = kvmppc_find_tce_table(vcpu, liobn);
> +	/* Didn't find the liobn, put it to virtual space */
> +	if (!tt)
> +		return H_TOO_HARD;
> +
> +	/* Emulated IO */
> +	if ((ioba + (npages << IOMMU_PAGE_SHIFT)) > tt->window_size)
> +		return H_PARAMETER;
> +
> +	ret = kvmppc_emulated_validate_tce(tce_value);
> +	if (ret || (tce_value & (TCE_PCI_WRITE | TCE_PCI_READ)))
> +		return H_PARAMETER;
> +
> +	for (i = 0; i < npages; ++i, ioba += IOMMU_PAGE_SIZE)
> +		kvmppc_emulated_put_tce(tt, ioba, tce_value);
> +
> +	return H_SUCCESS;
>  }
> +#endif /* CONFIG_KVM_BOOK3S_64_HV */
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 9de24f8..9d70fd8 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -566,6 +566,30 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
>  			ret = kvmppc_xics_hcall(vcpu, req);
>  			break;
>  		} /* fallthrough */
> +		return RESUME_HOST;
> +	case H_PUT_TCE:
> +		ret = kvmppc_virtmode_h_put_tce(vcpu, kvmppc_get_gpr(vcpu, 4),
> +						kvmppc_get_gpr(vcpu, 5),
> +						kvmppc_get_gpr(vcpu, 6));
> +		if (ret == H_TOO_HARD)
> +			return RESUME_HOST;
> +		break;
> +	case H_PUT_TCE_INDIRECT:
> +		ret = kvmppc_virtmode_h_put_tce_indirect(vcpu, kvmppc_get_gpr(vcpu, 4),
> +						kvmppc_get_gpr(vcpu, 5),
> +						kvmppc_get_gpr(vcpu, 6),
> +						kvmppc_get_gpr(vcpu, 7));
> +		if (ret == H_TOO_HARD)
> +			return RESUME_HOST;
> +		break;
> +	case H_STUFF_TCE:
> +		ret = kvmppc_virtmode_h_stuff_tce(vcpu, kvmppc_get_gpr(vcpu, 4),
> +						kvmppc_get_gpr(vcpu, 5),
> +						kvmppc_get_gpr(vcpu, 6),
> +						kvmppc_get_gpr(vcpu, 7));
> +		if (ret == H_TOO_HARD)
> +			return RESUME_HOST;
> +		break;
>  	default:
>  		return RESUME_HOST;
>  	}
> @@ -956,6 +980,20 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id)
>  	vcpu->arch.cpu_type = KVM_CPU_3S_64;
>  	kvmppc_sanity_check(vcpu);
>  
> +	/*
> +	 * As we want to minimize the chance of having H_PUT_TCE_INDIRECT
> +	 * half executed, we first read TCEs from the user, check them and
> +	 * return error if something went wrong and only then put TCEs into
> +	 * the TCE table.
> +	 *
> +	 * tce_tmp is a cache for TCEs to avoid stack allocation or
> +	 * kmalloc as the whole TCE list can take up to 512 items 8 bytes
> +	 * each (4096 bytes).
> +	 */
> +	vcpu->arch.tce_tmp = kmalloc(4096, GFP_KERNEL);
> +	if (!vcpu->arch.tce_tmp)
> +		goto free_vcpu;
> +
>  	return vcpu;
>  
>  free_vcpu:
> @@ -978,6 +1016,7 @@ void kvmppc_core_vcpu_free(struct kvm_vcpu *vcpu)
>  	unpin_vpa(vcpu->kvm, &vcpu->arch.slb_shadow);
>  	unpin_vpa(vcpu->kvm, &vcpu->arch.vpa);
>  	spin_unlock(&vcpu->arch.vpa_update_lock);
> +	kfree(vcpu->arch.tce_tmp);
>  	kvm_vcpu_uninit(vcpu);
>  	kmem_cache_free(kvm_vcpu_cache, vcpu);
>  }
> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> index b02f91e..d35554e 100644
> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> @@ -1490,6 +1490,12 @@ hcall_real_table:
>  	.long	0		/* 0x11c */
>  	.long	0		/* 0x120 */
>  	.long	.kvmppc_h_bulk_remove - hcall_real_table
> +	.long	0		/* 0x128 */
> +	.long	0		/* 0x12c */
> +	.long	0		/* 0x130 */
> +	.long	0		/* 0x134 */
> +	.long	.kvmppc_h_stuff_tce - hcall_real_table
> +	.long	.kvmppc_h_put_tce_indirect - hcall_real_table
>  hcall_real_table_end:
>  
>  ignore_hdec:
> diff --git a/arch/powerpc/kvm/book3s_pr_papr.c b/arch/powerpc/kvm/book3s_pr_papr.c
> index b24309c..3750e3c 100644
> --- a/arch/powerpc/kvm/book3s_pr_papr.c
> +++ b/arch/powerpc/kvm/book3s_pr_papr.c
> @@ -220,7 +220,38 @@ static int kvmppc_h_pr_put_tce(struct kvm_vcpu *vcpu)
>  	unsigned long tce = kvmppc_get_gpr(vcpu, 6);
>  	long rc;
>  
> -	rc = kvmppc_h_put_tce(vcpu, liobn, ioba, tce);
> +	rc = kvmppc_virtmode_h_put_tce(vcpu, liobn, ioba, tce);
> +	if (rc == H_TOO_HARD)
> +		return EMULATE_FAIL;
> +	kvmppc_set_gpr(vcpu, 3, rc);
> +	return EMULATE_DONE;
> +}
> +
> +static int kvmppc_h_pr_put_tce_indirect(struct kvm_vcpu *vcpu)
> +{
> +	unsigned long liobn = kvmppc_get_gpr(vcpu, 4);
> +	unsigned long ioba = kvmppc_get_gpr(vcpu, 5);
> +	unsigned long tce = kvmppc_get_gpr(vcpu, 6);
> +	unsigned long npages = kvmppc_get_gpr(vcpu, 7);
> +	long rc;
> +
> +	rc = kvmppc_virtmode_h_put_tce_indirect(vcpu, liobn, ioba,
> +			tce, npages);
> +	if (rc == H_TOO_HARD)
> +		return EMULATE_FAIL;
> +	kvmppc_set_gpr(vcpu, 3, rc);
> +	return EMULATE_DONE;
> +}
> +
> +static int kvmppc_h_pr_stuff_tce(struct kvm_vcpu *vcpu)
> +{
> +	unsigned long liobn = kvmppc_get_gpr(vcpu, 4);
> +	unsigned long ioba = kvmppc_get_gpr(vcpu, 5);
> +	unsigned long tce_value = kvmppc_get_gpr(vcpu, 6);
> +	unsigned long npages = kvmppc_get_gpr(vcpu, 7);
> +	long rc;
> +
> +	rc = kvmppc_virtmode_h_stuff_tce(vcpu, liobn, ioba, tce_value, npages);
>  	if (rc == H_TOO_HARD)
>  		return EMULATE_FAIL;
>  	kvmppc_set_gpr(vcpu, 3, rc);
> @@ -247,6 +278,10 @@ int kvmppc_h_pr(struct kvm_vcpu *vcpu, unsigned long cmd)
>  		return kvmppc_h_pr_bulk_remove(vcpu);
>  	case H_PUT_TCE:
>  		return kvmppc_h_pr_put_tce(vcpu);
> +	case H_PUT_TCE_INDIRECT:
> +		return kvmppc_h_pr_put_tce_indirect(vcpu);
> +	case H_STUFF_TCE:
> +		return kvmppc_h_pr_stuff_tce(vcpu);
>  	case H_CEDE:
>  		vcpu->arch.shared->msr |= MSR_EE;
>  		kvm_vcpu_block(vcpu);
> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> index 6316ee3..8465c2a 100644
> --- a/arch/powerpc/kvm/powerpc.c
> +++ b/arch/powerpc/kvm/powerpc.c
> @@ -395,6 +395,9 @@ int kvm_dev_ioctl_check_extension(long ext)
>  		r = 1;
>  		break;
>  #endif
> +	case KVM_CAP_SPAPR_MULTITCE:
> +		r = 1;
> +		break;
>  	default:
>  		r = 0;
>  		break;
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index a5c86fc..5a2afda 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -666,6 +666,7 @@ struct kvm_ppc_smmu_info {
>  #define KVM_CAP_IRQ_MPIC 90
>  #define KVM_CAP_PPC_RTAS 91
>  #define KVM_CAP_IRQ_XICS 92
> +#define KVM_CAP_SPAPR_MULTITCE (0x110000 + 89)
>  
>  #ifdef KVM_CAP_IRQ_ROUTING
>  
> 

^ permalink raw reply

* Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
From: Paolo Bonzini @ 2013-05-27 10:23 UTC (permalink / raw)
  To: David Gibson
  Cc: kvm, Alexey Kardashevskiy, linux-kernel, kvm-ppc, Alexander Graf,
	Paul Mackerras, Scott Wood, linuxppc-dev
In-Reply-To: <20130525024524.GA6112@boomeroo.fritz.box>

Il 25/05/2013 04:45, David Gibson ha scritto:
>> >+	case KVM_CREATE_SPAPR_TCE_IOMMU: {
>> >+		struct kvm_create_spapr_tce_iommu create_tce_iommu;
>> >+		struct kvm *kvm = filp->private_data;
>> >+
>> >+		r = -EFAULT;
>> >+		if (copy_from_user(&create_tce_iommu, argp,
>> >+				sizeof(create_tce_iommu)))
>> >+			goto out;
>> >+		r = kvm_vm_ioctl_create_spapr_tce_iommu(kvm,
>> >&create_tce_iommu);
>> >+		goto out;
>> >+	}

Would it make sense to make this the only interface for creating TCEs?
That is, pass both a window_size and an IOMMU group id (or e.g. -1 for
no hardware IOMMU usage), and have a single ioctl for both cases?
There's some duplicated code between kvm_vm_ioctl_create_spapr_tce and
kvm_vm_ioctl_create_spapr_tce_iommu.

KVM_CREATE_SPAPR_TCE could stay for backwards-compatibility, or you
could just use a new capability and drop the old ioctl.  I'm not sure
whether you're already considering the ABI to be stable for kvmppc.

Paolo

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox