LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 3/3] powerpc: Context switch the new EBB SPRs
From: Michael Neuling @ 2013-05-01  6:17 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Michael Neuling, linuxppc-dev, Matt Evans
In-Reply-To: <1367389024-20391-1-git-send-email-mikey@neuling.org>

From: Michael Ellerman <michael@ellerman.id.au>

This context switches the new Event Based Branching (EBB) SPRs.  The three new
SPRs are:
  - Event Based Branch Handler Register (EBBHR)
  - Event Based Branch Return Register (EBBRR)
  - Branch Event Status and Control Register (BESCR)

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Matt Evans <matt@ozlabs.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 arch/powerpc/include/asm/processor.h |    3 +++
 arch/powerpc/include/asm/reg.h       |    3 +++
 arch/powerpc/kernel/asm-offsets.c    |    3 +++
 arch/powerpc/kernel/entry_64.S       |   16 ++++++++++++++++
 4 files changed, 25 insertions(+)

diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
index 0a4cc5d..d7e67ca 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -281,6 +281,9 @@ struct thread_struct {
 #endif
 #ifdef CONFIG_PPC_BOOK3S_64
 	unsigned long	tar;
+	unsigned long	ebbrr;
+	unsigned long	ebbhr;
+	unsigned long	bescr;
 #endif
 };
 
diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index cd241ed..fa8285b 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -663,6 +663,9 @@
 #define SPRN_MMCRH	316	/* Hypervisor monitor mode control register */
 #define SPRN_MMCRS	894	/* Supervisor monitor mode control register */
 #define SPRN_MMCRC	851	/* Core monitor mode control register */
+#define SPRN_EBBHR	804	/* Event based branch handler register */
+#define SPRN_EBBRR	805	/* Event based branch return register */
+#define SPRN_BESCR	806	/* Branch event status and control register */
 
 #define SPRN_PMC1	787
 #define SPRN_PMC2	788
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index b6c17ec..172233e 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -124,6 +124,9 @@ int main(void)
 
 #ifdef CONFIG_PPC_BOOK3S_64
 	DEFINE(THREAD_TAR, offsetof(struct thread_struct, tar));
+	DEFINE(THREAD_BESCR, offsetof(struct thread_struct, bescr));
+	DEFINE(THREAD_EBBHR, offsetof(struct thread_struct, ebbhr));
+	DEFINE(THREAD_EBBRR, offsetof(struct thread_struct, ebbrr));
 #endif
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
 	DEFINE(PACATMSCRATCH, offsetof(struct paca_struct, tm_scratch));
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 7a6801f..3fe5259 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -458,6 +458,14 @@ BEGIN_FTR_SECTION
 	 */
 	mfspr	r0,SPRN_TAR
 	std	r0,THREAD_TAR(r3)
+
+	/* Event based branch registers */
+	mfspr	r0, SPRN_BESCR
+	std	r0, THREAD_BESCR(r3)
+	mfspr	r0, SPRN_EBBHR
+	std	r0, THREAD_EBBHR(r3)
+	mfspr	r0, SPRN_EBBRR
+	std	r0, THREAD_EBBRR(r3)
 END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 #endif
 
@@ -545,6 +553,14 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_1T_SEGMENT)
 
 #ifdef CONFIG_PPC_BOOK3S_64
 BEGIN_FTR_SECTION
+	/* Event based branch registers */
+	ld	r0, THREAD_BESCR(r4)
+	mtspr	SPRN_BESCR, r0
+	ld	r0, THREAD_EBBHR(r4)
+	mtspr	SPRN_EBBHR, r0
+	ld	r0, THREAD_EBBRR(r4)
+	mtspr	SPRN_EBBRR, r0
+
 	ld	r0,THREAD_TAR(r4)
 	mtspr	SPRN_TAR,r0
 END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH] powerpc: Fix build error for book3e
From: Aneesh Kumar K.V @ 2013-05-01  6:26 UTC (permalink / raw)
  To: benh; +Cc: linuxppc-dev, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

We moved the definition of shift_to_mmu_psize and mmu_psize_to_shift
out of hugetlbpage.c in patch "powerpc: New hugepage directory format".

These functions are not related to hugetlbpage and we want to use them
outside hugetlbpage.c We missed a definition for book3e when we moved
these functions. Add similar functions to mmu-book3e.h

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/mmu-book3e.h | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/arch/powerpc/include/asm/mmu-book3e.h b/arch/powerpc/include/asm/mmu-book3e.h
index 8bd560c..936db36 100644
--- a/arch/powerpc/include/asm/mmu-book3e.h
+++ b/arch/powerpc/include/asm/mmu-book3e.h
@@ -215,6 +215,7 @@
 #define TLBILX_T_CLASS3			7
 
 #ifndef __ASSEMBLY__
+#include <asm/bug.h>
 
 extern unsigned int tlbcam_index;
 
@@ -254,6 +255,23 @@ struct mmu_psize_def
 };
 extern struct mmu_psize_def mmu_psize_defs[MMU_PAGE_COUNT];
 
+static inline int shift_to_mmu_psize(unsigned int shift)
+{
+	int psize;
+
+	for (psize = 0; psize < MMU_PAGE_COUNT; ++psize)
+		if (mmu_psize_defs[psize].shift == shift)
+			return psize;
+	return -1;
+}
+
+static inline unsigned int mmu_psize_to_shift(unsigned int mmu_psize)
+{
+	if (mmu_psize_defs[mmu_psize].shift)
+		return mmu_psize_defs[mmu_psize].shift;
+	BUG();
+}
+
 /* The page sizes use the same names as 64-bit hash but are
  * constants
  */
-- 
1.8.1.2

^ permalink raw reply related

* Re: [PATCH net-next] af_unix: fix a fatal race with bit fields
From: David Miller @ 2013-05-01  7:36 UTC (permalink / raw)
  To: benh; +Cc: netdev, linuxppc-dev, paulus, ambrose, eric.dumazet
In-Reply-To: <1367372393.22115.6.camel@pasglop>

From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Date: Wed, 01 May 2013 11:39:53 +1000

> I'm not even completely certain bytes are safe to be honest, though
> probably more than bitfields. I'll poke our compiler people.

Older Alpha only has 32-bit and 64-bit loads and stores, so byte sized
accesses are not atomic, and therefore use racey read-modify-write
sequences.

^ permalink raw reply

* Re: [PATCH -V7 18/18] powerpc: Update tlbie/tlbiel as per ISA doc
From: Aneesh Kumar K.V @ 2013-05-01  7:47 UTC (permalink / raw)
  To: David Gibson; +Cc: linuxppc-dev, paulus, linux-mm
In-Reply-To: <20130501052625.GC14106@truffula.fritz.box>

David Gibson <dwg@au1.ibm.com> writes:

> On Tue, Apr 30, 2013 at 10:51:00PM +0530, Aneesh Kumar K.V wrote:
>> David Gibson <dwg@au1.ibm.com> writes:
>> 
>> > On Mon, Apr 29, 2013 at 01:07:39AM +0530, Aneesh Kumar K.V wrote:
>> >> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>> >> 
>> >> Encode the actual page correctly in tlbie/tlbiel. This make sure we handle
>> >> multiple page size segment correctly.
>> >
>> > As mentioned in previous comments, this commit message needs to give
>> > much more detail about what precisely the existing implementation is
>> > doing wrong.
>> >
>> >> 
>> >> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>> >> ---
>> >>  arch/powerpc/mm/hash_native_64.c | 32 ++++++++++++++++++++++++++++++--
>> >>  1 file changed, 30 insertions(+), 2 deletions(-)
>> >> 
>> >> diff --git a/arch/powerpc/mm/hash_native_64.c b/arch/powerpc/mm/hash_native_64.c
>> >> index bb920ee..6a2aead 100644
>> >> --- a/arch/powerpc/mm/hash_native_64.c
>> >> +++ b/arch/powerpc/mm/hash_native_64.c
>> >> @@ -61,7 +61,10 @@ static inline void __tlbie(unsigned long vpn, int psize, int apsize, int ssize)
>> >>  
>> >>  	switch (psize) {
>> >>  	case MMU_PAGE_4K:
>> >> +		/* clear out bits after (52) [0....52.....63] */
>> >> +		va &= ~((1ul << (64 - 52)) - 1);
>> >>  		va |= ssize << 8;
>> >> +		va |= mmu_psize_defs[apsize].sllp << 6;
>> >>  		asm volatile(ASM_FTR_IFCLR("tlbie %0,0", PPC_TLBIE(%1,%0), %2)
>> >>  			     : : "r" (va), "r"(0), "i" (CPU_FTR_ARCH_206)
>> >>  			     : "memory");
>> >> @@ -69,9 +72,20 @@ static inline void __tlbie(unsigned long vpn, int psize, int apsize, int ssize)
>> >>  	default:
>> >>  		/* We need 14 to 14 + i bits of va */
>> >>  		penc = mmu_psize_defs[psize].penc[apsize];
>> >> -		va &= ~((1ul << mmu_psize_defs[psize].shift) - 1);
>> >> +		va &= ~((1ul << mmu_psize_defs[apsize].shift) - 1);
>> >>  		va |= penc << 12;
>> >>  		va |= ssize << 8;
>> >> +		/* Add AVAL part */
>> >> +		if (psize != apsize) {
>> >> +			/*
>> >> +			 * MPSS, 64K base page size and 16MB parge page size
>> >> +			 * We don't need all the bits, but rest of the bits
>> >> +			 * must be ignored by the processor.
>> >> +			 * vpn cover upto 65 bits of va. (0...65) and we need
>> >> +			 * 58..64 bits of va.
>> >
>> > I can't understand what this comment is saying.  Why do we need to do
>> > something different in the psize != apsize case?
>> >
>> >> +			 */
>> >> +			va |= (vpn & 0xfe);
>> >> +		}
>> 
>> That is as per ISA doc. It says if base page size == actual page size,
>> (RB)56:62 must be zeros, which must be ignored by the processor.
>> Otherwise it should be filled with the selected bits of VA as explained above.
>
> What you've just said here makes much more sense than what's written
> in the comment in the code.
>
>> We only support MPSS with base page size = 64K and actual page size = 16MB.
>
> Is that actually relevant to this code though?

In a way yes. The number of bits we we select out of VA depends on the
base page size and actual page size. We have a math around that
documented in ISA. Now since we support only 64K and 16MB we can make it
simpler by only selecting required bits and not making it a
function. But then it is also not relevant to the code in that ISA also
state other bits in (RB)56:62 must be zero. I wanted to capture both the
details in the comment. 

-aneesh

^ permalink raw reply

* Re: [PATCH -V7 18/18] powerpc: Update tlbie/tlbiel as per ISA doc
From: Simon Jeons @ 2013-05-01  7:52 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: linux-mm, linuxppc-dev, paulus, David Gibson
In-Reply-To: <87hain9m5e.fsf@linux.vnet.ibm.com>

On 05/01/2013 03:47 PM, Aneesh Kumar K.V wrote:
> David Gibson <dwg@au1.ibm.com> writes:
>
>> On Tue, Apr 30, 2013 at 10:51:00PM +0530, Aneesh Kumar K.V wrote:
>>> David Gibson <dwg@au1.ibm.com> writes:
>>>
>>>> On Mon, Apr 29, 2013 at 01:07:39AM +0530, Aneesh Kumar K.V wrote:
>>>>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>>>>>
>>>>> Encode the actual page correctly in tlbie/tlbiel. This make sure we handle
>>>>> multiple page size segment correctly.
>>>> As mentioned in previous comments, this commit message needs to give
>>>> much more detail about what precisely the existing implementation is
>>>> doing wrong.
>>>>
>>>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>>>>> ---
>>>>>   arch/powerpc/mm/hash_native_64.c | 32 ++++++++++++++++++++++++++++++--
>>>>>   1 file changed, 30 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/arch/powerpc/mm/hash_native_64.c b/arch/powerpc/mm/hash_native_64.c
>>>>> index bb920ee..6a2aead 100644
>>>>> --- a/arch/powerpc/mm/hash_native_64.c
>>>>> +++ b/arch/powerpc/mm/hash_native_64.c
>>>>> @@ -61,7 +61,10 @@ static inline void __tlbie(unsigned long vpn, int psize, int apsize, int ssize)
>>>>>   
>>>>>   	switch (psize) {
>>>>>   	case MMU_PAGE_4K:
>>>>> +		/* clear out bits after (52) [0....52.....63] */
>>>>> +		va &= ~((1ul << (64 - 52)) - 1);
>>>>>   		va |= ssize << 8;
>>>>> +		va |= mmu_psize_defs[apsize].sllp << 6;
>>>>>   		asm volatile(ASM_FTR_IFCLR("tlbie %0,0", PPC_TLBIE(%1,%0), %2)
>>>>>   			     : : "r" (va), "r"(0), "i" (CPU_FTR_ARCH_206)
>>>>>   			     : "memory");
>>>>> @@ -69,9 +72,20 @@ static inline void __tlbie(unsigned long vpn, int psize, int apsize, int ssize)
>>>>>   	default:
>>>>>   		/* We need 14 to 14 + i bits of va */
>>>>>   		penc = mmu_psize_defs[psize].penc[apsize];
>>>>> -		va &= ~((1ul << mmu_psize_defs[psize].shift) - 1);
>>>>> +		va &= ~((1ul << mmu_psize_defs[apsize].shift) - 1);
>>>>>   		va |= penc << 12;
>>>>>   		va |= ssize << 8;
>>>>> +		/* Add AVAL part */
>>>>> +		if (psize != apsize) {
>>>>> +			/*
>>>>> +			 * MPSS, 64K base page size and 16MB parge page size
>>>>> +			 * We don't need all the bits, but rest of the bits
>>>>> +			 * must be ignored by the processor.
>>>>> +			 * vpn cover upto 65 bits of va. (0...65) and we need
>>>>> +			 * 58..64 bits of va.
>>>> I can't understand what this comment is saying.  Why do we need to do
>>>> something different in the psize != apsize case?
>>>>
>>>>> +			 */
>>>>> +			va |= (vpn & 0xfe);
>>>>> +		}
>>> That is as per ISA doc. It says if base page size == actual page size,
>>> (RB)56:62 must be zeros, which must be ignored by the processor.
>>> Otherwise it should be filled with the selected bits of VA as explained above.
>> What you've just said here makes much more sense than what's written
>> in the comment in the code.
>>
>>> We only support MPSS with base page size = 64K and actual page size = 16MB.
>> Is that actually relevant to this code though?
> In a way yes. The number of bits we we select out of VA depends on the
> base page size and actual page size. We have a math around that
> documented in ISA. Now since we support only 64K and 16MB we can make it
> simpler by only selecting required bits and not making it a
> function. But then it is also not relevant to the code in that ISA also
> state other bits in (RB)56:62 must be zero. I wanted to capture both the
> details in the comment.

What's the benefit of ppc use hash-based page table instead of 
tree-based page table? If you said that hash is faster, could x86 change 
to it?

>
> -aneesh
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH net-next] af_unix: fix a fatal race with bit fields
From: Benjamin Herrenschmidt @ 2013-05-01  8:08 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linuxppc-dev, paulus, ambrose, eric.dumazet
In-Reply-To: <20130501.033650.703182794549888825.davem@davemloft.net>

On Wed, 2013-05-01 at 03:36 -0400, David Miller wrote:
> From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Date: Wed, 01 May 2013 11:39:53 +1000
> 
> > I'm not even completely certain bytes are safe to be honest, though
> > probably more than bitfields. I'll poke our compiler people.
> 
> Older Alpha only has 32-bit and 64-bit loads and stores, so byte sized
> accesses are not atomic, and therefore use racey read-modify-write
> sequences.

In this case it depends whether the compiler will "chose" the smaller
(32-bit) size which hopefully won't overlap with the atomic/lock
provided the latter is aligned... lots of if's here, makes me nervous...

At least the bytes seem to fix it for ppc64 so far...

It would make feel generally better if we could get gcc to guarantee us
to always use the smallest access size that encompass the whole bitfield
(or at least not go larger than int when the bitfield is defined as
unsigned int). This would take care of all the cases we haven't spotted
yet (hopefully).

For all intend and purposes those two fields are bits of an unsigned
int, why the heck would the compiler use a larger access size anyway ? I
seem to recall that we have other places where such an assumption is
made that ints are accessed atomically, and Linus stating in the past
that a compiler doing anything else was not worth bothering with. I
don't see why bitfields of such int would be an exception to that rule
(though again, this is probably not a rule stated in the standard ... oh
well).

/me goes have a glass of wine and not think about this until tomorrow.

Cheers,
Ben.

^ permalink raw reply

* [PATCH 1/1] powerpc: mpc512x_dma: add support for data transfers between memory and i/o memory
From: Alexander Popov @ 2013-05-01 11:28 UTC (permalink / raw)
  To: Vinod Koul, Dan Williams; +Cc: linuxppc-dev, linux-kernel

The initial version of this driver supports only memory to memory
data transfers.

Data transfers between memory and i/o memory require more delicate TCD
(Transfer Control Descriptor) configuration and DMA channel service requests
via hardware.

dma_device.device_control callback function is needed to configure
DMA channel to work with i/o memory.

Signed-off-by: Alexander Popov <a13xp0p0v88@gmail.com>
---
 drivers/dma/mpc512x_dma.c | 230 ++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 192 insertions(+), 38 deletions(-)

diff --git a/drivers/dma/mpc512x_dma.c b/drivers/dma/mpc512x_dma.c
index 2d95673..8aedff1 100644
--- a/drivers/dma/mpc512x_dma.c
+++ b/drivers/dma/mpc512x_dma.c
@@ -2,6 +2,7 @@
  * Copyright (C) Freescale Semicondutor, Inc. 2007, 2008.
  * Copyright (C) Semihalf 2009
  * Copyright (C) Ilya Yanok, Emcraft Systems 2010
+ * Copyright (C) Alexander Popov, Promcontroller 2013
  *
  * Written by Piotr Ziecik <kosmo@semihalf.com>. Hardware description
  * (defines, structures and comments) was taken from MPC5121 DMA driver
@@ -28,11 +29,6 @@
  * file called COPYING.
  */
 
-/*
- * This is initial version of MPC5121 DMA driver. Only memory to memory
- * transfers are supported (tested using dmatest module).
- */
-
 #include <linux/module.h>
 #include <linux/dmaengine.h>
 #include <linux/dma-mapping.h>
@@ -183,6 +179,8 @@ struct mpc_dma_desc {
 
 struct mpc_dma_chan {
 	struct dma_chan			chan;
+	enum dma_transfer_direction	dir;
+	enum dma_slave_buswidth		slave_reg_width;
 	struct list_head		free;
 	struct list_head		prepared;
 	struct list_head		queued;
@@ -190,6 +188,7 @@ struct mpc_dma_chan {
 	struct list_head		completed;
 	struct mpc_dma_tcd		*tcd;
 	dma_addr_t			tcd_paddr;
+	u32				tcd_nunits;
 
 	/* Lock for this structure */
 	spinlock_t			lock;
@@ -268,7 +267,11 @@ static void mpc_dma_execute(struct mpc_dma_chan *mchan)
 
 	if (first != prev)
 		mdma->tcd[cid].e_sg = 1;
-	out_8(&mdma->regs->dmassrt, cid);
+
+	if (first->tcd->biter != 1) /* Request channel service by... */
+		out_8(&mdma->regs->dmaserq, cid); /* hardware */
+	else
+		out_8(&mdma->regs->dmassrt, cid); /* software */
 }
 
 /* Handle interrupt on one half of DMA controller (32 channels) */
@@ -567,7 +570,42 @@ mpc_dma_tx_status(struct dma_chan *chan, dma_cookie_t cookie,
 	return ret;
 }
 
-/* Prepare descriptor for memory to memory copy */
+static int mpc_dma_control(struct dma_chan *chan, enum dma_ctrl_cmd cmd,
+						unsigned long arg)
+{
+	struct mpc_dma_chan *mchan = dma_chan_to_mpc_dma_chan(chan);
+	struct dma_slave_config *cfg = (void *)arg;
+	int ret = 0;
+
+	if (!chan)
+		return -EINVAL;
+
+	if (cmd == DMA_SLAVE_CONFIG && cfg) {
+		if (cfg->direction == DMA_DEV_TO_MEM) {
+			if (cfg->src_addr_width != DMA_SLAVE_BUSWIDTH_UNDEFINED)
+				mchan->slave_reg_width = cfg->src_addr_width;
+			else
+				return -EINVAL;
+			mchan->dir = DMA_DEV_TO_MEM;
+			mchan->tcd_nunits = cfg->src_maxburst;
+		} else if (cfg->direction == DMA_MEM_TO_DEV) {
+			if (cfg->dst_addr_width != DMA_SLAVE_BUSWIDTH_UNDEFINED)
+				mchan->slave_reg_width = cfg->dst_addr_width;
+			else
+				return -EINVAL;
+			mchan->dir = DMA_MEM_TO_DEV;
+			mchan->tcd_nunits = cfg->dst_maxburst;
+		} else {
+			mchan->dir = DMA_MEM_TO_MEM;
+			mchan->slave_reg_width = DMA_SLAVE_BUSWIDTH_UNDEFINED;
+			mchan->tcd_nunits = 0;
+		}
+	} else
+		return -ENOSYS;
+
+	return ret;
+}
+
 static struct dma_async_tx_descriptor *
 mpc_dma_prep_memcpy(struct dma_chan *chan, dma_addr_t dst, dma_addr_t src,
 					size_t len, unsigned long flags)
@@ -577,6 +615,7 @@ mpc_dma_prep_memcpy(struct dma_chan *chan, dma_addr_t dst, dma_addr_t src,
 	struct mpc_dma_desc *mdesc = NULL;
 	struct mpc_dma_tcd *tcd;
 	unsigned long iflags;
+	u32 iter = 0;
 
 	/* Get free descriptor */
 	spin_lock_irqsave(&mchan->lock, iflags);
@@ -599,39 +638,153 @@ mpc_dma_prep_memcpy(struct dma_chan *chan, dma_addr_t dst, dma_addr_t src,
 	/* Prepare Transfer Control Descriptor for this transaction */
 	memset(tcd, 0, sizeof(struct mpc_dma_tcd));
 
-	if (IS_ALIGNED(src | dst | len, 32)) {
-		tcd->ssize = MPC_DMA_TSIZE_32;
-		tcd->dsize = MPC_DMA_TSIZE_32;
-		tcd->soff = 32;
-		tcd->doff = 32;
-	} else if (!mdma->is_mpc8308 && IS_ALIGNED(src | dst | len, 16)) {
-		/* MPC8308 doesn't support 16 byte transfers */
-		tcd->ssize = MPC_DMA_TSIZE_16;
-		tcd->dsize = MPC_DMA_TSIZE_16;
-		tcd->soff = 16;
-		tcd->doff = 16;
-	} else if (IS_ALIGNED(src | dst | len, 4)) {
-		tcd->ssize = MPC_DMA_TSIZE_4;
-		tcd->dsize = MPC_DMA_TSIZE_4;
-		tcd->soff = 4;
-		tcd->doff = 4;
-	} else if (IS_ALIGNED(src | dst | len, 2)) {
-		tcd->ssize = MPC_DMA_TSIZE_2;
-		tcd->dsize = MPC_DMA_TSIZE_2;
-		tcd->soff = 2;
-		tcd->doff = 2;
-	} else {
-		tcd->ssize = MPC_DMA_TSIZE_1;
-		tcd->dsize = MPC_DMA_TSIZE_1;
-		tcd->soff = 1;
-		tcd->doff = 1;
-	}
-
 	tcd->saddr = src;
 	tcd->daddr = dst;
-	tcd->nbytes = len;
-	tcd->biter = 1;
-	tcd->citer = 1;
+	if (mchan->dir == DMA_MEM_TO_MEM) {
+		if (IS_ALIGNED(src | dst | len, 32)) {
+			tcd->ssize = MPC_DMA_TSIZE_32;
+			tcd->dsize = MPC_DMA_TSIZE_32;
+			tcd->soff = 32;
+			tcd->doff = 32;
+		} else if (!mdma->is_mpc8308 &&
+					IS_ALIGNED(src | dst | len, 16)) {
+			/* MPC8308 doesn't support 16 byte transfers */
+			tcd->ssize = MPC_DMA_TSIZE_16;
+			tcd->dsize = MPC_DMA_TSIZE_16;
+			tcd->soff = 16;
+			tcd->doff = 16;
+		} else if (IS_ALIGNED(src | dst | len, 4)) {
+			tcd->ssize = MPC_DMA_TSIZE_4;
+			tcd->dsize = MPC_DMA_TSIZE_4;
+			tcd->soff = 4;
+			tcd->doff = 4;
+		} else if (IS_ALIGNED(src | dst | len, 2)) {
+			tcd->ssize = MPC_DMA_TSIZE_2;
+			tcd->dsize = MPC_DMA_TSIZE_2;
+			tcd->soff = 2;
+			tcd->doff = 2;
+		} else {
+			tcd->ssize = MPC_DMA_TSIZE_1;
+			tcd->dsize = MPC_DMA_TSIZE_1;
+			tcd->soff = 1;
+			tcd->doff = 1;
+		}
+		tcd->nbytes = len;
+		tcd->biter = 1;
+		tcd->citer = 1;
+	} else {
+		/* Memory to io-memory transfer or vice versa.
+		 * DMA controller is going to access io-memory via
+		 * some FIFO data register. The width of this register
+		 * is mchan->slave_reg_width.
+		 *
+		 * Since some FIFO registers require full width access,
+		 * let's firmly set the corresponding transfer size
+		 * to mchan->slave_reg_width
+		 * and prohibit transfers of packets with a length
+		 * which is not aligned on mchan->slave_reg_width boundaries
+		 * to avoid Transfer Control Descriptor inconsistency.
+		 * Moreover this will save us from playing with
+		 * source and destination address modulo.
+		 */
+
+		if (!IS_ALIGNED(len, mchan->slave_reg_width))
+			return NULL;
+
+		if (mchan->dir == DMA_DEV_TO_MEM) {
+			tcd->soff = 0;
+			if (mchan->slave_reg_width ==
+					DMA_SLAVE_BUSWIDTH_4_BYTES) {
+				tcd->ssize = MPC_DMA_TSIZE_4;
+				if (IS_ALIGNED(dst, 4)) {
+					tcd->dsize = MPC_DMA_TSIZE_4;
+					tcd->doff = 4;
+				} else if (IS_ALIGNED(dst, 2)) {
+					tcd->dsize = MPC_DMA_TSIZE_2;
+					tcd->doff = 2;
+				} else {
+					tcd->dsize = MPC_DMA_TSIZE_1;
+					tcd->doff = 1;
+				}
+			} else if (mchan->slave_reg_width ==
+					DMA_SLAVE_BUSWIDTH_2_BYTES) {
+				tcd->ssize = MPC_DMA_TSIZE_2;
+				if (IS_ALIGNED(dst, 2)) {
+					tcd->dsize = MPC_DMA_TSIZE_2;
+					tcd->doff = 2;
+				} else {
+					tcd->dsize = MPC_DMA_TSIZE_1;
+					tcd->doff = 1;
+				}
+			} else if (mchan->slave_reg_width ==
+					DMA_SLAVE_BUSWIDTH_1_BYTE) {
+				tcd->ssize = MPC_DMA_TSIZE_1;
+				tcd->dsize = MPC_DMA_TSIZE_1;
+				tcd->doff = 1;
+			} else
+				return NULL;
+		} else {
+			tcd->doff = 0;
+			if (mchan->slave_reg_width ==
+					DMA_SLAVE_BUSWIDTH_4_BYTES) {
+				tcd->dsize = MPC_DMA_TSIZE_4;
+				if (IS_ALIGNED(src, 4)) {
+					tcd->ssize = MPC_DMA_TSIZE_4;
+					tcd->soff = 4;
+				} else if (IS_ALIGNED(src, 2)) {
+					tcd->ssize = MPC_DMA_TSIZE_2;
+					tcd->soff = 2;
+				} else {
+					tcd->ssize = MPC_DMA_TSIZE_1;
+					tcd->soff = 1;
+				}
+			} else if (mchan->slave_reg_width ==
+					DMA_SLAVE_BUSWIDTH_2_BYTES) {
+				tcd->dsize = MPC_DMA_TSIZE_2;
+				if (IS_ALIGNED(src, 2)) {
+					tcd->ssize = MPC_DMA_TSIZE_2;
+					tcd->soff = 2;
+				} else {
+					tcd->ssize = MPC_DMA_TSIZE_1;
+					tcd->soff = 1;
+				}
+			} else if (mchan->slave_reg_width ==
+					DMA_SLAVE_BUSWIDTH_1_BYTE) {
+				tcd->dsize = MPC_DMA_TSIZE_1;
+				tcd->ssize = MPC_DMA_TSIZE_1;
+				tcd->soff = 1;
+			} else
+				return NULL;
+		}
+
+		if (mchan->tcd_nunits) {
+			tcd->nbytes = mchan->tcd_nunits *
+						mchan->slave_reg_width;
+			if (!IS_ALIGNED(len, tcd->nbytes)) {
+				/* mchan->tcd_nunits is inconsistent */
+				return NULL;
+			}
+
+			iter = len / tcd->nbytes;
+			if (iter > ((1 << 15) - 1)) {	/* maximum biter */
+				return NULL; /* len is too big */
+			} else {
+				tcd->biter = iter;
+				tcd->biter_linkch = iter >> 9;
+				tcd->citer = tcd->biter;
+				tcd->citer_linkch = tcd->biter_linkch;
+			}
+
+			/* DMA hardware should automatically clear
+			 * the corresponding DMAERQ bit when
+			 * the current major iteration count reaches zero. */
+			tcd->d_req = 1;
+		} else {
+			tcd->nbytes = len;
+			tcd->biter = 1;
+			tcd->citer = 1;
+		}
+	}
 
 	/* Place descriptor in prepared list */
 	spin_lock_irqsave(&mchan->lock, iflags);
@@ -725,6 +878,7 @@ static int mpc_dma_probe(struct platform_device *op)
 	dma->device_issue_pending = mpc_dma_issue_pending;
 	dma->device_tx_status = mpc_dma_tx_status;
 	dma->device_prep_dma_memcpy = mpc_dma_prep_memcpy;
+	dma->device_control = mpc_dma_control;
 
 	INIT_LIST_HEAD(&dma->channels);
 	dma_cap_set(DMA_MEMCPY, dma->cap_mask);
-- 
1.7.11.3

^ permalink raw reply related

* Re: [PATCH net-next] af_unix: fix a fatal race with bit fields
From: Ben Hutchings @ 2013-05-01 12:08 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Eric Dumazet, netdev, Ambrose Feinstein, Paul Mackerras,
	linuxppc-dev, David Miller
In-Reply-To: <1367372393.22115.6.camel@pasglop>

On Wed, 2013-05-01 at 11:39 +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2013-04-30 at 18:12 -0700, Eric Dumazet wrote:
> > From: Eric Dumazet <edumazet@google.com>
> > 
> > Using bit fields is dangerous on ppc64, as the compiler uses 64bit
> > instructions to manipulate them. If the 64bit word includes any
> > atomic_t or spinlock_t, we can lose critical concurrent changes.
> > 
> > This is happening in af_unix, where unix_sk(sk)->gc_candidate/
> > gc_maybe_cycle/lock share the same 64bit word.
> > 
> > This leads to fatal deadlock, as one/several cpus spin forever
> > on a spinlock that will never be available again.
> > 
> > Reported-by: Ambrose Feinstein <ambrose@google.com>
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> > Cc: Paul Mackerras <paulus@samba.org>
> > ---
> > 
> > Could ppc64 experts confirm using byte is safe, or should we really add
> > a 32bit hole after the spinlock ? If so, I wonder how many other places
> > need a change...
> 
> Wow, nice one !
> 
> I'm not even completely certain bytes are safe to be honest, though
> probably more than bitfields. I'll poke our compiler people.

There is a longstanding and hard-to-fix bug in gcc that is specific to
bitfields.  I think that the underlying type isn't propagated, so when
it comes to code generation the compiler doesn't know the natural width
for the memory access.

As for bytes - early Alphas couldn't load/store less than 32 bits, but I
doubt anyone cares any more.

> The worry is of course how many more of these do we potentially have ? 
> We might be able to automate finding these issues with sparse, I
> suppose.
> 
> Also I'd be surprised if ppc64 is the only one with that problem... what
> about sparc64 and arm64 ?

I expect they can have the same general problem, but gcc may be more or
less keen to generate 64-bit load/store instructions for bitfields on
different architectures.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH -V7 18/18] powerpc: Update tlbie/tlbiel as per ISA doc
From: David Gibson @ 2013-05-01 11:36 UTC (permalink / raw)
  To: Simon Jeons; +Cc: linux-mm, linuxppc-dev, Aneesh Kumar K.V, paulus
In-Reply-To: <5180C9BE.5040002@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 4123 bytes --]

On Wed, May 01, 2013 at 03:52:30PM +0800, Simon Jeons wrote:
> On 05/01/2013 03:47 PM, Aneesh Kumar K.V wrote:
> >David Gibson <dwg@au1.ibm.com> writes:
> >
> >>On Tue, Apr 30, 2013 at 10:51:00PM +0530, Aneesh Kumar K.V wrote:
> >>>David Gibson <dwg@au1.ibm.com> writes:
> >>>
> >>>>On Mon, Apr 29, 2013 at 01:07:39AM +0530, Aneesh Kumar K.V wrote:
> >>>>>From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> >>>>>
> >>>>>Encode the actual page correctly in tlbie/tlbiel. This make sure we handle
> >>>>>multiple page size segment correctly.
> >>>>As mentioned in previous comments, this commit message needs to give
> >>>>much more detail about what precisely the existing implementation is
> >>>>doing wrong.
> >>>>
> >>>>>Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> >>>>>---
> >>>>>  arch/powerpc/mm/hash_native_64.c | 32 ++++++++++++++++++++++++++++++--
> >>>>>  1 file changed, 30 insertions(+), 2 deletions(-)
> >>>>>
> >>>>>diff --git a/arch/powerpc/mm/hash_native_64.c b/arch/powerpc/mm/hash_native_64.c
> >>>>>index bb920ee..6a2aead 100644
> >>>>>--- a/arch/powerpc/mm/hash_native_64.c
> >>>>>+++ b/arch/powerpc/mm/hash_native_64.c
> >>>>>@@ -61,7 +61,10 @@ static inline void __tlbie(unsigned long vpn, int psize, int apsize, int ssize)
> >>>>>  	switch (psize) {
> >>>>>  	case MMU_PAGE_4K:
> >>>>>+		/* clear out bits after (52) [0....52.....63] */
> >>>>>+		va &= ~((1ul << (64 - 52)) - 1);
> >>>>>  		va |= ssize << 8;
> >>>>>+		va |= mmu_psize_defs[apsize].sllp << 6;
> >>>>>  		asm volatile(ASM_FTR_IFCLR("tlbie %0,0", PPC_TLBIE(%1,%0), %2)
> >>>>>  			     : : "r" (va), "r"(0), "i" (CPU_FTR_ARCH_206)
> >>>>>  			     : "memory");
> >>>>>@@ -69,9 +72,20 @@ static inline void __tlbie(unsigned long vpn, int psize, int apsize, int ssize)
> >>>>>  	default:
> >>>>>  		/* We need 14 to 14 + i bits of va */
> >>>>>  		penc = mmu_psize_defs[psize].penc[apsize];
> >>>>>-		va &= ~((1ul << mmu_psize_defs[psize].shift) - 1);
> >>>>>+		va &= ~((1ul << mmu_psize_defs[apsize].shift) - 1);
> >>>>>  		va |= penc << 12;
> >>>>>  		va |= ssize << 8;
> >>>>>+		/* Add AVAL part */
> >>>>>+		if (psize != apsize) {
> >>>>>+			/*
> >>>>>+			 * MPSS, 64K base page size and 16MB parge page size
> >>>>>+			 * We don't need all the bits, but rest of the bits
> >>>>>+			 * must be ignored by the processor.
> >>>>>+			 * vpn cover upto 65 bits of va. (0...65) and we need
> >>>>>+			 * 58..64 bits of va.
> >>>>I can't understand what this comment is saying.  Why do we need to do
> >>>>something different in the psize != apsize case?
> >>>>
> >>>>>+			 */
> >>>>>+			va |= (vpn & 0xfe);
> >>>>>+		}
> >>>That is as per ISA doc. It says if base page size == actual page size,
> >>>(RB)56:62 must be zeros, which must be ignored by the processor.
> >>>Otherwise it should be filled with the selected bits of VA as explained above.
> >>What you've just said here makes much more sense than what's written
> >>in the comment in the code.
> >>
> >>>We only support MPSS with base page size = 64K and actual page size = 16MB.
> >>Is that actually relevant to this code though?
> >In a way yes. The number of bits we we select out of VA depends on the
> >base page size and actual page size. We have a math around that
> >documented in ISA. Now since we support only 64K and 16MB we can make it
> >simpler by only selecting required bits and not making it a
> >function. But then it is also not relevant to the code in that ISA also
> >state other bits in (RB)56:62 must be zero. I wanted to capture both the
> >details in the comment.
> 
> What's the benefit of ppc use hash-based page table instead of
> tree-based page table? If you said that hash is faster, could x86
> change to it?

Complex and debatable at best.  But in any case it's a fixed property
of the CPU, not something that can be changed in software.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply

* Re: [PATCH net-next] af_unix: fix a fatal race with bit fields
From: Stephen Hemminger @ 2013-05-01 15:10 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Alan Modra, netdev, Ambrose Feinstein, Paul Mackerras,
	Anton Blanchard, linuxppc-dev, David Miller
In-Reply-To: <1367384672.11020.34.camel@edumazet-glaptop>

On Tue, 30 Apr 2013 22:04:32 -0700
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> On Wed, 2013-05-01 at 13:24 +0930, Alan Modra wrote:
> > On Tue, Apr 30, 2013 at 07:24:20PM -0700, Eric Dumazet wrote:
> > > 	li 11,1
> > > 	ld 0,0(9)
> > > 	rldimi 0,11,31,32
> > > 	std 0,0(9)
> > > 	blr
> > > 	.ident	"GCC: (GNU) 4.6.3"
> > > 
> > > You can see "ld 0,0(9)" is used : its a 64 bit load.
> > 
> > Yup.  This is not a powerpc64 specific problem.  See
> > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52080
> > Fixed in 4.8.0 and 4.7.3.
> 
> Ah thanks.
> 
> This seems a pretty serious issue, is it documented somewhere that
> ppc64, sparc64 and others need such compiler version ?
> 
> These kind of errors are pretty hard to find, its a pity to spend time
> on them.

There is a checkbin target inside arch/powerpc/Makefile
Shouldn't a check be added there to block building kernel with known
bad GCC versions?

^ permalink raw reply

* [PATCH v2 net-next] af_unix: fix a fatal race with bit fields
From: Eric Dumazet @ 2013-05-01 15:24 UTC (permalink / raw)
  To: David Miller; +Cc: linuxppc-dev, paulus, ambrose, netdev
In-Reply-To: <20130501.033650.703182794549888825.davem@davemloft.net>

On Wed, 2013-05-01 at 03:36 -0400, David Miller wrote:
> From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Date: Wed, 01 May 2013 11:39:53 +1000
> 
> > I'm not even completely certain bytes are safe to be honest, though
> > probably more than bitfields. I'll poke our compiler people.
> 
> Older Alpha only has 32-bit and 64-bit loads and stores, so byte sized
> accesses are not atomic, and therefore use racey read-modify-write
> sequences.

Right, so what about the following more general fix ?

Thanks !

[PATCH v2] af_unix: fix a fatal race with bit fields

Using bit fields is dangerous on ppc64/sparc64, as the compiler [1]
uses 64bit instructions to manipulate them.
If the 64bit word includes any atomic_t or spinlock_t, we can lose
critical concurrent changes.

This is happening in af_unix, where unix_sk(sk)->gc_candidate/
gc_maybe_cycle/lock share the same 64bit word.

This leads to fatal deadlock, as one/several cpus spin forever
on a spinlock that will never be available again.

A safer way would be to use a long to store flags.
This way we are sure compiler/arch wont do bad things.

As we own unix_gc_lock spinlock when clearing or setting bits,
we can use the non atomic __set_bit()/__clear_bit().

recursion_level can share the same 64bit location with the spinlock,
as it is set only with this spinlock held.

[1] bug fixed in gcc-4.8.0 :
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52080

Reported-by: Ambrose Feinstein <ambrose@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
---
 include/net/af_unix.h |    5 +++--
 net/unix/garbage.c    |   12 ++++++------
 2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/include/net/af_unix.h b/include/net/af_unix.h
index a8836e8..dbdfd2b 100644
--- a/include/net/af_unix.h
+++ b/include/net/af_unix.h
@@ -57,9 +57,10 @@ struct unix_sock {
 	struct list_head	link;
 	atomic_long_t		inflight;
 	spinlock_t		lock;
-	unsigned int		gc_candidate : 1;
-	unsigned int		gc_maybe_cycle : 1;
 	unsigned char		recursion_level;
+	unsigned long		gc_flags;
+#define UNIX_GC_CANDIDATE	0
+#define UNIX_GC_MAYBE_CYCLE	1
 	struct socket_wq	peer_wq;
 };
 #define unix_sk(__sk) ((struct unix_sock *)__sk)
diff --git a/net/unix/garbage.c b/net/unix/garbage.c
index d0f6545..9c6cc08 100644
--- a/net/unix/garbage.c
+++ b/net/unix/garbage.c
@@ -185,7 +185,7 @@ static void scan_inflight(struct sock *x, void (*func)(struct unix_sock *),
 					 * have been added to the queues after
 					 * starting the garbage collection
 					 */
-					if (u->gc_candidate) {
+					if (test_bit(UNIX_GC_CANDIDATE, &u->gc_flags)) {
 						hit = true;
 						func(u);
 					}
@@ -254,7 +254,7 @@ static void inc_inflight_move_tail(struct unix_sock *u)
 	 * of the list, so that it's checked even if it was already
 	 * passed over
 	 */
-	if (u->gc_maybe_cycle)
+	if (test_bit(UNIX_GC_MAYBE_CYCLE, &u->gc_flags))
 		list_move_tail(&u->link, &gc_candidates);
 }
 
@@ -315,8 +315,8 @@ void unix_gc(void)
 		BUG_ON(total_refs < inflight_refs);
 		if (total_refs == inflight_refs) {
 			list_move_tail(&u->link, &gc_candidates);
-			u->gc_candidate = 1;
-			u->gc_maybe_cycle = 1;
+			__set_bit(UNIX_GC_CANDIDATE, &u->gc_flags);
+			__set_bit(UNIX_GC_MAYBE_CYCLE, &u->gc_flags);
 		}
 	}
 
@@ -344,7 +344,7 @@ void unix_gc(void)
 
 		if (atomic_long_read(&u->inflight) > 0) {
 			list_move_tail(&u->link, &not_cycle_list);
-			u->gc_maybe_cycle = 0;
+			__clear_bit(UNIX_GC_MAYBE_CYCLE, &u->gc_flags);
 			scan_children(&u->sk, inc_inflight_move_tail, NULL);
 		}
 	}
@@ -356,7 +356,7 @@ void unix_gc(void)
 	 */
 	while (!list_empty(&not_cycle_list)) {
 		u = list_entry(not_cycle_list.next, struct unix_sock, link);
-		u->gc_candidate = 0;
+		__clear_bit(UNIX_GC_CANDIDATE, &u->gc_flags);
 		list_move_tail(&u->link, &gc_inflight_list);
 	}
 

^ permalink raw reply related

* Re: [PATCH] powerpc: Bring all threads online prior to migration/hibernation
From: Nathan Fontenot @ 2013-05-01 15:25 UTC (permalink / raw)
  To: Robert Jennings, linuxppc-dev, Ben Herrenschmidt, stable
In-Reply-To: <20130426213203.GC3698@linux.vnet.ibm.com>

On 04/26/2013 04:32 PM, Robert Jennings wrote:
> With this patch before a migration/hibernation all threads present but
> not online will be brought online.  After migration/hibernation those
> threads are taken back offline.
> 
> During migration/hibernation all online CPUs must call H_JOIN, this is
> required by the hypervisor.  Without this patch, threads that are offline
> (H_CEDE'd) will not be woken to make the H_JOIN call and the OS will be
> deadlocked (all threads either JOIN'd or CEDE'd).
> 

This fixes a long standing bug in partition migration.

Acked-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>

> Cc: <stable@kernel.org>
> Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/rtas.h          |    2 +
>  arch/powerpc/kernel/rtas.c               |   95 ++++++++++++++++++++++++++++++
>  arch/powerpc/platforms/pseries/suspend.c |   22 +++++++
>  3 files changed, 119 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
> index aef00c6..ee38f29 100644
> --- a/arch/powerpc/include/asm/rtas.h
> +++ b/arch/powerpc/include/asm/rtas.h
> @@ -262,6 +262,8 @@ extern void rtas_progress(char *s, unsigned short hex);
>  extern void rtas_initialize(void);
>  extern int rtas_suspend_cpu(struct rtas_suspend_me_data *data);
>  extern int rtas_suspend_last_cpu(struct rtas_suspend_me_data *data);
> +extern int rtas_online_cpus_mask(cpumask_var_t cpus);
> +extern int rtas_offline_cpus_mask(cpumask_var_t cpus);
>  extern int rtas_ibm_suspend_me(struct rtas_args *);
>  
>  struct rtc_time;
> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
> index 1fd6e7b..855ee98 100644
> --- a/arch/powerpc/kernel/rtas.c
> +++ b/arch/powerpc/kernel/rtas.c
> @@ -19,6 +19,7 @@
>  #include <linux/init.h>
>  #include <linux/capability.h>
>  #include <linux/delay.h>
> +#include <linux/cpu.h>
>  #include <linux/smp.h>
>  #include <linux/completion.h>
>  #include <linux/cpumask.h>
> @@ -807,6 +808,77 @@ static void rtas_percpu_suspend_me(void *info)
>  	__rtas_suspend_cpu((struct rtas_suspend_me_data *)info, 1);
>  }
>  
> +enum rtas_cpu_state {
> +	DOWN,
> +	UP,
> +};
> +
> +/* On return cpumask will be altered to indicate CPUs changed */
> +static int rtas_cpu_state_change_mask(enum rtas_cpu_state state,
> +				cpumask_var_t cpus)
> +{
> +	int cpu;
> +	int cpuret = 0;
> +	int ret = 0;
> +
> +	if (cpumask_empty(cpus))
> +		return 0;
> +
> +	for_each_cpu(cpu, cpus) {
> +		switch (state) {
> +		case DOWN:
> +			cpuret = cpu_down(cpu);
> +			break;
> +		case UP:
> +			cpuret = cpu_up(cpu);
> +			break;
> +		}
> +		if (cpuret) {
> +			pr_debug("%s: cpu_%s for cpu#%d returned %d.\n",
> +					__func__,
> +					((state == UP) ? "up" : "down"),
> +					cpu, cpuret);
> +			if (!ret)
> +				ret = cpuret;
> +			if (state == UP) {
> +				cpumask_shift_right(cpus, cpus, cpu);
> +				cpumask_shift_left(cpus, cpus, cpu);
> +				break;
> +			} else
> +				cpumask_clear_cpu(cpu, cpus);
> +		}
> +	}
> +
> +	return ret;
> +}
> +
> +int rtas_online_cpus_mask(cpumask_var_t cpus)
> +{
> +	int ret;
> +
> +	ret = rtas_cpu_state_change_mask(UP, cpus);
> +
> +	if (ret) {
> +		cpumask_var_t tmp_mask;
> +
> +		if (!alloc_cpumask_var(&tmp_mask, GFP_TEMPORARY))
> +			return ret;
> +
> +		cpumask_copy(tmp_mask, cpus);
> +		rtas_offline_cpus_mask(tmp_mask);
> +		free_cpumask_var(tmp_mask);
> +	}
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL(rtas_online_cpus_mask);
> +
> +int rtas_offline_cpus_mask(cpumask_var_t cpus)
> +{
> +	return rtas_cpu_state_change_mask(DOWN, cpus);
> +}
> +EXPORT_SYMBOL(rtas_offline_cpus_mask);
> +
>  int rtas_ibm_suspend_me(struct rtas_args *args)
>  {
>  	long state;
> @@ -814,6 +886,8 @@ int rtas_ibm_suspend_me(struct rtas_args *args)
>  	unsigned long retbuf[PLPAR_HCALL_BUFSIZE];
>  	struct rtas_suspend_me_data data;
>  	DECLARE_COMPLETION_ONSTACK(done);
> +	cpumask_var_t offline_mask;
> +	int cpuret;
>  
>  	if (!rtas_service_present("ibm,suspend-me"))
>  		return -ENOSYS;
> @@ -837,11 +911,24 @@ int rtas_ibm_suspend_me(struct rtas_args *args)
>  		return 0;
>  	}
>  
> +	if (!alloc_cpumask_var(&offline_mask, GFP_TEMPORARY))
> +		return -ENOMEM;
> +
>  	atomic_set(&data.working, 0);
>  	atomic_set(&data.done, 0);
>  	atomic_set(&data.error, 0);
>  	data.token = rtas_token("ibm,suspend-me");
>  	data.complete = &done;
> +
> +	/* All present CPUs must be online */
> +	cpumask_andnot(offline_mask, cpu_present_mask, cpu_online_mask);
> +	cpuret = rtas_online_cpus_mask(offline_mask);
> +	if (cpuret) {
> +		pr_err("%s: Could not bring present CPUs online.\n", __func__);
> +		atomic_set(&data.error, cpuret);
> +		goto out;
> +	}
> +
>  	stop_topology_update();
>  
>  	/* Call function on all CPUs.  One of us will make the
> @@ -857,6 +944,14 @@ int rtas_ibm_suspend_me(struct rtas_args *args)
>  
>  	start_topology_update();
>  
> +	/* Take down CPUs not online prior to suspend */
> +	cpuret = rtas_offline_cpus_mask(offline_mask);
> +	if (cpuret)
> +		pr_warn("%s: Could not restore CPUs to offline state.\n",
> +				__func__);
> +
> +out:
> +	free_cpumask_var(offline_mask);
>  	return atomic_read(&data.error);
>  }
>  #else /* CONFIG_PPC_PSERIES */
> diff --git a/arch/powerpc/platforms/pseries/suspend.c b/arch/powerpc/platforms/pseries/suspend.c
> index 47226e0..5f997e7 100644
> --- a/arch/powerpc/platforms/pseries/suspend.c
> +++ b/arch/powerpc/platforms/pseries/suspend.c
> @@ -16,6 +16,7 @@
>    * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
>    */
>  
> +#include <linux/cpu.h>
>  #include <linux/delay.h>
>  #include <linux/suspend.h>
>  #include <linux/stat.h>
> @@ -126,11 +127,15 @@ static ssize_t store_hibernate(struct device *dev,
>  			       struct device_attribute *attr,
>  			       const char *buf, size_t count)
>  {
> +	cpumask_var_t offline_mask;
>  	int rc;
>  
>  	if (!capable(CAP_SYS_ADMIN))
>  		return -EPERM;
>  
> +	if (!alloc_cpumask_var(&offline_mask, GFP_TEMPORARY))
> +		return -ENOMEM;
> +
>  	stream_id = simple_strtoul(buf, NULL, 16);
>  
>  	do {
> @@ -140,15 +145,32 @@ static ssize_t store_hibernate(struct device *dev,
>  	} while (rc == -EAGAIN);
>  
>  	if (!rc) {
> +		/* All present CPUs must be online */
> +		cpumask_andnot(offline_mask, cpu_present_mask,
> +				cpu_online_mask);
> +		rc = rtas_online_cpus_mask(offline_mask);
> +		if (rc) {
> +			pr_err("%s: Could not bring present CPUs online.\n",
> +					__func__);
> +			goto out;
> +		}
> +
>  		stop_topology_update();
>  		rc = pm_suspend(PM_SUSPEND_MEM);
>  		start_topology_update();
> +
> +		/* Take down CPUs not online prior to suspend */
> +		if (!rtas_offline_cpus_mask(offline_mask))
> +			pr_warn("%s: Could not restore CPUs to offline "
> +					"state.\n", __func__);
>  	}
>  
>  	stream_id = 0;
>  
>  	if (!rc)
>  		rc = count;
> +out:
> +	free_cpumask_var(offline_mask);
>  	return rc;
>  }
>  


-- 
-Nathan

^ permalink raw reply

* RE: [PATCH v2 net-next] af_unix: fix a fatal race with bit fields
From: Eric Dumazet @ 2013-05-01 16:00 UTC (permalink / raw)
  To: David Laight; +Cc: ambrose, paulus, netdev, linuxppc-dev, David Miller
In-Reply-To: <AE90C24D6B3A694183C094C60CF0A2F6026B721C@saturn3.aculab.com>

On Wed, 2013-05-01 at 16:53 +0100, David Laight wrote:
> Why not just change gc_candidate and gc_maybe_cycle to
> unsigned char?
> It might reduce the number of pad bytes somewhat.

You didn't quite follow the discussion.

I used bytes on V1, and we are not 100% sure its safe on all arches.

unsigned long is guaranteed to be safe. We absolutely rely on this.

Better use more bytes on a socket (with no impact on real memory use),
than spending hours to debug some strange issues.

^ permalink raw reply

* RE: [PATCH v2 net-next] af_unix: fix a fatal race with bit fields
From: David Laight @ 2013-05-01 15:53 UTC (permalink / raw)
  To: Eric Dumazet, David Miller; +Cc: linuxppc-dev, paulus, ambrose, netdev
In-Reply-To: <1367421843.11020.43.camel@edumazet-glaptop>

PiBkaWZmIC0tZ2l0IGEvaW5jbHVkZS9uZXQvYWZfdW5peC5oIGIvaW5jbHVkZS9uZXQvYWZfdW5p
eC5oDQo+IGluZGV4IGE4ODM2ZTguLmRiZGZkMmIgMTAwNjQ0DQo+IC0tLSBhL2luY2x1ZGUvbmV0
L2FmX3VuaXguaA0KPiArKysgYi9pbmNsdWRlL25ldC9hZl91bml4LmgNCj4gQEAgLTU3LDkgKzU3
LDEwIEBAIHN0cnVjdCB1bml4X3NvY2sgew0KPiAgCXN0cnVjdCBsaXN0X2hlYWQJbGluazsNCj4g
IAlhdG9taWNfbG9uZ190CQlpbmZsaWdodDsNCj4gIAlzcGlubG9ja190CQlsb2NrOw0KPiAtCXVu
c2lnbmVkIGludAkJZ2NfY2FuZGlkYXRlIDogMTsNCj4gLQl1bnNpZ25lZCBpbnQJCWdjX21heWJl
X2N5Y2xlIDogMTsNCj4gIAl1bnNpZ25lZCBjaGFyCQlyZWN1cnNpb25fbGV2ZWw7DQo+ICsJdW5z
aWduZWQgbG9uZwkJZ2NfZmxhZ3M7DQo+ICsjZGVmaW5lIFVOSVhfR0NfQ0FORElEQVRFCTANCj4g
KyNkZWZpbmUgVU5JWF9HQ19NQVlCRV9DWUNMRQkxDQo+ICAJc3RydWN0IHNvY2tldF93cQlwZWVy
X3dxOw0KPiAgfTsNCg0KV2h5IG5vdCBqdXN0IGNoYW5nZSBnY19jYW5kaWRhdGUgYW5kIGdjX21h
eWJlX2N5Y2xlIHRvDQp1bnNpZ25lZCBjaGFyPw0KSXQgbWlnaHQgcmVkdWNlIHRoZSBudW1iZXIg
b2YgcGFkIGJ5dGVzIHNvbWV3aGF0Lg0KDQoJRGF2aWQNCg0K

^ permalink raw reply

* Re: ppc/sata-fsl: orphan config value: CONFIG_MPC8315_DS
From: Scott Wood @ 2013-05-01 18:05 UTC (permalink / raw)
  To: Anthony Foiani
  Cc: Robert P.J.Day, linuxppc-dev@lists.ozlabs.org, Li Yang-R58472,
	Jeff Garzik, Adrian Bunk
In-Reply-To: <518078C0.7070805@scrye.com>

On 04/30/2013 09:06:56 PM, Anthony Foiani wrote:
> Scott --
>=20
> On 04/30/2013 06:42 PM, Scott Wood wrote:
>> I just meant that, for whatever boards you would have put this in =20
>> the device tree, put it in platform code instead (if the platform =20
>> file supports more than one board type, then check the compatible at =20
>> the top of the device tree).
>=20
> I think I understand what you're suggesting.
>=20
> Instead of a new property name, I would instead check for my specific =20
> board type (let's call it a foo-8315) in the top-level compatible =20
> list?  So I'd change my devtree to have this top-level compatible:
>=20
> / {
>     compatible =3D "example,foo-8315", "fsl,mpc8315erdb";

It should really only have compatible =3D "example,foo-8315", since it's =20
not 100% compatible with fsl,mpc8315erdb (at least due to this bug, but =20
probably there are other differences as well).

> If I saw that, I would then twiddle the bits as needed?

Yes.

> MIght work, although having it in the sata block of the device tree =20
> has the advantage of providing me exactly the OF node that I need (in =20
> ofdev->dev.of_node).  I'd have to figure out how to traverse to the =20
> dev tree root and then back down one to the root compat entry.  =20
> Probably not impossible, but I was aiming for a fairly minimal patch.

Well, if this is only seen on your board so far (or rather, your =20
vendor's board which isn't upstream), and you're OK with updating the =20
device tree, then I have no objection.

> It would also be nice if we could unravel exactly why that =20
> CONFIG_8315_DS ever showed up in the first place.

It would be nice, but I doubt that particular information is ever going =20
to surface...  IIRC I asked internally back when this first came up, =20
and didn't get an answer.

>> Or do you mean that you would not set this on any board's device =20
>> tree by default, and instead have users set it if they encounter =20
>> problems?
>=20
> No, I would expect to set it on all the boards, so using the =20
> compatibility hack above would work.

You mean all the boards that have the bug, which doesn't include any =20
upstream device tree, right?

-Scott=

^ permalink raw reply

* Re: [PATCH v2 net-next] af_unix: fix a fatal race with bit fields
From: David Miller @ 2013-05-01 19:14 UTC (permalink / raw)
  To: eric.dumazet; +Cc: linuxppc-dev, paulus, ambrose, netdev
In-Reply-To: <1367421843.11020.43.camel@edumazet-glaptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 01 May 2013 08:24:03 -0700

> [PATCH v2] af_unix: fix a fatal race with bit fields
> 
> Using bit fields is dangerous on ppc64/sparc64, as the compiler [1]
> uses 64bit instructions to manipulate them.
> If the 64bit word includes any atomic_t or spinlock_t, we can lose
> critical concurrent changes.
> 
> This is happening in af_unix, where unix_sk(sk)->gc_candidate/
> gc_maybe_cycle/lock share the same 64bit word.
> 
> This leads to fatal deadlock, as one/several cpus spin forever
> on a spinlock that will never be available again.
> 
> A safer way would be to use a long to store flags.
> This way we are sure compiler/arch wont do bad things.
> 
> As we own unix_gc_lock spinlock when clearing or setting bits,
> we can use the non atomic __set_bit()/__clear_bit().
> 
> recursion_level can share the same 64bit location with the spinlock,
> as it is set only with this spinlock held.
> 
> [1] bug fixed in gcc-4.8.0 :
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52080
> 
> Reported-by: Ambrose Feinstein <ambrose@google.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied and queued up for -stable, thanks Eric.

^ permalink raw reply

* Re: [PATCH] powerpc: Bring all threads online prior to migration/hibernation
From: Srivatsa S. Bhat @ 2013-05-01 19:43 UTC (permalink / raw)
  To: Robert Jennings, linuxppc-dev, Ben Herrenschmidt, stable
In-Reply-To: <20130426213203.GC3698@linux.vnet.ibm.com>

On 04/27/2013 03:02 AM, Robert Jennings wrote:
> With this patch before a migration/hibernation all threads present but
> not online will be brought online.  After migration/hibernation those
> threads are taken back offline.
> 
> During migration/hibernation all online CPUs must call H_JOIN, this is
> required by the hypervisor.  Without this patch, threads that are offline
> (H_CEDE'd) will not be woken to make the H_JOIN call and the OS will be
> deadlocked (all threads either JOIN'd or CEDE'd).
> 
> Cc: <stable@kernel.org>
> Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
> ---

Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>

Regards,
Srivatsa S. Bhat

>  arch/powerpc/include/asm/rtas.h          |    2 +
>  arch/powerpc/kernel/rtas.c               |   95 ++++++++++++++++++++++++++++++
>  arch/powerpc/platforms/pseries/suspend.c |   22 +++++++
>  3 files changed, 119 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
> index aef00c6..ee38f29 100644
> --- a/arch/powerpc/include/asm/rtas.h
> +++ b/arch/powerpc/include/asm/rtas.h
> @@ -262,6 +262,8 @@ extern void rtas_progress(char *s, unsigned short hex);
>  extern void rtas_initialize(void);
>  extern int rtas_suspend_cpu(struct rtas_suspend_me_data *data);
>  extern int rtas_suspend_last_cpu(struct rtas_suspend_me_data *data);
> +extern int rtas_online_cpus_mask(cpumask_var_t cpus);
> +extern int rtas_offline_cpus_mask(cpumask_var_t cpus);
>  extern int rtas_ibm_suspend_me(struct rtas_args *);
> 
>  struct rtc_time;
> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
> index 1fd6e7b..855ee98 100644
> --- a/arch/powerpc/kernel/rtas.c
> +++ b/arch/powerpc/kernel/rtas.c
> @@ -19,6 +19,7 @@
>  #include <linux/init.h>
>  #include <linux/capability.h>
>  #include <linux/delay.h>
> +#include <linux/cpu.h>
>  #include <linux/smp.h>
>  #include <linux/completion.h>
>  #include <linux/cpumask.h>
> @@ -807,6 +808,77 @@ static void rtas_percpu_suspend_me(void *info)
>  	__rtas_suspend_cpu((struct rtas_suspend_me_data *)info, 1);
>  }
> 
> +enum rtas_cpu_state {
> +	DOWN,
> +	UP,
> +};
> +
> +/* On return cpumask will be altered to indicate CPUs changed */
> +static int rtas_cpu_state_change_mask(enum rtas_cpu_state state,
> +				cpumask_var_t cpus)
> +{
> +	int cpu;
> +	int cpuret = 0;
> +	int ret = 0;
> +
> +	if (cpumask_empty(cpus))
> +		return 0;
> +
> +	for_each_cpu(cpu, cpus) {
> +		switch (state) {
> +		case DOWN:
> +			cpuret = cpu_down(cpu);
> +			break;
> +		case UP:
> +			cpuret = cpu_up(cpu);
> +			break;
> +		}
> +		if (cpuret) {
> +			pr_debug("%s: cpu_%s for cpu#%d returned %d.\n",
> +					__func__,
> +					((state == UP) ? "up" : "down"),
> +					cpu, cpuret);
> +			if (!ret)
> +				ret = cpuret;
> +			if (state == UP) {
> +				cpumask_shift_right(cpus, cpus, cpu);
> +				cpumask_shift_left(cpus, cpus, cpu);
> +				break;
> +			} else
> +				cpumask_clear_cpu(cpu, cpus);
> +		}
> +	}
> +
> +	return ret;
> +}
> +
> +int rtas_online_cpus_mask(cpumask_var_t cpus)
> +{
> +	int ret;
> +
> +	ret = rtas_cpu_state_change_mask(UP, cpus);
> +
> +	if (ret) {
> +		cpumask_var_t tmp_mask;
> +
> +		if (!alloc_cpumask_var(&tmp_mask, GFP_TEMPORARY))
> +			return ret;
> +
> +		cpumask_copy(tmp_mask, cpus);
> +		rtas_offline_cpus_mask(tmp_mask);
> +		free_cpumask_var(tmp_mask);
> +	}
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL(rtas_online_cpus_mask);
> +
> +int rtas_offline_cpus_mask(cpumask_var_t cpus)
> +{
> +	return rtas_cpu_state_change_mask(DOWN, cpus);
> +}
> +EXPORT_SYMBOL(rtas_offline_cpus_mask);
> +
>  int rtas_ibm_suspend_me(struct rtas_args *args)
>  {
>  	long state;
> @@ -814,6 +886,8 @@ int rtas_ibm_suspend_me(struct rtas_args *args)
>  	unsigned long retbuf[PLPAR_HCALL_BUFSIZE];
>  	struct rtas_suspend_me_data data;
>  	DECLARE_COMPLETION_ONSTACK(done);
> +	cpumask_var_t offline_mask;
> +	int cpuret;
> 
>  	if (!rtas_service_present("ibm,suspend-me"))
>  		return -ENOSYS;
> @@ -837,11 +911,24 @@ int rtas_ibm_suspend_me(struct rtas_args *args)
>  		return 0;
>  	}
> 
> +	if (!alloc_cpumask_var(&offline_mask, GFP_TEMPORARY))
> +		return -ENOMEM;
> +
>  	atomic_set(&data.working, 0);
>  	atomic_set(&data.done, 0);
>  	atomic_set(&data.error, 0);
>  	data.token = rtas_token("ibm,suspend-me");
>  	data.complete = &done;
> +
> +	/* All present CPUs must be online */
> +	cpumask_andnot(offline_mask, cpu_present_mask, cpu_online_mask);
> +	cpuret = rtas_online_cpus_mask(offline_mask);
> +	if (cpuret) {
> +		pr_err("%s: Could not bring present CPUs online.\n", __func__);
> +		atomic_set(&data.error, cpuret);
> +		goto out;
> +	}
> +
>  	stop_topology_update();
> 
>  	/* Call function on all CPUs.  One of us will make the
> @@ -857,6 +944,14 @@ int rtas_ibm_suspend_me(struct rtas_args *args)
> 
>  	start_topology_update();
> 
> +	/* Take down CPUs not online prior to suspend */
> +	cpuret = rtas_offline_cpus_mask(offline_mask);
> +	if (cpuret)
> +		pr_warn("%s: Could not restore CPUs to offline state.\n",
> +				__func__);
> +
> +out:
> +	free_cpumask_var(offline_mask);
>  	return atomic_read(&data.error);
>  }
>  #else /* CONFIG_PPC_PSERIES */
> diff --git a/arch/powerpc/platforms/pseries/suspend.c b/arch/powerpc/platforms/pseries/suspend.c
> index 47226e0..5f997e7 100644
> --- a/arch/powerpc/platforms/pseries/suspend.c
> +++ b/arch/powerpc/platforms/pseries/suspend.c
> @@ -16,6 +16,7 @@
>    * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
>    */
> 
> +#include <linux/cpu.h>
>  #include <linux/delay.h>
>  #include <linux/suspend.h>
>  #include <linux/stat.h>
> @@ -126,11 +127,15 @@ static ssize_t store_hibernate(struct device *dev,
>  			       struct device_attribute *attr,
>  			       const char *buf, size_t count)
>  {
> +	cpumask_var_t offline_mask;
>  	int rc;
> 
>  	if (!capable(CAP_SYS_ADMIN))
>  		return -EPERM;
> 
> +	if (!alloc_cpumask_var(&offline_mask, GFP_TEMPORARY))
> +		return -ENOMEM;
> +
>  	stream_id = simple_strtoul(buf, NULL, 16);
> 
>  	do {
> @@ -140,15 +145,32 @@ static ssize_t store_hibernate(struct device *dev,
>  	} while (rc == -EAGAIN);
> 
>  	if (!rc) {
> +		/* All present CPUs must be online */
> +		cpumask_andnot(offline_mask, cpu_present_mask,
> +				cpu_online_mask);
> +		rc = rtas_online_cpus_mask(offline_mask);
> +		if (rc) {
> +			pr_err("%s: Could not bring present CPUs online.\n",
> +					__func__);
> +			goto out;
> +		}
> +
>  		stop_topology_update();
>  		rc = pm_suspend(PM_SUSPEND_MEM);
>  		start_topology_update();
> +
> +		/* Take down CPUs not online prior to suspend */
> +		if (!rtas_offline_cpus_mask(offline_mask))
> +			pr_warn("%s: Could not restore CPUs to offline "
> +					"state.\n", __func__);
>  	}
> 
>  	stream_id = 0;
> 
>  	if (!rc)
>  		rc = count;
> +out:
> +	free_cpumask_var(offline_mask);
>  	return rc;
>  }
> 

^ permalink raw reply

* Re: flush_icache_range on AMCC 44x targets
From: Josh Boyer @ 2013-05-01 19:53 UTC (permalink / raw)
  To: Mike; +Cc: linuxppc-dev
In-Reply-To: <CANtoAtNsjMdgdpj31myTDUBy_sbMk4nvgBzir_2ZHnSgsgZ=JQ@mail.gmail.com>

On Tue, Apr 30, 2013 at 02:17:59PM +0200, Mike wrote:
>Hi,
>
>i was reading trough arch/powerpc/kernel/misc32.S looking at the icbi and
>iccci instructions, from whats on print in
>http://s.eeweb.com/members/kvks_kumar/answers/1356585717-PPC440_UM2013.pdf(page
>272) iccci should be used once in the power-on / reset routine, and
>as far as flush_icache_range goes presumably before icbi is called?

I'm not understanding your question.

>So should not flush_icache_range go
>#ifdef CONFIG_44x
>iccci    0, r0
>#endif
>icbi    0,r6

The icbi isn't ever executed on 44x at all.

>arch/powerpc/kernel/misc32.S:
>/*
> * Write any modified data cache blocks out to memory
> * and invalidate the corresponding instruction cache blocks.
> * This is a no-op on the 601.
> *
> * flush_icache_range(unsigned long start, unsigned long stop)
> */
>_KPROBE(__flush_icache_range)
>BEGIN_FTR_SECTION
>    blr                /* for 601, do nothing */
>END_FTR_SECTION_IFSET(CPU_FTR_COHERENT_ICACHE)
>    li    r5,L1_CACHE_BYTES-1
>    andc    r3,r3,r5
>    subf    r4,r3,r4
>    add    r4,r4,r5
>    srwi.    r4,r4,L1_CACHE_SHIFT
>    beqlr
>    mtctr    r4
>    mr    r6,r3
>1:    dcbst    0,r3
>    addi    r3,r3,L1_CACHE_BYTES
>    bdnz    1b
>    sync                /* wait for dcbst's to get to ram */
>#ifndef CONFIG_44x

This part above is "if not defined CONFIG_44X", which means execute
everything below here until you hit #else if you are running on a
processor that isn't 4xx.

>    mtctr    r4
>2:    icbi    0,r6
>    addi    r6,r6,L1_CACHE_BYTES
>    bdnz    2b
>#else

Otherwise, use iccci.  Which is what you're sort of suggesting be done.

>    /* Flash invalidate on 44x because we are passed kmapped addresses and
>       this doesn't work for userspace pages due to the virtually tagged
>       icache.  Sigh. */
>    iccci    0, r0
>#endif

So.  I think the code is already doing what you think it should?

josh

^ permalink raw reply

* Re: PowerPC, P2020RDB, application debug when the application is in tight loop, Sysrq
From: Scott Wood @ 2013-05-01 19:54 UTC (permalink / raw)
  To: saikrishna gajula; +Cc: linuxppc-dev
In-Reply-To: <CAGstqXHrHWdiE=RvZ=1sV+=mPd9LiUZHL0gHPa89AVZRSND2Tg@mail.gmail.com>

On 04/17/2013 12:04:10 AM, saikrishna gajula wrote:
> HI All,
>=20
>              I am new to this group. I am working on Freescale P2020
> platform running linux 2.6.21.   I am looking for debug =20
> mechanism/utility,
> when a multi threaded application running on linux , appears to be =20
> hung (
> running in a tight loop,deadlock)  while not able to access the board
> through serial/SSH/Telnet.
>=20
>    I was looking at Magic sysrq option in linux to generate the stack,
> register dump when the application is hung. I am able to dump the call
> trace in normal working conditions. But i can't use  echo  t >
> /proc/sysrq-trigger and debug when the application hung.
>=20
>  I am using below piece of code(drivers/serial/8250.c) on P2020RDB to =20
> debug
> the application where in , in hung situation, when i press 'y'  =20
> followed by
> 't'  on serial console it should go to sysrq handler, and dump the =20
> call
> trace, but it is not happening.(simply board hung)
>=20
> {
>            if(sysrq_enable_flag)
>                   handle_sysrq(ch, up->port.info->tty);
>=20
>         sysrq_enable_flag =3D 0;
>=20
>         if(ch =3D=3D 'y')
>             sysrq_enable_flag =3D 1;
> }
>=20
> It would be helpful if you provide any hint on the issue, or any =20
> other way
> to debug the application in hang situations.

There's an erratum regarding breaks in Freescale's serial port =20
implementation that has been worked around in more recent kernels.  =20
2.6.21 is six years old.  If you update to a kernel that has the =20
workaround, or backport the workaround yourself to your old kernel, you =20
should be able to use a break instead of hacking in a check for 'y'.

Other than that, either the kernel is hung too badly to respond to =20
serial input at all (in which case use JTAG to debug) or there's =20
something wrong with your 'y' hack (sysrq_enable_flag doesn't even =20
appear in current kernels so I can't see if this is the case without =20
digging through old code).

In any case, upgrade to a newer kernel -- maybe you're hitting a bug =20
that has been fixed since then.

-Scott=

^ permalink raw reply

* Re: [PATCH] powerpc: Bring all threads online prior to migration/hibernation
From: Benjamin Herrenschmidt @ 2013-05-01 21:04 UTC (permalink / raw)
  To: Nathan Fontenot; +Cc: Robert Jennings, linuxppc-dev, stable
In-Reply-To: <51813403.50508@linux.vnet.ibm.com>

On Wed, 2013-05-01 at 10:25 -0500, Nathan Fontenot wrote:
> On 04/26/2013 04:32 PM, Robert Jennings wrote:
> > With this patch before a migration/hibernation all threads present but
> > not online will be brought online.  After migration/hibernation those
> > threads are taken back offline.
> > 
> > During migration/hibernation all online CPUs must call H_JOIN, this is
> > required by the hypervisor.  Without this patch, threads that are offline
> > (H_CEDE'd) will not be woken to make the H_JOIN call and the OS will be
> > deadlocked (all threads either JOIN'd or CEDE'd).
> > 
> 
> This fixes a long standing bug in partition migration.
> 
> Acked-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>

I was about to apply this one but had to take it out of my branch as it
breaks the UP build. Can you send a new revision of the patch ? It's a
bug fix, I'll put it in at -rc1.

Cheers,
Ben.

> > Cc: <stable@kernel.org>
> > Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
> > ---
> >  arch/powerpc/include/asm/rtas.h          |    2 +
> >  arch/powerpc/kernel/rtas.c               |   95 ++++++++++++++++++++++++++++++
> >  arch/powerpc/platforms/pseries/suspend.c |   22 +++++++
> >  3 files changed, 119 insertions(+)
> > 
> > diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
> > index aef00c6..ee38f29 100644
> > --- a/arch/powerpc/include/asm/rtas.h
> > +++ b/arch/powerpc/include/asm/rtas.h
> > @@ -262,6 +262,8 @@ extern void rtas_progress(char *s, unsigned short hex);
> >  extern void rtas_initialize(void);
> >  extern int rtas_suspend_cpu(struct rtas_suspend_me_data *data);
> >  extern int rtas_suspend_last_cpu(struct rtas_suspend_me_data *data);
> > +extern int rtas_online_cpus_mask(cpumask_var_t cpus);
> > +extern int rtas_offline_cpus_mask(cpumask_var_t cpus);
> >  extern int rtas_ibm_suspend_me(struct rtas_args *);
> >  
> >  struct rtc_time;
> > diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
> > index 1fd6e7b..855ee98 100644
> > --- a/arch/powerpc/kernel/rtas.c
> > +++ b/arch/powerpc/kernel/rtas.c
> > @@ -19,6 +19,7 @@
> >  #include <linux/init.h>
> >  #include <linux/capability.h>
> >  #include <linux/delay.h>
> > +#include <linux/cpu.h>
> >  #include <linux/smp.h>
> >  #include <linux/completion.h>
> >  #include <linux/cpumask.h>
> > @@ -807,6 +808,77 @@ static void rtas_percpu_suspend_me(void *info)
> >  	__rtas_suspend_cpu((struct rtas_suspend_me_data *)info, 1);
> >  }
> >  
> > +enum rtas_cpu_state {
> > +	DOWN,
> > +	UP,
> > +};
> > +
> > +/* On return cpumask will be altered to indicate CPUs changed */
> > +static int rtas_cpu_state_change_mask(enum rtas_cpu_state state,
> > +				cpumask_var_t cpus)
> > +{
> > +	int cpu;
> > +	int cpuret = 0;
> > +	int ret = 0;
> > +
> > +	if (cpumask_empty(cpus))
> > +		return 0;
> > +
> > +	for_each_cpu(cpu, cpus) {
> > +		switch (state) {
> > +		case DOWN:
> > +			cpuret = cpu_down(cpu);
> > +			break;
> > +		case UP:
> > +			cpuret = cpu_up(cpu);
> > +			break;
> > +		}
> > +		if (cpuret) {
> > +			pr_debug("%s: cpu_%s for cpu#%d returned %d.\n",
> > +					__func__,
> > +					((state == UP) ? "up" : "down"),
> > +					cpu, cpuret);
> > +			if (!ret)
> > +				ret = cpuret;
> > +			if (state == UP) {
> > +				cpumask_shift_right(cpus, cpus, cpu);
> > +				cpumask_shift_left(cpus, cpus, cpu);
> > +				break;
> > +			} else
> > +				cpumask_clear_cpu(cpu, cpus);
> > +		}
> > +	}
> > +
> > +	return ret;
> > +}
> > +
> > +int rtas_online_cpus_mask(cpumask_var_t cpus)
> > +{
> > +	int ret;
> > +
> > +	ret = rtas_cpu_state_change_mask(UP, cpus);
> > +
> > +	if (ret) {
> > +		cpumask_var_t tmp_mask;
> > +
> > +		if (!alloc_cpumask_var(&tmp_mask, GFP_TEMPORARY))
> > +			return ret;
> > +
> > +		cpumask_copy(tmp_mask, cpus);
> > +		rtas_offline_cpus_mask(tmp_mask);
> > +		free_cpumask_var(tmp_mask);
> > +	}
> > +
> > +	return ret;
> > +}
> > +EXPORT_SYMBOL(rtas_online_cpus_mask);
> > +
> > +int rtas_offline_cpus_mask(cpumask_var_t cpus)
> > +{
> > +	return rtas_cpu_state_change_mask(DOWN, cpus);
> > +}
> > +EXPORT_SYMBOL(rtas_offline_cpus_mask);
> > +
> >  int rtas_ibm_suspend_me(struct rtas_args *args)
> >  {
> >  	long state;
> > @@ -814,6 +886,8 @@ int rtas_ibm_suspend_me(struct rtas_args *args)
> >  	unsigned long retbuf[PLPAR_HCALL_BUFSIZE];
> >  	struct rtas_suspend_me_data data;
> >  	DECLARE_COMPLETION_ONSTACK(done);
> > +	cpumask_var_t offline_mask;
> > +	int cpuret;
> >  
> >  	if (!rtas_service_present("ibm,suspend-me"))
> >  		return -ENOSYS;
> > @@ -837,11 +911,24 @@ int rtas_ibm_suspend_me(struct rtas_args *args)
> >  		return 0;
> >  	}
> >  
> > +	if (!alloc_cpumask_var(&offline_mask, GFP_TEMPORARY))
> > +		return -ENOMEM;
> > +
> >  	atomic_set(&data.working, 0);
> >  	atomic_set(&data.done, 0);
> >  	atomic_set(&data.error, 0);
> >  	data.token = rtas_token("ibm,suspend-me");
> >  	data.complete = &done;
> > +
> > +	/* All present CPUs must be online */
> > +	cpumask_andnot(offline_mask, cpu_present_mask, cpu_online_mask);
> > +	cpuret = rtas_online_cpus_mask(offline_mask);
> > +	if (cpuret) {
> > +		pr_err("%s: Could not bring present CPUs online.\n", __func__);
> > +		atomic_set(&data.error, cpuret);
> > +		goto out;
> > +	}
> > +
> >  	stop_topology_update();
> >  
> >  	/* Call function on all CPUs.  One of us will make the
> > @@ -857,6 +944,14 @@ int rtas_ibm_suspend_me(struct rtas_args *args)
> >  
> >  	start_topology_update();
> >  
> > +	/* Take down CPUs not online prior to suspend */
> > +	cpuret = rtas_offline_cpus_mask(offline_mask);
> > +	if (cpuret)
> > +		pr_warn("%s: Could not restore CPUs to offline state.\n",
> > +				__func__);
> > +
> > +out:
> > +	free_cpumask_var(offline_mask);
> >  	return atomic_read(&data.error);
> >  }
> >  #else /* CONFIG_PPC_PSERIES */
> > diff --git a/arch/powerpc/platforms/pseries/suspend.c b/arch/powerpc/platforms/pseries/suspend.c
> > index 47226e0..5f997e7 100644
> > --- a/arch/powerpc/platforms/pseries/suspend.c
> > +++ b/arch/powerpc/platforms/pseries/suspend.c
> > @@ -16,6 +16,7 @@
> >    * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
> >    */
> >  
> > +#include <linux/cpu.h>
> >  #include <linux/delay.h>
> >  #include <linux/suspend.h>
> >  #include <linux/stat.h>
> > @@ -126,11 +127,15 @@ static ssize_t store_hibernate(struct device *dev,
> >  			       struct device_attribute *attr,
> >  			       const char *buf, size_t count)
> >  {
> > +	cpumask_var_t offline_mask;
> >  	int rc;
> >  
> >  	if (!capable(CAP_SYS_ADMIN))
> >  		return -EPERM;
> >  
> > +	if (!alloc_cpumask_var(&offline_mask, GFP_TEMPORARY))
> > +		return -ENOMEM;
> > +
> >  	stream_id = simple_strtoul(buf, NULL, 16);
> >  
> >  	do {
> > @@ -140,15 +145,32 @@ static ssize_t store_hibernate(struct device *dev,
> >  	} while (rc == -EAGAIN);
> >  
> >  	if (!rc) {
> > +		/* All present CPUs must be online */
> > +		cpumask_andnot(offline_mask, cpu_present_mask,
> > +				cpu_online_mask);
> > +		rc = rtas_online_cpus_mask(offline_mask);
> > +		if (rc) {
> > +			pr_err("%s: Could not bring present CPUs online.\n",
> > +					__func__);
> > +			goto out;
> > +		}
> > +
> >  		stop_topology_update();
> >  		rc = pm_suspend(PM_SUSPEND_MEM);
> >  		start_topology_update();
> > +
> > +		/* Take down CPUs not online prior to suspend */
> > +		if (!rtas_offline_cpus_mask(offline_mask))
> > +			pr_warn("%s: Could not restore CPUs to offline "
> > +					"state.\n", __func__);
> >  	}
> >  
> >  	stream_id = 0;
> >  
> >  	if (!rc)
> >  		rc = count;
> > +out:
> > +	free_cpumask_var(offline_mask);
> >  	return rc;
> >  }
> >  
> 
> 

^ permalink raw reply

* Re: ppc/sata-fsl: orphan config value: CONFIG_MPC8315_DS
From: Anthony Foiani @ 2013-05-01 23:35 UTC (permalink / raw)
  To: Scott Wood
  Cc: Robert P.J.Day, linuxppc-dev@lists.ozlabs.org, Li Yang-R58472,
	Jeff Garzik, Adrian Bunk
In-Reply-To: <1367431525.29231.6@snotra>

Scott --

Thanks again for the quick reply.

On 05/01/2013 12:05 PM, Scott Wood wrote:
> On 04/30/2013 09:06:56 PM, Anthony Foiani wrote:
>> Instead of a new property name, I would instead check for my specific 
>> board type (let's call it a foo-8315) in the top-level compatible 
>> list?  So I'd change my devtree to have this top-level compatible:
>>
>> / {
>>     compatible = "example,foo-8315", "fsl,mpc8315erdb";
>
> It should really only have compatible = "example,foo-8315", since it's 
> not 100% compatible with fsl,mpc8315erdb (at least due to this bug, 
> but probably there are other differences as well).

Then I guess I don't understand the proper use of "compatible" (or is 
the root node special?)

E.g., the DTS for the "parent" board (MPC8315ERDB) has multiple entries 
for the crypto "compatible" value:

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/arch/powerpc/boot/dts/mpc8315erdb.dts?id=refs/tags/v3.9#n286
(or: *http://preview.tinyurl.com/btlqxgo* )

|		crypto@30000 {
			compatible = "fsl,sec3.3", "fsl,sec3.1", "fsl,sec3.0",
				     "fsl,sec2.4", "fsl,sec2.2", "fsl,sec2.1",
				     "fsl,sec2.0";
			reg = <0x30000 0x10000>;|

I read this as meaning: "if you have to ask if a certain feature is 
compatible with some 'foo', then this board provides that 
compatibility".  Not as "if a value is in the compatibility flag, then 
it is 100% compatible with that value".  (Although maybe that's true in 
the case of the SEC, so perhaps that a bad example.)

For what it's worth, the upstream vendor did have a separate root-node 
"compatible" value -- which called a board-specific function in a 
board-specific C file, both of which were direct cut & paste copies from 
the MPC8315ERDB function / file.  My gut instinct is that this degree of 
duplication is unhealthy and incorrect, but if my solution is considered 
abuse of the device tree, then I can try to do it a different way next 
time.

To clarify what I mean by "cut and paste", here are the differences 
between the 2.6.36 MPC8315ERDB dts, and the one the vendor sent us 
(based on that kernel rev):

$ diff -u arch/powerpc/boot/dts/mpc8315erdb.dts ~/foo.dts
[tony@hum linux-stable]$ --- arch/powerpc/boot/dts/mpc8315erdb.dts    
2013-05-01 17:23:36.803167646 -0600
+++ /home/tony/foo.dts    2013-05-01 17:18:55.009632505 -0600
@@ -1,4 +1,10 @@
  /*
+ * Foo 8315 Device Tree Source
+ *
+ * Copyright 2010 FooCorp
+ *
+ * Based on:
+ *
   * MPC8315E RDB Device Tree Source
   *
   * Copyright 2007 Freescale Semiconductor Inc.
@@ -12,7 +18,7 @@
  /dts-v1/;

  / {
-    compatible = "fsl,mpc8315erdb";
+    compatible = "fsl,foo-8315";
      #address-cells = <1>;
      #size-cells = <1>;

That's it.  And for the copy from mpc831x_rdb.c to foo8315.c?

--- mpc831x_rdb.c    2013-05-01 17:23:36.860167147 -0600
+++ foo8315.c    2013-05-01 17:27:04.310352475 -0600
@@ -1,5 +1,5 @@
  /*
- * arch/powerpc/platforms/83xx/mpc831x_rdb.c
+ * arch/powerpc/platforms/83xx/foo8315.c
   *
   * Description: MPC831x RDB board specific routines.
   * This file is based on mpc834x_sys.c
@@ -26,14 +26,14 @@
  /*
   * Setup the architecture
   */
-static void __init mpc831x_rdb_setup_arch(void)
+static void __init foo8315_setup_arch(void)
  {
  #ifdef CONFIG_PCI
      struct device_node *np;
  #endif

      if (ppc_md.progress)
-        ppc_md.progress("mpc831x_rdb_setup_arch()", 0);
+        ppc_md.progress("foo8315_setup_arch()", 0);

  #ifdef CONFIG_PCI
      for_each_compatible_node(np, "pci", "fsl,mpc8349-pci")
@@ -44,7 +44,7 @@
      mpc831x_usb_cfg();
  }

-static void __init mpc831x_rdb_init_IRQ(void)
+static void __init foo8315_init_IRQ(void)
  {
      struct device_node *np;

@@ -63,12 +63,12 @@
  /*
   * Called very early, MMU is off, device-tree isn't unflattened
   */
-static int __init mpc831x_rdb_probe(void)
+static int __init foo8315_probe(void)
  {
      unsigned long root = of_get_flat_dt_root();

      return of_flat_dt_is_compatible(root, "MPC8313ERDB") ||
-           of_flat_dt_is_compatible(root, "fsl,mpc8315erdb");
+           of_flat_dt_is_compatible(root, "fsl,foo8315");
  }

  static struct of_device_id __initdata of_bus_ids[] = {
@@ -83,13 +83,13 @@
      of_platform_bus_probe(NULL, of_bus_ids, NULL);
      return 0;
  }
-machine_device_initcall(mpc831x_rdb, declare_of_platform_devices);
+machine_device_initcall(foo8315, declare_of_platform_devices);

-define_machine(mpc831x_rdb) {
-    .name            = "MPC831x RDB",
-    .probe            = mpc831x_rdb_probe,
-    .setup_arch        = mpc831x_rdb_setup_arch,
-    .init_IRQ        = mpc831x_rdb_init_IRQ,
+define_machine(foo8315) {
+    .name            = "foo-8315",
+    .probe            = foo8315_probe,
+    .setup_arch        = foo8315_setup_arch,
+    .init_IRQ        = foo8315_init_IRQ,
      .get_irq        = ipic_get_irq,
      .restart        = mpc83xx_restart,
      .time_init        = mpc83xx_time_init,

Given those diffs, it didn't seem much of a stretch to use compatible = 
"fsl,mpc8315erdb"

> Well, if this is only seen on your board so far (or rather, your 
> vendor's board which isn't upstream), and you're OK with updating the 
> device tree, then I have no objection.
Well, so far as I know, this project is the only consumer of these 
boards.  I've fed all my fixes/changes back to our vendor, but they're 
not interested in upstreaming it.

I have been trying to at least get the patches into the public arena, 
but I don't have the bandwidth to move to tip-of-tree and do testing 
there, as well as moving project along the actual release path (which is 
based on long-term releases, hence my patch being against 3.4.y).

For my use, this is a good compromise: I've documented the errata (even 
if we don't know the root cause), and it is now set up so that I can't 
lose the setting just by doing a config update (which was the problem 
with the orphan config setting).

> > It would also be nice if we could unravel exactly why that 
> CONFIG_8315_DS ever showed up in the first place.
>
> It would be nice, but I doubt that particular information is ever 
> going to surface...  IIRC I asked internally back when this first came 
> up, and didn't get an answer.
Right, that's my recollection as well.

I end up feeling a bit put-upon, though, when I'm asked for a 
comprehensive patch to cover something that the original authors can't 
explain.

>
>>> Or do you mean that you would not set this on any board's device 
>>> tree by default, and instead have users set it if they encounter 
>>> problems?
>>
>> No, I would expect to set it on all the boards, so using the 
>> compatibility hack above would work.
>
> You mean all the boards that have the bug, which doesn't include any 
> upstream device tree, right?
As mentioned above, my primary concern is the use of these cards in the 
project/product I'm working on.  My answer has been to apply this fix 
(and the matching change to the device tree I supply as a part of the 
boot image).  I feel that I'm trying to do the right thing by getting 
some of these changes publicly visible, but I fear that I'll also have 
to go down the route of "not enough time or money to properly upstream it".

"doesn't include upstream device tree" ... no, the device tree was 
supplied with the original set of patches from the vendor.

And it seems like it's a bit of a corner case -- even on other MPC8315 
boards, it seems that only 2-3 of us have ever run into this.  Maybe not 
many people use the SATA features, or they use them with older/slower 
drives that never try to negotiate above 1.5Gbps (given that it's a 6yo 
design or so).

Anyway.  Thanks for all the feedback, but at this point, I'm going to go 
with the patch I already supplied.  Hopefully it will make someone 
else's life easier at some later date.

Thanks again!

Take care,
Anthony

^ permalink raw reply

* Re: ppc/sata-fsl: orphan config value: CONFIG_MPC8315_DS
From: Scott Wood @ 2013-05-02  0:13 UTC (permalink / raw)
  To: Anthony Foiani
  Cc: Robert P.J.Day, linuxppc-dev@lists.ozlabs.org, Li Yang-R58472,
	Jeff Garzik, Adrian Bunk
In-Reply-To: <5181A6CA.9090903@scrye.com>

On 05/01/2013 06:35:38 PM, Anthony Foiani wrote:
> Scott --
>=20
> Thanks again for the quick reply.
>=20
> On 05/01/2013 12:05 PM, Scott Wood wrote:
>> On 04/30/2013 09:06:56 PM, Anthony Foiani wrote:
>>> Instead of a new property name, I would instead check for my =20
>>> specific board type (let's call it a foo-8315) in the top-level =20
>>> compatible list?  So I'd change my devtree to have this top-level =20
>>> compatible:
>>>=20
>>> / {
>>>     compatible =3D "example,foo-8315", "fsl,mpc8315erdb";
>>=20
>> It should really only have compatible =3D "example,foo-8315", since =20
>> it's not 100% compatible with fsl,mpc8315erdb (at least due to this =20
>> bug, but probably there are other differences as well).
>=20
> Then I guess I don't understand the proper use of "compatible" (or is =20
> the root node special?)

It's only special in that 100% compatibility is much less likely to be =20
true of an entire system than of a particular component.

> E.g., the DTS for the "parent" board (MPC8315ERDB) has multiple =20
> entries for the crypto "compatible" value:
>=20
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/arch=
/powerpc/boot/dts/mpc8315erdb.dts?id=3Drefs/tags/v3.9#n286
> (or: *http://preview.tinyurl.com/btlqxgo* )
>=20
> |		crypto@30000 {
> 			compatible =3D "fsl,sec3.3", "fsl,sec3.1", =20
> "fsl,sec3.0",
> 				     "fsl,sec2.4", "fsl,sec2.2", =20
> "fsl,sec2.1",
> 				     "fsl,sec2.0";
> 			reg =3D <0x30000 0x10000>;|
>=20
> I read this as meaning: "if you have to ask if a certain feature is =20
> compatible with some 'foo', then this board provides that =20
> compatibility".  Not as "if a value is in the compatibility flag, =20
> then it is 100% compatible with that value".  (Although maybe that's =20
> true in the case of the SEC, so perhaps that a bad example.)

AFAIK there is backwards compatibility with these SEC versions.  If =20
not, they shouldn't be listed.

> For what it's worth, the upstream vendor did have a separate =20
> root-node "compatible" value -- which called a board-specific =20
> function in a board-specific C file, both of which were direct cut & =20
> paste copies from the MPC8315ERDB function / file.  My gut instinct =20
> is that this degree of duplication is unhealthy and incorrect, but if =20
> my solution is considered abuse of the device tree, then I can try to =20
> do it a different way next time.

It's quite possible to use the same C file for multiple similar boards =20
with different compatibles.  This is done often, including =20
mpc831x_erdb.c.

> Given those diffs, it didn't seem much of a stretch to use compatible =20
> =3D "fsl,mpc8315erdb"

The criteria for claiming compatibility should be based in the hardware =20
itself, not whether a particular file in Linux needs any changes.

>>>> Or do you mean that you would not set this on any board's device =20
>>>> tree by default, and instead have users set it if they encounter =20
>>>> problems?
>>>=20
>>> No, I would expect to set it on all the boards, so using the =20
>>> compatibility hack above would work.
>>=20
>> You mean all the boards that have the bug, which doesn't include any =20
>> upstream device tree, right?
> As mentioned above, my primary concern is the use of these cards in =20
> the project/product I'm working on.  My answer has been to apply this =20
> fix (and the matching change to the device tree I supply as a part of =20
> the boot image).  I feel that I'm trying to do the right thing by =20
> getting some of these changes publicly visible, but I fear that I'll =20
> also have to go down the route of "not enough time or money to =20
> properly upstream it".
>=20
> "doesn't include upstream device tree" ... no, the device tree was =20
> supplied with the original set of patches from the vendor.

I'm not saying that the device tree not being upstream is a problem -- =20
actually the opposite, that it means we don't have compatibility to =20
maintain with an already-accepted device tree.

-Scott=

^ permalink raw reply

* Re: [PATCH] powerpc: Bring all threads online prior to migration/hibernation
From: Robert Jennings @ 2013-05-02  0:39 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Nathan Fontenot, linuxppc-dev, stable
In-Reply-To: <1367442296.22115.48.camel@pasglop>

* Benjamin Herrenschmidt (benh@kernel.crashing.org) wrote:
> On Wed, 2013-05-01 at 10:25 -0500, Nathan Fontenot wrote:
> > On 04/26/2013 04:32 PM, Robert Jennings wrote:
> > > With this patch before a migration/hibernation all threads present but
> > > not online will be brought online.  After migration/hibernation those
> > > threads are taken back offline.
> > > 
> > > During migration/hibernation all online CPUs must call H_JOIN, this is
> > > required by the hypervisor.  Without this patch, threads that are offline
> > > (H_CEDE'd) will not be woken to make the H_JOIN call and the OS will be
> > > deadlocked (all threads either JOIN'd or CEDE'd).
> > > 
> > 
> > This fixes a long standing bug in partition migration.
> > 
> > Acked-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
> 
> I was about to apply this one but had to take it out of my branch as it
> breaks the UP build. Can you send a new revision of the patch ? It's a
> bug fix, I'll put it in at -rc1.

Yes, sorry.  I'll get you a fix for the -rc1 timeframe for sure.

> Cheers,
> Ben.
> 
> > > Cc: <stable@kernel.org>
> > > Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
> > > ---
> > >  arch/powerpc/include/asm/rtas.h          |    2 +
> > >  arch/powerpc/kernel/rtas.c               |   95 ++++++++++++++++++++++++++++++
> > >  arch/powerpc/platforms/pseries/suspend.c |   22 +++++++
> > >  3 files changed, 119 insertions(+)
> > > 
> > > diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
> > > index aef00c6..ee38f29 100644
> > > --- a/arch/powerpc/include/asm/rtas.h
> > > +++ b/arch/powerpc/include/asm/rtas.h
> > > @@ -262,6 +262,8 @@ extern void rtas_progress(char *s, unsigned short hex);
> > >  extern void rtas_initialize(void);
> > >  extern int rtas_suspend_cpu(struct rtas_suspend_me_data *data);
> > >  extern int rtas_suspend_last_cpu(struct rtas_suspend_me_data *data);
> > > +extern int rtas_online_cpus_mask(cpumask_var_t cpus);
> > > +extern int rtas_offline_cpus_mask(cpumask_var_t cpus);
> > >  extern int rtas_ibm_suspend_me(struct rtas_args *);
> > >  
> > >  struct rtc_time;
> > > diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
> > > index 1fd6e7b..855ee98 100644
> > > --- a/arch/powerpc/kernel/rtas.c
> > > +++ b/arch/powerpc/kernel/rtas.c
> > > @@ -19,6 +19,7 @@
> > >  #include <linux/init.h>
> > >  #include <linux/capability.h>
> > >  #include <linux/delay.h>
> > > +#include <linux/cpu.h>
> > >  #include <linux/smp.h>
> > >  #include <linux/completion.h>
> > >  #include <linux/cpumask.h>
> > > @@ -807,6 +808,77 @@ static void rtas_percpu_suspend_me(void *info)
> > >  	__rtas_suspend_cpu((struct rtas_suspend_me_data *)info, 1);
> > >  }
> > >  
> > > +enum rtas_cpu_state {
> > > +	DOWN,
> > > +	UP,
> > > +};
> > > +
> > > +/* On return cpumask will be altered to indicate CPUs changed */
> > > +static int rtas_cpu_state_change_mask(enum rtas_cpu_state state,
> > > +				cpumask_var_t cpus)
> > > +{
> > > +	int cpu;
> > > +	int cpuret = 0;
> > > +	int ret = 0;
> > > +
> > > +	if (cpumask_empty(cpus))
> > > +		return 0;
> > > +
> > > +	for_each_cpu(cpu, cpus) {
> > > +		switch (state) {
> > > +		case DOWN:
> > > +			cpuret = cpu_down(cpu);
> > > +			break;
> > > +		case UP:
> > > +			cpuret = cpu_up(cpu);
> > > +			break;
> > > +		}
> > > +		if (cpuret) {
> > > +			pr_debug("%s: cpu_%s for cpu#%d returned %d.\n",
> > > +					__func__,
> > > +					((state == UP) ? "up" : "down"),
> > > +					cpu, cpuret);
> > > +			if (!ret)
> > > +				ret = cpuret;
> > > +			if (state == UP) {
> > > +				cpumask_shift_right(cpus, cpus, cpu);
> > > +				cpumask_shift_left(cpus, cpus, cpu);
> > > +				break;
> > > +			} else
> > > +				cpumask_clear_cpu(cpu, cpus);
> > > +		}
> > > +	}
> > > +
> > > +	return ret;
> > > +}
> > > +
> > > +int rtas_online_cpus_mask(cpumask_var_t cpus)
> > > +{
> > > +	int ret;
> > > +
> > > +	ret = rtas_cpu_state_change_mask(UP, cpus);
> > > +
> > > +	if (ret) {
> > > +		cpumask_var_t tmp_mask;
> > > +
> > > +		if (!alloc_cpumask_var(&tmp_mask, GFP_TEMPORARY))
> > > +			return ret;
> > > +
> > > +		cpumask_copy(tmp_mask, cpus);
> > > +		rtas_offline_cpus_mask(tmp_mask);
> > > +		free_cpumask_var(tmp_mask);
> > > +	}
> > > +
> > > +	return ret;
> > > +}
> > > +EXPORT_SYMBOL(rtas_online_cpus_mask);
> > > +
> > > +int rtas_offline_cpus_mask(cpumask_var_t cpus)
> > > +{
> > > +	return rtas_cpu_state_change_mask(DOWN, cpus);
> > > +}
> > > +EXPORT_SYMBOL(rtas_offline_cpus_mask);
> > > +
> > >  int rtas_ibm_suspend_me(struct rtas_args *args)
> > >  {
> > >  	long state;
> > > @@ -814,6 +886,8 @@ int rtas_ibm_suspend_me(struct rtas_args *args)
> > >  	unsigned long retbuf[PLPAR_HCALL_BUFSIZE];
> > >  	struct rtas_suspend_me_data data;
> > >  	DECLARE_COMPLETION_ONSTACK(done);
> > > +	cpumask_var_t offline_mask;
> > > +	int cpuret;
> > >  
> > >  	if (!rtas_service_present("ibm,suspend-me"))
> > >  		return -ENOSYS;
> > > @@ -837,11 +911,24 @@ int rtas_ibm_suspend_me(struct rtas_args *args)
> > >  		return 0;
> > >  	}
> > >  
> > > +	if (!alloc_cpumask_var(&offline_mask, GFP_TEMPORARY))
> > > +		return -ENOMEM;
> > > +
> > >  	atomic_set(&data.working, 0);
> > >  	atomic_set(&data.done, 0);
> > >  	atomic_set(&data.error, 0);
> > >  	data.token = rtas_token("ibm,suspend-me");
> > >  	data.complete = &done;
> > > +
> > > +	/* All present CPUs must be online */
> > > +	cpumask_andnot(offline_mask, cpu_present_mask, cpu_online_mask);
> > > +	cpuret = rtas_online_cpus_mask(offline_mask);
> > > +	if (cpuret) {
> > > +		pr_err("%s: Could not bring present CPUs online.\n", __func__);
> > > +		atomic_set(&data.error, cpuret);
> > > +		goto out;
> > > +	}
> > > +
> > >  	stop_topology_update();
> > >  
> > >  	/* Call function on all CPUs.  One of us will make the
> > > @@ -857,6 +944,14 @@ int rtas_ibm_suspend_me(struct rtas_args *args)
> > >  
> > >  	start_topology_update();
> > >  
> > > +	/* Take down CPUs not online prior to suspend */
> > > +	cpuret = rtas_offline_cpus_mask(offline_mask);
> > > +	if (cpuret)
> > > +		pr_warn("%s: Could not restore CPUs to offline state.\n",
> > > +				__func__);
> > > +
> > > +out:
> > > +	free_cpumask_var(offline_mask);
> > >  	return atomic_read(&data.error);
> > >  }
> > >  #else /* CONFIG_PPC_PSERIES */
> > > diff --git a/arch/powerpc/platforms/pseries/suspend.c b/arch/powerpc/platforms/pseries/suspend.c
> > > index 47226e0..5f997e7 100644
> > > --- a/arch/powerpc/platforms/pseries/suspend.c
> > > +++ b/arch/powerpc/platforms/pseries/suspend.c
> > > @@ -16,6 +16,7 @@
> > >    * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
> > >    */
> > >  
> > > +#include <linux/cpu.h>
> > >  #include <linux/delay.h>
> > >  #include <linux/suspend.h>
> > >  #include <linux/stat.h>
> > > @@ -126,11 +127,15 @@ static ssize_t store_hibernate(struct device *dev,
> > >  			       struct device_attribute *attr,
> > >  			       const char *buf, size_t count)
> > >  {
> > > +	cpumask_var_t offline_mask;
> > >  	int rc;
> > >  
> > >  	if (!capable(CAP_SYS_ADMIN))
> > >  		return -EPERM;
> > >  
> > > +	if (!alloc_cpumask_var(&offline_mask, GFP_TEMPORARY))
> > > +		return -ENOMEM;
> > > +
> > >  	stream_id = simple_strtoul(buf, NULL, 16);
> > >  
> > >  	do {
> > > @@ -140,15 +145,32 @@ static ssize_t store_hibernate(struct device *dev,
> > >  	} while (rc == -EAGAIN);
> > >  
> > >  	if (!rc) {
> > > +		/* All present CPUs must be online */
> > > +		cpumask_andnot(offline_mask, cpu_present_mask,
> > > +				cpu_online_mask);
> > > +		rc = rtas_online_cpus_mask(offline_mask);
> > > +		if (rc) {
> > > +			pr_err("%s: Could not bring present CPUs online.\n",
> > > +					__func__);
> > > +			goto out;
> > > +		}
> > > +
> > >  		stop_topology_update();
> > >  		rc = pm_suspend(PM_SUSPEND_MEM);
> > >  		start_topology_update();
> > > +
> > > +		/* Take down CPUs not online prior to suspend */
> > > +		if (!rtas_offline_cpus_mask(offline_mask))
> > > +			pr_warn("%s: Could not restore CPUs to offline "
> > > +					"state.\n", __func__);
> > >  	}
> > >  
> > >  	stream_id = 0;
> > >  
> > >  	if (!rc)
> > >  		rc = count;
> > > +out:
> > > +	free_cpumask_var(offline_mask);
> > >  	return rc;
> > >  }
> > >  
> > 
> > 
> 
> 
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
> 

^ permalink raw reply

* net/eth/ibmveth: Fixup retrieval of MAC address
From: Benjamin Herrenschmidt @ 2013-05-02  1:35 UTC (permalink / raw)
  To: netdev; +Cc: linuxppc-dev, David Miller, David Gibson

Some ancient pHyp versions used to create a 8 bytes local-mac-address
property in the device-tree instead of a 6 bytes one for veth.

The Linux driver code to deal with that is an insane hack which also
happens to break with some choices of MAC addresses in qemu by testing
for a bit in the address rather than just looking at the size of the
property.

Sanitize this by doing the latter instead.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: <stable@vger.kernel.org>
---

I CC'ed stable bcs I'd like to switch qemu/kvm to 6-bytes properties as
soon as possible. Unfortunately doing so breaks the horrible hack unless
qemu has the right magic bit in the MAC which it may or may not have...

diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c
index c859771..579a03e 100644
--- a/drivers/net/ethernet/ibm/ibmveth.c
+++ b/drivers/net/ethernet/ibm/ibmveth.c
@@ -1324,7 +1324,7 @@ static const struct net_device_ops ibmveth_netdev_ops = {
 
 static int ibmveth_probe(struct vio_dev *dev, const struct vio_device_id *id)
 {
-	int rc, i;
+	int rc, i, mac_len;
 	struct net_device *netdev;
 	struct ibmveth_adapter *adapter;
 	unsigned char *mac_addr_p;
@@ -1334,11 +1334,19 @@ static int ibmveth_probe(struct vio_dev *dev, const struct vio_device_id *id)
 		dev->unit_address);
 
 	mac_addr_p = (unsigned char *)vio_get_attribute(dev, VETH_MAC_ADDR,
-							NULL);
+							&mac_len);
 	if (!mac_addr_p) {
 		dev_err(&dev->dev, "Can't find VETH_MAC_ADDR attribute\n");
 		return -EINVAL;
 	}
+	/* Workaround for old/broken pHyp */
+	if (mac_len == 8)
+		mac_addr_p += 2;
+	if (mac_len != 6) {
+		dev_err(&dev->dev, "VETH_MAC_ADDR attribute wrong len %d\n",
+			mac_len);
+		return -EINVAL;
+	}
 
 	mcastFilterSize_p = (unsigned int *)vio_get_attribute(dev,
 						VETH_MCAST_FILTER_SIZE, NULL);
@@ -1363,17 +1371,6 @@ static int ibmveth_probe(struct vio_dev *dev, const struct vio_device_id *id)
 
 	netif_napi_add(netdev, &adapter->napi, ibmveth_poll, 16);
 
-	/*
-	 * Some older boxes running PHYP non-natively have an OF that returns
-	 * a 8-byte local-mac-address field (and the first 2 bytes have to be
-	 * ignored) while newer boxes' OF return a 6-byte field. Note that
-	 * IEEE 1275 specifies that local-mac-address must be a 6-byte field.
-	 * The RPA doc specifies that the first byte must be 10b, so we'll
-	 * just look for it to solve this 8 vs. 6 byte field issue
-	 */
-	if ((*mac_addr_p & 0x3) != 0x02)
-		mac_addr_p += 2;
-
 	adapter->mac_addr = 0;
 	memcpy(&adapter->mac_addr, mac_addr_p, 6);
 

^ permalink raw reply related

* [PATCH -next] kvm/ppc/mpic: fix missing unlock in set_base_addr()
From: Wei Yongjun @ 2013-05-02  5:17 UTC (permalink / raw)
  To: mtosatti, gleb, agraf, benh, paulus, scottwood
  Cc: yongjun_wei, linuxppc-dev, kvm-ppc, kvm

From: Wei Yongjun <yongjun_wei@trendmicro.com.cn>

Add the missing unlock before return from function set_base_addr()
when disables the mapping.

Introduced by commit 5df554ad5b7522ea62b0ff9d5be35183494efc21
(kvm/ppc/mpic: in-kernel MPIC emulation)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
---
 arch/powerpc/kvm/mpic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/mpic.c b/arch/powerpc/kvm/mpic.c
index f3148f8..0047a70 100644
--- a/arch/powerpc/kvm/mpic.c
+++ b/arch/powerpc/kvm/mpic.c
@@ -1475,8 +1475,8 @@ static int set_base_addr(struct openpic *opp, struct kvm_device_attr *attr)
 
 	map_mmio(opp);
 
-	mutex_unlock(&opp->kvm->slots_lock);
 out:
+	mutex_unlock(&opp->kvm->slots_lock);
 	return 0;
 }
 

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox