LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 net-next] af_unix: fix a fatal race with bit fields
From: Eric Dumazet @ 2013-05-01 15:24 UTC (permalink / raw)
  To: David Miller; +Cc: linuxppc-dev, paulus, ambrose, netdev
In-Reply-To: <20130501.033650.703182794549888825.davem@davemloft.net>

On Wed, 2013-05-01 at 03:36 -0400, David Miller wrote:
> From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Date: Wed, 01 May 2013 11:39:53 +1000
> 
> > I'm not even completely certain bytes are safe to be honest, though
> > probably more than bitfields. I'll poke our compiler people.
> 
> Older Alpha only has 32-bit and 64-bit loads and stores, so byte sized
> accesses are not atomic, and therefore use racey read-modify-write
> sequences.

Right, so what about the following more general fix ?

Thanks !

[PATCH v2] af_unix: fix a fatal race with bit fields

Using bit fields is dangerous on ppc64/sparc64, as the compiler [1]
uses 64bit instructions to manipulate them.
If the 64bit word includes any atomic_t or spinlock_t, we can lose
critical concurrent changes.

This is happening in af_unix, where unix_sk(sk)->gc_candidate/
gc_maybe_cycle/lock share the same 64bit word.

This leads to fatal deadlock, as one/several cpus spin forever
on a spinlock that will never be available again.

A safer way would be to use a long to store flags.
This way we are sure compiler/arch wont do bad things.

As we own unix_gc_lock spinlock when clearing or setting bits,
we can use the non atomic __set_bit()/__clear_bit().

recursion_level can share the same 64bit location with the spinlock,
as it is set only with this spinlock held.

[1] bug fixed in gcc-4.8.0 :
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52080

Reported-by: Ambrose Feinstein <ambrose@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
---
 include/net/af_unix.h |    5 +++--
 net/unix/garbage.c    |   12 ++++++------
 2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/include/net/af_unix.h b/include/net/af_unix.h
index a8836e8..dbdfd2b 100644
--- a/include/net/af_unix.h
+++ b/include/net/af_unix.h
@@ -57,9 +57,10 @@ struct unix_sock {
 	struct list_head	link;
 	atomic_long_t		inflight;
 	spinlock_t		lock;
-	unsigned int		gc_candidate : 1;
-	unsigned int		gc_maybe_cycle : 1;
 	unsigned char		recursion_level;
+	unsigned long		gc_flags;
+#define UNIX_GC_CANDIDATE	0
+#define UNIX_GC_MAYBE_CYCLE	1
 	struct socket_wq	peer_wq;
 };
 #define unix_sk(__sk) ((struct unix_sock *)__sk)
diff --git a/net/unix/garbage.c b/net/unix/garbage.c
index d0f6545..9c6cc08 100644
--- a/net/unix/garbage.c
+++ b/net/unix/garbage.c
@@ -185,7 +185,7 @@ static void scan_inflight(struct sock *x, void (*func)(struct unix_sock *),
 					 * have been added to the queues after
 					 * starting the garbage collection
 					 */
-					if (u->gc_candidate) {
+					if (test_bit(UNIX_GC_CANDIDATE, &u->gc_flags)) {
 						hit = true;
 						func(u);
 					}
@@ -254,7 +254,7 @@ static void inc_inflight_move_tail(struct unix_sock *u)
 	 * of the list, so that it's checked even if it was already
 	 * passed over
 	 */
-	if (u->gc_maybe_cycle)
+	if (test_bit(UNIX_GC_MAYBE_CYCLE, &u->gc_flags))
 		list_move_tail(&u->link, &gc_candidates);
 }
 
@@ -315,8 +315,8 @@ void unix_gc(void)
 		BUG_ON(total_refs < inflight_refs);
 		if (total_refs == inflight_refs) {
 			list_move_tail(&u->link, &gc_candidates);
-			u->gc_candidate = 1;
-			u->gc_maybe_cycle = 1;
+			__set_bit(UNIX_GC_CANDIDATE, &u->gc_flags);
+			__set_bit(UNIX_GC_MAYBE_CYCLE, &u->gc_flags);
 		}
 	}
 
@@ -344,7 +344,7 @@ void unix_gc(void)
 
 		if (atomic_long_read(&u->inflight) > 0) {
 			list_move_tail(&u->link, &not_cycle_list);
-			u->gc_maybe_cycle = 0;
+			__clear_bit(UNIX_GC_MAYBE_CYCLE, &u->gc_flags);
 			scan_children(&u->sk, inc_inflight_move_tail, NULL);
 		}
 	}
@@ -356,7 +356,7 @@ void unix_gc(void)
 	 */
 	while (!list_empty(&not_cycle_list)) {
 		u = list_entry(not_cycle_list.next, struct unix_sock, link);
-		u->gc_candidate = 0;
+		__clear_bit(UNIX_GC_CANDIDATE, &u->gc_flags);
 		list_move_tail(&u->link, &gc_inflight_list);
 	}
 

^ permalink raw reply related

* Re: [PATCH net-next] af_unix: fix a fatal race with bit fields
From: Stephen Hemminger @ 2013-05-01 15:10 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Alan Modra, netdev, Ambrose Feinstein, Paul Mackerras,
	Anton Blanchard, linuxppc-dev, David Miller
In-Reply-To: <1367384672.11020.34.camel@edumazet-glaptop>

On Tue, 30 Apr 2013 22:04:32 -0700
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> On Wed, 2013-05-01 at 13:24 +0930, Alan Modra wrote:
> > On Tue, Apr 30, 2013 at 07:24:20PM -0700, Eric Dumazet wrote:
> > > 	li 11,1
> > > 	ld 0,0(9)
> > > 	rldimi 0,11,31,32
> > > 	std 0,0(9)
> > > 	blr
> > > 	.ident	"GCC: (GNU) 4.6.3"
> > > 
> > > You can see "ld 0,0(9)" is used : its a 64 bit load.
> > 
> > Yup.  This is not a powerpc64 specific problem.  See
> > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52080
> > Fixed in 4.8.0 and 4.7.3.
> 
> Ah thanks.
> 
> This seems a pretty serious issue, is it documented somewhere that
> ppc64, sparc64 and others need such compiler version ?
> 
> These kind of errors are pretty hard to find, its a pity to spend time
> on them.

There is a checkbin target inside arch/powerpc/Makefile
Shouldn't a check be added there to block building kernel with known
bad GCC versions?

^ permalink raw reply

* Re: [PATCH -V7 18/18] powerpc: Update tlbie/tlbiel as per ISA doc
From: David Gibson @ 2013-05-01 11:36 UTC (permalink / raw)
  To: Simon Jeons; +Cc: linux-mm, linuxppc-dev, Aneesh Kumar K.V, paulus
In-Reply-To: <5180C9BE.5040002@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 4123 bytes --]

On Wed, May 01, 2013 at 03:52:30PM +0800, Simon Jeons wrote:
> On 05/01/2013 03:47 PM, Aneesh Kumar K.V wrote:
> >David Gibson <dwg@au1.ibm.com> writes:
> >
> >>On Tue, Apr 30, 2013 at 10:51:00PM +0530, Aneesh Kumar K.V wrote:
> >>>David Gibson <dwg@au1.ibm.com> writes:
> >>>
> >>>>On Mon, Apr 29, 2013 at 01:07:39AM +0530, Aneesh Kumar K.V wrote:
> >>>>>From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> >>>>>
> >>>>>Encode the actual page correctly in tlbie/tlbiel. This make sure we handle
> >>>>>multiple page size segment correctly.
> >>>>As mentioned in previous comments, this commit message needs to give
> >>>>much more detail about what precisely the existing implementation is
> >>>>doing wrong.
> >>>>
> >>>>>Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> >>>>>---
> >>>>>  arch/powerpc/mm/hash_native_64.c | 32 ++++++++++++++++++++++++++++++--
> >>>>>  1 file changed, 30 insertions(+), 2 deletions(-)
> >>>>>
> >>>>>diff --git a/arch/powerpc/mm/hash_native_64.c b/arch/powerpc/mm/hash_native_64.c
> >>>>>index bb920ee..6a2aead 100644
> >>>>>--- a/arch/powerpc/mm/hash_native_64.c
> >>>>>+++ b/arch/powerpc/mm/hash_native_64.c
> >>>>>@@ -61,7 +61,10 @@ static inline void __tlbie(unsigned long vpn, int psize, int apsize, int ssize)
> >>>>>  	switch (psize) {
> >>>>>  	case MMU_PAGE_4K:
> >>>>>+		/* clear out bits after (52) [0....52.....63] */
> >>>>>+		va &= ~((1ul << (64 - 52)) - 1);
> >>>>>  		va |= ssize << 8;
> >>>>>+		va |= mmu_psize_defs[apsize].sllp << 6;
> >>>>>  		asm volatile(ASM_FTR_IFCLR("tlbie %0,0", PPC_TLBIE(%1,%0), %2)
> >>>>>  			     : : "r" (va), "r"(0), "i" (CPU_FTR_ARCH_206)
> >>>>>  			     : "memory");
> >>>>>@@ -69,9 +72,20 @@ static inline void __tlbie(unsigned long vpn, int psize, int apsize, int ssize)
> >>>>>  	default:
> >>>>>  		/* We need 14 to 14 + i bits of va */
> >>>>>  		penc = mmu_psize_defs[psize].penc[apsize];
> >>>>>-		va &= ~((1ul << mmu_psize_defs[psize].shift) - 1);
> >>>>>+		va &= ~((1ul << mmu_psize_defs[apsize].shift) - 1);
> >>>>>  		va |= penc << 12;
> >>>>>  		va |= ssize << 8;
> >>>>>+		/* Add AVAL part */
> >>>>>+		if (psize != apsize) {
> >>>>>+			/*
> >>>>>+			 * MPSS, 64K base page size and 16MB parge page size
> >>>>>+			 * We don't need all the bits, but rest of the bits
> >>>>>+			 * must be ignored by the processor.
> >>>>>+			 * vpn cover upto 65 bits of va. (0...65) and we need
> >>>>>+			 * 58..64 bits of va.
> >>>>I can't understand what this comment is saying.  Why do we need to do
> >>>>something different in the psize != apsize case?
> >>>>
> >>>>>+			 */
> >>>>>+			va |= (vpn & 0xfe);
> >>>>>+		}
> >>>That is as per ISA doc. It says if base page size == actual page size,
> >>>(RB)56:62 must be zeros, which must be ignored by the processor.
> >>>Otherwise it should be filled with the selected bits of VA as explained above.
> >>What you've just said here makes much more sense than what's written
> >>in the comment in the code.
> >>
> >>>We only support MPSS with base page size = 64K and actual page size = 16MB.
> >>Is that actually relevant to this code though?
> >In a way yes. The number of bits we we select out of VA depends on the
> >base page size and actual page size. We have a math around that
> >documented in ISA. Now since we support only 64K and 16MB we can make it
> >simpler by only selecting required bits and not making it a
> >function. But then it is also not relevant to the code in that ISA also
> >state other bits in (RB)56:62 must be zero. I wanted to capture both the
> >details in the comment.
> 
> What's the benefit of ppc use hash-based page table instead of
> tree-based page table? If you said that hash is faster, could x86
> change to it?

Complex and debatable at best.  But in any case it's a fixed property
of the CPU, not something that can be changed in software.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply

* Re: [PATCH net-next] af_unix: fix a fatal race with bit fields
From: Ben Hutchings @ 2013-05-01 12:08 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Eric Dumazet, netdev, Ambrose Feinstein, Paul Mackerras,
	linuxppc-dev, David Miller
In-Reply-To: <1367372393.22115.6.camel@pasglop>

On Wed, 2013-05-01 at 11:39 +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2013-04-30 at 18:12 -0700, Eric Dumazet wrote:
> > From: Eric Dumazet <edumazet@google.com>
> > 
> > Using bit fields is dangerous on ppc64, as the compiler uses 64bit
> > instructions to manipulate them. If the 64bit word includes any
> > atomic_t or spinlock_t, we can lose critical concurrent changes.
> > 
> > This is happening in af_unix, where unix_sk(sk)->gc_candidate/
> > gc_maybe_cycle/lock share the same 64bit word.
> > 
> > This leads to fatal deadlock, as one/several cpus spin forever
> > on a spinlock that will never be available again.
> > 
> > Reported-by: Ambrose Feinstein <ambrose@google.com>
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> > Cc: Paul Mackerras <paulus@samba.org>
> > ---
> > 
> > Could ppc64 experts confirm using byte is safe, or should we really add
> > a 32bit hole after the spinlock ? If so, I wonder how many other places
> > need a change...
> 
> Wow, nice one !
> 
> I'm not even completely certain bytes are safe to be honest, though
> probably more than bitfields. I'll poke our compiler people.

There is a longstanding and hard-to-fix bug in gcc that is specific to
bitfields.  I think that the underlying type isn't propagated, so when
it comes to code generation the compiler doesn't know the natural width
for the memory access.

As for bytes - early Alphas couldn't load/store less than 32 bits, but I
doubt anyone cares any more.

> The worry is of course how many more of these do we potentially have ? 
> We might be able to automate finding these issues with sparse, I
> suppose.
> 
> Also I'd be surprised if ppc64 is the only one with that problem... what
> about sparc64 and arm64 ?

I expect they can have the same general problem, but gcc may be more or
less keen to generate 64-bit load/store instructions for bitfields on
different architectures.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* [PATCH 1/1] powerpc: mpc512x_dma: add support for data transfers between memory and i/o memory
From: Alexander Popov @ 2013-05-01 11:28 UTC (permalink / raw)
  To: Vinod Koul, Dan Williams; +Cc: linuxppc-dev, linux-kernel

The initial version of this driver supports only memory to memory
data transfers.

Data transfers between memory and i/o memory require more delicate TCD
(Transfer Control Descriptor) configuration and DMA channel service requests
via hardware.

dma_device.device_control callback function is needed to configure
DMA channel to work with i/o memory.

Signed-off-by: Alexander Popov <a13xp0p0v88@gmail.com>
---
 drivers/dma/mpc512x_dma.c | 230 ++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 192 insertions(+), 38 deletions(-)

diff --git a/drivers/dma/mpc512x_dma.c b/drivers/dma/mpc512x_dma.c
index 2d95673..8aedff1 100644
--- a/drivers/dma/mpc512x_dma.c
+++ b/drivers/dma/mpc512x_dma.c
@@ -2,6 +2,7 @@
  * Copyright (C) Freescale Semicondutor, Inc. 2007, 2008.
  * Copyright (C) Semihalf 2009
  * Copyright (C) Ilya Yanok, Emcraft Systems 2010
+ * Copyright (C) Alexander Popov, Promcontroller 2013
  *
  * Written by Piotr Ziecik <kosmo@semihalf.com>. Hardware description
  * (defines, structures and comments) was taken from MPC5121 DMA driver
@@ -28,11 +29,6 @@
  * file called COPYING.
  */
 
-/*
- * This is initial version of MPC5121 DMA driver. Only memory to memory
- * transfers are supported (tested using dmatest module).
- */
-
 #include <linux/module.h>
 #include <linux/dmaengine.h>
 #include <linux/dma-mapping.h>
@@ -183,6 +179,8 @@ struct mpc_dma_desc {
 
 struct mpc_dma_chan {
 	struct dma_chan			chan;
+	enum dma_transfer_direction	dir;
+	enum dma_slave_buswidth		slave_reg_width;
 	struct list_head		free;
 	struct list_head		prepared;
 	struct list_head		queued;
@@ -190,6 +188,7 @@ struct mpc_dma_chan {
 	struct list_head		completed;
 	struct mpc_dma_tcd		*tcd;
 	dma_addr_t			tcd_paddr;
+	u32				tcd_nunits;
 
 	/* Lock for this structure */
 	spinlock_t			lock;
@@ -268,7 +267,11 @@ static void mpc_dma_execute(struct mpc_dma_chan *mchan)
 
 	if (first != prev)
 		mdma->tcd[cid].e_sg = 1;
-	out_8(&mdma->regs->dmassrt, cid);
+
+	if (first->tcd->biter != 1) /* Request channel service by... */
+		out_8(&mdma->regs->dmaserq, cid); /* hardware */
+	else
+		out_8(&mdma->regs->dmassrt, cid); /* software */
 }
 
 /* Handle interrupt on one half of DMA controller (32 channels) */
@@ -567,7 +570,42 @@ mpc_dma_tx_status(struct dma_chan *chan, dma_cookie_t cookie,
 	return ret;
 }
 
-/* Prepare descriptor for memory to memory copy */
+static int mpc_dma_control(struct dma_chan *chan, enum dma_ctrl_cmd cmd,
+						unsigned long arg)
+{
+	struct mpc_dma_chan *mchan = dma_chan_to_mpc_dma_chan(chan);
+	struct dma_slave_config *cfg = (void *)arg;
+	int ret = 0;
+
+	if (!chan)
+		return -EINVAL;
+
+	if (cmd == DMA_SLAVE_CONFIG && cfg) {
+		if (cfg->direction == DMA_DEV_TO_MEM) {
+			if (cfg->src_addr_width != DMA_SLAVE_BUSWIDTH_UNDEFINED)
+				mchan->slave_reg_width = cfg->src_addr_width;
+			else
+				return -EINVAL;
+			mchan->dir = DMA_DEV_TO_MEM;
+			mchan->tcd_nunits = cfg->src_maxburst;
+		} else if (cfg->direction == DMA_MEM_TO_DEV) {
+			if (cfg->dst_addr_width != DMA_SLAVE_BUSWIDTH_UNDEFINED)
+				mchan->slave_reg_width = cfg->dst_addr_width;
+			else
+				return -EINVAL;
+			mchan->dir = DMA_MEM_TO_DEV;
+			mchan->tcd_nunits = cfg->dst_maxburst;
+		} else {
+			mchan->dir = DMA_MEM_TO_MEM;
+			mchan->slave_reg_width = DMA_SLAVE_BUSWIDTH_UNDEFINED;
+			mchan->tcd_nunits = 0;
+		}
+	} else
+		return -ENOSYS;
+
+	return ret;
+}
+
 static struct dma_async_tx_descriptor *
 mpc_dma_prep_memcpy(struct dma_chan *chan, dma_addr_t dst, dma_addr_t src,
 					size_t len, unsigned long flags)
@@ -577,6 +615,7 @@ mpc_dma_prep_memcpy(struct dma_chan *chan, dma_addr_t dst, dma_addr_t src,
 	struct mpc_dma_desc *mdesc = NULL;
 	struct mpc_dma_tcd *tcd;
 	unsigned long iflags;
+	u32 iter = 0;
 
 	/* Get free descriptor */
 	spin_lock_irqsave(&mchan->lock, iflags);
@@ -599,39 +638,153 @@ mpc_dma_prep_memcpy(struct dma_chan *chan, dma_addr_t dst, dma_addr_t src,
 	/* Prepare Transfer Control Descriptor for this transaction */
 	memset(tcd, 0, sizeof(struct mpc_dma_tcd));
 
-	if (IS_ALIGNED(src | dst | len, 32)) {
-		tcd->ssize = MPC_DMA_TSIZE_32;
-		tcd->dsize = MPC_DMA_TSIZE_32;
-		tcd->soff = 32;
-		tcd->doff = 32;
-	} else if (!mdma->is_mpc8308 && IS_ALIGNED(src | dst | len, 16)) {
-		/* MPC8308 doesn't support 16 byte transfers */
-		tcd->ssize = MPC_DMA_TSIZE_16;
-		tcd->dsize = MPC_DMA_TSIZE_16;
-		tcd->soff = 16;
-		tcd->doff = 16;
-	} else if (IS_ALIGNED(src | dst | len, 4)) {
-		tcd->ssize = MPC_DMA_TSIZE_4;
-		tcd->dsize = MPC_DMA_TSIZE_4;
-		tcd->soff = 4;
-		tcd->doff = 4;
-	} else if (IS_ALIGNED(src | dst | len, 2)) {
-		tcd->ssize = MPC_DMA_TSIZE_2;
-		tcd->dsize = MPC_DMA_TSIZE_2;
-		tcd->soff = 2;
-		tcd->doff = 2;
-	} else {
-		tcd->ssize = MPC_DMA_TSIZE_1;
-		tcd->dsize = MPC_DMA_TSIZE_1;
-		tcd->soff = 1;
-		tcd->doff = 1;
-	}
-
 	tcd->saddr = src;
 	tcd->daddr = dst;
-	tcd->nbytes = len;
-	tcd->biter = 1;
-	tcd->citer = 1;
+	if (mchan->dir == DMA_MEM_TO_MEM) {
+		if (IS_ALIGNED(src | dst | len, 32)) {
+			tcd->ssize = MPC_DMA_TSIZE_32;
+			tcd->dsize = MPC_DMA_TSIZE_32;
+			tcd->soff = 32;
+			tcd->doff = 32;
+		} else if (!mdma->is_mpc8308 &&
+					IS_ALIGNED(src | dst | len, 16)) {
+			/* MPC8308 doesn't support 16 byte transfers */
+			tcd->ssize = MPC_DMA_TSIZE_16;
+			tcd->dsize = MPC_DMA_TSIZE_16;
+			tcd->soff = 16;
+			tcd->doff = 16;
+		} else if (IS_ALIGNED(src | dst | len, 4)) {
+			tcd->ssize = MPC_DMA_TSIZE_4;
+			tcd->dsize = MPC_DMA_TSIZE_4;
+			tcd->soff = 4;
+			tcd->doff = 4;
+		} else if (IS_ALIGNED(src | dst | len, 2)) {
+			tcd->ssize = MPC_DMA_TSIZE_2;
+			tcd->dsize = MPC_DMA_TSIZE_2;
+			tcd->soff = 2;
+			tcd->doff = 2;
+		} else {
+			tcd->ssize = MPC_DMA_TSIZE_1;
+			tcd->dsize = MPC_DMA_TSIZE_1;
+			tcd->soff = 1;
+			tcd->doff = 1;
+		}
+		tcd->nbytes = len;
+		tcd->biter = 1;
+		tcd->citer = 1;
+	} else {
+		/* Memory to io-memory transfer or vice versa.
+		 * DMA controller is going to access io-memory via
+		 * some FIFO data register. The width of this register
+		 * is mchan->slave_reg_width.
+		 *
+		 * Since some FIFO registers require full width access,
+		 * let's firmly set the corresponding transfer size
+		 * to mchan->slave_reg_width
+		 * and prohibit transfers of packets with a length
+		 * which is not aligned on mchan->slave_reg_width boundaries
+		 * to avoid Transfer Control Descriptor inconsistency.
+		 * Moreover this will save us from playing with
+		 * source and destination address modulo.
+		 */
+
+		if (!IS_ALIGNED(len, mchan->slave_reg_width))
+			return NULL;
+
+		if (mchan->dir == DMA_DEV_TO_MEM) {
+			tcd->soff = 0;
+			if (mchan->slave_reg_width ==
+					DMA_SLAVE_BUSWIDTH_4_BYTES) {
+				tcd->ssize = MPC_DMA_TSIZE_4;
+				if (IS_ALIGNED(dst, 4)) {
+					tcd->dsize = MPC_DMA_TSIZE_4;
+					tcd->doff = 4;
+				} else if (IS_ALIGNED(dst, 2)) {
+					tcd->dsize = MPC_DMA_TSIZE_2;
+					tcd->doff = 2;
+				} else {
+					tcd->dsize = MPC_DMA_TSIZE_1;
+					tcd->doff = 1;
+				}
+			} else if (mchan->slave_reg_width ==
+					DMA_SLAVE_BUSWIDTH_2_BYTES) {
+				tcd->ssize = MPC_DMA_TSIZE_2;
+				if (IS_ALIGNED(dst, 2)) {
+					tcd->dsize = MPC_DMA_TSIZE_2;
+					tcd->doff = 2;
+				} else {
+					tcd->dsize = MPC_DMA_TSIZE_1;
+					tcd->doff = 1;
+				}
+			} else if (mchan->slave_reg_width ==
+					DMA_SLAVE_BUSWIDTH_1_BYTE) {
+				tcd->ssize = MPC_DMA_TSIZE_1;
+				tcd->dsize = MPC_DMA_TSIZE_1;
+				tcd->doff = 1;
+			} else
+				return NULL;
+		} else {
+			tcd->doff = 0;
+			if (mchan->slave_reg_width ==
+					DMA_SLAVE_BUSWIDTH_4_BYTES) {
+				tcd->dsize = MPC_DMA_TSIZE_4;
+				if (IS_ALIGNED(src, 4)) {
+					tcd->ssize = MPC_DMA_TSIZE_4;
+					tcd->soff = 4;
+				} else if (IS_ALIGNED(src, 2)) {
+					tcd->ssize = MPC_DMA_TSIZE_2;
+					tcd->soff = 2;
+				} else {
+					tcd->ssize = MPC_DMA_TSIZE_1;
+					tcd->soff = 1;
+				}
+			} else if (mchan->slave_reg_width ==
+					DMA_SLAVE_BUSWIDTH_2_BYTES) {
+				tcd->dsize = MPC_DMA_TSIZE_2;
+				if (IS_ALIGNED(src, 2)) {
+					tcd->ssize = MPC_DMA_TSIZE_2;
+					tcd->soff = 2;
+				} else {
+					tcd->ssize = MPC_DMA_TSIZE_1;
+					tcd->soff = 1;
+				}
+			} else if (mchan->slave_reg_width ==
+					DMA_SLAVE_BUSWIDTH_1_BYTE) {
+				tcd->dsize = MPC_DMA_TSIZE_1;
+				tcd->ssize = MPC_DMA_TSIZE_1;
+				tcd->soff = 1;
+			} else
+				return NULL;
+		}
+
+		if (mchan->tcd_nunits) {
+			tcd->nbytes = mchan->tcd_nunits *
+						mchan->slave_reg_width;
+			if (!IS_ALIGNED(len, tcd->nbytes)) {
+				/* mchan->tcd_nunits is inconsistent */
+				return NULL;
+			}
+
+			iter = len / tcd->nbytes;
+			if (iter > ((1 << 15) - 1)) {	/* maximum biter */
+				return NULL; /* len is too big */
+			} else {
+				tcd->biter = iter;
+				tcd->biter_linkch = iter >> 9;
+				tcd->citer = tcd->biter;
+				tcd->citer_linkch = tcd->biter_linkch;
+			}
+
+			/* DMA hardware should automatically clear
+			 * the corresponding DMAERQ bit when
+			 * the current major iteration count reaches zero. */
+			tcd->d_req = 1;
+		} else {
+			tcd->nbytes = len;
+			tcd->biter = 1;
+			tcd->citer = 1;
+		}
+	}
 
 	/* Place descriptor in prepared list */
 	spin_lock_irqsave(&mchan->lock, iflags);
@@ -725,6 +878,7 @@ static int mpc_dma_probe(struct platform_device *op)
 	dma->device_issue_pending = mpc_dma_issue_pending;
 	dma->device_tx_status = mpc_dma_tx_status;
 	dma->device_prep_dma_memcpy = mpc_dma_prep_memcpy;
+	dma->device_control = mpc_dma_control;
 
 	INIT_LIST_HEAD(&dma->channels);
 	dma_cap_set(DMA_MEMCPY, dma->cap_mask);
-- 
1.7.11.3

^ permalink raw reply related

* Re: [PATCH net-next] af_unix: fix a fatal race with bit fields
From: Benjamin Herrenschmidt @ 2013-05-01  8:08 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linuxppc-dev, paulus, ambrose, eric.dumazet
In-Reply-To: <20130501.033650.703182794549888825.davem@davemloft.net>

On Wed, 2013-05-01 at 03:36 -0400, David Miller wrote:
> From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Date: Wed, 01 May 2013 11:39:53 +1000
> 
> > I'm not even completely certain bytes are safe to be honest, though
> > probably more than bitfields. I'll poke our compiler people.
> 
> Older Alpha only has 32-bit and 64-bit loads and stores, so byte sized
> accesses are not atomic, and therefore use racey read-modify-write
> sequences.

In this case it depends whether the compiler will "chose" the smaller
(32-bit) size which hopefully won't overlap with the atomic/lock
provided the latter is aligned... lots of if's here, makes me nervous...

At least the bytes seem to fix it for ppc64 so far...

It would make feel generally better if we could get gcc to guarantee us
to always use the smallest access size that encompass the whole bitfield
(or at least not go larger than int when the bitfield is defined as
unsigned int). This would take care of all the cases we haven't spotted
yet (hopefully).

For all intend and purposes those two fields are bits of an unsigned
int, why the heck would the compiler use a larger access size anyway ? I
seem to recall that we have other places where such an assumption is
made that ints are accessed atomically, and Linus stating in the past
that a compiler doing anything else was not worth bothering with. I
don't see why bitfields of such int would be an exception to that rule
(though again, this is probably not a rule stated in the standard ... oh
well).

/me goes have a glass of wine and not think about this until tomorrow.

Cheers,
Ben.

^ permalink raw reply

* Re: [PATCH -V7 18/18] powerpc: Update tlbie/tlbiel as per ISA doc
From: Simon Jeons @ 2013-05-01  7:52 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: linux-mm, linuxppc-dev, paulus, David Gibson
In-Reply-To: <87hain9m5e.fsf@linux.vnet.ibm.com>

On 05/01/2013 03:47 PM, Aneesh Kumar K.V wrote:
> David Gibson <dwg@au1.ibm.com> writes:
>
>> On Tue, Apr 30, 2013 at 10:51:00PM +0530, Aneesh Kumar K.V wrote:
>>> David Gibson <dwg@au1.ibm.com> writes:
>>>
>>>> On Mon, Apr 29, 2013 at 01:07:39AM +0530, Aneesh Kumar K.V wrote:
>>>>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>>>>>
>>>>> Encode the actual page correctly in tlbie/tlbiel. This make sure we handle
>>>>> multiple page size segment correctly.
>>>> As mentioned in previous comments, this commit message needs to give
>>>> much more detail about what precisely the existing implementation is
>>>> doing wrong.
>>>>
>>>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>>>>> ---
>>>>>   arch/powerpc/mm/hash_native_64.c | 32 ++++++++++++++++++++++++++++++--
>>>>>   1 file changed, 30 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/arch/powerpc/mm/hash_native_64.c b/arch/powerpc/mm/hash_native_64.c
>>>>> index bb920ee..6a2aead 100644
>>>>> --- a/arch/powerpc/mm/hash_native_64.c
>>>>> +++ b/arch/powerpc/mm/hash_native_64.c
>>>>> @@ -61,7 +61,10 @@ static inline void __tlbie(unsigned long vpn, int psize, int apsize, int ssize)
>>>>>   
>>>>>   	switch (psize) {
>>>>>   	case MMU_PAGE_4K:
>>>>> +		/* clear out bits after (52) [0....52.....63] */
>>>>> +		va &= ~((1ul << (64 - 52)) - 1);
>>>>>   		va |= ssize << 8;
>>>>> +		va |= mmu_psize_defs[apsize].sllp << 6;
>>>>>   		asm volatile(ASM_FTR_IFCLR("tlbie %0,0", PPC_TLBIE(%1,%0), %2)
>>>>>   			     : : "r" (va), "r"(0), "i" (CPU_FTR_ARCH_206)
>>>>>   			     : "memory");
>>>>> @@ -69,9 +72,20 @@ static inline void __tlbie(unsigned long vpn, int psize, int apsize, int ssize)
>>>>>   	default:
>>>>>   		/* We need 14 to 14 + i bits of va */
>>>>>   		penc = mmu_psize_defs[psize].penc[apsize];
>>>>> -		va &= ~((1ul << mmu_psize_defs[psize].shift) - 1);
>>>>> +		va &= ~((1ul << mmu_psize_defs[apsize].shift) - 1);
>>>>>   		va |= penc << 12;
>>>>>   		va |= ssize << 8;
>>>>> +		/* Add AVAL part */
>>>>> +		if (psize != apsize) {
>>>>> +			/*
>>>>> +			 * MPSS, 64K base page size and 16MB parge page size
>>>>> +			 * We don't need all the bits, but rest of the bits
>>>>> +			 * must be ignored by the processor.
>>>>> +			 * vpn cover upto 65 bits of va. (0...65) and we need
>>>>> +			 * 58..64 bits of va.
>>>> I can't understand what this comment is saying.  Why do we need to do
>>>> something different in the psize != apsize case?
>>>>
>>>>> +			 */
>>>>> +			va |= (vpn & 0xfe);
>>>>> +		}
>>> That is as per ISA doc. It says if base page size == actual page size,
>>> (RB)56:62 must be zeros, which must be ignored by the processor.
>>> Otherwise it should be filled with the selected bits of VA as explained above.
>> What you've just said here makes much more sense than what's written
>> in the comment in the code.
>>
>>> We only support MPSS with base page size = 64K and actual page size = 16MB.
>> Is that actually relevant to this code though?
> In a way yes. The number of bits we we select out of VA depends on the
> base page size and actual page size. We have a math around that
> documented in ISA. Now since we support only 64K and 16MB we can make it
> simpler by only selecting required bits and not making it a
> function. But then it is also not relevant to the code in that ISA also
> state other bits in (RB)56:62 must be zero. I wanted to capture both the
> details in the comment.

What's the benefit of ppc use hash-based page table instead of 
tree-based page table? If you said that hash is faster, could x86 change 
to it?

>
> -aneesh
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH -V7 18/18] powerpc: Update tlbie/tlbiel as per ISA doc
From: Aneesh Kumar K.V @ 2013-05-01  7:47 UTC (permalink / raw)
  To: David Gibson; +Cc: linuxppc-dev, paulus, linux-mm
In-Reply-To: <20130501052625.GC14106@truffula.fritz.box>

David Gibson <dwg@au1.ibm.com> writes:

> On Tue, Apr 30, 2013 at 10:51:00PM +0530, Aneesh Kumar K.V wrote:
>> David Gibson <dwg@au1.ibm.com> writes:
>> 
>> > On Mon, Apr 29, 2013 at 01:07:39AM +0530, Aneesh Kumar K.V wrote:
>> >> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>> >> 
>> >> Encode the actual page correctly in tlbie/tlbiel. This make sure we handle
>> >> multiple page size segment correctly.
>> >
>> > As mentioned in previous comments, this commit message needs to give
>> > much more detail about what precisely the existing implementation is
>> > doing wrong.
>> >
>> >> 
>> >> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>> >> ---
>> >>  arch/powerpc/mm/hash_native_64.c | 32 ++++++++++++++++++++++++++++++--
>> >>  1 file changed, 30 insertions(+), 2 deletions(-)
>> >> 
>> >> diff --git a/arch/powerpc/mm/hash_native_64.c b/arch/powerpc/mm/hash_native_64.c
>> >> index bb920ee..6a2aead 100644
>> >> --- a/arch/powerpc/mm/hash_native_64.c
>> >> +++ b/arch/powerpc/mm/hash_native_64.c
>> >> @@ -61,7 +61,10 @@ static inline void __tlbie(unsigned long vpn, int psize, int apsize, int ssize)
>> >>  
>> >>  	switch (psize) {
>> >>  	case MMU_PAGE_4K:
>> >> +		/* clear out bits after (52) [0....52.....63] */
>> >> +		va &= ~((1ul << (64 - 52)) - 1);
>> >>  		va |= ssize << 8;
>> >> +		va |= mmu_psize_defs[apsize].sllp << 6;
>> >>  		asm volatile(ASM_FTR_IFCLR("tlbie %0,0", PPC_TLBIE(%1,%0), %2)
>> >>  			     : : "r" (va), "r"(0), "i" (CPU_FTR_ARCH_206)
>> >>  			     : "memory");
>> >> @@ -69,9 +72,20 @@ static inline void __tlbie(unsigned long vpn, int psize, int apsize, int ssize)
>> >>  	default:
>> >>  		/* We need 14 to 14 + i bits of va */
>> >>  		penc = mmu_psize_defs[psize].penc[apsize];
>> >> -		va &= ~((1ul << mmu_psize_defs[psize].shift) - 1);
>> >> +		va &= ~((1ul << mmu_psize_defs[apsize].shift) - 1);
>> >>  		va |= penc << 12;
>> >>  		va |= ssize << 8;
>> >> +		/* Add AVAL part */
>> >> +		if (psize != apsize) {
>> >> +			/*
>> >> +			 * MPSS, 64K base page size and 16MB parge page size
>> >> +			 * We don't need all the bits, but rest of the bits
>> >> +			 * must be ignored by the processor.
>> >> +			 * vpn cover upto 65 bits of va. (0...65) and we need
>> >> +			 * 58..64 bits of va.
>> >
>> > I can't understand what this comment is saying.  Why do we need to do
>> > something different in the psize != apsize case?
>> >
>> >> +			 */
>> >> +			va |= (vpn & 0xfe);
>> >> +		}
>> 
>> That is as per ISA doc. It says if base page size == actual page size,
>> (RB)56:62 must be zeros, which must be ignored by the processor.
>> Otherwise it should be filled with the selected bits of VA as explained above.
>
> What you've just said here makes much more sense than what's written
> in the comment in the code.
>
>> We only support MPSS with base page size = 64K and actual page size = 16MB.
>
> Is that actually relevant to this code though?

In a way yes. The number of bits we we select out of VA depends on the
base page size and actual page size. We have a math around that
documented in ISA. Now since we support only 64K and 16MB we can make it
simpler by only selecting required bits and not making it a
function. But then it is also not relevant to the code in that ISA also
state other bits in (RB)56:62 must be zero. I wanted to capture both the
details in the comment. 

-aneesh

^ permalink raw reply

* Re: [PATCH net-next] af_unix: fix a fatal race with bit fields
From: David Miller @ 2013-05-01  7:36 UTC (permalink / raw)
  To: benh; +Cc: netdev, linuxppc-dev, paulus, ambrose, eric.dumazet
In-Reply-To: <1367372393.22115.6.camel@pasglop>

From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Date: Wed, 01 May 2013 11:39:53 +1000

> I'm not even completely certain bytes are safe to be honest, though
> probably more than bitfields. I'll poke our compiler people.

Older Alpha only has 32-bit and 64-bit loads and stores, so byte sized
accesses are not atomic, and therefore use racey read-modify-write
sequences.

^ permalink raw reply

* [PATCH] powerpc: Fix build error for book3e
From: Aneesh Kumar K.V @ 2013-05-01  6:26 UTC (permalink / raw)
  To: benh; +Cc: linuxppc-dev, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

We moved the definition of shift_to_mmu_psize and mmu_psize_to_shift
out of hugetlbpage.c in patch "powerpc: New hugepage directory format".

These functions are not related to hugetlbpage and we want to use them
outside hugetlbpage.c We missed a definition for book3e when we moved
these functions. Add similar functions to mmu-book3e.h

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/mmu-book3e.h | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/arch/powerpc/include/asm/mmu-book3e.h b/arch/powerpc/include/asm/mmu-book3e.h
index 8bd560c..936db36 100644
--- a/arch/powerpc/include/asm/mmu-book3e.h
+++ b/arch/powerpc/include/asm/mmu-book3e.h
@@ -215,6 +215,7 @@
 #define TLBILX_T_CLASS3			7
 
 #ifndef __ASSEMBLY__
+#include <asm/bug.h>
 
 extern unsigned int tlbcam_index;
 
@@ -254,6 +255,23 @@ struct mmu_psize_def
 };
 extern struct mmu_psize_def mmu_psize_defs[MMU_PAGE_COUNT];
 
+static inline int shift_to_mmu_psize(unsigned int shift)
+{
+	int psize;
+
+	for (psize = 0; psize < MMU_PAGE_COUNT; ++psize)
+		if (mmu_psize_defs[psize].shift == shift)
+			return psize;
+	return -1;
+}
+
+static inline unsigned int mmu_psize_to_shift(unsigned int mmu_psize)
+{
+	if (mmu_psize_defs[mmu_psize].shift)
+		return mmu_psize_defs[mmu_psize].shift;
+	BUG();
+}
+
 /* The page sizes use the same names as 64-bit hash but are
  * constants
  */
-- 
1.8.1.2

^ permalink raw reply related

* [PATCH 3/3] powerpc: Context switch the new EBB SPRs
From: Michael Neuling @ 2013-05-01  6:17 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Michael Neuling, linuxppc-dev, Matt Evans
In-Reply-To: <1367389024-20391-1-git-send-email-mikey@neuling.org>

From: Michael Ellerman <michael@ellerman.id.au>

This context switches the new Event Based Branching (EBB) SPRs.  The three new
SPRs are:
  - Event Based Branch Handler Register (EBBHR)
  - Event Based Branch Return Register (EBBRR)
  - Branch Event Status and Control Register (BESCR)

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Matt Evans <matt@ozlabs.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 arch/powerpc/include/asm/processor.h |    3 +++
 arch/powerpc/include/asm/reg.h       |    3 +++
 arch/powerpc/kernel/asm-offsets.c    |    3 +++
 arch/powerpc/kernel/entry_64.S       |   16 ++++++++++++++++
 4 files changed, 25 insertions(+)

diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
index 0a4cc5d..d7e67ca 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -281,6 +281,9 @@ struct thread_struct {
 #endif
 #ifdef CONFIG_PPC_BOOK3S_64
 	unsigned long	tar;
+	unsigned long	ebbrr;
+	unsigned long	ebbhr;
+	unsigned long	bescr;
 #endif
 };
 
diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index cd241ed..fa8285b 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -663,6 +663,9 @@
 #define SPRN_MMCRH	316	/* Hypervisor monitor mode control register */
 #define SPRN_MMCRS	894	/* Supervisor monitor mode control register */
 #define SPRN_MMCRC	851	/* Core monitor mode control register */
+#define SPRN_EBBHR	804	/* Event based branch handler register */
+#define SPRN_EBBRR	805	/* Event based branch return register */
+#define SPRN_BESCR	806	/* Branch event status and control register */
 
 #define SPRN_PMC1	787
 #define SPRN_PMC2	788
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index b6c17ec..172233e 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -124,6 +124,9 @@ int main(void)
 
 #ifdef CONFIG_PPC_BOOK3S_64
 	DEFINE(THREAD_TAR, offsetof(struct thread_struct, tar));
+	DEFINE(THREAD_BESCR, offsetof(struct thread_struct, bescr));
+	DEFINE(THREAD_EBBHR, offsetof(struct thread_struct, ebbhr));
+	DEFINE(THREAD_EBBRR, offsetof(struct thread_struct, ebbrr));
 #endif
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
 	DEFINE(PACATMSCRATCH, offsetof(struct paca_struct, tm_scratch));
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 7a6801f..3fe5259 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -458,6 +458,14 @@ BEGIN_FTR_SECTION
 	 */
 	mfspr	r0,SPRN_TAR
 	std	r0,THREAD_TAR(r3)
+
+	/* Event based branch registers */
+	mfspr	r0, SPRN_BESCR
+	std	r0, THREAD_BESCR(r3)
+	mfspr	r0, SPRN_EBBHR
+	std	r0, THREAD_EBBHR(r3)
+	mfspr	r0, SPRN_EBBRR
+	std	r0, THREAD_EBBRR(r3)
 END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 #endif
 
@@ -545,6 +553,14 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_1T_SEGMENT)
 
 #ifdef CONFIG_PPC_BOOK3S_64
 BEGIN_FTR_SECTION
+	/* Event based branch registers */
+	ld	r0, THREAD_BESCR(r4)
+	mtspr	SPRN_BESCR, r0
+	ld	r0, THREAD_EBBHR(r4)
+	mtspr	SPRN_EBBHR, r0
+	ld	r0, THREAD_EBBRR(r4)
+	mtspr	SPRN_EBBRR, r0
+
 	ld	r0,THREAD_TAR(r4)
 	mtspr	SPRN_TAR,r0
 END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 2/3] powerpc: Turn on the EBB H/FSCR bits
From: Michael Neuling @ 2013-05-01  6:17 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Michael Neuling, linuxppc-dev, Matt Evans
In-Reply-To: <1367389024-20391-1-git-send-email-mikey@neuling.org>

This turns Event Based Branching (EBB) on in the Hypervisor Facility Status and
Control Register (HFSCR) and Facility Status and Control Register (FSCR).

Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 arch/powerpc/include/asm/reg.h        |    2 ++
 arch/powerpc/kernel/cpu_setup_power.S |    4 ++--
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 0b1ea1f..cd241ed 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -267,9 +267,11 @@
 #define SPRN_HSRR1	0x13B	/* Hypervisor Save/Restore 1 */
 #define SPRN_FSCR	0x099	/* Facility Status & Control Register */
 #define   FSCR_TAR	(1 << (63-55)) /* Enable Target Address Register */
+#define   FSCR_EBB	(1 << (63-56)) /* Enable Event Based Branching */
 #define   FSCR_DSCR	(1 << (63-61)) /* Enable Data Stream Control Register */
 #define SPRN_HFSCR	0xbe	/* HV=1 Facility Status & Control Register */
 #define   HFSCR_TAR	(1 << (63-55)) /* Enable Target Address Register */
+#define   HFSCR_EBB	(1 << (63-56)) /* Enable Event Based Branching */
 #define   HFSCR_TM	(1 << (63-58)) /* Enable Transactional Memory */
 #define   HFSCR_BHRB	(1 << (63-59)) /* Enable Branch History Rolling Buffer */
 #define   HFSCR_PM	(1 << (63-60)) /* Enable prob/priv access to PMU SPRs */
diff --git a/arch/powerpc/kernel/cpu_setup_power.S b/arch/powerpc/kernel/cpu_setup_power.S
index 256766d..2df2276 100644
--- a/arch/powerpc/kernel/cpu_setup_power.S
+++ b/arch/powerpc/kernel/cpu_setup_power.S
@@ -123,13 +123,13 @@ __init_LPCR:
 
 __init_FSCR:
 	mfspr	r3,SPRN_FSCR
-	ori	r3,r3,FSCR_TAR|FSCR_DSCR
+	ori	r3,r3,FSCR_TAR|FSCR_EBB|FSCR_DSCR
 	mtspr	SPRN_FSCR,r3
 	blr
 
 __init_HFSCR:
 	mfspr	r3,SPRN_HFSCR
-	ori	r3,r3,HFSCR_TAR|HFSCR_TM|HFSCR_BHRB|\
+	ori	r3,r3,HFSCR_TAR|HFSCR_EBB|HFSCR_TM|HFSCR_BHRB|\
 				HFSCR_DSCR|HFSCR_VECVSX|HFSCR_FP
 	mtspr	SPRN_HFSCR,r3
 	blr
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 1/3] powerpc: Replace CPU_FTR_BCTAR with CPU_FTR_ARCH_207S
From: Michael Neuling @ 2013-05-01  6:17 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Michael Neuling, linuxppc-dev, Matt Evans
In-Reply-To: <1367389024-20391-1-git-send-email-mikey@neuling.org>

From: Michael Ellerman <michael@ellerman.id.au>

We are getting low on cpu feature bits. So rather than add a separate bit for
every new Power8 feature, add a bit for arch 2.07 server catagory and use that
instead.

Hijack the value we had for BCTAR, but swap the value with CFAR so that all the
ARCH defines are together.

Note we don't touch CPU_FTR_TM, because it is conditionally enabled if
the kernel is built with TM support.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 arch/powerpc/include/asm/cputable.h |    8 ++++----
 arch/powerpc/kernel/entry_64.S      |    4 ++--
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/cputable.h b/arch/powerpc/include/asm/cputable.h
index 284e50b..fcc54ad 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -152,7 +152,7 @@ extern const char *powerpc_base_platform;
 #define CPU_FTR_HVMODE			LONG_ASM_CONST(0x0000000100000000)
 #define CPU_FTR_ARCH_201		LONG_ASM_CONST(0x0000000200000000)
 #define CPU_FTR_ARCH_206		LONG_ASM_CONST(0x0000000400000000)
-#define CPU_FTR_CFAR			LONG_ASM_CONST(0x0000000800000000)
+#define CPU_FTR_ARCH_207S		LONG_ASM_CONST(0x0000000800000000)
 #define CPU_FTR_IABR			LONG_ASM_CONST(0x0000001000000000)
 #define CPU_FTR_MMCRA			LONG_ASM_CONST(0x0000002000000000)
 #define CPU_FTR_CTRL			LONG_ASM_CONST(0x0000004000000000)
@@ -173,7 +173,7 @@ extern const char *powerpc_base_platform;
 #define CPU_FTR_ICSWX			LONG_ASM_CONST(0x0020000000000000)
 #define CPU_FTR_VMX_COPY		LONG_ASM_CONST(0x0040000000000000)
 #define CPU_FTR_TM			LONG_ASM_CONST(0x0080000000000000)
-#define CPU_FTR_BCTAR			LONG_ASM_CONST(0x0100000000000000)
+#define CPU_FTR_CFAR			LONG_ASM_CONST(0x0100000000000000)
 #define	CPU_FTR_HAS_PPR			LONG_ASM_CONST(0x0200000000000000)
 #define CPU_FTR_DAWR			LONG_ASM_CONST(0x0400000000000000)
 
@@ -422,8 +422,8 @@ extern const char *powerpc_base_platform;
 	    CPU_FTR_DSCR | CPU_FTR_SAO  | \
 	    CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | CPU_FTR_POPCNTD | \
 	    CPU_FTR_ICSWX | CPU_FTR_CFAR | CPU_FTR_HVMODE | CPU_FTR_VMX_COPY | \
-	    CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_DAWR | CPU_FTR_BCTAR | \
-	    CPU_FTR_TM_COMP)
+	    CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_DAWR | \
+	    CPU_FTR_ARCH_207S | CPU_FTR_TM_COMP)
 #define CPU_FTRS_CELL	(CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \
 	    CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \
 	    CPU_FTR_ALTIVEC_COMP | CPU_FTR_MMCRA | CPU_FTR_SMT | \
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 04d69c4..7a6801f 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -458,7 +458,7 @@ BEGIN_FTR_SECTION
 	 */
 	mfspr	r0,SPRN_TAR
 	std	r0,THREAD_TAR(r3)
-END_FTR_SECTION_IFSET(CPU_FTR_BCTAR)
+END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 #endif
 
 #ifdef CONFIG_SMP
@@ -547,7 +547,7 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_1T_SEGMENT)
 BEGIN_FTR_SECTION
 	ld	r0,THREAD_TAR(r4)
 	mtspr	SPRN_TAR,r0
-END_FTR_SECTION_IFSET(CPU_FTR_BCTAR)
+END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 #endif
 
 #ifdef CONFIG_ALTIVEC
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 0/3] powerpc: Context switch Event Based Branch (EBB) SPRs
From: Michael Neuling @ 2013-05-01  6:17 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Michael Neuling, linuxppc-dev, Matt Evans

Event Based Branching (EBB) is a new POWER8 feature to take asynchronous
userspace interrupt based on events.  Currently on POWER8 these events can be
PMU overflows.

This patch series context switches the three new EBB SPRs which are:
- Event Based Branch Handler Register (EBBHR)
- Event Based Branch Return Register (EBBRR)
- Branch Event Status And Control Register (BESCR)

Context switching these ensures there's no covert channel between processes.

This series developed by the BML team.

Apologies that the new EBB SPR acronyms are not humorously named.  Our powerpc
architects had an off day.

Mikey

^ permalink raw reply

* Re: [PATCH -V7 18/18] powerpc: Update tlbie/tlbiel as per ISA doc
From: David Gibson @ 2013-05-01  5:26 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: linuxppc-dev, paulus, linux-mm
In-Reply-To: <87ppxc9bpf.fsf@linux.vnet.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 3088 bytes --]

On Tue, Apr 30, 2013 at 10:51:00PM +0530, Aneesh Kumar K.V wrote:
> David Gibson <dwg@au1.ibm.com> writes:
> 
> > On Mon, Apr 29, 2013 at 01:07:39AM +0530, Aneesh Kumar K.V wrote:
> >> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> >> 
> >> Encode the actual page correctly in tlbie/tlbiel. This make sure we handle
> >> multiple page size segment correctly.
> >
> > As mentioned in previous comments, this commit message needs to give
> > much more detail about what precisely the existing implementation is
> > doing wrong.
> >
> >> 
> >> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> >> ---
> >>  arch/powerpc/mm/hash_native_64.c | 32 ++++++++++++++++++++++++++++++--
> >>  1 file changed, 30 insertions(+), 2 deletions(-)
> >> 
> >> diff --git a/arch/powerpc/mm/hash_native_64.c b/arch/powerpc/mm/hash_native_64.c
> >> index bb920ee..6a2aead 100644
> >> --- a/arch/powerpc/mm/hash_native_64.c
> >> +++ b/arch/powerpc/mm/hash_native_64.c
> >> @@ -61,7 +61,10 @@ static inline void __tlbie(unsigned long vpn, int psize, int apsize, int ssize)
> >>  
> >>  	switch (psize) {
> >>  	case MMU_PAGE_4K:
> >> +		/* clear out bits after (52) [0....52.....63] */
> >> +		va &= ~((1ul << (64 - 52)) - 1);
> >>  		va |= ssize << 8;
> >> +		va |= mmu_psize_defs[apsize].sllp << 6;
> >>  		asm volatile(ASM_FTR_IFCLR("tlbie %0,0", PPC_TLBIE(%1,%0), %2)
> >>  			     : : "r" (va), "r"(0), "i" (CPU_FTR_ARCH_206)
> >>  			     : "memory");
> >> @@ -69,9 +72,20 @@ static inline void __tlbie(unsigned long vpn, int psize, int apsize, int ssize)
> >>  	default:
> >>  		/* We need 14 to 14 + i bits of va */
> >>  		penc = mmu_psize_defs[psize].penc[apsize];
> >> -		va &= ~((1ul << mmu_psize_defs[psize].shift) - 1);
> >> +		va &= ~((1ul << mmu_psize_defs[apsize].shift) - 1);
> >>  		va |= penc << 12;
> >>  		va |= ssize << 8;
> >> +		/* Add AVAL part */
> >> +		if (psize != apsize) {
> >> +			/*
> >> +			 * MPSS, 64K base page size and 16MB parge page size
> >> +			 * We don't need all the bits, but rest of the bits
> >> +			 * must be ignored by the processor.
> >> +			 * vpn cover upto 65 bits of va. (0...65) and we need
> >> +			 * 58..64 bits of va.
> >
> > I can't understand what this comment is saying.  Why do we need to do
> > something different in the psize != apsize case?
> >
> >> +			 */
> >> +			va |= (vpn & 0xfe);
> >> +		}
> 
> That is as per ISA doc. It says if base page size == actual page size,
> (RB)56:62 must be zeros, which must be ignored by the processor.
> Otherwise it should be filled with the selected bits of VA as explained above.

What you've just said here makes much more sense than what's written
in the comment in the code.

> We only support MPSS with base page size = 64K and actual page size = 16MB.

Is that actually relevant to this code though?

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply

* Re: linux-next: manual merge of the vfs tree with the powerpc tree
From: Al Viro @ 2013-05-01  5:10 UTC (permalink / raw)
  To: Vasant Hegde
  Cc: Stephen Rothwell, linux-kernel, David Howells, linux-next,
	linuxppc-dev
In-Reply-To: <5180A1AD.4060409@linux.vnet.ibm.com>

On Wed, May 01, 2013 at 10:31:33AM +0530, Vasant Hegde wrote:
> On 05/01/2013 07:36 AM, Stephen Rothwell wrote:
> >Hi Al,
> >
> >Today's linux-next merge of the vfs tree got a conflict in
> >arch/powerpc/kernel/rtas_flash.c between commit fb4696c39573
> >("powerpc/rtas_flash: Fix bad memory access") from the powerpc tree and
> >commits ad18a364f186 ("powerpc/rtas_flash: Free kmem upon module exit")
> >and 2352ad01409d ("ppc: Clean up rtas_flash driver somewhat") from the
> >vfs tree.
> >
> 
> Stephen,
> 
> As Ben mentioned earlier, there are outstanding comments from me which
> needs to be addressed by Dave, before its merged.

What the hell?  Your fixes *are* folded into that commit and according to
Ben it passes testing; is there anything still missing?

^ permalink raw reply

* Re: [PATCH net-next] af_unix: fix a fatal race with bit fields
From: Eric Dumazet @ 2013-05-01  5:04 UTC (permalink / raw)
  To: Alan Modra
  Cc: netdev, Ambrose Feinstein, Paul Mackerras, Anton Blanchard,
	linuxppc-dev, David Miller
In-Reply-To: <20130501035425.GD5221@bubble.grove.modra.org>

On Wed, 2013-05-01 at 13:24 +0930, Alan Modra wrote:
> On Tue, Apr 30, 2013 at 07:24:20PM -0700, Eric Dumazet wrote:
> > 	li 11,1
> > 	ld 0,0(9)
> > 	rldimi 0,11,31,32
> > 	std 0,0(9)
> > 	blr
> > 	.ident	"GCC: (GNU) 4.6.3"
> > 
> > You can see "ld 0,0(9)" is used : its a 64 bit load.
> 
> Yup.  This is not a powerpc64 specific problem.  See
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52080
> Fixed in 4.8.0 and 4.7.3.

Ah thanks.

This seems a pretty serious issue, is it documented somewhere that
ppc64, sparc64 and others need such compiler version ?

These kind of errors are pretty hard to find, its a pity to spend time
on them.

^ permalink raw reply

* Re: linux-next: manual merge of the vfs tree with the powerpc tree
From: Vasant Hegde @ 2013-05-01  5:01 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: linux-kernel, David Howells, linux-next, Al Viro, linuxppc-dev
In-Reply-To: <20130501120636.0e54681b2dfb7ebe6aeb27c4@canb.auug.org.au>

On 05/01/2013 07:36 AM, Stephen Rothwell wrote:
> Hi Al,
>
> Today's linux-next merge of the vfs tree got a conflict in
> arch/powerpc/kernel/rtas_flash.c between commit fb4696c39573
> ("powerpc/rtas_flash: Fix bad memory access") from the powerpc tree and
> commits ad18a364f186 ("powerpc/rtas_flash: Free kmem upon module exit")
> and 2352ad01409d ("ppc: Clean up rtas_flash driver somewhat") from the
> vfs tree.
>

Stephen,

As Ben mentioned earlier, there are outstanding comments from me which
needs to be addressed by Dave, before its merged.


-Vasant

> I fixed it up (see below) and can carry the fix as necessary (no action
> is required).
>

^ permalink raw reply

* Re: [PATCH net-next] af_unix: fix a fatal race with bit fields
From: Alan Modra @ 2013-05-01  3:54 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, Ambrose Feinstein, Paul Mackerras, Anton Blanchard,
	linuxppc-dev, David Miller
In-Reply-To: <1367375060.11020.24.camel@edumazet-glaptop>

On Tue, Apr 30, 2013 at 07:24:20PM -0700, Eric Dumazet wrote:
> 	li 11,1
> 	ld 0,0(9)
> 	rldimi 0,11,31,32
> 	std 0,0(9)
> 	blr
> 	.ident	"GCC: (GNU) 4.6.3"
> 
> You can see "ld 0,0(9)" is used : its a 64 bit load.

Yup.  This is not a powerpc64 specific problem.  See
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52080
Fixed in 4.8.0 and 4.7.3.

-- 
Alan Modra
Australia Development Lab, IBM

^ permalink raw reply

* Re: [PATCH net-next] af_unix: fix a fatal race with bit fields
From: Eric Dumazet @ 2013-05-01  2:24 UTC (permalink / raw)
  To: Anton Blanchard
  Cc: amodra, netdev, Ambrose Feinstein, Paul Mackerras, linuxppc-dev,
	David Miller
In-Reply-To: <20130501115103.58e40f37@kryten>

On Wed, 2013-05-01 at 11:51 +1000, Anton Blanchard wrote:
> Hi Eric,
> 
> > From: Eric Dumazet <edumazet@google.com>
> > 
> > Using bit fields is dangerous on ppc64, as the compiler uses 64bit
> > instructions to manipulate them. If the 64bit word includes any
> > atomic_t or spinlock_t, we can lose critical concurrent changes.
> > 
> > This is happening in af_unix, where unix_sk(sk)->gc_candidate/
> > gc_maybe_cycle/lock share the same 64bit word.
> > 
> > This leads to fatal deadlock, as one/several cpus spin forever
> > on a spinlock that will never be available again.
> 
> I just spoke to Alan Modra and he suspects this is a compiler
> bug. Can you give us your compiler version info?

$ gcc-4.6.3-nolibc/powerpc64-linux/bin/powerpc64-linux-gcc -v
Using built-in specs.
COLLECT_GCC=gcc-4.6.3-nolibc/powerpc64-linux/bin/powerpc64-linux-gcc
COLLECT_LTO_WRAPPER=/usr/local/google/home/edumazet/cross/gcc-4.6.3-nolibc/powerpc64-linux/bin/../libexec/gcc/powerpc64-linux/4.6.3/lto-wrapper
Target: powerpc64-linux
Configured with: /home/tony/buildall/src/gcc/configure
--target=powerpc64-linux --host=x86_64-linux-gnu
--build=x86_64-linux-gnu --enable-targets=all
--prefix=/opt/cross/gcc-4.6.3-nolibc/powerpc64-linux/
--enable-languages=c --with-newlib --without-headers
--enable-sjlj-exceptions --with-system-libunwind --disable-nls
--disable-threads --disable-shared --disable-libmudflap --disable-libssp
--disable-libgomp --disable-decimal-float --enable-checking=release
--with-mpfr=/home/tony/buildall/src/sys-x86_64
--with-gmp=/home/tony/buildall/src/sys-x86_64 --disable-bootstrap
--disable-libquadmath
Thread model: single
gcc version 4.6.3 (GCC) 


$ cat try.c ; gcc-4.6.3-nolibc/powerpc64-linux/bin/powerpc64-linux-gcc
-O2 -S try.c ; cat try.s
struct s {
	unsigned int lock;
	unsigned int f1 : 1;
	unsigned int f2 : 1;
	void *ptr;
} *p ;

showbug()
{
	p->lock++;
	p->f1 = 1;
}
	.file	"try.c"
	.section	".toc","aw"
	.section	".text"
	.section	".toc","aw"
.LC0:
	.tc p[TC],p
	.section	".text"
	.align 2
	.globl showbug
	.section	".opd","aw"
	.align 3
showbug:
	.quad	.L.showbug,.TOC.@tocbase,0
	.previous
	.type	showbug, @function
.L.showbug:
	addis 9,2,.LC0@toc@ha
	ld 9,.LC0@toc@l(9)
	ld 9,0(9)
	lwz 11,0(9)
	addi 0,11,1
	stw 0,0(9)
	li 11,1
	ld 0,0(9)
	rldimi 0,11,31,32
	std 0,0(9)
	blr
	.long 0
	.byte 0,0,0,0,0,0,0,0
	.size	showbug,.-.L.showbug
	.comm	p,8,8
	.ident	"GCC: (GNU) 4.6.3"

You can see "ld 0,0(9)" is used : its a 64 bit load.

^ permalink raw reply

* Re: ppc/sata-fsl: orphan config value: CONFIG_MPC8315_DS
From: Anthony Foiani @ 2013-05-01  2:06 UTC (permalink / raw)
  To: Scott Wood
  Cc: Robert P.J.Day, linuxppc-dev@lists.ozlabs.org, Li Yang-R58472,
	Jeff Garzik, Adrian Bunk
In-Reply-To: <1367368958.24133.21@snotra>

Scott --

On 04/30/2013 06:42 PM, Scott Wood wrote:
> I just meant that, for whatever boards you would have put this in the 
> device tree, put it in platform code instead (if the platform file 
> supports more than one board type, then check the compatible at the 
> top of the device tree). 

I think I understand what you're suggesting.

Instead of a new property name, I would instead check for my specific 
board type (let's call it a foo-8315) in the top-level compatible list?  
So I'd change my devtree to have this top-level compatible:

/ {
     compatible = "example,foo-8315", "fsl,mpc8315erdb";

If I saw that, I would then twiddle the bits as needed?

MIght work, although having it in the sata block of the device tree has 
the advantage of providing me exactly the OF node that I need (in 
ofdev->dev.of_node).  I'd have to figure out how to traverse to the dev 
tree root and then back down one to the root compat entry.  Probably not 
impossible, but I was aiming for a fairly minimal patch.

It would also be nice if we could unravel exactly why that 
CONFIG_8315_DS ever showed up in the first place.  (The other "minimal" 
aspect of my patch was to try to make changes only around that one area, 
so that others could see that it was a simple change.)

> Or do you mean that you would not set this on any board's device tree 
> by default, and instead have users set it if they encounter problems? 

No, I would expect to set it on all the boards, so using the 
compatibility hack above would work.

> Is this a custom board you're seeing it on?

Not ours, but our vendor isn't very active on upstreaming things. (And 
yes, had I know that 5 years ago, I could possibly have changed vendors, 
or made upstreaming a part of the contract.  But at this point, we're 
stuck with this vendor, and they're not going to fix it; so I'm trying 
to fix it, and I'm trying to do the best job of upstreaming those fixes 
that I can.)

> git send-email can connect directly to an SMTP server.

Ok, I'll play around with that.

Thanks again,
Anthony

^ permalink raw reply

* linux-next: manual merge of the vfs tree with the powerpc tree
From: Stephen Rothwell @ 2013-05-01  2:06 UTC (permalink / raw)
  To: Al Viro; +Cc: Vasant Hegde, linux-kernel, David Howells, linux-next,
	linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 4740 bytes --]

Hi Al,

Today's linux-next merge of the vfs tree got a conflict in
arch/powerpc/kernel/rtas_flash.c between commit fb4696c39573
("powerpc/rtas_flash: Fix bad memory access") from the powerpc tree and
commits ad18a364f186 ("powerpc/rtas_flash: Free kmem upon module exit")
and 2352ad01409d ("ppc: Clean up rtas_flash driver somewhat") from the
vfs tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

diff --cc arch/powerpc/kernel/rtas_flash.c
index 243e184,5b77026..0000000
--- a/arch/powerpc/kernel/rtas_flash.c
+++ b/arch/powerpc/kernel/rtas_flash.c
@@@ -310,9 -316,9 +328,9 @@@ static ssize_t rtas_flash_write(struct 
  	 * proc file
  	 */
  	if (uf->flist == NULL) {
 -		uf->flist = kmem_cache_alloc(flash_block_cache, GFP_KERNEL);
 +		uf->flist = kmem_cache_zalloc(flash_block_cache, GFP_KERNEL);
  		if (!uf->flist)
- 			return -ENOMEM;
+ 			goto nomem;
  	}
  
  	fl = uf->flist;
@@@ -321,18 -327,18 +339,18 @@@
  	next_free = fl->num_blocks;
  	if (next_free == FLASH_BLOCKS_PER_NODE) {
  		/* Need to allocate another block_list */
 -		fl->next = kmem_cache_alloc(flash_block_cache, GFP_KERNEL);
 +		fl->next = kmem_cache_zalloc(flash_block_cache, GFP_KERNEL);
  		if (!fl->next)
- 			return -ENOMEM;
+ 			goto nomem;
  		fl = fl->next;
  		next_free = 0;
  	}
  
  	if (count > RTAS_BLK_SIZE)
  		count = RTAS_BLK_SIZE;
 -	p = kmem_cache_alloc(flash_block_cache, GFP_KERNEL);
 +	p = kmem_cache_zalloc(flash_block_cache, GFP_KERNEL);
  	if (!p)
- 		return -ENOMEM;
+ 		goto nomem;
  	
  	if(copy_from_user(p, buffer, count)) {
  		kmem_cache_free(flash_block_cache, p);
@@@ -722,65 -694,13 +706,13 @@@ static int __init rtas_flash_init(void
  		return 1;
  	}
  
- 	firmware_flash_pde = create_flash_pde("powerpc/rtas/"
- 					      FIRMWARE_FLASH_NAME,
- 					      &rtas_flash_operations);
- 	if (firmware_flash_pde == NULL) {
- 		rc = -ENOMEM;
- 		goto cleanup;
- 	}
- 
- 	rc = initialize_flash_pde_data("ibm,update-flash-64-and-reboot",
- 			 	       sizeof(struct rtas_update_flash_t), 
- 				       firmware_flash_pde);
- 	if (rc != 0)
- 		goto cleanup;
- 
- 	firmware_update_pde = create_flash_pde("powerpc/rtas/"
- 					       FIRMWARE_UPDATE_NAME,
- 					       &rtas_flash_operations);
- 	if (firmware_update_pde == NULL) {
- 		rc = -ENOMEM;
- 		goto cleanup;
- 	}
- 
- 	rc = initialize_flash_pde_data("ibm,update-flash-64-and-reboot",
- 			 	       sizeof(struct rtas_update_flash_t), 
- 				       firmware_update_pde);
- 	if (rc != 0)
- 		goto cleanup;
- 
- 	validate_pde = create_flash_pde("powerpc/rtas/" VALIDATE_FLASH_NAME,
- 			      		&validate_flash_operations);
- 	if (validate_pde == NULL) {
- 		rc = -ENOMEM;
- 		goto cleanup;
- 	}
- 
- 	rc = initialize_flash_pde_data("ibm,validate-flash-image",
- 		                       sizeof(struct rtas_validate_flash_t), 
- 				       validate_pde);
- 	if (rc != 0)
- 		goto cleanup;
- 
- 	manage_pde = create_flash_pde("powerpc/rtas/" MANAGE_FLASH_NAME,
- 				      &manage_flash_operations);
- 	if (manage_pde == NULL) {
- 		rc = -ENOMEM;
- 		goto cleanup;
- 	}
- 
- 	rc = initialize_flash_pde_data("ibm,manage-flash-image",
- 			               sizeof(struct rtas_manage_flash_t),
- 				       manage_pde);
- 	if (rc != 0)
- 		goto cleanup;
- 
- 	rtas_flash_term_hook = rtas_flash_firmware;
+ 	rtas_validate_flash_data.buf = kzalloc(VALIDATE_BUF_SIZE, GFP_KERNEL);
+ 	if (!rtas_validate_flash_data.buf)
+ 		return -ENOMEM;
  
  	flash_block_cache = kmem_cache_create("rtas_flash_cache",
 -					      RTAS_BLK_SIZE, RTAS_BLK_SIZE, 0,
 -					      rtas_block_ctor);
 +				RTAS_BLK_SIZE, RTAS_BLK_SIZE, 0,
 +				NULL);
  	if (!flash_block_cache) {
  		printk(KERN_ERR "%s: failed to create block cache\n",
  				__func__);
@@@ -800,20 -742,17 +754,22 @@@ enomem_buf
  
  static void __exit rtas_flash_cleanup(void)
  {
+ 	int i;
+ 
  	rtas_flash_term_hook = NULL;
  
 +	if (rtas_firmware_flash_list) {
 +		free_flash_list(rtas_firmware_flash_list);
 +		rtas_firmware_flash_list = NULL;
 +	}
 +
- 	if (flash_block_cache)
- 		kmem_cache_destroy(flash_block_cache);
+ 	for (i = 0; i < ARRAY_SIZE(rtas_flash_files); i++) {
+ 		const struct rtas_flash_file *f = &rtas_flash_files[i];
+ 		remove_proc_entry(f->filename, NULL);
+ 	}
  
- 	remove_flash_pde(firmware_flash_pde);
- 	remove_flash_pde(firmware_update_pde);
- 	remove_flash_pde(validate_pde);
- 	remove_flash_pde(manage_pde);
+ 	kmem_cache_destroy(flash_block_cache);
+ 	kfree(rtas_validate_flash_data.buf);
  }
  
  module_init(rtas_flash_init);

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: [PATCH net-next] af_unix: fix a fatal race with bit fields
From: Anton Blanchard @ 2013-05-01  1:51 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: amodra, netdev, Ambrose Feinstein, Paul Mackerras, linuxppc-dev,
	David Miller
In-Reply-To: <1367370761.11020.22.camel@edumazet-glaptop>

Hi Eric,

> From: Eric Dumazet <edumazet@google.com>
> 
> Using bit fields is dangerous on ppc64, as the compiler uses 64bit
> instructions to manipulate them. If the 64bit word includes any
> atomic_t or spinlock_t, we can lose critical concurrent changes.
> 
> This is happening in af_unix, where unix_sk(sk)->gc_candidate/
> gc_maybe_cycle/lock share the same 64bit word.
> 
> This leads to fatal deadlock, as one/several cpus spin forever
> on a spinlock that will never be available again.

I just spoke to Alan Modra and he suspects this is a compiler
bug. Can you give us your compiler version info?

Anton

> Reported-by: Ambrose Feinstein <ambrose@google.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> ---
> 
> Could ppc64 experts confirm using byte is safe, or should we really
> add a 32bit hole after the spinlock ? If so, I wonder how many other
> places need a change...
> 
>  include/net/af_unix.h |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/include/net/af_unix.h b/include/net/af_unix.h
> index a8836e8..4520a23f 100644
> --- a/include/net/af_unix.h
> +++ b/include/net/af_unix.h
> @@ -57,8 +57,8 @@ struct unix_sock {
>  	struct list_head	link;
>  	atomic_long_t		inflight;
>  	spinlock_t		lock;
> -	unsigned int		gc_candidate : 1;
> -	unsigned int		gc_maybe_cycle : 1;
> +	unsigned char		gc_candidate;
> +	unsigned char		gc_maybe_cycle;
>  	unsigned char		recursion_level;
>  	struct socket_wq	peer_wq;
>  };
> 
> 
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
> 

^ permalink raw reply

* Re: [PATCH net-next] af_unix: fix a fatal race with bit fields
From: Benjamin Herrenschmidt @ 2013-05-01  1:39 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, linuxppc-dev, Paul Mackerras, David Miller,
	Ambrose Feinstein
In-Reply-To: <1367370761.11020.22.camel@edumazet-glaptop>

On Tue, 2013-04-30 at 18:12 -0700, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> Using bit fields is dangerous on ppc64, as the compiler uses 64bit
> instructions to manipulate them. If the 64bit word includes any
> atomic_t or spinlock_t, we can lose critical concurrent changes.
> 
> This is happening in af_unix, where unix_sk(sk)->gc_candidate/
> gc_maybe_cycle/lock share the same 64bit word.
> 
> This leads to fatal deadlock, as one/several cpus spin forever
> on a spinlock that will never be available again.
> 
> Reported-by: Ambrose Feinstein <ambrose@google.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> ---
> 
> Could ppc64 experts confirm using byte is safe, or should we really add
> a 32bit hole after the spinlock ? If so, I wonder how many other places
> need a change...

Wow, nice one !

I'm not even completely certain bytes are safe to be honest, though
probably more than bitfields. I'll poke our compiler people.

The worry is of course how many more of these do we potentially have ? 
We might be able to automate finding these issues with sparse, I
suppose.

Also I'd be surprised if ppc64 is the only one with that problem... what
about sparc64 and arm64 ?

Cheers,
Ben.

>  include/net/af_unix.h |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/include/net/af_unix.h b/include/net/af_unix.h
> index a8836e8..4520a23f 100644
> --- a/include/net/af_unix.h
> +++ b/include/net/af_unix.h
> @@ -57,8 +57,8 @@ struct unix_sock {
>  	struct list_head	link;
>  	atomic_long_t		inflight;
>  	spinlock_t		lock;
> -	unsigned int		gc_candidate : 1;
> -	unsigned int		gc_maybe_cycle : 1;
> +	unsigned char		gc_candidate;
> +	unsigned char		gc_maybe_cycle;
>  	unsigned char		recursion_level;
>  	struct socket_wq	peer_wq;
>  };
> 

^ permalink raw reply

* [PATCH net-next] af_unix: fix a fatal race with bit fields
From: Eric Dumazet @ 2013-05-01  1:12 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linuxppc-dev, Paul Mackerras, Ambrose Feinstein

From: Eric Dumazet <edumazet@google.com>

Using bit fields is dangerous on ppc64, as the compiler uses 64bit
instructions to manipulate them. If the 64bit word includes any
atomic_t or spinlock_t, we can lose critical concurrent changes.

This is happening in af_unix, where unix_sk(sk)->gc_candidate/
gc_maybe_cycle/lock share the same 64bit word.

This leads to fatal deadlock, as one/several cpus spin forever
on a spinlock that will never be available again.

Reported-by: Ambrose Feinstein <ambrose@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
---

Could ppc64 experts confirm using byte is safe, or should we really add
a 32bit hole after the spinlock ? If so, I wonder how many other places
need a change...

 include/net/af_unix.h |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/net/af_unix.h b/include/net/af_unix.h
index a8836e8..4520a23f 100644
--- a/include/net/af_unix.h
+++ b/include/net/af_unix.h
@@ -57,8 +57,8 @@ struct unix_sock {
 	struct list_head	link;
 	atomic_long_t		inflight;
 	spinlock_t		lock;
-	unsigned int		gc_candidate : 1;
-	unsigned int		gc_maybe_cycle : 1;
+	unsigned char		gc_candidate;
+	unsigned char		gc_maybe_cycle;
 	unsigned char		recursion_level;
 	struct socket_wq	peer_wq;
 };

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox