LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [RFC] arch: Introduce new TSO memory barrier smp_tmb()
From: Paul E. McKenney @ 2013-11-03 22:43 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Michael Neuling, Mathieu Desnoyers, Peter Zijlstra,
	Frederic Weisbecker, Oleg Nesterov, LKML, Linux PPC dev,
	Anton Blanchard, Victor Kaplansky, Linus Torvalds
In-Reply-To: <1383512363.4776.22.camel@pasglop>

On Mon, Nov 04, 2013 at 07:59:23AM +1100, Benjamin Herrenschmidt wrote:
> On Sun, 2013-11-03 at 16:17 +0100, Peter Zijlstra wrote:
> > On Sun, Nov 03, 2013 at 06:40:17AM -0800, Paul E. McKenney wrote:
> > > If there was an smp_tmb(), I would likely use it in rcu_assign_pointer().
> > 
> > Well, I'm obviously all for introducing this new barrier, for it will
> > reduce a full mfence on x86 to a compiler barrier. And ppc can use
> > lwsync as opposed to sync afaict. Not sure ARM can do better.
> 
> The patch at the *very least* needs a good description of the semantics
> of the barrier, what does it order vs. what etc...

Agreed.  Also it needs a name that people can live with.  We will get
there.  ;-)

							Thanx, Paul

> Cheers,
> Ben.
> 
> > ---
> > Subject: arch: Introduce new TSO memory barrier smp_tmb()
> > 
> > A few sites could be downgraded from smp_mb() to smp_tmb() and a few
> > site should be upgraded to smp_tmb() that are now using smp_wmb().
> > 
> > XXX hope PaulMck explains things better..
> > 
> > X86 (!OOSTORE), SPARC have native TSO memory models and smp_tmb()
> > reduces to barrier().
> > 
> > PPC can use lwsync instead of sync
> > 
> > For the other archs, have smp_tmb map to smp_mb, as the stronger barrier
> > is always correct but possibly suboptimal.
> > 
> > Suggested-by: Paul McKenney <paulmck@linux.vnet.ibm.com>
> > Not-Signed-off-by: Peter Zijlstra <peterz@infradead.org>
> > ---
> >  arch/alpha/include/asm/barrier.h      | 2 ++
> >  arch/arc/include/asm/barrier.h        | 2 ++
> >  arch/arm/include/asm/barrier.h        | 2 ++
> >  arch/arm64/include/asm/barrier.h      | 2 ++
> >  arch/avr32/include/asm/barrier.h      | 1 +
> >  arch/blackfin/include/asm/barrier.h   | 1 +
> >  arch/cris/include/asm/barrier.h       | 2 ++
> >  arch/frv/include/asm/barrier.h        | 1 +
> >  arch/h8300/include/asm/barrier.h      | 2 ++
> >  arch/hexagon/include/asm/barrier.h    | 1 +
> >  arch/ia64/include/asm/barrier.h       | 2 ++
> >  arch/m32r/include/asm/barrier.h       | 2 ++
> >  arch/m68k/include/asm/barrier.h       | 1 +
> >  arch/metag/include/asm/barrier.h      | 3 +++
> >  arch/microblaze/include/asm/barrier.h | 1 +
> >  arch/mips/include/asm/barrier.h       | 3 +++
> >  arch/mn10300/include/asm/barrier.h    | 2 ++
> >  arch/parisc/include/asm/barrier.h     | 1 +
> >  arch/powerpc/include/asm/barrier.h    | 2 ++
> >  arch/s390/include/asm/barrier.h       | 1 +
> >  arch/score/include/asm/barrier.h      | 1 +
> >  arch/sh/include/asm/barrier.h         | 2 ++
> >  arch/sparc/include/asm/barrier_32.h   | 1 +
> >  arch/sparc/include/asm/barrier_64.h   | 3 +++
> >  arch/tile/include/asm/barrier.h       | 2 ++
> >  arch/unicore32/include/asm/barrier.h  | 1 +
> >  arch/x86/include/asm/barrier.h        | 3 +++
> >  arch/xtensa/include/asm/barrier.h     | 1 +
> >  28 files changed, 48 insertions(+)
> > 
> > diff --git a/arch/alpha/include/asm/barrier.h b/arch/alpha/include/asm/barrier.h
> > index ce8860a0b32d..02ea63897038 100644
> > --- a/arch/alpha/include/asm/barrier.h
> > +++ b/arch/alpha/include/asm/barrier.h
> > @@ -18,12 +18,14 @@ __asm__ __volatile__("mb": : :"memory")
> >  #ifdef CONFIG_SMP
> >  #define __ASM_SMP_MB	"\tmb\n"
> >  #define smp_mb()	mb()
> > +#define smp_tmb()	mb()
> >  #define smp_rmb()	rmb()
> >  #define smp_wmb()	wmb()
> >  #define smp_read_barrier_depends()	read_barrier_depends()
> >  #else
> >  #define __ASM_SMP_MB
> >  #define smp_mb()	barrier()
> > +#define smp_tmb()	barrier()
> >  #define smp_rmb()	barrier()
> >  #define smp_wmb()	barrier()
> >  #define smp_read_barrier_depends()	do { } while (0)
> > diff --git a/arch/arc/include/asm/barrier.h b/arch/arc/include/asm/barrier.h
> > index f6cb7c4ffb35..456c790fa1ad 100644
> > --- a/arch/arc/include/asm/barrier.h
> > +++ b/arch/arc/include/asm/barrier.h
> > @@ -22,10 +22,12 @@
> >  /* TODO-vineetg verify the correctness of macros here */
> >  #ifdef CONFIG_SMP
> >  #define smp_mb()        mb()
> > +#define smp_tmb()	mb()
> >  #define smp_rmb()       rmb()
> >  #define smp_wmb()       wmb()
> >  #else
> >  #define smp_mb()        barrier()
> > +#define smp_tmb()	barrier()
> >  #define smp_rmb()       barrier()
> >  #define smp_wmb()       barrier()
> >  #endif
> > diff --git a/arch/arm/include/asm/barrier.h b/arch/arm/include/asm/barrier.h
> > index 60f15e274e6d..bc88a8505673 100644
> > --- a/arch/arm/include/asm/barrier.h
> > +++ b/arch/arm/include/asm/barrier.h
> > @@ -51,10 +51,12 @@
> >  
> >  #ifndef CONFIG_SMP
> >  #define smp_mb()	barrier()
> > +#define smp_tmb()	barrier()
> >  #define smp_rmb()	barrier()
> >  #define smp_wmb()	barrier()
> >  #else
> >  #define smp_mb()	dmb(ish)
> > +#define smp_tmb()	smp_mb()
> >  #define smp_rmb()	smp_mb()
> >  #define smp_wmb()	dmb(ishst)
> >  #endif
> > diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
> > index d4a63338a53c..ec0531f4892f 100644
> > --- a/arch/arm64/include/asm/barrier.h
> > +++ b/arch/arm64/include/asm/barrier.h
> > @@ -33,10 +33,12 @@
> >  
> >  #ifndef CONFIG_SMP
> >  #define smp_mb()	barrier()
> > +#define smp_tmb()	barrier()
> >  #define smp_rmb()	barrier()
> >  #define smp_wmb()	barrier()
> >  #else
> >  #define smp_mb()	asm volatile("dmb ish" : : : "memory")
> > +#define smp_tmb()	asm volatile("dmb ish" : : : "memory")
> >  #define smp_rmb()	asm volatile("dmb ishld" : : : "memory")
> >  #define smp_wmb()	asm volatile("dmb ishst" : : : "memory")
> >  #endif
> > diff --git a/arch/avr32/include/asm/barrier.h b/arch/avr32/include/asm/barrier.h
> > index 0961275373db..6c6ccb9cf290 100644
> > --- a/arch/avr32/include/asm/barrier.h
> > +++ b/arch/avr32/include/asm/barrier.h
> > @@ -20,6 +20,7 @@
> >  # error "The AVR32 port does not support SMP"
> >  #else
> >  # define smp_mb()		barrier()
> > +# define smp_tmb()		barrier()
> >  # define smp_rmb()		barrier()
> >  # define smp_wmb()		barrier()
> >  # define smp_read_barrier_depends() do { } while(0)
> > diff --git a/arch/blackfin/include/asm/barrier.h b/arch/blackfin/include/asm/barrier.h
> > index ebb189507dd7..100f49121a18 100644
> > --- a/arch/blackfin/include/asm/barrier.h
> > +++ b/arch/blackfin/include/asm/barrier.h
> > @@ -40,6 +40,7 @@
> >  #endif /* !CONFIG_SMP */
> >  
> >  #define smp_mb()  mb()
> > +#define smp_tmb() mb()
> >  #define smp_rmb() rmb()
> >  #define smp_wmb() wmb()
> >  #define set_mb(var, value) do { var = value; mb(); } while (0)
> > diff --git a/arch/cris/include/asm/barrier.h b/arch/cris/include/asm/barrier.h
> > index 198ad7fa6b25..679c33738b4c 100644
> > --- a/arch/cris/include/asm/barrier.h
> > +++ b/arch/cris/include/asm/barrier.h
> > @@ -12,11 +12,13 @@
> >  
> >  #ifdef CONFIG_SMP
> >  #define smp_mb()        mb()
> > +#define smp_tmb()       mb()
> >  #define smp_rmb()       rmb()
> >  #define smp_wmb()       wmb()
> >  #define smp_read_barrier_depends()     read_barrier_depends()
> >  #else
> >  #define smp_mb()        barrier()
> > +#define smp_tmb()       barrier()
> >  #define smp_rmb()       barrier()
> >  #define smp_wmb()       barrier()
> >  #define smp_read_barrier_depends()     do { } while(0)
> > diff --git a/arch/frv/include/asm/barrier.h b/arch/frv/include/asm/barrier.h
> > index 06776ad9f5e9..60354ce13ba0 100644
> > --- a/arch/frv/include/asm/barrier.h
> > +++ b/arch/frv/include/asm/barrier.h
> > @@ -20,6 +20,7 @@
> >  #define read_barrier_depends()	do { } while (0)
> >  
> >  #define smp_mb()			barrier()
> > +#define smp_tmb()			barrier()
> >  #define smp_rmb()			barrier()
> >  #define smp_wmb()			barrier()
> >  #define smp_read_barrier_depends()	do {} while(0)
> > diff --git a/arch/h8300/include/asm/barrier.h b/arch/h8300/include/asm/barrier.h
> > index 9e0aa9fc195d..e8e297fa4e9a 100644
> > --- a/arch/h8300/include/asm/barrier.h
> > +++ b/arch/h8300/include/asm/barrier.h
> > @@ -16,11 +16,13 @@
> >  
> >  #ifdef CONFIG_SMP
> >  #define smp_mb()	mb()
> > +#define smp_tmb()	mb()
> >  #define smp_rmb()	rmb()
> >  #define smp_wmb()	wmb()
> >  #define smp_read_barrier_depends()	read_barrier_depends()
> >  #else
> >  #define smp_mb()	barrier()
> > +#define smp_tmb()	barrier()
> >  #define smp_rmb()	barrier()
> >  #define smp_wmb()	barrier()
> >  #define smp_read_barrier_depends()	do { } while(0)
> > diff --git a/arch/hexagon/include/asm/barrier.h b/arch/hexagon/include/asm/barrier.h
> > index 1041a8e70ce8..2dd5b2ad4d21 100644
> > --- a/arch/hexagon/include/asm/barrier.h
> > +++ b/arch/hexagon/include/asm/barrier.h
> > @@ -28,6 +28,7 @@
> >  #define smp_rmb()			barrier()
> >  #define smp_read_barrier_depends()	barrier()
> >  #define smp_wmb()			barrier()
> > +#define smp_tmb()			barrier()
> >  #define smp_mb()			barrier()
> >  #define smp_mb__before_atomic_dec()	barrier()
> >  #define smp_mb__after_atomic_dec()	barrier()
> > diff --git a/arch/ia64/include/asm/barrier.h b/arch/ia64/include/asm/barrier.h
> > index 60576e06b6fb..a5f92146b091 100644
> > --- a/arch/ia64/include/asm/barrier.h
> > +++ b/arch/ia64/include/asm/barrier.h
> > @@ -42,11 +42,13 @@
> >  
> >  #ifdef CONFIG_SMP
> >  # define smp_mb()	mb()
> > +# define smp_tmb()	mb()
> >  # define smp_rmb()	rmb()
> >  # define smp_wmb()	wmb()
> >  # define smp_read_barrier_depends()	read_barrier_depends()
> >  #else
> >  # define smp_mb()	barrier()
> > +# define smp_tmb()	barrier()
> >  # define smp_rmb()	barrier()
> >  # define smp_wmb()	barrier()
> >  # define smp_read_barrier_depends()	do { } while(0)
> > diff --git a/arch/m32r/include/asm/barrier.h b/arch/m32r/include/asm/barrier.h
> > index 6976621efd3f..a6fa29facd7a 100644
> > --- a/arch/m32r/include/asm/barrier.h
> > +++ b/arch/m32r/include/asm/barrier.h
> > @@ -79,12 +79,14 @@
> >  
> >  #ifdef CONFIG_SMP
> >  #define smp_mb()	mb()
> > +#define smp_tmb()	mb()
> >  #define smp_rmb()	rmb()
> >  #define smp_wmb()	wmb()
> >  #define smp_read_barrier_depends()	read_barrier_depends()
> >  #define set_mb(var, value) do { (void) xchg(&var, value); } while (0)
> >  #else
> >  #define smp_mb()	barrier()
> > +#define smp_tmb()	barrier()
> >  #define smp_rmb()	barrier()
> >  #define smp_wmb()	barrier()
> >  #define smp_read_barrier_depends()	do { } while (0)
> > diff --git a/arch/m68k/include/asm/barrier.h b/arch/m68k/include/asm/barrier.h
> > index 445ce22c23cb..8ecf52c87847 100644
> > --- a/arch/m68k/include/asm/barrier.h
> > +++ b/arch/m68k/include/asm/barrier.h
> > @@ -13,6 +13,7 @@
> >  #define set_mb(var, value)	({ (var) = (value); wmb(); })
> >  
> >  #define smp_mb()	barrier()
> > +#define smp_tmb()	barrier()
> >  #define smp_rmb()	barrier()
> >  #define smp_wmb()	barrier()
> >  #define smp_read_barrier_depends()	((void)0)
> > diff --git a/arch/metag/include/asm/barrier.h b/arch/metag/include/asm/barrier.h
> > index c90bfc6bf648..eb179fbce580 100644
> > --- a/arch/metag/include/asm/barrier.h
> > +++ b/arch/metag/include/asm/barrier.h
> > @@ -50,6 +50,7 @@ static inline void wmb(void)
> >  #ifndef CONFIG_SMP
> >  #define fence()		do { } while (0)
> >  #define smp_mb()        barrier()
> > +#define smp_tmb()       barrier()
> >  #define smp_rmb()       barrier()
> >  #define smp_wmb()       barrier()
> >  #else
> > @@ -70,11 +71,13 @@ static inline void fence(void)
> >  	*flushptr = 0;
> >  }
> >  #define smp_mb()        fence()
> > +#define smp_tmb()       fence()
> >  #define smp_rmb()       fence()
> >  #define smp_wmb()       barrier()
> >  #else
> >  #define fence()		do { } while (0)
> >  #define smp_mb()        barrier()
> > +#define smp_tmb()       barrier()
> >  #define smp_rmb()       barrier()
> >  #define smp_wmb()       barrier()
> >  #endif
> > diff --git a/arch/microblaze/include/asm/barrier.h b/arch/microblaze/include/asm/barrier.h
> > index df5be3e87044..d573c170a717 100644
> > --- a/arch/microblaze/include/asm/barrier.h
> > +++ b/arch/microblaze/include/asm/barrier.h
> > @@ -21,6 +21,7 @@
> >  #define set_wmb(var, value)	do { var = value; wmb(); } while (0)
> >  
> >  #define smp_mb()		mb()
> > +#define smp_tmb()		mb()
> >  #define smp_rmb()		rmb()
> >  #define smp_wmb()		wmb()
> >  
> > diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h
> > index 314ab5532019..535e699eec3b 100644
> > --- a/arch/mips/include/asm/barrier.h
> > +++ b/arch/mips/include/asm/barrier.h
> > @@ -144,15 +144,18 @@
> >  #if defined(CONFIG_WEAK_ORDERING) && defined(CONFIG_SMP)
> >  # ifdef CONFIG_CPU_CAVIUM_OCTEON
> >  #  define smp_mb()	__sync()
> > +#  define smp_tmb()	__sync()
> >  #  define smp_rmb()	barrier()
> >  #  define smp_wmb()	__syncw()
> >  # else
> >  #  define smp_mb()	__asm__ __volatile__("sync" : : :"memory")
> > +#  define smp_tmb()	__asm__ __volatile__("sync" : : :"memory")
> >  #  define smp_rmb()	__asm__ __volatile__("sync" : : :"memory")
> >  #  define smp_wmb()	__asm__ __volatile__("sync" : : :"memory")
> >  # endif
> >  #else
> >  #define smp_mb()	barrier()
> > +#define smp_tmb()	barrier()
> >  #define smp_rmb()	barrier()
> >  #define smp_wmb()	barrier()
> >  #endif
> > diff --git a/arch/mn10300/include/asm/barrier.h b/arch/mn10300/include/asm/barrier.h
> > index 2bd97a5c8af7..a345b0776e5f 100644
> > --- a/arch/mn10300/include/asm/barrier.h
> > +++ b/arch/mn10300/include/asm/barrier.h
> > @@ -19,11 +19,13 @@
> >  
> >  #ifdef CONFIG_SMP
> >  #define smp_mb()	mb()
> > +#define smp_tmb()	mb()
> >  #define smp_rmb()	rmb()
> >  #define smp_wmb()	wmb()
> >  #define set_mb(var, value)  do { xchg(&var, value); } while (0)
> >  #else  /* CONFIG_SMP */
> >  #define smp_mb()	barrier()
> > +#define smp_tmb()	barrier()
> >  #define smp_rmb()	barrier()
> >  #define smp_wmb()	barrier()
> >  #define set_mb(var, value)  do { var = value;  mb(); } while (0)
> > diff --git a/arch/parisc/include/asm/barrier.h b/arch/parisc/include/asm/barrier.h
> > index e77d834aa803..f53196b589ec 100644
> > --- a/arch/parisc/include/asm/barrier.h
> > +++ b/arch/parisc/include/asm/barrier.h
> > @@ -25,6 +25,7 @@
> >  #define rmb()		mb()
> >  #define wmb()		mb()
> >  #define smp_mb()	mb()
> > +#define smp_tmb()	mb()
> >  #define smp_rmb()	mb()
> >  #define smp_wmb()	mb()
> >  #define smp_read_barrier_depends()	do { } while(0)
> > diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h
> > index ae782254e731..d7e8a560f1fe 100644
> > --- a/arch/powerpc/include/asm/barrier.h
> > +++ b/arch/powerpc/include/asm/barrier.h
> > @@ -46,11 +46,13 @@
> >  #endif
> >  
> >  #define smp_mb()	mb()
> > +#define smp_tmb()	__asm__ __volatile__ (stringify_in_c(LWSYNC) : : :"memory")
> >  #define smp_rmb()	__asm__ __volatile__ (stringify_in_c(LWSYNC) : : :"memory")
> >  #define smp_wmb()	__asm__ __volatile__ (stringify_in_c(SMPWMB) : : :"memory")
> >  #define smp_read_barrier_depends()	read_barrier_depends()
> >  #else
> >  #define smp_mb()	barrier()
> > +#define smp_tmb()	barrier()
> >  #define smp_rmb()	barrier()
> >  #define smp_wmb()	barrier()
> >  #define smp_read_barrier_depends()	do { } while(0)
> > diff --git a/arch/s390/include/asm/barrier.h b/arch/s390/include/asm/barrier.h
> > index 16760eeb79b0..f0409a874243 100644
> > --- a/arch/s390/include/asm/barrier.h
> > +++ b/arch/s390/include/asm/barrier.h
> > @@ -24,6 +24,7 @@
> >  #define wmb()				mb()
> >  #define read_barrier_depends()		do { } while(0)
> >  #define smp_mb()			mb()
> > +#define smp_tmb()			mb()
> >  #define smp_rmb()			rmb()
> >  #define smp_wmb()			wmb()
> >  #define smp_read_barrier_depends()	read_barrier_depends()
> > diff --git a/arch/score/include/asm/barrier.h b/arch/score/include/asm/barrier.h
> > index 0eacb6471e6d..865652083dde 100644
> > --- a/arch/score/include/asm/barrier.h
> > +++ b/arch/score/include/asm/barrier.h
> > @@ -5,6 +5,7 @@
> >  #define rmb()		barrier()
> >  #define wmb()		barrier()
> >  #define smp_mb()	barrier()
> > +#define smp_tmb()	barrier()
> >  #define smp_rmb()	barrier()
> >  #define smp_wmb()	barrier()
> >  
> > diff --git a/arch/sh/include/asm/barrier.h b/arch/sh/include/asm/barrier.h
> > index 72c103dae300..f8dce7926432 100644
> > --- a/arch/sh/include/asm/barrier.h
> > +++ b/arch/sh/include/asm/barrier.h
> > @@ -39,11 +39,13 @@
> >  
> >  #ifdef CONFIG_SMP
> >  #define smp_mb()	mb()
> > +#define smp_tmb()	mb()
> >  #define smp_rmb()	rmb()
> >  #define smp_wmb()	wmb()
> >  #define smp_read_barrier_depends()	read_barrier_depends()
> >  #else
> >  #define smp_mb()	barrier()
> > +#define smp_tmb()	barrier()
> >  #define smp_rmb()	barrier()
> >  #define smp_wmb()	barrier()
> >  #define smp_read_barrier_depends()	do { } while(0)
> > diff --git a/arch/sparc/include/asm/barrier_32.h b/arch/sparc/include/asm/barrier_32.h
> > index c1b76654ee76..1037ce189cee 100644
> > --- a/arch/sparc/include/asm/barrier_32.h
> > +++ b/arch/sparc/include/asm/barrier_32.h
> > @@ -8,6 +8,7 @@
> >  #define read_barrier_depends()	do { } while(0)
> >  #define set_mb(__var, __value)  do { __var = __value; mb(); } while(0)
> >  #define smp_mb()	__asm__ __volatile__("":::"memory")
> > +#define smp_tmb()	__asm__ __volatile__("":::"memory")
> >  #define smp_rmb()	__asm__ __volatile__("":::"memory")
> >  #define smp_wmb()	__asm__ __volatile__("":::"memory")
> >  #define smp_read_barrier_depends()	do { } while(0)
> > diff --git a/arch/sparc/include/asm/barrier_64.h b/arch/sparc/include/asm/barrier_64.h
> > index 95d45986f908..0f3c2fdb86b8 100644
> > --- a/arch/sparc/include/asm/barrier_64.h
> > +++ b/arch/sparc/include/asm/barrier_64.h
> > @@ -34,6 +34,7 @@ do {	__asm__ __volatile__("ba,pt	%%xcc, 1f\n\t" \
> >   * memory ordering than required by the specifications.
> >   */
> >  #define mb()	membar_safe("#StoreLoad")
> > +#define tmb()	__asm__ __volatile__("":::"memory")
> >  #define rmb()	__asm__ __volatile__("":::"memory")
> >  #define wmb()	__asm__ __volatile__("":::"memory")
> >  
> > @@ -43,10 +44,12 @@ do {	__asm__ __volatile__("ba,pt	%%xcc, 1f\n\t" \
> >  
> >  #ifdef CONFIG_SMP
> >  #define smp_mb()	mb()
> > +#define smp_tmb()	tmb()
> >  #define smp_rmb()	rmb()
> >  #define smp_wmb()	wmb()
> >  #else
> >  #define smp_mb()	__asm__ __volatile__("":::"memory")
> > +#define smp_tmb()	__asm__ __volatile__("":::"memory")
> >  #define smp_rmb()	__asm__ __volatile__("":::"memory")
> >  #define smp_wmb()	__asm__ __volatile__("":::"memory")
> >  #endif
> > diff --git a/arch/tile/include/asm/barrier.h b/arch/tile/include/asm/barrier.h
> > index a9a73da5865d..cad3c6ae28bf 100644
> > --- a/arch/tile/include/asm/barrier.h
> > +++ b/arch/tile/include/asm/barrier.h
> > @@ -127,11 +127,13 @@ mb_incoherent(void)
> >  
> >  #ifdef CONFIG_SMP
> >  #define smp_mb()	mb()
> > +#define smp_tmb()	mb()
> >  #define smp_rmb()	rmb()
> >  #define smp_wmb()	wmb()
> >  #define smp_read_barrier_depends()	read_barrier_depends()
> >  #else
> >  #define smp_mb()	barrier()
> > +#define smp_tmb()	barrier()
> >  #define smp_rmb()	barrier()
> >  #define smp_wmb()	barrier()
> >  #define smp_read_barrier_depends()	do { } while (0)
> > diff --git a/arch/unicore32/include/asm/barrier.h b/arch/unicore32/include/asm/barrier.h
> > index a6620e5336b6..8b341fffbda6 100644
> > --- a/arch/unicore32/include/asm/barrier.h
> > +++ b/arch/unicore32/include/asm/barrier.h
> > @@ -18,6 +18,7 @@
> >  #define rmb()				barrier()
> >  #define wmb()				barrier()
> >  #define smp_mb()			barrier()
> > +#define smp_tmb()			barrier()
> >  #define smp_rmb()			barrier()
> >  #define smp_wmb()			barrier()
> >  #define read_barrier_depends()		do { } while (0)
> > diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
> > index c6cd358a1eec..480201d83af1 100644
> > --- a/arch/x86/include/asm/barrier.h
> > +++ b/arch/x86/include/asm/barrier.h
> > @@ -86,14 +86,17 @@
> >  # define smp_rmb()	barrier()
> >  #endif
> >  #ifdef CONFIG_X86_OOSTORE
> > +# define smp_tmb()	mb()
> >  # define smp_wmb() 	wmb()
> >  #else
> > +# define smp_tmb()	barrier()
> >  # define smp_wmb()	barrier()
> >  #endif
> >  #define smp_read_barrier_depends()	read_barrier_depends()
> >  #define set_mb(var, value) do { (void)xchg(&var, value); } while (0)
> >  #else
> >  #define smp_mb()	barrier()
> > +#define smp_tmb()	barrier()
> >  #define smp_rmb()	barrier()
> >  #define smp_wmb()	barrier()
> >  #define smp_read_barrier_depends()	do { } while (0)
> > diff --git a/arch/xtensa/include/asm/barrier.h b/arch/xtensa/include/asm/barrier.h
> > index ef021677d536..7839db843ea5 100644
> > --- a/arch/xtensa/include/asm/barrier.h
> > +++ b/arch/xtensa/include/asm/barrier.h
> > @@ -20,6 +20,7 @@
> >  #error smp_* not defined
> >  #else
> >  #define smp_mb()	barrier()
> > +#define smp_tmb()	barrier()
> >  #define smp_rmb()	barrier()
> >  #define smp_wmb()	barrier()
> >  #endif
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> 
> 

^ permalink raw reply

* Re: [RFC] arch: Introduce new TSO memory barrier smp_tmb()
From: Linus Torvalds @ 2013-11-03 23:34 UTC (permalink / raw)
  To: Paul McKenney
  Cc: Michael Neuling, Mathieu Desnoyers, Peter Zijlstra, Oleg Nesterov,
	LKML, Linux PPC dev, Anton Blanchard, Frederic Weisbecker,
	Victor Kaplansky
In-Reply-To: <20131103224242.GF3947@linux.vnet.ibm.com>

On Sun, Nov 3, 2013 at 2:42 PM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
>
> smp_storebuffer_mb() -- A barrier that enforces those orderings
>         that do not invalidate the hardware store-buffer optimization.

Ugh. Maybe. Can you guarantee that those are the correct semantics?
And why talk about the hardware semantics, when you really want
specific semantics for the *software*.

> smp_not_w_r_mb() -- A barrier that orders everything except prior
>         writes against subsequent reads.

Ok, that sounds more along the lines of "these are the semantics we
want", but I have to say, it also doesn't make me go "ahh, ok".

> smp_acqrel_mb() -- A barrier that combines C/C++ acquire and release
>         semantics.  (C/C++ "acquire" orders a specific load against
>         subsequent loads and stores, while C/C++ "release" orders
>         a specific store against prior loads and stores.)

I don't think this is true. acquire+release is much stronger than what
you're looking for - it doesn't allow subsequent reads to move past
the write (because that would violate the acquire part). On x86, for
example, you'd need to have a locked cycle for smp_acqrel_mb().

So again, what are the guarantees you actually want? Describe those.
And then make a name.

I _think_ the guarantees you want is:
 - SMP write barrier
 - *local* read barrier for reads preceding the write.

but the problem is that the "preceding reads" part is really
specifically about the write that you had. The barrier should really
be attached to the *particular* write operation, it cannot be a
standalone barrier.

So it would *kind* of act like a "smp_wmb() + smp_rmb()", but the
problem is that a "smp_rmb()" doesn't really "attach" to the preceding
write.

This is analogous to a "acquire" operation: you cannot make an
"acquire" barrier, because it's not a barrier *between* two ops, it's
associated with one particular op.

So what I *think* you actually really really want is a "store with
release consistency, followed by a write barrier".

In TSO, afaik all stores have release consistency, and all writes are
ordered, which is why this is a no-op in TSO. And x86 also has that
"all stores have release consistency, and all writes are ordered"
model, even if TSO doesn't really describe the x86 model.

But on ARM64, for example, I think you'd really want the store itself
to be done with "stlr" (store with release), and then follow up with a
"dsb st" after that.

And notice how that requires you to mark the store itself. There is no
actual barrier *after* the store that does the optimized model.

Of course, it's entirely possible that it's not worth worrying about
this on ARM64, and that just doing it as a "normal store followed by a
full memory barrier" is good enough. But at least in *theory* a
microarchitecture might make it much cheaper to do a "store with
release consistency" followed by "write barrier".

Anyway, having talked exhaustively about exactly what semantics you
are after, I *think* the best model would be to just have a

  #define smp_store_with_release_semantics(x, y) ...

and use that *and* a "smp_wmb()" for this (possibly a special
"smp_wmb_after_release()" if that allows people to avoid double
barriers). On x86 (and TSO systems), the
smp_store_with_release_semantics() would be just a regular store, and
the smp_wmb() is obviously a no-op. Other platforms would end up doing
other things.

Hmm?

         Linus

^ permalink raw reply

* Please pull 'next' branch of 5xxx tree
From: Anatolij Gustschin @ 2013-11-04  0:11 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev

Hi Ben !

Please pull mpc5xxx patches for v3.13:

Fixes for build issues when LPB FIFO driver is configured as
a module, removal of #ifdefs in mpc512x DIU platform code and
a revert of recent changes to mpc52xx PIC driver. Wolfram
provided a better fix for PIC driver build issue popping up
when older gcc-4.3.5 is used.

Thanks,
Anatolij
 
The following changes since commit 50bd6153d1a68354a0a0c8bca1fe949fa8875875:

  powerpc/powernv: Code update interface (2013-10-30 16:12:02 +1100)

are available in the git repository at:

  git://git.denx.de/linux-2.6-agust.git next

for you to fetch changes up to 7e198197ec878c720af4dc35c49c0c6a99b83f9f:

  powerpc/mpc512x: remove unnecessary #if (2013-10-30 22:56:10 +0100)

----------------------------------------------------------------
Anatolij Gustschin (1):
      powerpc/52xx: fix build breakage for MPC5200 LPBFIFO module

Brian Norris (1):
      powerpc/mpc512x: remove unnecessary #if

Gerhard Sittig (1):
      powerpc/mpc512x: silence build warning upon disabled DIU

Wolfram Sang (1):
      Kind of revert "powerpc: 52xx: provide a default in mpc52xx_irqhost_map()"

 arch/powerpc/platforms/512x/mpc512x_shared.c |   18 +++++++-----------
 arch/powerpc/platforms/52xx/Kconfig          |    2 +-
 arch/powerpc/platforms/52xx/mpc52xx_pic.c    |    5 ++---
 arch/powerpc/sysdev/fsl_soc.h                |    3 ---
 4 files changed, 10 insertions(+), 18 deletions(-)

^ permalink raw reply

* [PATCH] powerpc/scom: Improve debugfs interface
From: Benjamin Herrenschmidt @ 2013-11-04  2:20 UTC (permalink / raw)
  To: linuxppc-dev

The current debugfs interface to scom is essentially unused
and racy. It uses two different files "address" and "data"
to perform accesses which is at best impractical for anything
but manual use by a developer.

This replaces it with an "access" file which represent the entire
scom address space which can be lseek/read/writen too.

This file only supports accesses that are 8 bytes aligned and
multiple of 8 bytes in size. The offset is logically the SCOM
address multiplied by 8.

Since nothing in userspace exploits that file at the moment, the ABI
change is a no-brainer.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---

This replaces the previously posted

"powerpc/scom: Replace debugfs interface with cleaner sysfs"

After discussion with Greg KH, we decided that debugfs was the
right place to put that facility.

 arch/powerpc/sysdev/scom.c | 136 +++++++++++++++++++++++++++------------------
 1 file changed, 81 insertions(+), 55 deletions(-)

diff --git a/arch/powerpc/sysdev/scom.c b/arch/powerpc/sysdev/scom.c
index 3963d99..6f5a8d1 100644
--- a/arch/powerpc/sysdev/scom.c
+++ b/arch/powerpc/sysdev/scom.c
@@ -25,6 +25,7 @@
 #include <asm/debug.h>
 #include <asm/prom.h>
 #include <asm/scom.h>
+#include <asm/uaccess.h>
 
 const struct scom_controller *scom_controller;
 EXPORT_SYMBOL_GPL(scom_controller);
@@ -98,61 +99,89 @@ EXPORT_SYMBOL_GPL(scom_map_device);
 #ifdef CONFIG_SCOM_DEBUGFS
 struct scom_debug_entry {
 	struct device_node *dn;
-	unsigned long addr;
-	scom_map_t map;
-	spinlock_t lock;
-	char name[8];
-	struct debugfs_blob_wrapper blob;
+	struct debugfs_blob_wrapper path;
+	char name[16];
 };
 
-static int scom_addr_set(void *data, u64 val)
-{
-	struct scom_debug_entry *ent = data;
-
-	ent->addr = 0;
-	scom_unmap(ent->map);
-
-	ent->map = scom_map(ent->dn, val, 1);
-	if (scom_map_ok(ent->map))
-		ent->addr = val;
-	else
-		return -EFAULT;
-
-	return 0;
-}
-
-static int scom_addr_get(void *data, u64 *val)
+static ssize_t scom_debug_read(struct file *filp, char __user *ubuf,
+			       size_t count, loff_t *ppos)
 {
-	struct scom_debug_entry *ent = data;
-	*val = ent->addr;
-	return 0;
+	struct scom_debug_entry *ent = filp->private_data;
+	u64 __user *ubuf64 = (u64 __user *)ubuf;
+	loff_t off = *ppos;
+	ssize_t done = 0; 
+	u64 reg, reg_cnt, val;
+	scom_map_t map;
+	int rc;
+
+	if (off < 0 || (off & 7) || (count & 7))
+		return -EINVAL;
+	reg = off >> 3;
+	reg_cnt = count >> 3;
+
+	map = scom_map(ent->dn, reg, reg_cnt);
+	if (!scom_map_ok(map))
+		return -ENXIO;
+
+	for (reg = 0; reg < reg_cnt; reg++) {
+		rc = scom_read(map, reg, &val);
+		if (!rc)
+			rc = put_user(val, ubuf64);
+		if (rc) {
+			if (!done)
+				done = rc;
+			break;
+		}
+		ubuf64++;
+		*ppos += 8;
+		done += 8;
+	}
+	scom_unmap(map);
+	return done;
 }
-DEFINE_SIMPLE_ATTRIBUTE(scom_addr_fops, scom_addr_get, scom_addr_set,
-			"0x%llx\n");
 
-static int scom_val_set(void *data, u64 val)
+static ssize_t scom_debug_write(struct file* filp, const char __user *ubuf,
+				size_t count, loff_t *ppos)
 {
-	struct scom_debug_entry *ent = data;
-
-	if (!scom_map_ok(ent->map))
-		return -EFAULT;
-
-	scom_write(ent->map, 0, val);
-
-	return 0;
+	struct scom_debug_entry *ent = filp->private_data;
+	u64 __user *ubuf64 = (u64 __user *)ubuf;
+	loff_t off = *ppos;
+	ssize_t done = 0; 
+	u64 reg, reg_cnt, val;
+	scom_map_t map;
+	int rc;
+
+	if (off < 0 || (off & 7) || (count & 7))
+		return -EINVAL;
+	reg = off >> 3;
+	reg_cnt = count >> 3;
+
+	map = scom_map(ent->dn, reg, reg_cnt);
+	if (!scom_map_ok(map))
+		return -ENXIO;
+
+	for (reg = 0; reg < reg_cnt; reg++) {
+		rc = get_user(val, ubuf64);
+		if (!rc)
+			rc = scom_write(map, reg,  val);
+		if (rc) {
+			if (!done)
+				done = rc;
+			break;
+		}
+		ubuf64++;
+		done += 8;
+	}
+	scom_unmap(map);
+	return done;
 }
 
-static int scom_val_get(void *data, u64 *val)
-{
-	struct scom_debug_entry *ent = data;
-
-	if (!scom_map_ok(ent->map))
-		return -EFAULT;
-
-	return scom_read(ent->map, 0, val);
-}
-DEFINE_SIMPLE_ATTRIBUTE(scom_val_fops, scom_val_get, scom_val_set,
-			"0x%llx\n");
+static const struct file_operations scom_debug_fops = {
+	.read =		scom_debug_read,
+	.write =	scom_debug_write,
+	.open =		simple_open,
+	.llseek =	default_llseek,
+};
 
 static int scom_debug_init_one(struct dentry *root, struct device_node *dn,
 			       int i)
@@ -165,11 +194,9 @@ static int scom_debug_init_one(struct dentry *root, struct device_node *dn,
 		return -ENOMEM;
 
 	ent->dn = of_node_get(dn);
-	ent->map = SCOM_MAP_INVALID;
-	spin_lock_init(&ent->lock);
-	snprintf(ent->name, 8, "scom%d", i);
-	ent->blob.data = (void*) dn->full_name;
-	ent->blob.size = strlen(dn->full_name);
+	snprintf(ent->name, 16, "%08x", i);
+	ent->path.data = (void*) dn->full_name;
+	ent->path.size = strlen(dn->full_name);
 
 	dir = debugfs_create_dir(ent->name, root);
 	if (!dir) {
@@ -178,9 +205,8 @@ static int scom_debug_init_one(struct dentry *root, struct device_node *dn,
 		return -1;
 	}
 
-	debugfs_create_file("addr", 0600, dir, ent, &scom_addr_fops);
-	debugfs_create_file("value", 0600, dir, ent, &scom_val_fops);
-	debugfs_create_blob("devspec", 0400, dir, &ent->blob);
+	debugfs_create_blob("devspec", 0400, dir, &ent->path);
+	debugfs_create_file("access", 0600, dir, ent, &scom_debug_fops);
 
 	return 0;
 }

^ permalink raw reply related

* Re: [V2 PATCH 1/3] powerpc: Enable emulate_step In Little Endian Mode
From: Paul Mackerras @ 2013-11-04  2:28 UTC (permalink / raw)
  To: Tom; +Cc: linuxppc-dev
In-Reply-To: <1383244738-5986-2-git-send-email-tommusta@gmail.com>

On Thu, Oct 31, 2013 at 01:38:56PM -0500, Tom wrote:
> From: Tom Musta <tommusta@gmail.com>
> 
> This patch modifies the endian chicken switch in the single step

"Chicken switch" is IBM jargon, perhaps best avoided where possible in
commit messages.

> emulation code (emulate_step()).  The old (big endian) code bailed
> early if a load or store instruction was to be emulated in little
> endian mode.
> 
> The new code modifies the check and only bails in a cross-endian
> situation (LE mode in a kernel compiled for BE and vice verse).

The patch adds #ifdefs inside code, which is generally frowned upon
in kernel code as it can make the code flow harder to see.  Perhaps
you could do something like

	if ((regs->msr & MSR_LE) != (MSR_KERNEL & MSR_LE))

as an alternative that wouldn't require an #ifdef.  Or, define a
symbol that is 0 in a BE kernel and MSR_LE in a LE kernel, and compare
to that.

Paul.

^ permalink raw reply

* Re: [V2 PATCH 2/3] powerpc: Fix Unaligned Fixed Point Loads and Stores
From: Benjamin Herrenschmidt @ 2013-11-04  2:34 UTC (permalink / raw)
  To: Tom; +Cc: linuxppc-dev
In-Reply-To: <1383244738-5986-3-git-send-email-tommusta@gmail.com>

On Thu, 2013-10-31 at 13:38 -0500, Tom wrote:
> From: Tom Musta <tommusta@gmail.com>
> 
> This patch modifies the unaligned access routines of the sstep.c
> module so that it properly reverses the bytes of storage operands
> in the little endian kernel kernel.

Do that patch differ from v1 ? (I already merged v1)

Cheers,
Ben.
 
> Signed-off-by: Tom Musta <tommusta@gmail.com>
> ---
>  arch/powerpc/lib/sstep.c |   45 +++++++++++++++++++++++++++++++++++++++++++++
>  1 files changed, 45 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
> index 7bfaa9d..c8743e1 100644
> --- a/arch/powerpc/lib/sstep.c
> +++ b/arch/powerpc/lib/sstep.c
> @@ -212,11 +212,19 @@ static int __kprobes read_mem_unaligned(unsigned long *dest, unsigned long ea,
>  {
>  	int err;
>  	unsigned long x, b, c;
> +#ifdef __LITTLE_ENDIAN__
> +	int len = nb; /* save a copy of the length for byte reversal */
> +#endif
>  
>  	/* unaligned, do this in pieces */
>  	x = 0;
>  	for (; nb > 0; nb -= c) {
> +#ifdef __LITTLE_ENDIAN__
> +		c = 1;
> +#endif
> +#ifdef __BIG_ENDIAN__
>  		c = max_align(ea);
> +#endif
>  		if (c > nb)
>  			c = max_align(nb);
>  		err = read_mem_aligned(&b, ea, c);
> @@ -225,7 +233,24 @@ static int __kprobes read_mem_unaligned(unsigned long *dest, unsigned long ea,
>  		x = (x << (8 * c)) + b;
>  		ea += c;
>  	}
> +#ifdef __LITTLE_ENDIAN__
> +	switch (len) {
> +	case 2:
> +		*dest = byterev_2(x);
> +		break;
> +	case 4:
> +		*dest = byterev_4(x);
> +		break;
> +#ifdef __powerpc64__
> +	case 8:
> +		*dest = byterev_8(x);
> +		break;
> +#endif
> +	}
> +#endif
> +#ifdef __BIG_ENDIAN__
>  	*dest = x;
> +#endif
>  	return 0;
>  }
>  
> @@ -273,9 +298,29 @@ static int __kprobes write_mem_unaligned(unsigned long val, unsigned long ea,
>  	int err;
>  	unsigned long c;
>  
> +#ifdef __LITTLE_ENDIAN__
> +	switch (nb) {
> +	case 2:
> +		val = byterev_2(val);
> +		break;
> +	case 4:
> +		val = byterev_4(val);
> +		break;
> +#ifdef __powerpc64__
> +	case 8:
> +		val = byterev_8(val);
> +		break;
> +#endif
> +	}
> +#endif
>  	/* unaligned or little-endian, do this in pieces */
>  	for (; nb > 0; nb -= c) {
> +#ifdef __LITTLE_ENDIAN__
> +		c = 1;
> +#endif
> +#ifdef __BIG_ENDIAN__
>  		c = max_align(ea);
> +#endif
>  		if (c > nb)
>  			c = max_align(nb);
>  		err = write_mem_aligned(val >> (nb - c) * 8, ea, c);

^ permalink raw reply

* Re: [V2 PATCH 3/3] powerpc: Fix Unaligned LE Floating Point Loads and Stores
From: Benjamin Herrenschmidt @ 2013-11-04  2:34 UTC (permalink / raw)
  To: Tom; +Cc: linuxppc-dev
In-Reply-To: <1383244738-5986-4-git-send-email-tommusta@gmail.com>

On Thu, 2013-10-31 at 13:38 -0500, Tom wrote:
> From: Tom Musta <tommusta@gmail.com>
> 
> This patch addresses unaligned single precision floating point loads
> and stores in the single-step code.  The old implementation
> improperly treated an 8 byte structure as an array of two 4 byte
> words, which is a classic little endian bug.

Do that patch differ from v1 ? I also already merged v1 of this
one (the only one I didn't merge is the emulate_step one)

Cheers,
Ben.

> Signed-off-by: Tom Musta <tommusta@gmail.com>
> ---
>  arch/powerpc/lib/sstep.c |   52 +++++++++++++++++++++++++++++++++++----------
>  1 files changed, 40 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
> index c8743e1..1cfd150 100644
> --- a/arch/powerpc/lib/sstep.c
> +++ b/arch/powerpc/lib/sstep.c
> @@ -355,22 +355,36 @@ static int __kprobes do_fp_load(int rn, int (*func)(int, unsigned long),
>  				struct pt_regs *regs)
>  {
>  	int err;
> -	unsigned long val[sizeof(double) / sizeof(long)];
> +	union {
> +		double dbl;
> +		unsigned long ul[2];
> +		struct {
> +#ifdef __BIG_ENDIAN__
> +			unsigned _pad_;
> +			unsigned word;
> +#endif
> +#ifdef __LITTLE_ENDIAN__
> +			unsigned word;
> +			unsigned _pad_;
> +#endif
> +		} single;
> +	} data;
>  	unsigned long ptr;
>  
>  	if (!address_ok(regs, ea, nb))
>  		return -EFAULT;
>  	if ((ea & 3) == 0)
>  		return (*func)(rn, ea);
> -	ptr = (unsigned long) &val[0];
> +	ptr = (unsigned long) &data.ul;
>  	if (sizeof(unsigned long) == 8 || nb == 4) {
> -		err = read_mem_unaligned(&val[0], ea, nb, regs);
> -		ptr += sizeof(unsigned long) - nb;
> +		err = read_mem_unaligned(&data.ul[0], ea, nb, regs);
> +		if (nb == 4)
> +			ptr = (unsigned long)&(data.single.word);
>  	} else {
>  		/* reading a double on 32-bit */
> -		err = read_mem_unaligned(&val[0], ea, 4, regs);
> +		err = read_mem_unaligned(&data.ul[0], ea, 4, regs);
>  		if (!err)
> -			err = read_mem_unaligned(&val[1], ea + 4, 4, regs);
> +			err = read_mem_unaligned(&data.ul[1], ea + 4, 4, regs);
>  	}
>  	if (err)
>  		return err;
> @@ -382,28 +396,42 @@ static int __kprobes do_fp_store(int rn, int (*func)(int, unsigned long),
>  				 struct pt_regs *regs)
>  {
>  	int err;
> -	unsigned long val[sizeof(double) / sizeof(long)];
> +	union {
> +		double dbl;
> +		unsigned long ul[2];
> +		struct {
> +#ifdef __BIG_ENDIAN__
> +			unsigned _pad_;
> +			unsigned word;
> +#endif
> +#ifdef __LITTLE_ENDIAN__
> +			unsigned word;
> +			unsigned _pad_;
> +#endif
> +		} single;
> +	} data;
>  	unsigned long ptr;
>  
>  	if (!address_ok(regs, ea, nb))
>  		return -EFAULT;
>  	if ((ea & 3) == 0)
>  		return (*func)(rn, ea);
> -	ptr = (unsigned long) &val[0];
> +	ptr = (unsigned long) &data.ul[0];
>  	if (sizeof(unsigned long) == 8 || nb == 4) {
> -		ptr += sizeof(unsigned long) - nb;
> +		if (nb == 4)
> +			ptr = (unsigned long)&(data.single.word);
>  		err = (*func)(rn, ptr);
>  		if (err)
>  			return err;
> -		err = write_mem_unaligned(val[0], ea, nb, regs);
> +		err = write_mem_unaligned(data.ul[0], ea, nb, regs);
>  	} else {
>  		/* writing a double on 32-bit */
>  		err = (*func)(rn, ptr);
>  		if (err)
>  			return err;
> -		err = write_mem_unaligned(val[0], ea, 4, regs);
> +		err = write_mem_unaligned(data.ul[0], ea, 4, regs);
>  		if (!err)
> -			err = write_mem_unaligned(val[1], ea + 4, 4, regs);
> +			err = write_mem_unaligned(data.ul[1], ea + 4, 4, regs);
>  	}
>  	return err;
>  }

^ permalink raw reply

* Re: [V2 PATCH 2/3] powerpc: Fix Unaligned Fixed Point Loads and Stores
From: Paul Mackerras @ 2013-11-04  2:43 UTC (permalink / raw)
  To: Tom; +Cc: linuxppc-dev
In-Reply-To: <1383244738-5986-3-git-send-email-tommusta@gmail.com>

On Thu, Oct 31, 2013 at 01:38:57PM -0500, Tom wrote:
> From: Tom Musta <tommusta@gmail.com>
> 
> This patch modifies the unaligned access routines of the sstep.c
> module so that it properly reverses the bytes of storage operands
> in the little endian kernel kernel.

This has rather a lot of #ifdefs inside function definitions, and for
little-endian it does the unaligned accesses one byte at a time.  You
could avoid all the #ifdefs if you define the combining function in an
endian-dependant way and make read_mem_unaligned look something like
this:

#ifdef __LITTLE_ENDIAN__
#define combine_pieces(x, b, c, nd)	((x) + ((b) << (8 * (nd))))
#else
#define combine_pieces(x, b, c, nd)	(((x) << (8 * (c))) + (b))
#endif

static int __kprobes read_mem_unaligned(unsigned long *dest, unsigned long ea,
					int nb, struct pt_regs *regs)
{
	int err;
	int nd;
	unsigned long x, b, c;

	/* unaligned, do this in pieces */
	x = 0;
	for (nd = 0; nd < nb; nd += c) {
		c = max_align(ea);
		if (c > nb - nd)
			c = max_align(nb - nd);
		err = read_mem_aligned(&b, ea, c);
		if (err)
			return err;
		x = combine_pieces(x, b, c, nd);
		ea += c;
	}
	*dest = x;
	return 0;
}

and do something analogous for write_mem_unaligned().

Paul.

^ permalink raw reply

* RE: [PATCHv2 1/8] ALSA: Add SAI SoC Digital Audio Interface driver.
From: Li Xiubo @ 2013-11-04  3:45 UTC (permalink / raw)
  To: Guangyu Chen
  Cc: mark.rutland@arm.com, alsa-devel@alsa-project.org,
	linux-doc@vger.kernel.org, tiwai@suse.de, Huan Wang,
	timur@tabi.org, perex@perex.cz, Shawn Guo, LW@KARO-electronics.de,
	linux@arm.linux.org.uk, linux-arm-kernel@lists.infradead.org,
	grant.likely@linaro.org, devicetree@vger.kernel.org,
	ian.campbell@citrix.com, pawel.moll@arm.com,
	swarren@wwwdotorg.org, rob.herring@calxeda.com,
	broonie@kernel.org, oskar@scara.com, Fabio Estevam,
	lgirdwood@gmail.com, linux-kernel@vger.kernel.org,
	rob@landley.net, Zhengxiong Jin, shawn.guo@linaro.org,
	linuxppc-dev@lists.ozlabs.org
In-Reply-To: <20131101085904.GC28401@MrMyself>

> > This adds Freescale SAI ASoC Audio support.
> > This implementation is only compatible with device tree definition.
> > Features:
> > o Supports playback/capture
> > o Supports 16/20/24 bit PCM
> > o Supports 8k - 96k sample rates
> > o Supports slave mode only.
> >
>=20
> Just for curiosity, I found there're quite a bit code actually support
> I2S master mode such as set_sysclk() and register configuration to FMT
> SND_SOC_DAIFMT_CBS_CFS.
>=20
> Is that really "supports slave mode only"? Or just you haven't make it
> properly work?
>=20

Sorry, this comments is not very consistent with the driver implementation.
On VF610 there only supports slave mode.

So, I will revise this.


> > diff --git a/sound/soc/fsl/Kconfig b/sound/soc/fsl/Kconfig index
> > b7ab71f..9a8851e 100644
> > --- a/sound/soc/fsl/Kconfig
> > +++ b/sound/soc/fsl/Kconfig
> > @@ -213,3 +213,19 @@ config SND_SOC_IMX_MC13783
> >  	select SND_SOC_IMX_PCM_DMA
> >
> >  endif # SND_IMX_SOC
> > +
> > +menuconfig SND_FSL_SOC
> > +	tristate "SoC Audio for Freescale FSL CPUs"
> > +	help
> > +	  Say Y or M if you want to add support for codecs attached to
> > +	  the FSL CPUs.
> > +
> > +	  This will enable Freeacale SAI, SGT15000 codec.
> > +
> > +if SND_FSL_SOC
>=20
> Why check this here? SAI should be a regular IP module which can be used
> by other SoC as well. We would never know if next i.MX SoC won't contain
> SAI.
>=20
> > +
> > +config SND_SOC_FSL_SAI
> > +	tristate
> > +	select SND_SOC_GENERIC_DMAENGINE_PCM
>=20
> I think you should keep SAI beside SSI/SPDIF directly.
>=20

That's right.

> > +
> > +endif # SND_FSL_SOC
> > diff --git a/sound/soc/fsl/Makefile b/sound/soc/fsl/Makefile index
> > 8db705b..e5acc03 100644
> > --- a/sound/soc/fsl/Makefile
> > +++ b/sound/soc/fsl/Makefile
> > @@ -56,3 +56,8 @@ obj-$(CONFIG_SND_SOC_IMX_SGTL5000) +=3D
> > snd-soc-imx-sgtl5000.o
> >  obj-$(CONFIG_SND_SOC_IMX_WM8962) +=3D snd-soc-imx-wm8962.o
> >  obj-$(CONFIG_SND_SOC_IMX_SPDIF) +=3D snd-soc-imx-spdif.o
> >  obj-$(CONFIG_SND_SOC_IMX_MC13783) +=3D snd-soc-imx-mc13783.o
> > +
> > +# FSL ARM SAI/SGT15000 Platform Support
>=20
> Should be SGTL5000, not SGT1. And SAI should not be used with SGTL5000
> only, it's a bit confusing to mention SGTL5000 here.
>=20

Yes, this will be revised then.

> > +snd-soc-fsl-sai-objs :=3D fsl-sai.o
>=20
> And I think it should be better to put it along with fsl-ssi.o and fsl-
> spdif.o
>=20

But fsl-ssi.o and fsl-spdif.o is based PowrePC platform? Which we can see f=
rom the comments.
While fsl-sai.o is base ARM platform.
Despite whether there is any different between ARM and PPC or not, the comm=
ents won't be correct.


> > +
> > +obj-$(CONFIG_SND_SOC_FSL_SAI) +=3D snd-soc-fsl-sai.o
>=20
> Ditto
>=20
> > diff --git a/sound/soc/fsl/fsl-sai.c b/sound/soc/fsl/fsl-sai.c new
> > file mode 100644 index 0000000..bb57e67
> > --- /dev/null
> > +++ b/sound/soc/fsl/fsl-sai.c
> > @@ -0,0 +1,472 @@
> > +/*
> > + * Freescale SAI ALSA SoC Digital Audio Interface driver.
> > + *
> > + * Copyright 2012-2013 Freescale Semiconductor, Inc.
> > + *
> > + * This program is free software; you can redistribute  it and/or
> > +modify it
> > + * under  the terms of  the GNU General  Public License as published
> > +by the
> > + * Free Software Foundation;  either version 2 of the  License, or
> > +(at your
> > + * option) any later version.
>=20
> There're too many double space inside. Could you pls revise it?
>=20

Yes, see the next version.

> > + *
> > + */
> > +
> > +#include <linux/clk.h>
> > +#include <linux/module.h>
> > +#include <linux/slab.h>
> > +#include <linux/of_address.h>
> > +#include <sound/core.h>
> > +#include <sound/pcm_params.h>
> > +#include <linux/delay.h>
> > +#include <linux/dmaengine.h>
> > +#include <sound/dmaengine_pcm.h>
>=20
> I think it's better to keep <sound/xxx.h> together. And basically we can
> keep header files ordered by alphabet.
>=20

Okey.

> > +
> > +#include "fsl-sai.h"
> > +
> > +static int fsl_sai_set_dai_sysclk_tr(struct snd_soc_dai *cpu_dai,
> > +		int clk_id, unsigned int freq, int fsl_dir) {
> > +	u32 val_cr2, reg_cr2;
> > +	struct fsl_sai *sai =3D snd_soc_dai_get_drvdata(cpu_dai);
> > +
> > +	if (fsl_dir =3D=3D FSL_FMT_TRANSMITTER)
> > +		reg_cr2 =3D FSL_SAI_TCR2;
> > +	else
> > +		reg_cr2 =3D FSL_SAI_RCR2;
> > +
> > +	val_cr2 =3D readl(sai->base + reg_cr2);
> > +	switch (clk_id) {
> > +	case FSL_SAI_CLK_BUS:
> > +		val_cr2 &=3D ~FSL_SAI_CR2_MSEL_MASK;
> > +		val_cr2 |=3D FSL_SAI_CR2_MSEL_BUS;
> > +		break;
> > +	case FSL_SAI_CLK_MAST1:
> > +		val_cr2 &=3D ~FSL_SAI_CR2_MSEL_MASK;
> > +		val_cr2 |=3D FSL_SAI_CR2_MSEL_MCLK1;
> > +		break;
> > +	case FSL_SAI_CLK_MAST2:
> > +		val_cr2 &=3D ~FSL_SAI_CR2_MSEL_MASK;
> > +		val_cr2 |=3D FSL_SAI_CR2_MSEL_MCLK2;
> > +		break;
> > +	case FSL_SAI_CLK_MAST3:
> > +		val_cr2 &=3D ~FSL_SAI_CR2_MSEL_MASK;
> > +		val_cr2 |=3D FSL_SAI_CR2_MSEL_MCLK3;
> > +		break;
> > +	default:
> > +		return -EINVAL;
> > +	}
> > +	writel(val_cr2, sai->base + reg_cr2);
> > +
> > +	return 0;
> > +}
> > +
> > +static int fsl_sai_set_dai_sysclk(struct snd_soc_dai *cpu_dai,
> > +		int clk_id, unsigned int freq, int dir) {
> > +	int ret;
> > +
> > +	if (dir =3D=3D SND_SOC_CLOCK_IN)
> > +		return 0;
> > +
> > +	ret =3D fsl_sai_set_dai_sysclk_tr(cpu_dai, clk_id, freq,
> > +					FSL_FMT_TRANSMITTER);
> > +	if (ret) {
> > +		dev_err(cpu_dai->dev,
> > +				"Cannot set sai's transmitter sysclk: %d\n",
> > +				ret);
> > +		return ret;
> > +	}
> > +
> > +	ret =3D fsl_sai_set_dai_sysclk_tr(cpu_dai, clk_id, freq,
> > +					FSL_FMT_RECEIVER);
> > +	if (ret) {
> > +		dev_err(cpu_dai->dev,
> > +				"Cannot set sai's receiver sysclk: %d\n",
> > +				ret);
> > +		return ret;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +static int fsl_sai_set_dai_clkdiv(struct snd_soc_dai *cpu_dai,
> > +		int div_id, int div)
> > +{
> > +	struct fsl_sai *sai =3D snd_soc_dai_get_drvdata(cpu_dai);
> > +	u32 tcr2, rcr2;
> > +
> > +	if (div_id =3D=3D FSL_SAI_TX_DIV) {
> > +		tcr2 =3D readl(sai->base + FSL_SAI_TCR2);
> > +		tcr2 &=3D ~FSL_SAI_CR2_DIV_MASK;
> > +		tcr2 |=3D FSL_SAI_CR2_DIV(div);
> > +		writel(tcr2, sai->base + FSL_SAI_TCR2);
> > +
> > +	} else if (div_id =3D=3D FSL_SAI_RX_DIV) {
> > +		rcr2 =3D readl(sai->base + FSL_SAI_RCR2);
> > +		rcr2 &=3D ~FSL_SAI_CR2_DIV_MASK;
> > +		rcr2 |=3D FSL_SAI_CR2_DIV(div);
> > +		writel(rcr2, sai->base + FSL_SAI_RCR2);
> > +
> > +	} else
> > +		return -EINVAL;
>=20
> It would look better if using switch-case. And the last 'else'
> could be 'default:'.
>=20

I'll think it over.

> > +
> > +	return 0;
> > +}
> > +
> > +static int fsl_sai_set_dai_fmt_tr(struct snd_soc_dai *cpu_dai,
> > +				unsigned int fmt, int fsl_dir)
> > +{
> > +	u32 val_cr2, val_cr3, val_cr4, reg_cr2, reg_cr3, reg_cr4;
> > +	struct fsl_sai *sai =3D snd_soc_dai_get_drvdata(cpu_dai);
> > +
> > +	if (fsl_dir =3D=3D FSL_FMT_TRANSMITTER) {
> > +		reg_cr2 =3D FSL_SAI_TCR2;
> > +		reg_cr3 =3D FSL_SAI_TCR3;
> > +		reg_cr4 =3D FSL_SAI_TCR4;
> > +	} else {
> > +		reg_cr2 =3D FSL_SAI_RCR2;
> > +		reg_cr3 =3D FSL_SAI_RCR3;
> > +		reg_cr4 =3D FSL_SAI_RCR4;
> > +	}
> > +
> > +	val_cr2 =3D readl(sai->base + reg_cr2);
> > +	val_cr3 =3D readl(sai->base + reg_cr3);
> > +	val_cr4 =3D readl(sai->base + reg_cr4);
> > +
> > +	if (sai->fbt =3D=3D FSL_SAI_FBT_MSB)
> > +		val_cr4 |=3D FSL_SAI_CR4_MF;
> > +	else if (sai->fbt =3D=3D FSL_SAI_FBT_LSB)
> > +		val_cr4 &=3D ~FSL_SAI_CR4_MF;
> > +
> > +	switch (fmt & SND_SOC_DAIFMT_FORMAT_MASK) {
> > +	case SND_SOC_DAIFMT_I2S:
> > +		val_cr4 |=3D FSL_SAI_CR4_FSE;
> > +		val_cr4 |=3D FSL_SAI_CR4_FSP;
> > +		break;
> > +	default:
> > +		return -EINVAL;
> > +	}
> > +
> > +	switch (fmt & SND_SOC_DAIFMT_INV_MASK) {
> > +	case SND_SOC_DAIFMT_IB_IF:
> > +		val_cr4 |=3D FSL_SAI_CR4_FSP;
> > +		val_cr2 &=3D ~FSL_SAI_CR2_BCP;
> > +		break;
> > +	case SND_SOC_DAIFMT_IB_NF:
> > +		val_cr4 &=3D ~FSL_SAI_CR4_FSP;
> > +		val_cr2 &=3D ~FSL_SAI_CR2_BCP;
> > +		break;
> > +	case SND_SOC_DAIFMT_NB_IF:
> > +		val_cr4 |=3D FSL_SAI_CR4_FSP;
> > +		val_cr2 |=3D FSL_SAI_CR2_BCP;
> > +		break;
> > +	case SND_SOC_DAIFMT_NB_NF:
> > +		val_cr4 &=3D ~FSL_SAI_CR4_FSP;
> > +		val_cr2 |=3D FSL_SAI_CR2_BCP;
> > +		break;
> > +	default:
> > +		return -EINVAL;
> > +	}
> > +
> > +	switch (fmt & SND_SOC_DAIFMT_MASTER_MASK) {
> > +	case SND_SOC_DAIFMT_CBS_CFS:
> > +		val_cr2 |=3D FSL_SAI_CR2_BCD_MSTR;
> > +		val_cr4 |=3D FSL_SAI_CR4_FSD_MSTR;
> > +		break;
> > +	case SND_SOC_DAIFMT_CBM_CFM:
> > +		val_cr2 &=3D ~FSL_SAI_CR2_BCD_MSTR;
> > +		val_cr4 &=3D ~FSL_SAI_CR4_FSD_MSTR;
> > +		break;
> > +	default:
> > +		return -EINVAL;
> > +	}
> > +
> > +	val_cr3 |=3D FSL_SAI_CR3_TRCE;
> > +
> > +	if (fsl_dir =3D=3D FSL_FMT_RECEIVER)
> > +		val_cr2 |=3D FSL_SAI_CR2_SYNC;
> > +
> > +	writel(val_cr2, sai->base + reg_cr2);
> > +	writel(val_cr3, sai->base + reg_cr3);
> > +	writel(val_cr4, sai->base + reg_cr4);
> > +
> > +	return 0;
>=20
> > +
>=20
> Pls drop this extra line.
>

See the next version.

>=20
> > +}
> > +
> > +static int fsl_sai_set_dai_fmt(struct snd_soc_dai *cpu_dai, unsigned
> > +int fmt) {
> > +	int ret;
> > +
> > +	ret =3D fsl_sai_set_dai_fmt_tr(cpu_dai, fmt, FSL_FMT_TRANSMITTER);
> > +	if (ret) {
> > +		dev_err(cpu_dai->dev,
> > +				"Cannot set sai's transmitter format: %d\n",
> > +				ret);
> > +		return ret;
> > +	}
> > +
> > +	ret =3D fsl_sai_set_dai_fmt_tr(cpu_dai, fmt, FSL_FMT_RECEIVER);
> > +	if (ret) {
> > +		dev_err(cpu_dai->dev,
> > +				"Cannot set sai's receiver format: %d\n",
> > +				ret);
> > +		return ret;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +static int fsl_sai_set_dai_tdm_slot(struct snd_soc_dai *cpu_dai,
> > +		unsigned int tx_mask, unsigned int rx_mask,
> > +		int slots, int slot_width)
> > +{
> > +	struct fsl_sai *sai =3D snd_soc_dai_get_drvdata(cpu_dai);
> > +	u32 tcr4, rcr4;
> > +
> > +	tcr4 =3D readl(sai->base + FSL_SAI_TCR4);
> > +	tcr4 &=3D ~FSL_SAI_CR4_FRSZ_MASK;
> > +	tcr4 |=3D FSL_SAI_CR4_FRSZ(2);
> > +	writel(tcr4, sai->base + FSL_SAI_TCR4);
> > +	writel(tx_mask, sai->base + FSL_SAI_TMR);
> > +
> > +	rcr4 =3D readl(sai->base + FSL_SAI_RCR4);
> > +	rcr4 &=3D ~FSL_SAI_CR4_FRSZ_MASK;
> > +	rcr4 |=3D FSL_SAI_CR4_FRSZ(2);
> > +	writel(rcr4, sai->base + FSL_SAI_RCR4);
> > +	writel(rx_mask, sai->base + FSL_SAI_RMR);
> > +
> > +	return 0;
> > +}
> > +
> > +static int fsl_sai_hw_params(struct snd_pcm_substream *substream,
> > +		struct snd_pcm_hw_params *params,
> > +		struct snd_soc_dai *cpu_dai)
> > +{
> > +	u32 val_cr4, val_cr5, reg_cr4, reg_cr5, word_width;
> > +	struct fsl_sai *sai =3D snd_soc_dai_get_drvdata(cpu_dai);
> > +
> > +	if (substream->stream =3D=3D SNDRV_PCM_STREAM_PLAYBACK) {
> > +		reg_cr4 =3D FSL_SAI_TCR4;
> > +		reg_cr5 =3D FSL_SAI_TCR5;
> > +	} else {
> > +		reg_cr4 =3D FSL_SAI_RCR4;
> > +		reg_cr5 =3D FSL_SAI_RCR5;
> > +	}
> > +
> > +	val_cr4 =3D readl(sai->base + reg_cr4);
> > +	val_cr4 &=3D ~FSL_SAI_CR4_SYWD_MASK;
> > +
> > +	val_cr5 =3D readl(sai->base + reg_cr5);
> > +	val_cr5 &=3D ~FSL_SAI_CR5_WNW_MASK;
> > +	val_cr5 &=3D ~FSL_SAI_CR5_W0W_MASK;
> > +	val_cr5 &=3D ~FSL_SAI_CR5_FBT_MASK;
> > +
> > +	switch (params_format(params)) {
> > +	case SNDRV_PCM_FORMAT_S16_LE:
> > +		word_width =3D 16;
> > +		break;
> > +	case SNDRV_PCM_FORMAT_S20_3LE:
> > +		word_width =3D 20;
> > +		break;
> > +	case SNDRV_PCM_FORMAT_S24_LE:
> > +		word_width =3D 24;
> > +		break;
> > +	default:
> > +		return -EINVAL;
> > +	}
> > +
> > +	val_cr4 |=3D FSL_SAI_CR4_SYWD(word_width);
> > +	val_cr5 |=3D FSL_SAI_CR5_WNW(word_width);
> > +	val_cr5 |=3D FSL_SAI_CR5_W0W(word_width);
> > +
> > +	if (sai->fbt =3D=3D FSL_SAI_FBT_MSB)
> > +		val_cr5 |=3D FSL_SAI_CR5_FBT(word_width - 1);
> > +	else if (sai->fbt =3D=3D FSL_SAI_FBT_LSB)
> > +		val_cr5 |=3D FSL_SAI_CR5_FBT(0);
> > +
> > +	writel(val_cr4, sai->base + reg_cr4);
> > +	writel(val_cr5, sai->base + reg_cr5);
> > +
> > +	return 0;
> > +}
> > +
> > +static int fsl_sai_trigger(struct snd_pcm_substream *substream, int
> cmd,
> > +		struct snd_soc_dai *dai)
> > +{
> > +	struct fsl_sai *sai =3D snd_soc_dai_get_drvdata(dai);
> > +	unsigned int tcsr, rcsr;
> > +
> > +	tcsr =3D readl(sai->base + FSL_SAI_TCSR);
> > +	rcsr =3D readl(sai->base + FSL_SAI_RCSR);
> > +
> > +	if (substream->stream =3D=3D SNDRV_PCM_STREAM_PLAYBACK) {
> > +		tcsr |=3D FSL_SAI_CSR_FRDE;
> > +		rcsr &=3D ~FSL_SAI_CSR_FRDE;
> > +	} else {
> > +		rcsr |=3D FSL_SAI_CSR_FRDE;
> > +		tcsr &=3D ~FSL_SAI_CSR_FRDE;
> > +	}
> > +
> > +	switch (cmd) {
> > +	case SNDRV_PCM_TRIGGER_START:
> > +	case SNDRV_PCM_TRIGGER_RESUME:
> > +	case SNDRV_PCM_TRIGGER_PAUSE_RELEASE:
> > +		tcsr |=3D FSL_SAI_CSR_TERE;
> > +		rcsr |=3D FSL_SAI_CSR_TERE;
> > +		writel(rcsr, sai->base + FSL_SAI_RCSR);
> > +		udelay(10);
>=20
> Does SAI really needs this udelay() here? Required by IP's operation flow=
?
> If so, I think it's better to add comments here to explain it.
>=20
+++++++++++++++++
Synchronous mode
The SAI transmitter and receiver can be configured to operate with synchron=
ous bit clock
and frame sync.

1,
If the transmitter bit clock and frame sync are to be used by both the tran=
smitter and
receiver:
	...
* It is recommended that the transmitter is the last enabled and the first =
disabled.

2,
If the receiver bit clock and frame sync are to be used by both the transmi=
tter and
receiver:
	...
* It is recommended that the receiver is the last enabled and the first dis=
abled.
------------------

So I think the delay is needed, and I still tunning it.



> > +		writel(tcsr, sai->base + FSL_SAI_TCSR);
> > +		break;
> > +
> > +	case SNDRV_PCM_TRIGGER_STOP:
> > +	case SNDRV_PCM_TRIGGER_SUSPEND:
> > +	case SNDRV_PCM_TRIGGER_PAUSE_PUSH:
> > +		if (!(dai->playback_active || dai->capture_active)) {
> > +			tcsr &=3D ~FSL_SAI_CSR_TERE;
> > +			rcsr &=3D ~FSL_SAI_CSR_TERE;
> > +		}
> > +		writel(rcsr, sai->base + FSL_SAI_RCSR);
> > +		udelay(10);
> > +		writel(tcsr, sai->base + FSL_SAI_TCSR);
> > +		break;
> > +	default:
> > +		return -EINVAL;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +static const struct snd_soc_dai_ops fsl_sai_pcm_dai_ops =3D {
> > +	.set_sysclk	=3D fsl_sai_set_dai_sysclk,
> > +	.set_clkdiv	=3D fsl_sai_set_dai_clkdiv,
> > +	.set_fmt	=3D fsl_sai_set_dai_fmt,
> > +	.set_tdm_slot	=3D fsl_sai_set_dai_tdm_slot,
> > +	.hw_params	=3D fsl_sai_hw_params,
> > +	.trigger	=3D fsl_sai_trigger,
> > +};
> > +
> > +static int fsl_sai_dai_probe(struct snd_soc_dai *dai) {
> > +	int ret;
> > +	struct fsl_sai *sai =3D dev_get_drvdata(dai->dev);
> > +
> > +	ret =3D clk_prepare_enable(sai->clk);
> > +	if (ret)
> > +		return ret;
> > +
> > +	writel(0x0, sai->base + FSL_SAI_RCSR);
> > +	writel(0x0, sai->base + FSL_SAI_TCSR);
> > +	writel(sai->dma_params_tx.maxburst, sai->base + FSL_SAI_TCR1);
> > +	writel(sai->dma_params_rx.maxburst, sai->base + FSL_SAI_RCR1);
> > +
> > +	dai->playback_dma_data =3D &sai->dma_params_tx;
> > +	dai->capture_dma_data =3D &sai->dma_params_rx;
> > +
> > +	snd_soc_dai_set_drvdata(dai, sai);
> > +
> > +	return 0;
> > +}
> > +
> > +int fsl_sai_dai_remove(struct snd_soc_dai *dai)
>=20
> static
>=20

Yes.

> > +{
> > +	struct fsl_sai *sai =3D dev_get_drvdata(dai->dev);
> > +
> > +	clk_disable_unprepare(sai->clk);
> > +
> > +	return 0;
> > +}
> > +
> > +static struct snd_soc_dai_driver fsl_sai_dai =3D {
> > +	.probe =3D fsl_sai_dai_probe,
> > +	.remove =3D fsl_sai_dai_remove,
> > +	.playback =3D {
> > +		.channels_min =3D 1,
> > +		.channels_max =3D 2,
> > +		.rates =3D SNDRV_PCM_RATE_8000_96000,
> > +		.formats =3D FSL_SAI_FORMATS,
> > +	},
> > +	.capture =3D {
> > +		.channels_min =3D 1,
> > +		.channels_max =3D 2,
> > +		.rates =3D SNDRV_PCM_RATE_8000_96000,
> > +		.formats =3D FSL_SAI_FORMATS,
> > +	},
> > +	.ops =3D &fsl_sai_pcm_dai_ops,
> > +};
> > +
> > +static const struct snd_soc_component_driver fsl_component =3D {
> > +	.name           =3D "fsl-sai",
> > +};
> > +
> > +static int fsl_sai_probe(struct platform_device *pdev) {
> > +	struct resource *res;
> > +	struct fsl_sai *sai;
> > +	int ret =3D 0;
> > +
> > +	sai =3D devm_kzalloc(&pdev->dev, sizeof(*sai), GFP_KERNEL);
> > +	if (!sai)
> > +		return -ENOMEM;
> > +
> > +	sai->fbt =3D FSL_SAI_FBT_MSB;
> > +
> > +	res =3D platform_get_resource(pdev, IORESOURCE_MEM, 0);
> > +	sai->base =3D devm_ioremap_resource(&pdev->dev, res);
> > +	if (IS_ERR(sai->base))
> > +		return PTR_ERR(sai->base);
> > +
> > +	sai->clk =3D devm_clk_get(&pdev->dev, "sai");
> > +	if (IS_ERR(sai->clk)) {
> > +		dev_err(&pdev->dev, "Cannot get sai's clock: %d\n", ret);
> > +		return PTR_ERR(sai->clk);
> > +	}
> > +
> > +	sai->dma_params_rx.addr =3D res->start + FSL_SAI_RDR;
> > +	sai->dma_params_rx.maxburst =3D 6;
> > +
> > +	sai->dma_params_tx.addr =3D res->start + FSL_SAI_TDR;
> > +	sai->dma_params_tx.maxburst =3D 6;
> > +
> > +	platform_set_drvdata(pdev, sai);
> > +
> > +	ret =3D devm_snd_soc_register_component(&pdev->dev, &fsl_component,
> > +			&fsl_sai_dai, 1);
> > +	if (ret)
> > +		return ret;
> > +
> > +	ret =3D snd_dmaengine_pcm_register(&pdev->dev, NULL,
> > +			SND_DMAENGINE_PCM_FLAG_NO_RESIDUE);
> > +	if (ret)
> > +		return ret;
> > +
> > +	return 0;
> > +}
> > +
> > +static int fsl_sai_remove(struct platform_device *pdev) {
> > +	snd_dmaengine_pcm_unregister(&pdev->dev);
> > +
> > +	return 0;
> > +}
> > +
> > +static const struct of_device_id fsl_sai_ids[] =3D {
> > +	{ .compatible =3D "fsl,vf610-sai", },
>=20
> > +	{ /*sentinel*/ }
>=20
> I think this could be left in blank without comments inside.
> And if you really want to add it, pls add like: /* sentinel */
> 						  ^        ^

Okey.

> > +};
> > +
> > +static struct platform_driver fsl_sai_driver =3D {
> > +	.probe =3D fsl_sai_probe,
> > +	.remove =3D fsl_sai_remove,
> > +
> > +	.driver =3D {
> > +		.name =3D "fsl-sai",
> > +		.owner =3D THIS_MODULE,
> > +		.of_match_table =3D fsl_sai_ids,
> > +	},
> > +};
> > +module_platform_driver(fsl_sai_driver);
> > +
> > +MODULE_AUTHOR("Xiubo Li, <Li.Xiubo@freescale.com>");
> > +MODULE_AUTHOR("Alison Wang, <b18965@freescale.com>");
> > +MODULE_DESCRIPTION("Freescale Soc SAI Interface");
> > +MODULE_LICENSE("GPL");
>=20
> Should be better if added:
> MODULE_ALIAS("platform:fsl-sai");
>=20
> This would support module auto-load feature in some Linux-OS.
>=20

I will think it over.

BRS,
Xiubo

^ permalink raw reply

* RE: [PATCH v5 4/4] powerpc/85xx: add sysfs for pw20 state and altivec idle
From: Dongsheng Wang @ 2013-11-04  4:04 UTC (permalink / raw)
  To: Scott Wood; +Cc: linuxppc-dev@lists.ozlabs.org, Bharat Bhushan
In-Reply-To: <1382124094.7979.885.camel@snotra.buserror.net>

DQo+IC0tLS0tT3JpZ2luYWwgTWVzc2FnZS0tLS0tDQo+IEZyb206IFdhbmcgRG9uZ3NoZW5nLUI0
MDUzNA0KPiBTZW50OiBNb25kYXksIE9jdG9iZXIgMjEsIDIwMTMgMTE6MTEgQU0NCj4gVG86IFdv
b2QgU2NvdHQtQjA3NDIxDQo+IENjOiBCaHVzaGFuIEJoYXJhdC1SNjU3Nzc7IGxpbnV4cHBjLWRl
dkBsaXN0cy5vemxhYnMub3JnDQo+IFN1YmplY3Q6IFJFOiBbUEFUQ0ggdjUgNC80XSBwb3dlcnBj
Lzg1eHg6IGFkZCBzeXNmcyBmb3IgcHcyMCBzdGF0ZSBhbmQNCj4gYWx0aXZlYyBpZGxlDQo+IA0K
PiANCj4gDQo+ID4gLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0NCj4gPiBGcm9tOiBXb29kIFNj
b3R0LUIwNzQyMQ0KPiA+IFNlbnQ6IFNhdHVyZGF5LCBPY3RvYmVyIDE5LCAyMDEzIDM6MjIgQU0N
Cj4gPiBUbzogV2FuZyBEb25nc2hlbmctQjQwNTM0DQo+ID4gQ2M6IEJodXNoYW4gQmhhcmF0LVI2
NTc3NzsgV29vZCBTY290dC1CMDc0MjE7IGxpbnV4cHBjLQ0KPiA+IGRldkBsaXN0cy5vemxhYnMu
b3JnDQo+ID4gU3ViamVjdDogUmU6IFtQQVRDSCB2NSA0LzRdIHBvd2VycGMvODV4eDogYWRkIHN5
c2ZzIGZvciBwdzIwIHN0YXRlIGFuZA0KPiA+IGFsdGl2ZWMgaWRsZQ0KPiA+DQo+ID4gT24gVGh1
LCAyMDEzLTEwLTE3IGF0IDIyOjAyIC0wNTAwLCBXYW5nIERvbmdzaGVuZy1CNDA1MzQgd3JvdGU6
DQo+ID4gPg0KPiA+ID4gPiAtLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KPiA+ID4gPiBGcm9t
OiBCaHVzaGFuIEJoYXJhdC1SNjU3NzcNCj4gPiA+ID4gU2VudDogVGh1cnNkYXksIE9jdG9iZXIg
MTcsIDIwMTMgMjo0NiBQTQ0KPiA+ID4gPiBUbzogV2FuZyBEb25nc2hlbmctQjQwNTM0OyBXb29k
IFNjb3R0LUIwNzQyMQ0KPiA+ID4gPiBDYzogbGludXhwcGMtZGV2QGxpc3RzLm96bGFicy5vcmcN
Cj4gPiA+ID4gU3ViamVjdDogUkU6IFtQQVRDSCB2NSA0LzRdIHBvd2VycGMvODV4eDogYWRkIHN5
c2ZzIGZvciBwdzIwIHN0YXRlDQo+ID4gPiA+IGFuZCBhbHRpdmVjIGlkbGUNCj4gPiA+ID4NCj4g
PiA+ID4NCj4gPiA+ID4NCj4gPiA+ID4gPiA+ID4gLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0N
Cj4gPiA+ID4gPiA+ID4gRnJvbTogV2FuZyBEb25nc2hlbmctQjQwNTM0DQo+ID4gPiA+ID4gPiA+
IFNlbnQ6IFRodXJzZGF5LCBPY3RvYmVyIDE3LCAyMDEzIDExOjIyIEFNDQo+ID4gPiA+ID4gPiA+
IFRvOiBCaHVzaGFuIEJoYXJhdC1SNjU3Nzc7IFdvb2QgU2NvdHQtQjA3NDIxDQo+ID4gPiA+ID4g
PiA+IENjOiBsaW51eHBwYy1kZXZAbGlzdHMub3psYWJzLm9yZw0KPiA+ID4gPiA+ID4gPiBTdWJq
ZWN0OiBSRTogW1BBVENIIHY1IDQvNF0gcG93ZXJwYy84NXh4OiBhZGQgc3lzZnMgZm9yIHB3MjAN
Cj4gPiA+ID4gPiA+ID4gc3RhdGUgYW5kIGFsdGl2ZWMgaWRsZQ0KPiA+ID4gPiA+ID4gPg0KPiA+
ID4gPiA+ID4gPg0KPiA+ID4gPiA+ID4gPg0KPiA+ID4gPiA+ID4gPiA+IC0tLS0tT3JpZ2luYWwg
TWVzc2FnZS0tLS0tDQo+ID4gPiA+ID4gPiA+ID4gRnJvbTogQmh1c2hhbiBCaGFyYXQtUjY1Nzc3
DQo+ID4gPiA+ID4gPiA+ID4gU2VudDogVGh1cnNkYXksIE9jdG9iZXIgMTcsIDIwMTMgMTE6MjAg
QU0NCj4gPiA+ID4gPiA+ID4gPiBUbzogV2FuZyBEb25nc2hlbmctQjQwNTM0OyBXb29kIFNjb3R0
LUIwNzQyMQ0KPiA+ID4gPiA+ID4gPiA+IENjOiBsaW51eHBwYy1kZXZAbGlzdHMub3psYWJzLm9y
Zw0KPiA+ID4gPiA+ID4gPiA+IFN1YmplY3Q6IFJFOiBbUEFUQ0ggdjUgNC80XSBwb3dlcnBjLzg1
eHg6IGFkZCBzeXNmcyBmb3INCj4gPiA+ID4gPiA+ID4gPiBwdzIwIHN0YXRlIGFuZCBhbHRpdmVj
IGlkbGUNCj4gPiA+ID4gPiA+ID4gPg0KPiA+ID4gPiA+ID4gPiA+DQo+ID4gPiA+ID4gPiA+ID4N
Cj4gPiA+ID4gPiA+ID4gPiA+IC0tLS0tT3JpZ2luYWwgTWVzc2FnZS0tLS0tDQo+ID4gPiA+ID4g
PiA+ID4gPiBGcm9tOiBXYW5nIERvbmdzaGVuZy1CNDA1MzQNCj4gPiA+ID4gPiA+ID4gPiA+IFNl
bnQ6IFRodXJzZGF5LCBPY3RvYmVyIDE3LCAyMDEzIDg6MTYgQU0NCj4gPiA+ID4gPiA+ID4gPiA+
IFRvOiBCaHVzaGFuIEJoYXJhdC1SNjU3Nzc7IFdvb2QgU2NvdHQtQjA3NDIxDQo+ID4gPiA+ID4g
PiA+ID4gPiBDYzogbGludXhwcGMtZGV2QGxpc3RzLm96bGFicy5vcmcNCj4gPiA+ID4gPiA+ID4g
PiA+IFN1YmplY3Q6IFJFOiBbUEFUQ0ggdjUgNC80XSBwb3dlcnBjLzg1eHg6IGFkZCBzeXNmcyBm
b3INCj4gPiA+ID4gPiA+ID4gPiA+IHB3MjAgc3RhdGUgYW5kIGFsdGl2ZWMgaWRsZQ0KPiA+ID4g
PiA+ID4gPiA+ID4NCj4gPiA+ID4gPiA+ID4gPiA+DQo+ID4gPiA+ID4gPiA+ID4gPg0KPiA+ID4g
PiA+ID4gPiA+ID4gPiAtLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KPiA+ID4gPiA+ID4gPiA+
ID4gPiBGcm9tOiBCaHVzaGFuIEJoYXJhdC1SNjU3NzcNCj4gPiA+ID4gPiA+ID4gPiA+ID4gU2Vu
dDogVGh1cnNkYXksIE9jdG9iZXIgMTcsIDIwMTMgMTowMSBBTQ0KPiA+ID4gPiA+ID4gPiA+ID4g
PiBUbzogV2FuZyBEb25nc2hlbmctQjQwNTM0OyBXb29kIFNjb3R0LUIwNzQyMQ0KPiA+ID4gPiA+
ID4gPiA+ID4gPiBDYzogbGludXhwcGMtZGV2QGxpc3RzLm96bGFicy5vcmcNCj4gPiA+ID4gPiA+
ID4gPiA+ID4gU3ViamVjdDogUkU6IFtQQVRDSCB2NSA0LzRdIHBvd2VycGMvODV4eDogYWRkIHN5
c2ZzDQo+ID4gPiA+ID4gPiA+ID4gPiA+IGZvcg0KPiA+ID4gPiA+ID4gPiA+ID4gPiBwdzIwIHN0
YXRlIGFuZCBhbHRpdmVjIGlkbGUNCj4gPiA+ID4gPiA+ID4gPiA+ID4NCj4gPiA+ID4gPiA+ID4g
PiA+ID4NCj4gPiA+ID4gPiA+ID4gPiA+ID4NCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiAtLS0tLU9y
aWdpbmFsIE1lc3NhZ2UtLS0tLQ0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+IEZyb206IFdhbmcgRG9u
Z3NoZW5nLUI0MDUzNA0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+IFNlbnQ6IFR1ZXNkYXksIE9jdG9i
ZXIgMTUsIDIwMTMgMjo1MSBQTQ0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+IFRvOiBXb29kIFNjb3R0
LUIwNzQyMQ0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+IENjOiBCaHVzaGFuIEJoYXJhdC1SNjU3Nzc7
DQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gbGludXhwcGMtZGV2QGxpc3RzLm96bGFicy5vcmc7IFdh
bmcNCj4gPiA+ID4gPiA+ID4gPiA+ID4gRG9uZ3NoZW5nLUI0MDUzNA0KPiA+ID4gPiA+ID4gPiA+
ID4gPiA+IFN1YmplY3Q6IFtQQVRDSCB2NSA0LzRdIHBvd2VycGMvODV4eDogYWRkIHN5c2ZzIGZv
cg0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+IHB3MjAgc3RhdGUgYW5kDQo+ID4gPiA+ID4gPiA+ID4g
PiA+IGFsdGl2ZWMgaWRsZQ0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+DQo+ID4gPiA+ID4gPiA+ID4g
PiA+ID4gRnJvbTogV2FuZyBEb25nc2hlbmcgPGRvbmdzaGVuZy53YW5nQGZyZWVzY2FsZS5jb20+
DQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4NCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiBBZGQgYSBzeXMg
aW50ZXJmYWNlIHRvIGVuYWJsZS9kaWFibGUgcHcyMCBzdGF0ZSBvcg0KPiA+ID4gPiA+ID4gPiA+
ID4gPiA+IGFsdGl2ZWMgaWRsZSwgYW5kDQo+ID4gPiA+ID4gPiA+ID4gPiA+IGNvbnRyb2wgdGhl
DQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gd2FpdCBlbnRyeSB0aW1lLg0KPiA+ID4gPiA+ID4gPiA+
ID4gPiA+DQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gRW5hYmxlL0Rpc2FibGUgaW50ZXJmYWNlOg0K
PiA+ID4gPiA+ID4gPiA+ID4gPiA+IDAsIGRpc2FibGUuIDEsIGVuYWJsZS4NCj4gPiA+ID4gPiA+
ID4gPiA+ID4gPiAvc3lzL2RldmljZXMvc3lzdGVtL2NwdS9jcHVYL3B3MjBfc3RhdGUNCj4gPiA+
ID4gPiA+ID4gPiA+ID4gPiAvc3lzL2RldmljZXMvc3lzdGVtL2NwdS9jcHVYL2FsdGl2ZWNfaWRs
ZQ0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+DQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gU2V0IHdhaXQg
dGltZSBpbnRlcmZhY2U6KE5hbm9zZWNvbmQpDQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gL3N5cy9k
ZXZpY2VzL3N5c3RlbS9jcHUvY3B1WC9wdzIwX3dhaXRfdGltZQ0KPiA+ID4gPiA+ID4gPiA+ID4g
PiA+IC9zeXMvZGV2aWNlcy9zeXN0ZW0vY3B1L2NwdVgvYWx0aXZlY19pZGxlX3dhaXRfdGltZQ0K
PiA+ID4gPiA+ID4gPiA+ID4gPiA+IEV4YW1wbGU6IEJhc2Ugb24gVEJmcmVxIGlzIDQxTUhaLg0K
PiA+ID4gPiA+ID4gPiA+ID4gPiA+IDF+NDgobnMpOiBUQls2M10NCj4gPiA+ID4gPiA+ID4gPiA+
ID4gPiA0OX45Nyhucyk6IFRCWzYyXQ0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+IDk4fjE5NShucyk6
IFRCWzYxXQ0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+IDE5Nn4zOTAobnMpOiBUQls2MF0NCj4gPiA+
ID4gPiA+ID4gPiA+ID4gPiAzOTF+NzgwKG5zKTogVEJbNTldDQo+ID4gPiA+ID4gPiA+ID4gPiA+
ID4gNzgxfjE1NjAobnMpOiBUQls1OF0gLi4uDQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4NCj4gPiA+
ID4gPiA+ID4gPiA+ID4gPiBTaWduZWQtb2ZmLWJ5OiBXYW5nIERvbmdzaGVuZw0KPiA+ID4gPiA+
ID4gPiA+ID4gPiA+IDxkb25nc2hlbmcud2FuZ0BmcmVlc2NhbGUuY29tPg0KPiA+ID4gPiA+ID4g
PiA+ID4gPiA+IC0tLQ0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ICp2NToNCj4gPiA+ID4gPiA+ID4g
PiA+ID4gPiBDaGFuZ2UgZ2V0X2lkbGVfdGlja3NfYml0IGZ1bmN0aW9uIGltcGxlbWVudGF0aW9u
Lg0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+DQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gKnY0Og0KPiA+
ID4gPiA+ID4gPiA+ID4gPiA+IE1vdmUgY29kZSBmcm9tIDg1eHgvY29tbW9uLmMgdG8ga2VybmVs
L3N5c2ZzLmMuDQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4NCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiBS
ZW1vdmUgaGFzX3B3MjBfYWx0aXZlY19pZGxlIGZ1bmN0aW9uLg0KPiA+ID4gPiA+ID4gPiA+ID4g
PiA+DQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gQ2hhbmdlIHdhaXQgImVudHJ5X2JpdCIgdG8gd2Fp
dCB0aW1lLg0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+DQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gZGlm
ZiAtLWdpdCBhL2FyY2gvcG93ZXJwYy9rZXJuZWwvc3lzZnMuYw0KPiA+ID4gPiA+ID4gPiA+ID4g
PiA+IGIvYXJjaC9wb3dlcnBjL2tlcm5lbC9zeXNmcy5jDQo+ID4gPiA+ID4gPiA+ID4gPiA+IGlu
ZGV4DQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gMjdhOTBiOS4uMTBkMTEyOCAxMDA2NDQNCj4gPiA+
ID4gPiA+ID4gPiA+ID4gPiAtLS0gYS9hcmNoL3Bvd2VycGMva2VybmVsL3N5c2ZzLmMNCj4gPiA+
ID4gPiA+ID4gPiA+ID4gPiArKysgYi9hcmNoL3Bvd2VycGMva2VybmVsL3N5c2ZzLmMNCj4gPiA+
ID4gPiA+ID4gPiA+ID4gPiBAQCAtODUsNiArODUsMjg0IEBAIF9fc2V0dXAoInNtdC1zbm9vemUt
ZGVsYXk9IiwNCj4gPiA+ID4gPiA+ID4gPiA+ID4gc2V0dXBfc210X3Nub296ZV9kZWxheSk7DQo+
ID4gPiA+ID4gPiA+ID4gPiA+ID4NCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiAgI2VuZGlmIC8qIENP
TkZJR19QUEM2NCAqLw0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+DQo+ID4gPiA+ID4gPiA+ID4gPiA+
ID4gKyNpZmRlZiBDT05GSUdfRlNMX1NPQw0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ICsjZGVmaW5l
IE1BWF9CSVQJCQkJNjMNCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiArDQo+ID4gPiA+ID4gPiA+ID4g
PiA+ID4gK3N0YXRpYyB1NjQgcHcyMF93dDsgc3RhdGljIHU2NCBhbHRpdmVjX2lkbGVfd3Q7DQo+
ID4gPiA+ID4gPiA+ID4gPiA+ID4gKw0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ICtzdGF0aWMgdW5z
aWduZWQgaW50IGdldF9pZGxlX3RpY2tzX2JpdCh1NjQgbnMpIHsNCj4gPiA+ID4gPiA+ID4gPiA+
ID4gPiArCXU2NCBjeWNsZTsNCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiArDQo+ID4gPiA+ID4gPiA+
ID4gPiA+ID4gKwlpZiAobnMgPj0gMTAwMDApDQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gKwkJY3lj
bGUgPSBkaXZfdTY0KG5zICsgNTAwLCAxMDAwKSAqDQo+ID4gPiA+IHRiX3RpY2tzX3Blcl91c2Vj
Ow0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ICsJZWxzZQ0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ICsJ
CWN5Y2xlID0gZGl2X3U2NChucyAqIHRiX3RpY2tzX3Blcl91c2VjLA0KPiA+IDEwMDApOw0KPiA+
ID4gPiA+ID4gPiA+ID4gPiA+ICsNCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiArCWlmICghY3ljbGUp
DQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gKwkJcmV0dXJuIDA7DQo+ID4gPiA+ID4gPiA+ID4gPiA+
ID4gKw0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ICsJcmV0dXJuIGlsb2cyKGN5Y2xlKTsgfQ0KPiA+
ID4gPiA+ID4gPiA+ID4gPiA+ICsNCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiArc3RhdGljIHZvaWQg
ZG9fc2hvd19wd3JtZ3RjcjAodm9pZCAqdmFsKSB7DQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gKwl1
MzIgKnZhbHVlID0gdmFsOw0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ICsNCj4gPiA+ID4gPiA+ID4g
PiA+ID4gPiArCSp2YWx1ZSA9IG1mc3ByKFNQUk5fUFdSTUdUQ1IwKTsgfQ0KPiA+ID4gPiA+ID4g
PiA+ID4gPiA+ICsNCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiArc3RhdGljIHNzaXplX3Qgc2hvd19w
dzIwX3N0YXRlKHN0cnVjdCBkZXZpY2UgKmRldiwNCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiArCQkJ
CXN0cnVjdCBkZXZpY2VfYXR0cmlidXRlICphdHRyLA0KPiA+IGNoYXINCj4gPiA+ID4gKmJ1Zikg
ew0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ICsJdTMyIHZhbHVlOw0KPiA+ID4gPiA+ID4gPiA+ID4g
PiA+ICsJdW5zaWduZWQgaW50IGNwdSA9IGRldi0+aWQ7DQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4g
Kw0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ICsJc21wX2NhbGxfZnVuY3Rpb25fc2luZ2xlKGNwdSwg
ZG9fc2hvd19wd3JtZ3RjcjAsDQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gKyZ2YWx1ZSwgMSk7DQo+
ID4gPiA+ID4gPiA+ID4gPiA+ID4gKw0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ICsJdmFsdWUgJj0g
UFdSTUdUQ1IwX1BXMjBfV0FJVDsNCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiArDQo+ID4gPiA+ID4g
PiA+ID4gPiA+ID4gKwlyZXR1cm4gc3ByaW50ZihidWYsICIldVxuIiwgdmFsdWUgPyAxIDogMCk7
IH0NCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiArDQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gK3N0YXRp
YyB2b2lkIGRvX3N0b3JlX3B3MjBfc3RhdGUodm9pZCAqdmFsKSB7DQo+ID4gPiA+ID4gPiA+ID4g
PiA+ID4gKwl1MzIgKnZhbHVlID0gdmFsOw0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ICsJdTMyIHB3
MjBfc3RhdGU7DQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gKw0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+
ICsJcHcyMF9zdGF0ZSA9IG1mc3ByKFNQUk5fUFdSTUdUQ1IwKTsNCj4gPiA+ID4gPiA+ID4gPiA+
ID4gPiArDQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gKwlpZiAoKnZhbHVlKQ0KPiA+ID4gPiA+ID4g
PiA+ID4gPiA+ICsJCXB3MjBfc3RhdGUgfD0gUFdSTUdUQ1IwX1BXMjBfV0FJVDsNCj4gPiA+ID4g
PiA+ID4gPiA+ID4gPiArCWVsc2UNCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiArCQlwdzIwX3N0YXRl
ICY9IH5QV1JNR1RDUjBfUFcyMF9XQUlUOw0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ICsNCj4gPiA+
ID4gPiA+ID4gPiA+ID4gPiArCW10c3ByKFNQUk5fUFdSTUdUQ1IwLCBwdzIwX3N0YXRlKTsgfQ0K
PiA+ID4gPiA+ID4gPiA+ID4gPiA+ICsNCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiArc3RhdGljIHNz
aXplX3Qgc3RvcmVfcHcyMF9zdGF0ZShzdHJ1Y3QgZGV2aWNlICpkZXYsDQo+ID4gPiA+ID4gPiA+
ID4gPiA+ID4gKwkJCQlzdHJ1Y3QgZGV2aWNlX2F0dHJpYnV0ZSAqYXR0ciwNCj4gPiA+ID4gPiA+
ID4gPiA+ID4gPiArCQkJCWNvbnN0IGNoYXIgKmJ1Ziwgc2l6ZV90IGNvdW50KQ0KPiA+IHsNCj4g
PiA+ID4gPiA+ID4gPiA+ID4gPiArCXUzMiB2YWx1ZTsNCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiAr
CXVuc2lnbmVkIGludCBjcHUgPSBkZXYtPmlkOw0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ICsNCj4g
PiA+ID4gPiA+ID4gPiA+ID4gPiArCWlmIChrc3RydG91MzIoYnVmLCAwLCAmdmFsdWUpKQ0KPiA+
ID4gPiA+ID4gPiA+ID4gPiA+ICsJCXJldHVybiAtRUlOVkFMOw0KPiA+ID4gPiA+ID4gPiA+ID4g
PiA+ICsNCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiArCWlmICh2YWx1ZSA+IDEpDQo+ID4gPiA+ID4g
PiA+ID4gPiA+ID4gKwkJcmV0dXJuIC1FSU5WQUw7DQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gKw0K
PiA+ID4gPiA+ID4gPiA+ID4gPiA+ICsJc21wX2NhbGxfZnVuY3Rpb25fc2luZ2xlKGNwdSwgZG9f
c3RvcmVfcHcyMF9zdGF0ZSwNCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiArJnZhbHVlLCAxKTsNCj4g
PiA+ID4gPiA+ID4gPiA+ID4gPiArDQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gKwlyZXR1cm4gY291
bnQ7DQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gK30NCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiArDQo+
ID4gPiA+ID4gPiA+ID4gPiA+ID4gK3N0YXRpYyBzc2l6ZV90IHNob3dfcHcyMF93YWl0X3RpbWUo
c3RydWN0IGRldmljZQ0KPiAqZGV2LA0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ICsJCQkJc3RydWN0
IGRldmljZV9hdHRyaWJ1dGUgKmF0dHIsDQo+ID4gY2hhcg0KPiA+ID4gPiAqYnVmKSB7DQo+ID4g
PiA+ID4gPiA+ID4gPiA+ID4gKwl1MzIgdmFsdWU7DQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gKwl1
NjQgdGJfY3ljbGU7DQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gKwlzNjQgdGltZTsNCj4gPiA+ID4g
PiA+ID4gPiA+ID4gPiArDQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gKwl1bnNpZ25lZCBpbnQgY3B1
ID0gZGV2LT5pZDsNCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiArDQo+ID4gPiA+ID4gPiA+ID4gPiA+
ID4gKwlpZiAoIXB3MjBfd3QpIHsNCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiArCQlzbXBfY2FsbF9m
dW5jdGlvbl9zaW5nbGUoY3B1LA0KPiA+IGRvX3Nob3dfcHdybWd0Y3IwLA0KPiA+ID4gPiA+ID4g
PiA+ID4gPiA+ICsmdmFsdWUsDQo+ID4gPiA+ID4gPiA+ID4gMSk7DQo+ID4gPiA+ID4gPiA+ID4g
PiA+ID4gKwkJdmFsdWUgPSAodmFsdWUgJiBQV1JNR1RDUjBfUFcyMF9FTlQpID4+DQo+ID4gPiA+
ID4gPiA+ID4gPiA+ID4gKwkJCQkJUFdSTUdUQ1IwX1BXMjBfRU5UX1NISUZUOw0KPiA+ID4gPiA+
ID4gPiA+ID4gPiA+ICsNCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiArCQl0Yl9jeWNsZSA9ICgxIDw8
IChNQVhfQklUIC0gdmFsdWUpKSAqIDI7DQo+ID4gPiA+ID4gPiA+ID4gPiA+DQo+ID4gPiA+ID4g
PiA+ID4gPiA+IElzIHZhbHVlID0gMCBhbmQgdmFsdWUgPSAxIGxlZ2FsPyBUaGVzZSB3aWxsIG1h
a2UNCj4gPiA+ID4gPiA+ID4gPiA+ID4gdGJfY3ljbGUgPSAwLA0KPiA+ID4gPiA+ID4gPiA+ID4g
Pg0KPiA+ID4gPiA+ID4gPiA+ID4gPiA+ICsJCXRpbWUgPSBkaXZfdTY0KHRiX2N5Y2xlICogMTAw
MCwNCj4gPiB0Yl90aWNrc19wZXJfdXNlYykNCj4gPiA+ID4gLSAxOw0KPiA+ID4gPiA+ID4gPiA+
ID4gPg0KPiA+ID4gPiA+ID4gPiA+ID4gPiBBbmQgdGltZSA9IC0xOw0KPiA+ID4gPiA+ID4gPiA+
ID4gPg0KPiA+ID4gPiA+ID4gPiA+ID4gUGxlYXNlIGxvb2sgYXQgdGhlIGVuZCBvZiB0aGUgZnVu
Y3Rpb24sIDopDQo+ID4gPiA+ID4gPiA+ID4gPg0KPiA+ID4gPiA+ID4gPiA+ID4gInJldHVybiBz
cHJpbnRmKGJ1ZiwgIiVsbHVcbiIsIHRpbWUgPiAwID8gdGltZSA6IDApOyINCj4gPiA+ID4gPiA+
ID4gPg0KPiA+ID4gPiA+ID4gPiA+IEkga25vdyB5b3UgcmV0dXJuIDAgaWYgdmFsdWUgPSAwLzEs
IG15IHF1ZXN0aW9uIHdhcyB0aGF0LA0KPiA+ID4gPiA+ID4gPiA+IGlzIHRoaXMgY29ycmVjdCBh
cyBwZXIgc3BlY2lmaWNhdGlvbj8NCj4gPiA+ID4gPiA+ID4gPg0KPiA+ID4gPiA+ID4gPiA+IEFo
aCwgYWxzbyBmb3IgInZhbHVlIiB1cHRvIDcgeW91IHdpbGwgcmV0dXJuIDAsIG5vPw0KPiA+ID4g
PiA+ID4gPiA+DQo+ID4gPiA+ID4gPiA+IElmIHZhbHVlID0gMCwgTUFYX0JJVCAtIHZhbHVlID0g
NjMgdGJfY3ljbGUgPQ0KPiA+ID4gPiA+ID4gPiAweGZmZmZmZmZmX2ZmZmZmZmZmLCB0Yl9jeWNs
ZSAqIDEwMDAgd2lsbCBvdmVyZmxvdywgYnV0IHRoaXMNCj4gPiBzaXR1YXRpb24gaXMgbm90IHBv
c3NpYmxlLg0KPiA+ID4gPiA+ID4gPiBCZWNhdXNlIGlmIHRoZSAidmFsdWUgPSAwIiBtZWFucyB0
aGlzIGZlYXR1cmUgd2lsbCBiZQ0KPiAiZGlzYWJsZSIuDQo+ID4gPiA+ID4gPiA+IE5vdyBUaGUg
ZGVmYXVsdCB3YWl0IGJpdCBpcyA1MChNQVhfQklUIC0gdmFsdWUsIHZhbHVlID0gMTMpLA0KPiA+
ID4gPiA+ID4gPiB0aGUgUFcyMC9BbHRpdmVjIElkbGUgd2FpdCBlbnRyeSB0aW1lIGlzIGFib3V0
IDFtcywgdGhpcw0KPiA+ID4gPiA+ID4gPiB0aW1lIGlzIHZlcnkgbG9uZyBmb3Igd2FpdCBpZGxl
IHRpbWUsIGFuZCBpdCdzIGNhbm5vdCBiZQ0KPiA+ID4gPiA+ID4gPiBpbmNyZWFzZWQobWVhbnMg
KE1BWF9CSVQNCj4gPiA+ID4gPiA+ID4gLSB2YWx1ZSkNCj4gPiA+ID4gPiA+IGNhbm5vdCBncmVh
dGVyIHRoYW4gNTApLg0KPiA+ID4gPiA+ID4NCj4gPiA+ID4gPiA+IFdoYXQgeW91IHNhaWQgaXMg
bm90IG9idmlvdXMgZnJvbSBjb2RlIGFuZCBzbyBhdCBsZWFzdCB3cml0ZSBhDQo+ID4gPiA+ID4g
PiBjb21tZW50IHRoYXQgdmFsdWUgd2lsbCBiZSBhbHdheXMgPj0gMTMgb3IgdmFsdWUgd2lsbCBu
ZXZlciBiZQ0KPiA+ID4gPiA+ID4gbGVzcyB0aGFuIDwgOCBhbmQgYmVsb3cgY2FsY3VsYXRpb24g
d2lsbCBub3Qgb3ZlcmZsb3cuIG1heSBiZQ0KPiA+ID4gPiA+ID4gZXJyb3Igb3V0IGlmIHZhbHVl
IGlzIGxlc3MgdGhhbiA4Lg0KPiA+ID4gPiA+ID4NCj4gPiA+ID4gPiBUaGUgInZhbHVlIiBsZXNz
IHRoYW4gMTAsIHRoaXMgd2lsbCBvdmVyZmxvdy4NCj4gPiA+ID4gPiBUaGVyZSBpcyBub3QgZXJy
b3IsIFRoZSBjb2RlIEkga25ldyBpdCBjb3VsZCBub3QgYmUgbGVzcyB0aGFuDQo+ID4gPiA+ID4g
MTAsIHRoYXQncyB3aHkgSSB1c2UgdGhlIGZvbGxvd2luZyBjb2RlLiA6KQ0KPiA+ID4gPg0KPiA+
ID4gPiBJIGFtIHNvcnJ5IHRvIHBlcnNpc3QgYnV0IHRoaXMgaXMgbm90IGFib3V0IHdoYXQgeW91
IGtub3csIHRoaXMgaXMNCj4gPiA+ID4gYWJvdXQgaG93IGNvZGUgaXMgcmVhZCBhbmQgY29kZSBk
b2VzIG5vdCBzYXkgd2hhdCB5b3Uga25vdywgc28gYWRkDQo+ID4gPiA+IGEgY29tbWVudCBhdCBs
ZWFzdCBhbmQgZXJyb3Igb3V0L3dhcm4gd2hlbiAidmFsdWUiIGlzIGxlc3MgdGhhbiBhDQo+ID4g
Y2VydGFpbiBudW1iZXIuDQo+ID4gPiA+DQo+ID4gPiBTb3JyeSBmb3IgdGhlIGxhdGUgdG8gcmVz
cG9uc2UgdGhlIG1haWwuIElmIGl0IGNhdXNlZCBjb25mdXNpb24sIHdlDQo+ID4gPiBjYW4NCj4g
PiBhZGQgYSBjb21tZW50Lg0KPiA+ID4NCj4gPiA+IEhvdyBhYm91dCB0aGUgZm9sbG93aW5nIGNv
bW1lbnQ/DQo+ID4gPiAvKg0KPiA+ID4gICogSWYgdGhlICJ2YWx1ZSIgbGVzcyB0aGFuIDEwLCB0
aGlzIHdpbGwgb3ZlcmZsb3cuDQo+ID4gPiAgKiBGcm9tIGJlbmNobWFyayB0ZXN0LCB0aGUgZGVm
YXVsdCB3YWl0IGJpdCB3aWxsIG5vdCBiZSBzZXQgbGVzcw0KPiA+ID4gdGhhbg0KPiA+IDEwYml0
Lg0KPiA+ID4gICogQmVjYXVzZSAxMCBiaXQgY29ycmVzcG9uZHMgdG8gdGhlIHdhaXQgZW50cnkg
dGltZSBpcw0KPiA+ID4gNDM5Mzc1NTczNDAxOTk5NjA5KG5zKSwNCj4gPiA+ICAqIGZvciB3YWl0
LWVudHJ5LWlkbGUgdGltZSB0aGlzIHZhbHVlIGxvb2tzIHRvbyBsb25nLCBhbmQgd2UgY2Fubm90
DQo+ID4gPiB1c2UgdGhvc2UNCj4gPiA+ICAqICJsb25nIiB0aW1lIGFzIGEgZGVmYXVsdCB3YWl0
LWVudHJ5IHRpbWUuIFNvIG92ZXJmbG93IGNvdWxkIG5vdA0KPiA+ID4gaGF2ZSBoYXBwZW5lZA0K
PiA+ID4gICogYW5kIHdlIHVzZSB0aGlzIGNhbGN1bGF0aW9uIG1ldGhvZCB0byBnZXQgd2FpdC1l
bnRyeS1pZGxlIHRpbWUuDQo+ID4gPiAgKi8NCj4gPg0KPiA+IElmIHRoZXJlJ3MgdG8gYmUgYSBs
aW1pdCBvbiB0aGUgdGltZXMgd2UgYWNjZXB0LCBtYWtlIGl0IGV4cGxpY2l0Lg0KPiA+IENoZWNr
IGZvciBpdCBiZWZvcmUgZG9pbmcgYW55IGNvbnZlcnNpb25zLCBhbmQgcmV0dXJuIGFuIGVycm9y
IGlmDQo+ID4gdXNlcnNwYWNlIHRyaWVzIHRvIHNldCBpdC4NCj4gPg0KPiBUaGUgYnJhbmNoIG9u
bHkgdXNlIHRvIHJlYWQgZGVmYXVsdCB3YWl0LWVudHJ5LXRpbWUuDQo+IFdlIGhhdmUgbm8gbGlt
aXQgdGhlIHVzZXIncyBpbnB1dCwgYW5kIHdlIGNhbid0IHJlc3RyaWN0LiBPbmNlIHRoZSB1c2Vy
DQo+IHNldCB0aGUgd2FpdC1lbnRyeS10aW1lLCB0aGUgY29kZSB3aWxsIGRvIGFub3RoZXIgYnJh
bmNoLg0KPiANCg0KSGkgc2NvdHQsDQpEbyB5b3UgaGF2ZSBhbnkgY29tbWVudHMgYWJvdXQgdGhp
cyBwYXRjaD8NCkkgd2lsbCBhZGQgdGhlIGNvbW1lbnQgYW5kIHNlbmQgdGhpcyBwYXRjaCBhZ2Fp
bi4NCg0KLWRvbmdzaGVuZw0KDQo+ID4NCj4gPg0KDQo=

^ permalink raw reply

* Re: [PATCHv2 1/8] ALSA: Add SAI SoC Digital Audio Interface driver.
From: Nicolin Chen @ 2013-11-04  4:33 UTC (permalink / raw)
  To: Xiubo Li-B47053
  Cc: mark.rutland@arm.com, alsa-devel@alsa-project.org,
	linux-doc@vger.kernel.org, tiwai@suse.de, Wang Huan-B18965,
	timur@tabi.org, perex@perex.cz, Guo Shawn-R65073,
	LW@KARO-electronics.de, linux@arm.linux.org.uk,
	linux-arm-kernel@lists.infradead.org, grant.likely@linaro.org,
	devicetree@vger.kernel.org, ian.campbell@citrix.com,
	pawel.moll@arm.com, swarren@wwwdotorg.org,
	rob.herring@calxeda.com, broonie@kernel.org, oskar@scara.com,
	Estevam Fabio-R49496, lgirdwood@gmail.com,
	linux-kernel@vger.kernel.org, rob@landley.net,
	Jin Zhengxiong-R64188, shawn.guo@linaro.org,
	linuxppc-dev@lists.ozlabs.org
In-Reply-To: <1DD289F6464F0949A2FCA5AA6DC23F82874201@039-SN2MPN1-013.039d.mgd.msft.net>

On Mon, Nov 04, 2013 at 11:45:10AM +0800, Xiubo Li-B47053 wrote:
> > > +snd-soc-fsl-sai-objs := fsl-sai.o
> > 
> > And I think it should be better to put it along with fsl-ssi.o and fsl-
> > spdif.o
> > 
> 
> But fsl-ssi.o and fsl-spdif.o is based PowrePC platform? Which we can see from the comments.

No. fsl-ssi.c is compatible to both ARM and PPC. And fsl-spdif.c is currently
used on ARM only. So please just put along with them.

> > > +	case SNDRV_PCM_TRIGGER_START:
> > > +	case SNDRV_PCM_TRIGGER_RESUME:
> > > +	case SNDRV_PCM_TRIGGER_PAUSE_RELEASE:
> > > +		tcsr |= FSL_SAI_CSR_TERE;
> > > +		rcsr |= FSL_SAI_CSR_TERE;
> > > +		writel(rcsr, sai->base + FSL_SAI_RCSR);
> > > +		udelay(10);
> > 
> > Does SAI really needs this udelay() here? Required by IP's operation flow?
> > If so, I think it's better to add comments here to explain it.
> > 
> +++++++++++++++++
> Synchronous mode
> The SAI transmitter and receiver can be configured to operate with synchronous bit clock
> and frame sync.
> 
> 1,
> If the transmitter bit clock and frame sync are to be used by both the transmitter and
> receiver:
> 	...
> * It is recommended that the transmitter is the last enabled and the first disabled.
> 
> 2,
> If the receiver bit clock and frame sync are to be used by both the transmitter and
> receiver:
> 	...
> * It is recommended that the receiver is the last enabled and the first disabled.
> ------------------
> 
> So I think the delay is needed, and I still tunning it.
> 

The udelay just doesn't make sense to what you are talking about.

Does SAI really need 10us delay between two register-updating?

We basically use udelay only if the IP hardware actually needs it: some
IP needs time to boot itself up after doing software reset for example.
But it doesn't look reasonable to me by using udelay to make sure "the 
last enabled".

And from the 'Synchronous mode' you just provided, there're another issue:

+	case SNDRV_PCM_TRIGGER_START:
+	case SNDRV_PCM_TRIGGER_RESUME:
+	case SNDRV_PCM_TRIGGER_PAUSE_RELEASE:
+		tcsr |= FSL_SAI_CSR_TERE;
+		rcsr |= FSL_SAI_CSR_TERE;
+		writel(rcsr, sai->base + FSL_SAI_RCSR);
+		udelay(10);
+		writel(tcsr, sai->base + FSL_SAI_TCSR);
+		break;
+
+	case SNDRV_PCM_TRIGGER_STOP:
+	case SNDRV_PCM_TRIGGER_SUSPEND:
+	case SNDRV_PCM_TRIGGER_PAUSE_PUSH:
+		if (!(dai->playback_active || dai->capture_active)) {
+			tcsr &= ~FSL_SAI_CSR_TERE;
+			rcsr &= ~FSL_SAI_CSR_TERE;
+		}
+		writel(rcsr, sai->base + FSL_SAI_RCSR);
+		udelay(10);
+		writel(tcsr, sai->base + FSL_SAI_TCSR);
+		break;

ISSUE 1: You might make sure transmitter is the last enabled.
	 However, it's not the first disabled. Is this okay?

ISSUE 2: There are two cases listed in 'Synchronous mode'.
	 However, your driver doesn't take care of them.
	 The SAI's synchronous mode looks like more flexible
	 than SSI's. The driver needs to be more sophisticated
	 so that it can handle multiple cases when TX/RX clocks
	 are controlled by either TX or RX, and surely, the
	 asynchronous mode as well.


And there's another personal tip: I think you can first try to focus on
this SAI driver and pend the others. There might be two many things you
need to refine if you are doing them at the same time.

Best regards,
Nicolin Chen

^ permalink raw reply

* RE: [PATCHv2 1/8] ALSA: Add SAI SoC Digital Audio Interface driver.
From: Li Xiubo @ 2013-11-04  7:35 UTC (permalink / raw)
  To: Mark Brown
  Cc: mark.rutland@arm.com, alsa-devel@alsa-project.org,
	linux-doc@vger.kernel.org, tiwai@suse.de, Huan Wang,
	timur@tabi.org, perex@perex.cz, Shawn Guo, LW@KARO-electronics.de,
	linux@arm.linux.org.uk, Guangyu Chen,
	linux-arm-kernel@lists.infradead.org, grant.likely@linaro.org,
	devicetree@vger.kernel.org, ian.campbell@citrix.com,
	pawel.moll@arm.com, swarren@wwwdotorg.org,
	rob.herring@calxeda.com, oskar@scara.com, Fabio Estevam,
	lgirdwood@gmail.com, linux-kernel@vger.kernel.org,
	rob@landley.net, Zhengxiong Jin, shawn.guo@linaro.org,
	linuxppc-dev@lists.ozlabs.org
In-Reply-To: <20131101182514.GE2493@sirena.org.uk>

> > +static int fsl_sai_set_dai_clkdiv(struct snd_soc_dai *cpu_dai,
> > +		int div_id, int div)
> > +{
> > +	struct fsl_sai *sai =3D snd_soc_dai_get_drvdata(cpu_dai);
> > +	u32 tcr2, rcr2;
> > +
> > +	if (div_id =3D=3D FSL_SAI_TX_DIV) {
> > +		tcr2 =3D readl(sai->base + FSL_SAI_TCR2);
> > +		tcr2 &=3D ~FSL_SAI_CR2_DIV_MASK;
> > +		tcr2 |=3D FSL_SAI_CR2_DIV(div);
> > +		writel(tcr2, sai->base + FSL_SAI_TCR2);
>=20
> What is this divider and why does the user have to set it manually?
>=20

This is the bit clock divider. I'll add some comments or rename them to be =
more readable.
>From the IP spec:
++++++
Bit Clock Divide
Divides down the audio master clock to generate the bit clock when configur=
ed for an internal bit clock.
------

>From the ASoC subsystem comments we can see that:
++++++
Configures the clock dividers. This is used to derive the best DAI bit and
frame clocks from the system or master clock. It's best to set the DAI bit
and frame clocks as low as possible to save system power.
------


> > +static int fsl_sai_dai_probe(struct snd_soc_dai *dai) {
> > +	int ret;
> > +	struct fsl_sai *sai =3D dev_get_drvdata(dai->dev);
> > +
> > +	ret =3D clk_prepare_enable(sai->clk);
> > +	if (ret)
> > +		return ret;
>=20
> It'd be nicer to only enable the clock while the device is in active use.
>=20

While if the module clock is not enabled here, the followed registers canno=
t read/write in the same function.
And this _probe function is the _dai_probe not the driver's module _probe.

If the clk_prepare_enable(sai->clk) is not here, where should it be will be=
 nicer ?
One of the following functions ?
        .set_sysclk     =3D fsl_sai_set_dai_sysclk,
        .set_clkdiv     =3D fsl_sai_set_dai_clkdiv,
        .set_fmt        =3D fsl_sai_set_dai_fmt,
        .set_tdm_slot   =3D fsl_sai_set_dai_tdm_slot,
        .hw_params      =3D fsl_sai_hw_params,
        .trigger        =3D fsl_sai_trigger,


> > +	ret =3D snd_dmaengine_pcm_register(&pdev->dev, NULL,
> > +			SND_DMAENGINE_PCM_FLAG_NO_RESIDUE);
> > +	if (ret)
> > +		return ret;
>=20
> We should have a devm_ version of this.

Sorry, is there one patch for adding the devm_ version of snd_dmaengine_pcm=
_register() already ?
In the -next and other topics branches I could not find it.



Best Regards,
Xiubo

^ permalink raw reply

* Re: [PATCH v3 1/1] powerpc/embedded6xx: Add support for Motorola/Emerson MVME5100
From: Geert Uytterhoeven @ 2013-11-04  7:59 UTC (permalink / raw)
  To: Stephen Chivers; +Cc: linuxppc-dev@lists.ozlabs.org
In-Reply-To: <20131103210748.E74FAE06C0@canberra.localdomain>

On Sun, Nov 3, 2013 at 10:07 PM, Stephen Chivers <schivers@csc.com> wrote:
> +++ b/arch/powerpc/boot/dts/mvme5100.dts
> @@ -0,0 +1,185 @@
> +/*
> + * Device Tree Souce for Motorola/Emerson MVME5100.

Source

(unless this expresses your personal appreciation for device trees ;-)

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply

* [PATCH 2/2] powerpc/powernv: Reserve the correct PE number
From: Gavin Shan @ 2013-11-04  8:32 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan
In-Reply-To: <1383553967-16644-1-git-send-email-shangw@linux.vnet.ibm.com>

We're assigning PE numbers after the completion of PCI probe. During
the PCI probe, we had PE#0 as the super container to encompass all
PCI devices. However, that's inappropriate since PELTM has ascending
order of priority on search on P7IOC. So we need PE#127 takes the
role that PE#0 has previously. For PHB3, we still have PE#0 as the
reserved PE.

The patch supposes that the underly firmware has built the RID to
PE# mapping after resetting IODA tables: all PELTM entries except
last one has invalid mapping on P7IOC, but all RTEs have binding
to PE#0. The reserved PE# is being exported by firmware by device
tree.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c |   22 +++++++++-------------
 arch/powerpc/platforms/powernv/pci.c      |   10 +++++++---
 arch/powerpc/platforms/powernv/pci.h      |    1 +
 3 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 198566e..084cdfa 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1209,12 +1209,13 @@ void __init pnv_pci_init_ioda_phb(struct device_node *np,
 		pr_err("  Failed to map registers !\n");
 
 	/* Initialize more IODA stuff */
+	phb->ioda.total_pe = 1;
 	prop32 = of_get_property(np, "ibm,opal-num-pes", NULL);
-	if (!prop32)
-		phb->ioda.total_pe = 1;
-	else
+	if (prop32)
 		phb->ioda.total_pe = be32_to_cpup(prop32);
-
+	prop32 = of_get_property(np, "ibm,opal-reserved-pe", NULL);
+	if (prop32)
+		phb->ioda.reserved_pe = be32_to_cpup(prop32);
 	phb->ioda.m32_size = resource_size(&hose->mem_resources[0]);
 	/* FW Has already off top 64k of M32 space (MSI space) */
 	phb->ioda.m32_size += 0x10000;
@@ -1243,7 +1244,7 @@ void __init pnv_pci_init_ioda_phb(struct device_node *np,
 	if (phb->type == PNV_PHB_IODA1)
 		phb->ioda.io_segmap = aux + iomap_off;
 	phb->ioda.pe_array = aux + pemap_off;
-	set_bit(0, phb->ioda.pe_alloc);
+	set_bit(phb->ioda.reserved_pe, phb->ioda.pe_alloc);
 
 	INIT_LIST_HEAD(&phb->ioda.pe_dma_list);
 	INIT_LIST_HEAD(&phb->ioda.pe_list);
@@ -1268,8 +1269,10 @@ void __init pnv_pci_init_ioda_phb(struct device_node *np,
 					 segment_size);
 #endif
 
-	pr_info("  %d PE's M32: 0x%x [segment=0x%x] IO: 0x%x [segment=0x%x]\n",
+	pr_info("  %d (%d) PE's M32: 0x%x [segment=0x%x]"
+		" IO: 0x%x [segment=0x%x]\n",
 		phb->ioda.total_pe,
+		phb->ioda.reserved_pe,
 		phb->ioda.m32_size, phb->ioda.m32_segsize,
 		phb->ioda.io_size, phb->ioda.io_segsize);
 
@@ -1306,13 +1309,6 @@ void __init pnv_pci_init_ioda_phb(struct device_node *np,
 	rc = opal_pci_reset(phb_id, OPAL_PCI_IODA_TABLE_RESET, OPAL_ASSERT_RESET);
 	if (rc)
 		pr_warning("  OPAL Error %ld performing IODA table reset !\n", rc);
-
-	/*
-	 * On IODA1 map everything to PE#0, on IODA2 we assume the IODA reset
-	 * has cleared the RTT which has the same effect
-	 */
-	if (ioda_type == PNV_PHB_IODA1)
-		opal_pci_set_pe(phb_id, 0, 0, 7, 1, 1 , OPAL_MAP_PE);
 }
 
 void __init pnv_pci_init_ioda2_phb(struct device_node *np)
diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
index 921ae67..4eb33a9 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -242,11 +242,15 @@ static void pnv_pci_config_check_eeh(struct pnv_phb *phb,
 	/*
 	 * Get the PE#. During the PCI probe stage, we might not
 	 * setup that yet. So all ER errors should be mapped to
-	 * PE#0
+	 * reserved PE.
 	 */
 	pe_no = PCI_DN(dn)->pe_number;
-	if (pe_no == IODA_INVALID_PE)
-		pe_no = 0;
+	if (pe_no == IODA_INVALID_PE) {
+		if (phb->type == PNV_PHB_P5IOC2)
+			pe_no = 0;
+		else
+			pe_no = phb->ioda.reserved_pe;
+	}
 
 	/* Read freeze status */
 	rc = opal_pci_eeh_freeze_status(phb->opal_id, pe_no, &fstate, &pcierr,
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 64d3b12..911c24e 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -125,6 +125,7 @@ struct pnv_phb {
 		struct {
 			/* Global bridge info */
 			unsigned int		total_pe;
+			unsigned int		reserved_pe;
 			unsigned int		m32_size;
 			unsigned int		m32_segsize;
 			unsigned int		m32_pci_base;
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH 1/2] powerpc/powernv: Add PE to its own PELTV
From: Gavin Shan @ 2013-11-04  8:32 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan, stable

We need add PE to its own PELTV. Otherwise, the errors originated
from the PE might contribute to other PEs. In the result, we can't
clear up the error successfully even we're checking and clearing
errors during access to PCI config space.

Cc: stable@vger.kernel.org
Reported-by: kalshett@in.ibm.com
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c |   12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index c639af7..198566e 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -163,13 +163,23 @@ static int pnv_ioda_configure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
 		rid_end = pe->rid + 1;
 	}
 
-	/* Associate PE in PELT */
+	/*
+	 * Associate PE in PELT. We need add the PE into the
+	 * corresponding PELT-V as well. Otherwise, the error
+	 * originated from the PE might contribute to other
+	 * PEs.
+	 */
 	rc = opal_pci_set_pe(phb->opal_id, pe->pe_number, pe->rid,
 			     bcomp, dcomp, fcomp, OPAL_MAP_PE);
 	if (rc) {
 		pe_err(pe, "OPAL error %ld trying to setup PELT table\n", rc);
 		return -ENXIO;
 	}
+
+	rc = opal_pci_set_peltv(phb->opal_id, pe->pe_number,
+				pe->pe_number, OPAL_ADD_PE_TO_DOMAIN);
+	if (rc)
+		pe_warn(pe, "OPAL error %d adding self to PELTV\n", rc);
 	opal_pci_eeh_freeze_clear(phb->opal_id, pe->pe_number,
 				  OPAL_EEH_ACTION_CLEAR_FREEZE_ALL);
 
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH] powerpc: remove unused REDBOOT Kconfig parameter
From: Michael Opdenacker @ 2013-11-04  8:38 UTC (permalink / raw)
  To: benh, paulus, galak, vitb, marcelo
  Cc: Michael Opdenacker, linuxppc-dev, linux-kernel

This removes the REDBOOT Kconfig parameter,
which was no longer used anywhere in the source code
and Makefiles.

Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com>
---
 arch/powerpc/Kconfig                | 3 ---
 arch/powerpc/platforms/83xx/Kconfig | 1 -
 arch/powerpc/platforms/8xx/Kconfig  | 1 -
 3 files changed, 5 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 38f3b7e47ec5..f02a41935c95 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -205,9 +205,6 @@ config DEFAULT_UIMAGE
 	  Used to allow a board to specify it wants a uImage built by default
 	default n
 
-config REDBOOT
-	bool
-
 config ARCH_HIBERNATION_POSSIBLE
 	bool
 	default y
diff --git a/arch/powerpc/platforms/83xx/Kconfig b/arch/powerpc/platforms/83xx/Kconfig
index 670a033264c0..2bdc8c862c46 100644
--- a/arch/powerpc/platforms/83xx/Kconfig
+++ b/arch/powerpc/platforms/83xx/Kconfig
@@ -99,7 +99,6 @@ config SBC834x
 config ASP834x
 	bool "Analogue & Micro ASP 834x"
 	select PPC_MPC834x
-	select REDBOOT
 	help
 	  This enables support for the Analogue & Micro ASP 83xx
 	  board.
diff --git a/arch/powerpc/platforms/8xx/Kconfig b/arch/powerpc/platforms/8xx/Kconfig
index 8dec3c0911ad..bd6f1a1cf922 100644
--- a/arch/powerpc/platforms/8xx/Kconfig
+++ b/arch/powerpc/platforms/8xx/Kconfig
@@ -45,7 +45,6 @@ config PPC_EP88XC
 config PPC_ADDER875
 	bool "Analogue & Micro Adder 875"
 	select CPM1
-	select REDBOOT
 	help
 	  This enables support for the Analogue & Micro Adder 875
 	  board.
-- 
1.8.1.2

^ permalink raw reply related

* Re: perf events ring buffer memory barrier on powerpc
From: Peter Zijlstra @ 2013-11-04  9:07 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Michael Neuling, Mathieu Desnoyers, LKML, Oleg Nesterov,
	Linux PPC dev, Anton Blanchard, Frederic Weisbecker,
	Victor Kaplansky
In-Reply-To: <20131102152048.GI4067@linux.vnet.ibm.com>

On Sat, Nov 02, 2013 at 08:20:48AM -0700, Paul E. McKenney wrote:
> On Fri, Nov 01, 2013 at 11:30:17AM +0100, Peter Zijlstra wrote:
> > Furthermore there's a gazillion parallel userspace programs.
> 
> Most of which have very unaggressive concurrency designs.

pthread_mutex_t A, B;

char data_A[x];
int  counter_B = 1;

void funA(void)
{
	pthread_mutex_lock(&A);
	memset(data_A, 0, sizeof(data_A));
	pthread_mutex_unlock(&A);
}

void funB(void)
{
	pthread_mutex_lock(&B);
	counter_B++;
	pthread_mutex_unlock(&B);
}

void funC(void)
{
	pthread_mutex_lock(&B)
	printf("%d\n", counter_B);
	pthread_mutex_unlock(&B);
}

Then run: funA, funB, funC concurrently, and end with a funC.

Then explain to userman than his unaggressive program can return:
0
1

Because the memset() thought it might be a cute idea to overwrite
counter_B and fix it up 'later'. Which if I understood you right is
valid in C/C++ :-(

Not that any actual memset implementation exhibiting this trait wouldn't
be shot on the spot.

> > > By marking "ptr" as atomic, thus telling the compiler not to mess with it.
> > > And thus requiring that all accesses to it be decorated, which in the
> > > case of RCU could be buried in the RCU accessors.
> > 
> > This seems contradictory; marking it atomic would look like:
> > 
> > struct foo {
> > 	unsigned long value;
> > 	__atomic void *ptr;
> > 	unsigned long value1;
> > };
> > 
> > Clearly we cannot hide this definition in accessors, because then
> > accesses to value* won't see the annotation.
> 
> #define __rcu __atomic

Yeah, except we don't use __rcu all that consistently; in fact I don't
know if I ever added it.

^ permalink raw reply

* Re: perf events ring buffer memory barrier on powerpc
From: Paul E. McKenney @ 2013-11-04 10:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Michael Neuling, Mathieu Desnoyers, LKML, Oleg Nesterov,
	Linux PPC dev, Anton Blanchard, Frederic Weisbecker,
	Victor Kaplansky
In-Reply-To: <20131104090744.GE10651@twins.programming.kicks-ass.net>

On Mon, Nov 04, 2013 at 10:07:44AM +0100, Peter Zijlstra wrote:
> On Sat, Nov 02, 2013 at 08:20:48AM -0700, Paul E. McKenney wrote:
> > On Fri, Nov 01, 2013 at 11:30:17AM +0100, Peter Zijlstra wrote:
> > > Furthermore there's a gazillion parallel userspace programs.
> > 
> > Most of which have very unaggressive concurrency designs.
> 
> pthread_mutex_t A, B;
> 
> char data_A[x];
> int  counter_B = 1;
> 
> void funA(void)
> {
> 	pthread_mutex_lock(&A);
> 	memset(data_A, 0, sizeof(data_A));
> 	pthread_mutex_unlock(&A);
> }
> 
> void funB(void)
> {
> 	pthread_mutex_lock(&B);
> 	counter_B++;
> 	pthread_mutex_unlock(&B);
> }
> 
> void funC(void)
> {
> 	pthread_mutex_lock(&B)
> 	printf("%d\n", counter_B);
> 	pthread_mutex_unlock(&B);
> }
> 
> Then run: funA, funB, funC concurrently, and end with a funC.
> 
> Then explain to userman than his unaggressive program can return:
> 0
> 1
> 
> Because the memset() thought it might be a cute idea to overwrite
> counter_B and fix it up 'later'. Which if I understood you right is
> valid in C/C++ :-(
> 
> Not that any actual memset implementation exhibiting this trait wouldn't
> be shot on the spot.

Even without such a malicious memcpy() implementation I must still explain
about false sharing when the developer notices that the unaggressive
program isn't running as fast as expected.

> > > > By marking "ptr" as atomic, thus telling the compiler not to mess with it.
> > > > And thus requiring that all accesses to it be decorated, which in the
> > > > case of RCU could be buried in the RCU accessors.
> > > 
> > > This seems contradictory; marking it atomic would look like:
> > > 
> > > struct foo {
> > > 	unsigned long value;
> > > 	__atomic void *ptr;
> > > 	unsigned long value1;
> > > };
> > > 
> > > Clearly we cannot hide this definition in accessors, because then
> > > accesses to value* won't see the annotation.
> > 
> > #define __rcu __atomic
> 
> Yeah, except we don't use __rcu all that consistently; in fact I don't
> know if I ever added it.

There are more than 300 of them in the kernel.  Plus sparse can be
convinced to yell at you if you don't use them.  So lack of __rcu could
be fixed without too much trouble.

The C/C++11 need to annotate functions that take arguments or return
values taken from rcu_dereference() is another story.  But the compilers
have to get significantly more aggressive or developers have to be doing
unusual things that result in rcu_dereference() returning something whose
value the compiler can predict exactly.

							Thanx, Paul

^ permalink raw reply

* RE: [PATCHv2 6/8] ASoC: fsl: add SGTL5000 based audio machine driver.
From: Li Xiubo @ 2013-11-04  9:52 UTC (permalink / raw)
  To: Mark Brown
  Cc: mark.rutland@arm.com, alsa-devel@alsa-project.org,
	linux-doc@vger.kernel.org, tiwai@suse.de, Huan Wang,
	timur@tabi.org, perex@perex.cz, Shawn Guo, LW@KARO-electronics.de,
	linux@arm.linux.org.uk, Guangyu Chen,
	linux-arm-kernel@lists.infradead.org, grant.likely@linaro.org,
	devicetree@vger.kernel.org, ian.campbell@citrix.com,
	pawel.moll@arm.com, swarren@wwwdotorg.org,
	rob.herring@calxeda.com, oskar@scara.com, Fabio Estevam,
	lgirdwood@gmail.com, linux-kernel@vger.kernel.org,
	rob@landley.net, Zhengxiong Jin, shawn.guo@linaro.org,
	linuxppc-dev@lists.ozlabs.org
In-Reply-To: <20131101184008.GF2493@sirena.org.uk>


> > Conflicts:
> > 	sound/soc/fsl/Makefile
>=20
> Ahem.
>=20

This will be removed.

> > +static int fsl_sgtl5000_remove(struct platform_device *pdev) {
> > +	snd_soc_unregister_card(&fsl_sgt1500_card);
> > +
> > +	return 0;
> > +}
>=20
> You're using snd_soc_unregister_card() so you don't need to do this.
>

See the next version.

^ permalink raw reply

* Re: [RFC] arch: Introduce new TSO memory barrier smp_tmb()
From: Paul E. McKenney @ 2013-11-04 10:51 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Michael Neuling, Mathieu Desnoyers, Peter Zijlstra, Oleg Nesterov,
	LKML, Linux PPC dev, Anton Blanchard, Frederic Weisbecker,
	Victor Kaplansky
In-Reply-To: <CA+55aFyD_kCkAHQwHCUBrumO-pH6LaZikTNvyWDW_tWsHdqk6Q@mail.gmail.com>

On Sun, Nov 03, 2013 at 03:34:00PM -0800, Linus Torvalds wrote:
> On Sun, Nov 3, 2013 at 2:42 PM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> >
> > smp_storebuffer_mb() -- A barrier that enforces those orderings
> >         that do not invalidate the hardware store-buffer optimization.
> 
> Ugh. Maybe. Can you guarantee that those are the correct semantics?
> And why talk about the hardware semantics, when you really want
> specific semantics for the *software*.
> 
> > smp_not_w_r_mb() -- A barrier that orders everything except prior
> >         writes against subsequent reads.
> 
> Ok, that sounds more along the lines of "these are the semantics we
> want", but I have to say, it also doesn't make me go "ahh, ok".
> 
> > smp_acqrel_mb() -- A barrier that combines C/C++ acquire and release
> >         semantics.  (C/C++ "acquire" orders a specific load against
> >         subsequent loads and stores, while C/C++ "release" orders
> >         a specific store against prior loads and stores.)
> 
> I don't think this is true. acquire+release is much stronger than what
> you're looking for - it doesn't allow subsequent reads to move past
> the write (because that would violate the acquire part). On x86, for
> example, you'd need to have a locked cycle for smp_acqrel_mb().
> 
> So again, what are the guarantees you actually want? Describe those.
> And then make a name.

I was thinking in terms of the guarantee that TSO systems provide
given a barrier() directive, and that PowerPC provides given the lwsync
instruction.  This guarantee is that loads preceding the barrier will
not be reordered with memory referenced following the barrier, and that
stores preceding the barrier will not be reordered with stores following
the barrier.  But given how much easier RCU reviews became after burying
smp_wmb() and smp_read_barrier_depends() into rcu_assign_pointer() and
rcu_dereference(), respectively, I think I prefer an extension of your
idea below.

> I _think_ the guarantees you want is:
>  - SMP write barrier
>  - *local* read barrier for reads preceding the write.
> 
> but the problem is that the "preceding reads" part is really
> specifically about the write that you had. The barrier should really
> be attached to the *particular* write operation, it cannot be a
> standalone barrier.

Indeed, neither rcu_assign_pointer() nor the circular queue really needs a
standalone barrier, so that attaching the barrier to a particular memory
reference would work.  And as you note below, in the case of ARM this
would turn into one of their new memory-reference instructions.

> So it would *kind* of act like a "smp_wmb() + smp_rmb()", but the
> problem is that a "smp_rmb()" doesn't really "attach" to the preceding
> write.
> 
> This is analogous to a "acquire" operation: you cannot make an
> "acquire" barrier, because it's not a barrier *between* two ops, it's
> associated with one particular op.

But you -could- use any barrier that prevented reordering of any preceding
load with any subsequent memory reference.  Please note that I am -not-
advocating this anymore, because I like the idea of attaching the barrier
to a particular memory operation.  However, for completeness, here it is
in the case of TSO systems and PowerPC, respectively:

#define smp_acquire_mb() barrier();

#define smp_acquire_mb() \
	__asm__ __volatile__ (stringify_in_c(LWSYNC) : : :"memory");

This functions correctly, but is a pain to review because you have to
figure out which of many possible preceding loads the smp_acquire_mb()
is supposed to attach to.  As you say, it is -way- better to attach the
barrier to a particular memory operation.

> So what I *think* you actually really really want is a "store with
> release consistency, followed by a write barrier".

I believe that the combination of "store with release consistency" and
"load with acquire consistency" should do the trick for the two use cases
at this point, which again are circular buffers and rcu_assign_pointer().
At this point, I don't see the need for "followed by a write barrier".
But I step through the circular buffers below.

> In TSO, afaik all stores have release consistency, and all writes are
> ordered, which is why this is a no-op in TSO. And x86 also has that
> "all stores have release consistency, and all writes are ordered"
> model, even if TSO doesn't really describe the x86 model.

Yep, as does the mainframe.  And these architectures also have all reads
having acquire consistency.

> But on ARM64, for example, I think you'd really want the store itself
> to be done with "stlr" (store with release), and then follow up with a
> "dsb st" after that.

Agree with the "stlr" but don't (yet, anyway) understand the need for
a subsequent "dsb st".

> And notice how that requires you to mark the store itself. There is no
> actual barrier *after* the store that does the optimized model.

And marking the store itself is a very good thing from my viewpoint.

> Of course, it's entirely possible that it's not worth worrying about
> this on ARM64, and that just doing it as a "normal store followed by a
> full memory barrier" is good enough. But at least in *theory* a
> microarchitecture might make it much cheaper to do a "store with
> release consistency" followed by "write barrier".
> 
> Anyway, having talked exhaustively about exactly what semantics you
> are after, I *think* the best model would be to just have a
> 
>   #define smp_store_with_release_semantics(x, y) ...
> 
> and use that *and* a "smp_wmb()" for this (possibly a special
> "smp_wmb_after_release()" if that allows people to avoid double
> barriers). On x86 (and TSO systems), the
> smp_store_with_release_semantics() would be just a regular store, and
> the smp_wmb() is obviously a no-op. Other platforms would end up doing
> other things.
> 
> Hmm?

OK, something like this for the definitions (though PowerPC might want
to locally abstract the lwsync expansion):

	#define smp_store_with_release_semantics(p, v) /* x86, s390, etc. */ \
	do { \
		barrier(); \
		ACCESS_ONCE(p) = (v); \
	} while (0)

	#define smp_store_with_release_semantics(p, v) /* PowerPC. */ \
	do { \
		__asm__ __volatile__ (stringify_in_c(LWSYNC) : : :"memory"); \
		ACCESS_ONCE(p) = (v); \
	} while (0)

	#define smp_load_with_acquire_semantics(p) /* x86, s390, etc. */ \
	({ \
		typeof(*p) *_________p1 = ACCESS_ONCE(p); \
		barrier(); \
		_________p1; \
	})

	#define smp_load_with_acquire_semantics(p) /* PowerPC. */ \
	({ \
		typeof(*p) *_________p1 = ACCESS_ONCE(p); \
		__asm__ __volatile__ (stringify_in_c(LWSYNC) : : :"memory"); \
		_________p1; \
	})

For ARM, smp_load_with_acquire_semantics() is a wrapper around the ARM
"ldar" instruction and smp_store_with_release_semantics() is a wrapper
around the ARM "stlr" instruction.

Then if I am not too confused (and I would expect Victor to let me know
in short order if I am), the following patch to the current mainline
version of Documentation/circular-buffers.txt would suffice.

Thoughts?

							Thanx, Paul

------------------------------------------------------------------------

diff --git a/Documentation/circular-buffers.txt b/Documentation/circular-buffers.txt
index 8117e5bf6065..1846044bf6cc 100644
--- a/Documentation/circular-buffers.txt
+++ b/Documentation/circular-buffers.txt
@@ -160,6 +160,7 @@ The producer will look something like this:
 	spin_lock(&producer_lock);
 
 	unsigned long head = buffer->head;
+	/* The spin_unlock() and next spin_lock() provide needed ordering. */
 	unsigned long tail = ACCESS_ONCE(buffer->tail);
 
 	if (CIRC_SPACE(head, tail, buffer->size) >= 1) {
@@ -168,9 +169,8 @@ The producer will look something like this:
 
 		produce_item(item);
 
-		smp_wmb(); /* commit the item before incrementing the head */
-
-		buffer->head = (head + 1) & (buffer->size - 1);
+		smp_store_with_release_semantics(buffer->head,
+						 (head + 1) & (buffer->size - 1));
 
 		/* wake_up() will make sure that the head is committed before
 		 * waking anyone up */
@@ -183,9 +183,14 @@ This will instruct the CPU that the contents of the new item must be written
 before the head index makes it available to the consumer and then instructs the
 CPU that the revised head index must be written before the consumer is woken.
 
-Note that wake_up() doesn't have to be the exact mechanism used, but whatever
-is used must guarantee a (write) memory barrier between the update of the head
-index and the change of state of the consumer, if a change of state occurs.
+Note that wake_up() does not guarantee any sort of barrier unless something
+is actually awakened.  We therefore cannot rely on it for ordering.  However,
+there is always one element of the array left empty.  Therefore, the
+producer must produce two elements before it could possibly corrupt the
+element currently being read by the consumer.  Therefore, the unlock-lock
+pair between consecutive invocations of the consumer provides the necessary
+ordering between the read of the index indicating that the consumer has
+vacated a given element and the write by the producer to that same element.
 
 
 THE CONSUMER
@@ -195,21 +200,18 @@ The consumer will look something like this:
 
 	spin_lock(&consumer_lock);
 
-	unsigned long head = ACCESS_ONCE(buffer->head);
+	unsigned long head = smp_load_with_acquire_semantics(buffer->head);
 	unsigned long tail = buffer->tail;
 
 	if (CIRC_CNT(head, tail, buffer->size) >= 1) {
-		/* read index before reading contents at that index */
-		smp_read_barrier_depends();
 
 		/* extract one item from the buffer */
 		struct item *item = buffer[tail];
 
 		consume_item(item);
 
-		smp_mb(); /* finish reading descriptor before incrementing tail */
-
-		buffer->tail = (tail + 1) & (buffer->size - 1);
+		smp_store_with_release_semantics(buffer->tail,
+						 (tail + 1) & (buffer->size - 1));
 	}
 
 	spin_unlock(&consumer_lock);
@@ -218,12 +220,17 @@ This will instruct the CPU to make sure the index is up to date before reading
 the new item, and then it shall make sure the CPU has finished reading the item
 before it writes the new tail pointer, which will erase the item.
 
-
-Note the use of ACCESS_ONCE() in both algorithms to read the opposition index.
-This prevents the compiler from discarding and reloading its cached value -
-which some compilers will do across smp_read_barrier_depends().  This isn't
-strictly needed if you can be sure that the opposition index will _only_ be
-used the once.
+Note the use of ACCESS_ONCE() and smp_load_with_acquire_semantics()
+to read the opposition index.  This prevents the compiler from
+discarding and reloading its cached value - which some compilers will
+do across smp_read_barrier_depends().  This isn't strictly needed
+if you can be sure that the opposition index will _only_ be used
+the once.  The smp_load_with_acquire_semantics() additionally forces
+the CPU to order against subsequent memory references.  Similarly,
+smp_store_with_release_semantics() is used in both algorithms to write
+the thread's index.  This documents the fact that we are writing to
+something that can be read concurrently, prevents the compiler from
+tearing the store, and enforces ordering against previous accesses.
 
 
 ===============

^ permalink raw reply related

* Re: [RFC] arch: Introduce new TSO memory barrier smp_tmb()
From: Will Deacon @ 2013-11-04 11:05 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Michael Neuling, Mathieu Desnoyers, Peter Zijlstra, Oleg Nesterov,
	LKML, Linux PPC dev, Anton Blanchard, Frederic Weisbecker,
	Victor Kaplansky, Paul McKenney
In-Reply-To: <CA+55aFyD_kCkAHQwHCUBrumO-pH6LaZikTNvyWDW_tWsHdqk6Q@mail.gmail.com>

On Sun, Nov 03, 2013 at 11:34:00PM +0000, Linus Torvalds wrote:
> So it would *kind* of act like a "smp_wmb() + smp_rmb()", but the
> problem is that a "smp_rmb()" doesn't really "attach" to the preceding
> write.

Agreed.

> This is analogous to a "acquire" operation: you cannot make an
> "acquire" barrier, because it's not a barrier *between* two ops, it's
> associated with one particular op.
> 
> So what I *think* you actually really really want is a "store with
> release consistency, followed by a write barrier".

How does that order reads against reads? (Paul mentioned this as a
requirement). I not clear about the use case for this, so perhaps there is a
dependency that I'm not aware of.

> In TSO, afaik all stores have release consistency, and all writes are
> ordered, which is why this is a no-op in TSO. And x86 also has that
> "all stores have release consistency, and all writes are ordered"
> model, even if TSO doesn't really describe the x86 model.
> 
> But on ARM64, for example, I think you'd really want the store itself
> to be done with "stlr" (store with release), and then follow up with a
> "dsb st" after that.

So a dsb is pretty heavyweight here (it prevents execution of *any* further
instructions until all preceeding stores have completed, as well as
ensuring completion of any ongoing cache flushes). In conjunction with the
store-release, that's going to hold everything up until the store-release
(and therefore any preceeding memory accesses) have completed. Granted, I
think that gives Paul his read/read ordering, but it's a lot heavier than
what's required.

> And notice how that requires you to mark the store itself. There is no
> actual barrier *after* the store that does the optimized model.
> 
> Of course, it's entirely possible that it's not worth worrying about
> this on ARM64, and that just doing it as a "normal store followed by a
> full memory barrier" is good enough. But at least in *theory* a
> microarchitecture might make it much cheaper to do a "store with
> release consistency" followed by "write barrier".

I agree with the sentiment but, given that this stuff is so heavily
microarchitecture-dependent (and not simple to probe), a simple dmb ish
might be the best option after all. That's especially true if the
microarchitecture decided to ignore the barrier options and treat everything
as `all accesses, full system' in order to keep the hardware design simple.

Will

^ permalink raw reply

* Re: [RFC] arch: Introduce new TSO memory barrier smp_tmb()
From: Peter Zijlstra @ 2013-11-04 11:22 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Michael Neuling, Mathieu Desnoyers, Oleg Nesterov, LKML,
	Linux PPC dev, Anton Blanchard, Frederic Weisbecker,
	Victor Kaplansky, Linus Torvalds
In-Reply-To: <20131104105059.GL3947@linux.vnet.ibm.com>

On Mon, Nov 04, 2013 at 02:51:00AM -0800, Paul E. McKenney wrote:
> OK, something like this for the definitions (though PowerPC might want
> to locally abstract the lwsync expansion):
> 
> 	#define smp_store_with_release_semantics(p, v) /* x86, s390, etc. */ \
> 	do { \
> 		barrier(); \
> 		ACCESS_ONCE(p) = (v); \
> 	} while (0)
> 
> 	#define smp_store_with_release_semantics(p, v) /* PowerPC. */ \
> 	do { \
> 		__asm__ __volatile__ (stringify_in_c(LWSYNC) : : :"memory"); \
> 		ACCESS_ONCE(p) = (v); \
> 	} while (0)
> 
> 	#define smp_load_with_acquire_semantics(p) /* x86, s390, etc. */ \
> 	({ \
> 		typeof(*p) *_________p1 = ACCESS_ONCE(p); \
> 		barrier(); \
> 		_________p1; \
> 	})
> 
> 	#define smp_load_with_acquire_semantics(p) /* PowerPC. */ \
> 	({ \
> 		typeof(*p) *_________p1 = ACCESS_ONCE(p); \
> 		__asm__ __volatile__ (stringify_in_c(LWSYNC) : : :"memory"); \
> 		_________p1; \
> 	})
> 
> For ARM, smp_load_with_acquire_semantics() is a wrapper around the ARM
> "ldar" instruction and smp_store_with_release_semantics() is a wrapper
> around the ARM "stlr" instruction.

This still leaves me confused as to what to do with my case :/

Slightly modified since last time -- as the simplified version was maybe
simplified too far.

To recap, I'd like to get rid of barrier A where possible, since that's
now a full barrier for every event written.

However, there's no immediate store I can attach it to; the obvious one
would be the kbuf->head store, but that's complicated by the
local_cmpxchg() thing.

And we need that cmpxchg loop because a hardware NMI event can
interleave with a software event.

And to be honest, I'm still totally confused about memory barriers vs
control flow vs C/C++. The only way we're ever getting to that memcpy is
if we've already observed ubuf->tail, so that LOAD has to be fully
processes and completed.

I'm really not seeing how a STORE from the memcpy() could possibly go
wrong; and if C/C++ can hoist the memcpy() over a compiler barrier()
then I suppose we should all just go home.

/me who wants A to be a barrier() but is terminally confused.

---


/*
 * One important detail is that the kbuf part and the kbuf_writer() are
 * strictly per cpu and we can thus rely on program order for those.
 *
 * Only the userspace consumer can possibly run on another cpu, and thus we
 * need to ensure data consistency for those.
 */

struct buffer {
        u64 size;
        u64 tail;
        u64 head;
        void *data;
};

struct buffer *kbuf, *ubuf;

/*
 * If there's space in the buffer; store the data @buf; otherwise
 * discard it.
 */
void kbuf_write(int sz, void *buf)
{
	u64 tail, head, offset;

	do {
		tail = ACCESS_ONCE(ubuf->tail);
		offset = head = kbuf->head;
		if (CIRC_SPACE(head, tail, kbuf->size) < sz) {
			/* discard @buf */
			return;
		}
		head += sz;
	} while (local_cmpxchg(&kbuf->head, offset, head) != offset)

        /*
         * Ensure that if we see the userspace tail (ubuf->tail) such
         * that there is space to write @buf without overwriting data
         * userspace hasn't seen yet, we won't in fact store data before
         * that read completes.
         */

        smp_mb(); /* A, matches with D */

        memcpy(kbuf->data + offset, buf, sz);

        /*
         * Ensure that we write all the @buf data before we update the
         * userspace visible ubuf->head pointer.
         */
        smp_wmb(); /* B, matches with C */

        ubuf->head = kbuf->head;
}

/*
 * Consume the buffer data and update the tail pointer to indicate to
 * kernel space there's 'free' space.
 */
void ubuf_read(void)
{
        u64 head, tail;

        tail = ACCESS_ONCE(ubuf->tail);
        head = ACCESS_ONCE(ubuf->head);

        /*
         * Ensure we read the buffer boundaries before the actual buffer
         * data...
         */
        smp_rmb(); /* C, matches with B */

        while (tail != head) {
                obj = ubuf->data + tail;
                /* process obj */
                tail += obj->size;
                tail %= ubuf->size;
        }

        /*
         * Ensure all data reads are complete before we issue the
         * ubuf->tail update; once that update hits, kbuf_write() can
         * observe and overwrite data.
         */
        smp_mb(); /* D, matches with A */

        ubuf->tail = tail;
}

^ permalink raw reply

* Re: [V2 PATCH 3/3] powerpc: Fix Unaligned LE Floating Point Loads and Stores
From: Tom Musta @ 2013-11-04 13:29 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev
In-Reply-To: <1383532491.4776.42.camel@pasglop>

On 11/3/2013 8:34 PM, Benjamin Herrenschmidt wrote:
> On Thu, 2013-10-31 at 13:38 -0500, Tom wrote:
>> From: Tom Musta <tommusta@gmail.com>
>>
>> This patch addresses unaligned single precision floating point loads
>> and stores in the single-step code.  The old implementation
>> improperly treated an 8 byte structure as an array of two 4 byte
>> words, which is a classic little endian bug.
> 
> Do that patch differ from v1 ? I also already merged v1 of this
> one (the only one I didn't merge is the emulate_step one)
> 
> Cheers,
> Ben.
>

Ben:  Only patch 1/3 (Enable emulate_step in Little Endian Mode) differs in V2.

^ permalink raw reply

* Re: [PATCHv2 1/8] ALSA: Add SAI SoC Digital Audio Interface driver.
From: Mark Brown @ 2013-11-04 16:15 UTC (permalink / raw)
  To: Li Xiubo
  Cc: mark.rutland@arm.com, alsa-devel@alsa-project.org,
	linux-doc@vger.kernel.org, tiwai@suse.de, Huan Wang,
	timur@tabi.org, perex@perex.cz, Shawn Guo, LW@KARO-electronics.de,
	linux@arm.linux.org.uk, Guangyu Chen,
	linux-arm-kernel@lists.infradead.org, grant.likely@linaro.org,
	devicetree@vger.kernel.org, ian.campbell@citrix.com,
	pawel.moll@arm.com, swarren@wwwdotorg.org,
	rob.herring@calxeda.com, oskar@scara.com, Fabio Estevam,
	lgirdwood@gmail.com, linux-kernel@vger.kernel.org,
	rob@landley.net, Zhengxiong Jin, shawn.guo@linaro.org,
	linuxppc-dev@lists.ozlabs.org
In-Reply-To: <1DD289F6464F0949A2FCA5AA6DC23F8287439D@039-SN2MPN1-013.039d.mgd.msft.net>

[-- Attachment #1: Type: text/plain, Size: 2209 bytes --]

On Mon, Nov 04, 2013 at 07:35:12AM +0000, Li Xiubo wrote:

> From the ASoC subsystem comments we can see that:
> ++++++
> Configures the clock dividers. This is used to derive the best DAI bit and
> frame clocks from the system or master clock. It's best to set the DAI bit
> and frame clocks as low as possible to save system power.
> ------

You should never use this unless you have to, there is no point in every
single machine driver using your driver having to duplicate the same
calculations.

> 
> > > +static int fsl_sai_dai_probe(struct snd_soc_dai *dai) {
> > > +	int ret;
> > > +	struct fsl_sai *sai = dev_get_drvdata(dai->dev);
> > > +
> > > +	ret = clk_prepare_enable(sai->clk);
> > > +	if (ret)
> > > +		return ret;

> > It'd be nicer to only enable the clock while the device is in active use.

> While if the module clock is not enabled here, the followed registers cannot read/write in the same function.
> And this _probe function is the _dai_probe not the driver's module _probe.

So you can enable the clock when you explicitly need to write to the
registers...

> If the clk_prepare_enable(sai->clk) is not here, where should it be will be nicer ?
> One of the following functions ?
>         .set_sysclk     = fsl_sai_set_dai_sysclk,
>         .set_clkdiv     = fsl_sai_set_dai_clkdiv,
>         .set_fmt        = fsl_sai_set_dai_fmt,
>         .set_tdm_slot   = fsl_sai_set_dai_tdm_slot,
>         .hw_params      = fsl_sai_hw_params,
>         .trigger        = fsl_sai_trigger,

It could be in any or all of them except trigger (where the core should
hold a runtime PM reference anyway).  You can always take a reference
for the duration of the function if you're concerned it may be called
when the referent isn't otherwise held.

> > > +	ret = snd_dmaengine_pcm_register(&pdev->dev, NULL,
> > > +			SND_DMAENGINE_PCM_FLAG_NO_RESIDUE);
> > > +	if (ret)
> > > +		return ret;

> > We should have a devm_ version of this.

> Sorry, is there one patch for adding the devm_ version of snd_dmaengine_pcm_register() already ?
> In the -next and other topics branches I could not find it.

No, there isn't one but there should be one.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: [RFC] arch: Introduce new TSO memory barrier smp_tmb()
From: Paul E. McKenney @ 2013-11-04 16:27 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Michael Neuling, Mathieu Desnoyers, Oleg Nesterov, LKML,
	Linux PPC dev, Anton Blanchard, Frederic Weisbecker,
	Victor Kaplansky, Linus Torvalds
In-Reply-To: <20131104112254.GK28601@twins.programming.kicks-ass.net>

On Mon, Nov 04, 2013 at 12:22:54PM +0100, Peter Zijlstra wrote:
> On Mon, Nov 04, 2013 at 02:51:00AM -0800, Paul E. McKenney wrote:
> > OK, something like this for the definitions (though PowerPC might want
> > to locally abstract the lwsync expansion):
> > 
> > 	#define smp_store_with_release_semantics(p, v) /* x86, s390, etc. */ \
> > 	do { \
> > 		barrier(); \
> > 		ACCESS_ONCE(p) = (v); \
> > 	} while (0)
> > 
> > 	#define smp_store_with_release_semantics(p, v) /* PowerPC. */ \
> > 	do { \
> > 		__asm__ __volatile__ (stringify_in_c(LWSYNC) : : :"memory"); \
> > 		ACCESS_ONCE(p) = (v); \
> > 	} while (0)
> > 
> > 	#define smp_load_with_acquire_semantics(p) /* x86, s390, etc. */ \
> > 	({ \
> > 		typeof(*p) *_________p1 = ACCESS_ONCE(p); \
> > 		barrier(); \
> > 		_________p1; \
> > 	})
> > 
> > 	#define smp_load_with_acquire_semantics(p) /* PowerPC. */ \
> > 	({ \
> > 		typeof(*p) *_________p1 = ACCESS_ONCE(p); \
> > 		__asm__ __volatile__ (stringify_in_c(LWSYNC) : : :"memory"); \
> > 		_________p1; \
> > 	})
> > 
> > For ARM, smp_load_with_acquire_semantics() is a wrapper around the ARM
> > "ldar" instruction and smp_store_with_release_semantics() is a wrapper
> > around the ARM "stlr" instruction.
> 
> This still leaves me confused as to what to do with my case :/
> 
> Slightly modified since last time -- as the simplified version was maybe
> simplified too far.
> 
> To recap, I'd like to get rid of barrier A where possible, since that's
> now a full barrier for every event written.
> 
> However, there's no immediate store I can attach it to; the obvious one
> would be the kbuf->head store, but that's complicated by the
> local_cmpxchg() thing.
> 
> And we need that cmpxchg loop because a hardware NMI event can
> interleave with a software event.
> 
> And to be honest, I'm still totally confused about memory barriers vs
> control flow vs C/C++. The only way we're ever getting to that memcpy is
> if we've already observed ubuf->tail, so that LOAD has to be fully
> processes and completed.
> 
> I'm really not seeing how a STORE from the memcpy() could possibly go
> wrong; and if C/C++ can hoist the memcpy() over a compiler barrier()
> then I suppose we should all just go home.
> 
> /me who wants A to be a barrier() but is terminally confused.

Well, let's see...

> ---
> 
> 
> /*
>  * One important detail is that the kbuf part and the kbuf_writer() are
>  * strictly per cpu and we can thus rely on program order for those.
>  *
>  * Only the userspace consumer can possibly run on another cpu, and thus we
>  * need to ensure data consistency for those.
>  */
> 
> struct buffer {
>         u64 size;
>         u64 tail;
>         u64 head;
>         void *data;
> };
> 
> struct buffer *kbuf, *ubuf;
> 
> /*
>  * If there's space in the buffer; store the data @buf; otherwise
>  * discard it.
>  */
> void kbuf_write(int sz, void *buf)
> {
> 	u64 tail, head, offset;
> 
> 	do {
> 		tail = ACCESS_ONCE(ubuf->tail);

So the above load is the key load.  It determines whether or not we
have space in the buffer.  This of course assumes that only this CPU
writes to ->head.

If so, then:

		tail = smp_load_with_acquire_semantics(ubuf->tail); /* A -> D */

> 		offset = head = kbuf->head;
> 		if (CIRC_SPACE(head, tail, kbuf->size) < sz) {
> 			/* discard @buf */
> 			return;
> 		}
> 		head += sz;
> 	} while (local_cmpxchg(&kbuf->head, offset, head) != offset)

If there is an issue with kbuf->head, presumably local_cmpxchg() fails
and we retry.

But sheesh, do you think we could have buried the definitions of
local_cmpxchg() under a few more layers of macro expansion just to
keep things more obscure?  Anyway, griping aside...

o	__cmpxchg_local_generic() in include/asm-generic/cmpxchg-local.h
	doesn't seem to exclude NMIs, so is not safe for this usage.

o	__cmpxchg_local() in ARM handles NMI as long as the
	argument is 32 bits, otherwise, it uses the aforementionted
	__cmpxchg_local_generic(), which does not handle NMI.  Given your
	u64, this does not look good...

	And some ARM subarches (e.g., metag) seem to fail to handle NMI
	even in the 32-bit case.

o	FRV and M32r seem to act similar to ARM.

Or maybe these architectures don't do NMIs?  If they do, local_cmpxchg()
does not seem to be safe against NMIs in general.  :-/

That said, powerpc, 64-bit s390, sparc, and x86 seem to handle it.

Of course, x86's local_cmpxchg() has full memory barriers implicitly.

> 
>         /*
>          * Ensure that if we see the userspace tail (ubuf->tail) such
>          * that there is space to write @buf without overwriting data
>          * userspace hasn't seen yet, we won't in fact store data before
>          * that read completes.
>          */
> 
>         smp_mb(); /* A, matches with D */

Given a change to smp_load_with_acquire_semantics() above, you should not
need this smp_mb().

>         memcpy(kbuf->data + offset, buf, sz);
> 
>         /*
>          * Ensure that we write all the @buf data before we update the
>          * userspace visible ubuf->head pointer.
>          */
>         smp_wmb(); /* B, matches with C */
> 
>         ubuf->head = kbuf->head;

Replace the smp_wmb() and the assignment with:

	smp_store_with_release_semantics(ubuf->head, kbuf->head); /* B -> C */

> }
> 
> /*
>  * Consume the buffer data and update the tail pointer to indicate to
>  * kernel space there's 'free' space.
>  */
> void ubuf_read(void)
> {
>         u64 head, tail;
> 
>         tail = ACCESS_ONCE(ubuf->tail);

Does anyone else write tail?  Or is this defense against NMIs?

If no one else writes to tail and if NMIs cannot muck things up, then
the above ACCESS_ONCE() is not needed, though I would not object to its
staying.

>         head = ACCESS_ONCE(ubuf->head);

Make the above be:

	head = smp_load_with_acquire_semantics(ubuf->head);  /* C -> B */

>         /*
>          * Ensure we read the buffer boundaries before the actual buffer
>          * data...
>          */
>         smp_rmb(); /* C, matches with B */

And drop the above memory barrier.

>         while (tail != head) {
>                 obj = ubuf->data + tail;
>                 /* process obj */
>                 tail += obj->size;
>                 tail %= ubuf->size;
>         }
> 
>         /*
>          * Ensure all data reads are complete before we issue the
>          * ubuf->tail update; once that update hits, kbuf_write() can
>          * observe and overwrite data.
>          */
>         smp_mb(); /* D, matches with A */
> 
>         ubuf->tail = tail;

Replace the above barrier and the assignment with:

	smp_store_with_release_semantics(ubuf->tail, tail); /* D -> B. */

> }

All this is leading me to suggest the following shortenings of names:

	smp_load_with_acquire_semantics() -> smp_load_acquire()

	smp_store_with_release_semantics() -> smp_store_release()

But names aside, the above gets rid of explicit barriers on TSO architectures,
allows ARM to avoid full DMB, and allows PowerPC to use lwsync instead of
the heavier-weight sync.

								Thanx, Paul

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox