linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] ARM: fix cpu_relax() in case of doing dmb
@ 2012-08-22 14:52 Shawn Guo
  2012-08-23  2:47 ` Hui Wang
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Shawn Guo @ 2012-08-22 14:52 UTC (permalink / raw)
  To: linux-arm-kernel

There is an issue reported on imx6q restart function.  The issue is only
seen with the image building ARMv7 and ARMv6 together, where cpu_relax()
is define to do dmb.  It's been root-caused by Russell as below.

Russell King - ARM Linux wrote:
> I suspect having this dmb inside cpu_relax() is flooding the
> interconnects with traffic, which then prevents other CPUs getting
> a look-in (maybe there's no fairness when it comes to dmb's.

Fix the issue by insert a few NOPs into cpu_relax() where doing dmb.

Cc: <stable@vger.kernel.org>
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
---
 arch/arm/include/asm/processor.h |    9 ++++++++-
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h
index 99afa74..7cc67ce 100644
--- a/arch/arm/include/asm/processor.h
+++ b/arch/arm/include/asm/processor.h
@@ -80,7 +80,14 @@ extern void release_thread(struct task_struct *);
 unsigned long get_wchan(struct task_struct *p);
 
 #if __LINUX_ARM_ARCH__ == 6 || defined(CONFIG_ARM_ERRATA_754327)
-#define cpu_relax()			smp_mb()
+#define cpu_relax()		do {					\
+					asm("nop");			\
+					asm("nop");			\
+					asm("nop");			\
+					asm("nop");			\
+					asm("nop");			\
+					smp_mb();			\
+				} while (0)
 #else
 #define cpu_relax()			barrier()
 #endif
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH] ARM: fix cpu_relax() in case of doing dmb
  2012-08-22 14:52 [PATCH] ARM: fix cpu_relax() in case of doing dmb Shawn Guo
@ 2012-08-23  2:47 ` Hui Wang
  2012-08-23 10:23 ` Dirk Behme
  2012-08-23 10:43 ` Will Deacon
  2 siblings, 0 replies; 9+ messages in thread
From: Hui Wang @ 2012-08-23  2:47 UTC (permalink / raw)
  To: linux-arm-kernel

Shawn Guo wrote:
> There is an issue reported on imx6q restart function.  The issue is only
> seen with the image building ARMv7 and ARMv6 together, where cpu_relax()
> is define to do dmb.  It's been root-caused by Russell as below.
>
> Russell King - ARM Linux wrote:
>   
>> I suspect having this dmb inside cpu_relax() is flooding the
>> interconnects with traffic, which then prevents other CPUs getting
>> a look-in (maybe there's no fairness when it comes to dmb's.
>>     
>
> Fix the issue by insert a few NOPs into cpu_relax() where doing dmb.
>   
Tested-by: Hui Wang <jason77.wang@gmail.com>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] ARM: fix cpu_relax() in case of doing dmb
  2012-08-22 14:52 [PATCH] ARM: fix cpu_relax() in case of doing dmb Shawn Guo
  2012-08-23  2:47 ` Hui Wang
@ 2012-08-23 10:23 ` Dirk Behme
  2012-08-23 10:43 ` Will Deacon
  2 siblings, 0 replies; 9+ messages in thread
From: Dirk Behme @ 2012-08-23 10:23 UTC (permalink / raw)
  To: linux-arm-kernel

On 22.08.2012 16:52, Shawn Guo wrote:
> There is an issue reported on imx6q restart function.  The issue is only
> seen with the image building ARMv7 and ARMv6 together, where cpu_relax()
> is define to do dmb.  It's been root-caused by Russell as below.
> 
> Russell King - ARM Linux wrote:
>> I suspect having this dmb inside cpu_relax() is flooding the
>> interconnects with traffic, which then prevents other CPUs getting
>> a look-in (maybe there's no fairness when it comes to dmb's.
> 
> Fix the issue by insert a few NOPs into cpu_relax() where doing dmb.
> 
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Shawn Guo <shawn.guo@linaro.org>

Tested on a i.MX6 SabreLite board together with the patch 'ARM: imx6q: 
remove imx_src_prepare_restart() call':

Tested-by: Dirk Behme <dirk.behme@de.bosch.com>

Thanks

Dirk

> ---
>  arch/arm/include/asm/processor.h |    9 ++++++++-
>  1 files changed, 8 insertions(+), 1 deletions(-)
> 
> diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h
> index 99afa74..7cc67ce 100644
> --- a/arch/arm/include/asm/processor.h
> +++ b/arch/arm/include/asm/processor.h
> @@ -80,7 +80,14 @@ extern void release_thread(struct task_struct *);
>  unsigned long get_wchan(struct task_struct *p);
>  
>  #if __LINUX_ARM_ARCH__ == 6 || defined(CONFIG_ARM_ERRATA_754327)
> -#define cpu_relax()			smp_mb()
> +#define cpu_relax()		do {					\
> +					asm("nop");			\
> +					asm("nop");			\
> +					asm("nop");			\
> +					asm("nop");			\
> +					asm("nop");			\
> +					smp_mb();			\
> +				} while (0)
>  #else
>  #define cpu_relax()			barrier()
>  #endif

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] ARM: fix cpu_relax() in case of doing dmb
  2012-08-22 14:52 [PATCH] ARM: fix cpu_relax() in case of doing dmb Shawn Guo
  2012-08-23  2:47 ` Hui Wang
  2012-08-23 10:23 ` Dirk Behme
@ 2012-08-23 10:43 ` Will Deacon
  2012-08-23 13:58   ` Shawn Guo
  2012-08-24  1:10   ` Hui Wang
  2 siblings, 2 replies; 9+ messages in thread
From: Will Deacon @ 2012-08-23 10:43 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Aug 22, 2012 at 03:52:18PM +0100, Shawn Guo wrote:
> diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h
> index 99afa74..7cc67ce 100644
> --- a/arch/arm/include/asm/processor.h
> +++ b/arch/arm/include/asm/processor.h
> @@ -80,7 +80,14 @@ extern void release_thread(struct task_struct *);
>  unsigned long get_wchan(struct task_struct *p);
>  
>  #if __LINUX_ARM_ARCH__ == 6 || defined(CONFIG_ARM_ERRATA_754327)
> -#define cpu_relax()			smp_mb()
> +#define cpu_relax()		do {					\
> +					asm("nop");			\
> +					asm("nop");			\
> +					asm("nop");			\
> +					asm("nop");			\
> +					asm("nop");			\

Can you use nop() instead of the explicit asm? Also, I think we should try
and use some methodology on deciding the number of nops to insert. Without
having a full handle on the problem at the moment, it would seem that we
need at least NR_CPUS worth (since the number of spinning secondaries is
NR_CPUS-1 and they may execute their barriers in lock-step).

Will

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] ARM: fix cpu_relax() in case of doing dmb
  2012-08-23 10:43 ` Will Deacon
@ 2012-08-23 13:58   ` Shawn Guo
  2012-08-23 18:31     ` Jon Medhurst (Tixy)
  2012-08-24  1:10   ` Hui Wang
  1 sibling, 1 reply; 9+ messages in thread
From: Shawn Guo @ 2012-08-23 13:58 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Aug 23, 2012 at 11:43:56AM +0100, Will Deacon wrote:
> On Wed, Aug 22, 2012 at 03:52:18PM +0100, Shawn Guo wrote:
> > diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h
> > index 99afa74..7cc67ce 100644
> > --- a/arch/arm/include/asm/processor.h
> > +++ b/arch/arm/include/asm/processor.h
> > @@ -80,7 +80,14 @@ extern void release_thread(struct task_struct *);
> >  unsigned long get_wchan(struct task_struct *p);
> >  
> >  #if __LINUX_ARM_ARCH__ == 6 || defined(CONFIG_ARM_ERRATA_754327)
> > -#define cpu_relax()			smp_mb()
> > +#define cpu_relax()		do {					\
> > +					asm("nop");			\
> > +					asm("nop");			\
> > +					asm("nop");			\
> > +					asm("nop");			\
> > +					asm("nop");			\
> 
> Can you use nop() instead of the explicit asm?

Yes.  I just tried, and it works too.

> Also, I think we should try
> and use some methodology on deciding the number of nops to insert. Without
> having a full handle on the problem at the moment, it would seem that we
> need at least NR_CPUS worth (since the number of spinning secondaries is
> NR_CPUS-1 and they may execute their barriers in lock-step).
> 
I'm not sure we get something like that.  In my testing here, I need
at least 5 nops to get rid of the issue.

-- 
Regards,
Shawn

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] ARM: fix cpu_relax() in case of doing dmb
  2012-08-23 13:58   ` Shawn Guo
@ 2012-08-23 18:31     ` Jon Medhurst (Tixy)
  2012-08-24  1:15       ` Shawn Guo
  0 siblings, 1 reply; 9+ messages in thread
From: Jon Medhurst (Tixy) @ 2012-08-23 18:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 2012-08-23 at 21:58 +0800, Shawn Guo wrote:
> On Thu, Aug 23, 2012 at 11:43:56AM +0100, Will Deacon wrote:
> > On Wed, Aug 22, 2012 at 03:52:18PM +0100, Shawn Guo wrote:
> > > diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h
> > > index 99afa74..7cc67ce 100644
> > > --- a/arch/arm/include/asm/processor.h
> > > +++ b/arch/arm/include/asm/processor.h
> > > @@ -80,7 +80,14 @@ extern void release_thread(struct task_struct *);
> > >  unsigned long get_wchan(struct task_struct *p);
> > >  
> > >  #if __LINUX_ARM_ARCH__ == 6 || defined(CONFIG_ARM_ERRATA_754327)
> > > -#define cpu_relax()			smp_mb()
> > > +#define cpu_relax()		do {					\
> > > +					asm("nop");			\
> > > +					asm("nop");			\
> > > +					asm("nop");			\
> > > +					asm("nop");			\
> > > +					asm("nop");			\
> > 
> > Can you use nop() instead of the explicit asm?
> 
> Yes.  I just tried, and it works too.
> 
> > Also, I think we should try
> > and use some methodology on deciding the number of nops to insert. Without
> > having a full handle on the problem at the moment, it would seem that we
> > need at least NR_CPUS worth (since the number of spinning secondaries is
> > NR_CPUS-1 and they may execute their barriers in lock-step).
> > 
> I'm not sure we get something like that.  In my testing here, I need
> at least 5 nops to get rid of the issue.

Doesn't A9 do dual issue? If so, the maths for your 4 core iMX6Q might
match up with Will's hypothesis. You could try the theory by building
say with CONFIG_NR_CPUS == 3.

-- 
Tixy 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] ARM: fix cpu_relax() in case of doing dmb
  2012-08-23 10:43 ` Will Deacon
  2012-08-23 13:58   ` Shawn Guo
@ 2012-08-24  1:10   ` Hui Wang
  1 sibling, 0 replies; 9+ messages in thread
From: Hui Wang @ 2012-08-24  1:10 UTC (permalink / raw)
  To: linux-arm-kernel

Will Deacon wrote:
> On Wed, Aug 22, 2012 at 03:52:18PM +0100, Shawn Guo wrote:
>   
>> diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h
>> index 99afa74..7cc67ce 100644
>> --- a/arch/arm/include/asm/processor.h
>> +++ b/arch/arm/include/asm/processor.h
>> @@ -80,7 +80,14 @@ extern void release_thread(struct task_struct *);
>>  unsigned long get_wchan(struct task_struct *p);
>>  
>>  #if __LINUX_ARM_ARCH__ == 6 || defined(CONFIG_ARM_ERRATA_754327)
>> -#define cpu_relax()			smp_mb()
>> +#define cpu_relax()		do {					\
>> +					asm("nop");			\
>> +					asm("nop");			\
>> +					asm("nop");			\
>> +					asm("nop");			\
>> +					asm("nop");			\
>>     
>
> Can you use nop() instead of the explicit asm? Also, I think we should try
> and use some methodology on deciding the number of nops to insert. Without
> having a full handle on the problem at the moment, it would seem that we
> need at least NR_CPUS worth (since the number of spinning secondaries is
> NR_CPUS-1 and they may execute their barriers in lock-step).
>   
Your concern sounds reasonable, but i did a test, the result show there 
is no explicit relation between NR_CPUS and the number of nop needed.

NR_CPUS = 4 and NR_CPUS = 2 need at least the same number of nop.


Regards,
Hui.

> Will
>
>   

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] ARM: fix cpu_relax() in case of doing dmb
  2012-08-23 18:31     ` Jon Medhurst (Tixy)
@ 2012-08-24  1:15       ` Shawn Guo
  2012-08-24  9:14         ` Jon Medhurst (Tixy)
  0 siblings, 1 reply; 9+ messages in thread
From: Shawn Guo @ 2012-08-24  1:15 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Aug 23, 2012 at 07:31:26PM +0100, Jon Medhurst (Tixy) wrote:
> On Thu, 2012-08-23 at 21:58 +0800, Shawn Guo wrote:
> > On Thu, Aug 23, 2012 at 11:43:56AM +0100, Will Deacon wrote:
> > > On Wed, Aug 22, 2012 at 03:52:18PM +0100, Shawn Guo wrote:
> > > > diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h
> > > > index 99afa74..7cc67ce 100644
> > > > --- a/arch/arm/include/asm/processor.h
> > > > +++ b/arch/arm/include/asm/processor.h
> > > > @@ -80,7 +80,14 @@ extern void release_thread(struct task_struct *);
> > > >  unsigned long get_wchan(struct task_struct *p);
> > > >  
> > > >  #if __LINUX_ARM_ARCH__ == 6 || defined(CONFIG_ARM_ERRATA_754327)
> > > > -#define cpu_relax()			smp_mb()
> > > > +#define cpu_relax()		do {					\
> > > > +					asm("nop");			\
> > > > +					asm("nop");			\
> > > > +					asm("nop");			\
> > > > +					asm("nop");			\
> > > > +					asm("nop");			\
> > > 
> > > Can you use nop() instead of the explicit asm?
> > 
> > Yes.  I just tried, and it works too.
> > 
> > > Also, I think we should try
> > > and use some methodology on deciding the number of nops to insert. Without
> > > having a full handle on the problem at the moment, it would seem that we
> > > need at least NR_CPUS worth (since the number of spinning secondaries is
> > > NR_CPUS-1 and they may execute their barriers in lock-step).
> > > 
> > I'm not sure we get something like that.  In my testing here, I need
> > at least 5 nops to get rid of the issue.
> 
> Doesn't A9 do dual issue?

Do you have some details about the issue to share?

> If so, the maths for your 4 core iMX6Q might
> match up with Will's hypothesis. You could try the theory by building
> say with CONFIG_NR_CPUS == 3.
> 
I'm still not quite sure about the hypothesis, but I assume you are
asking if 3 NOPs will fix the issue.  If so, the answer is NO.
I increase the number of NOP incrementally starting from 1, and the
issue remains until we have 5 NOPs in there.

-- 
Regards,
Shawn

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] ARM: fix cpu_relax() in case of doing dmb
  2012-08-24  1:15       ` Shawn Guo
@ 2012-08-24  9:14         ` Jon Medhurst (Tixy)
  0 siblings, 0 replies; 9+ messages in thread
From: Jon Medhurst (Tixy) @ 2012-08-24  9:14 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, 2012-08-24 at 09:15 +0800, Shawn Guo wrote:
> On Thu, Aug 23, 2012 at 07:31:26PM +0100, Jon Medhurst (Tixy) wrote:
> > On Thu, 2012-08-23 at 21:58 +0800, Shawn Guo wrote:
> > > On Thu, Aug 23, 2012 at 11:43:56AM +0100, Will Deacon wrote:
> > > > On Wed, Aug 22, 2012 at 03:52:18PM +0100, Shawn Guo wrote:
> > > > > diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h
> > > > > index 99afa74..7cc67ce 100644
> > > > > --- a/arch/arm/include/asm/processor.h
> > > > > +++ b/arch/arm/include/asm/processor.h
> > > > > @@ -80,7 +80,14 @@ extern void release_thread(struct task_struct *);
> > > > >  unsigned long get_wchan(struct task_struct *p);
> > > > >  
> > > > >  #if __LINUX_ARM_ARCH__ == 6 || defined(CONFIG_ARM_ERRATA_754327)
> > > > > -#define cpu_relax()			smp_mb()
> > > > > +#define cpu_relax()		do {					\
> > > > > +					asm("nop");			\
> > > > > +					asm("nop");			\
> > > > > +					asm("nop");			\
> > > > > +					asm("nop");			\
> > > > > +					asm("nop");			\
> > > > 
> > > > Can you use nop() instead of the explicit asm?
> > > 
> > > Yes.  I just tried, and it works too.
> > > 
> > > > Also, I think we should try
> > > > and use some methodology on deciding the number of nops to insert. Without
> > > > having a full handle on the problem at the moment, it would seem that we
> > > > need at least NR_CPUS worth (since the number of spinning secondaries is
> > > > NR_CPUS-1 and they may execute their barriers in lock-step).
> > > > 
> > > I'm not sure we get something like that.  In my testing here, I need
> > > at least 5 nops to get rid of the issue.
> > 
> > Doesn't A9 do dual issue?
> 
> Do you have some details about the issue to share?

I don't have any particular insight, I was just making the observation
that if CPU clock cycles executed in the loop were a consideration, then
the fact that the A9 would probably execute two nops in a clock cycle
would be pertinent.

> > If so, the maths for your 4 core iMX6Q might
> > match up with Will's hypothesis. You could try the theory by building
> > say with CONFIG_NR_CPUS == 3.
> > 
> I'm still not quite sure about the hypothesis, but I assume you are
> asking if 3 NOPs will fix the issue.  If so, the answer is NO.
> I increase the number of NOP incrementally starting from 1, and the
> issue remains until we have 5 NOPs in there.

Right, your and Hui test seems to scupper the idea that number of nops
should be related to number of CPUs. (Which I didn't really under
either.)

Perhaps the issue is related to the size of the loop buffer,
or possibly cache line boundaries?

-- 
Tixy 

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2012-08-24  9:14 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-08-22 14:52 [PATCH] ARM: fix cpu_relax() in case of doing dmb Shawn Guo
2012-08-23  2:47 ` Hui Wang
2012-08-23 10:23 ` Dirk Behme
2012-08-23 10:43 ` Will Deacon
2012-08-23 13:58   ` Shawn Guo
2012-08-23 18:31     ` Jon Medhurst (Tixy)
2012-08-24  1:15       ` Shawn Guo
2012-08-24  9:14         ` Jon Medhurst (Tixy)
2012-08-24  1:10   ` Hui Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).