* [PATCH] ARM: fix cpu_relax() in case of doing dmb
@ 2012-08-22 14:52 Shawn Guo
2012-08-23 2:47 ` Hui Wang
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Shawn Guo @ 2012-08-22 14:52 UTC (permalink / raw)
To: linux-arm-kernel
There is an issue reported on imx6q restart function. The issue is only
seen with the image building ARMv7 and ARMv6 together, where cpu_relax()
is define to do dmb. It's been root-caused by Russell as below.
Russell King - ARM Linux wrote:
> I suspect having this dmb inside cpu_relax() is flooding the
> interconnects with traffic, which then prevents other CPUs getting
> a look-in (maybe there's no fairness when it comes to dmb's.
Fix the issue by insert a few NOPs into cpu_relax() where doing dmb.
Cc: <stable@vger.kernel.org>
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
---
arch/arm/include/asm/processor.h | 9 ++++++++-
1 files changed, 8 insertions(+), 1 deletions(-)
diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h
index 99afa74..7cc67ce 100644
--- a/arch/arm/include/asm/processor.h
+++ b/arch/arm/include/asm/processor.h
@@ -80,7 +80,14 @@ extern void release_thread(struct task_struct *);
unsigned long get_wchan(struct task_struct *p);
#if __LINUX_ARM_ARCH__ == 6 || defined(CONFIG_ARM_ERRATA_754327)
-#define cpu_relax() smp_mb()
+#define cpu_relax() do { \
+ asm("nop"); \
+ asm("nop"); \
+ asm("nop"); \
+ asm("nop"); \
+ asm("nop"); \
+ smp_mb(); \
+ } while (0)
#else
#define cpu_relax() barrier()
#endif
--
1.7.5.4
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH] ARM: fix cpu_relax() in case of doing dmb
2012-08-22 14:52 [PATCH] ARM: fix cpu_relax() in case of doing dmb Shawn Guo
@ 2012-08-23 2:47 ` Hui Wang
2012-08-23 10:23 ` Dirk Behme
2012-08-23 10:43 ` Will Deacon
2 siblings, 0 replies; 9+ messages in thread
From: Hui Wang @ 2012-08-23 2:47 UTC (permalink / raw)
To: linux-arm-kernel
Shawn Guo wrote:
> There is an issue reported on imx6q restart function. The issue is only
> seen with the image building ARMv7 and ARMv6 together, where cpu_relax()
> is define to do dmb. It's been root-caused by Russell as below.
>
> Russell King - ARM Linux wrote:
>
>> I suspect having this dmb inside cpu_relax() is flooding the
>> interconnects with traffic, which then prevents other CPUs getting
>> a look-in (maybe there's no fairness when it comes to dmb's.
>>
>
> Fix the issue by insert a few NOPs into cpu_relax() where doing dmb.
>
Tested-by: Hui Wang <jason77.wang@gmail.com>
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] ARM: fix cpu_relax() in case of doing dmb
2012-08-22 14:52 [PATCH] ARM: fix cpu_relax() in case of doing dmb Shawn Guo
2012-08-23 2:47 ` Hui Wang
@ 2012-08-23 10:23 ` Dirk Behme
2012-08-23 10:43 ` Will Deacon
2 siblings, 0 replies; 9+ messages in thread
From: Dirk Behme @ 2012-08-23 10:23 UTC (permalink / raw)
To: linux-arm-kernel
On 22.08.2012 16:52, Shawn Guo wrote:
> There is an issue reported on imx6q restart function. The issue is only
> seen with the image building ARMv7 and ARMv6 together, where cpu_relax()
> is define to do dmb. It's been root-caused by Russell as below.
>
> Russell King - ARM Linux wrote:
>> I suspect having this dmb inside cpu_relax() is flooding the
>> interconnects with traffic, which then prevents other CPUs getting
>> a look-in (maybe there's no fairness when it comes to dmb's.
>
> Fix the issue by insert a few NOPs into cpu_relax() where doing dmb.
>
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Tested on a i.MX6 SabreLite board together with the patch 'ARM: imx6q:
remove imx_src_prepare_restart() call':
Tested-by: Dirk Behme <dirk.behme@de.bosch.com>
Thanks
Dirk
> ---
> arch/arm/include/asm/processor.h | 9 ++++++++-
> 1 files changed, 8 insertions(+), 1 deletions(-)
>
> diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h
> index 99afa74..7cc67ce 100644
> --- a/arch/arm/include/asm/processor.h
> +++ b/arch/arm/include/asm/processor.h
> @@ -80,7 +80,14 @@ extern void release_thread(struct task_struct *);
> unsigned long get_wchan(struct task_struct *p);
>
> #if __LINUX_ARM_ARCH__ == 6 || defined(CONFIG_ARM_ERRATA_754327)
> -#define cpu_relax() smp_mb()
> +#define cpu_relax() do { \
> + asm("nop"); \
> + asm("nop"); \
> + asm("nop"); \
> + asm("nop"); \
> + asm("nop"); \
> + smp_mb(); \
> + } while (0)
> #else
> #define cpu_relax() barrier()
> #endif
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] ARM: fix cpu_relax() in case of doing dmb
2012-08-22 14:52 [PATCH] ARM: fix cpu_relax() in case of doing dmb Shawn Guo
2012-08-23 2:47 ` Hui Wang
2012-08-23 10:23 ` Dirk Behme
@ 2012-08-23 10:43 ` Will Deacon
2012-08-23 13:58 ` Shawn Guo
2012-08-24 1:10 ` Hui Wang
2 siblings, 2 replies; 9+ messages in thread
From: Will Deacon @ 2012-08-23 10:43 UTC (permalink / raw)
To: linux-arm-kernel
On Wed, Aug 22, 2012 at 03:52:18PM +0100, Shawn Guo wrote:
> diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h
> index 99afa74..7cc67ce 100644
> --- a/arch/arm/include/asm/processor.h
> +++ b/arch/arm/include/asm/processor.h
> @@ -80,7 +80,14 @@ extern void release_thread(struct task_struct *);
> unsigned long get_wchan(struct task_struct *p);
>
> #if __LINUX_ARM_ARCH__ == 6 || defined(CONFIG_ARM_ERRATA_754327)
> -#define cpu_relax() smp_mb()
> +#define cpu_relax() do { \
> + asm("nop"); \
> + asm("nop"); \
> + asm("nop"); \
> + asm("nop"); \
> + asm("nop"); \
Can you use nop() instead of the explicit asm? Also, I think we should try
and use some methodology on deciding the number of nops to insert. Without
having a full handle on the problem at the moment, it would seem that we
need at least NR_CPUS worth (since the number of spinning secondaries is
NR_CPUS-1 and they may execute their barriers in lock-step).
Will
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] ARM: fix cpu_relax() in case of doing dmb
2012-08-23 10:43 ` Will Deacon
@ 2012-08-23 13:58 ` Shawn Guo
2012-08-23 18:31 ` Jon Medhurst (Tixy)
2012-08-24 1:10 ` Hui Wang
1 sibling, 1 reply; 9+ messages in thread
From: Shawn Guo @ 2012-08-23 13:58 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, Aug 23, 2012 at 11:43:56AM +0100, Will Deacon wrote:
> On Wed, Aug 22, 2012 at 03:52:18PM +0100, Shawn Guo wrote:
> > diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h
> > index 99afa74..7cc67ce 100644
> > --- a/arch/arm/include/asm/processor.h
> > +++ b/arch/arm/include/asm/processor.h
> > @@ -80,7 +80,14 @@ extern void release_thread(struct task_struct *);
> > unsigned long get_wchan(struct task_struct *p);
> >
> > #if __LINUX_ARM_ARCH__ == 6 || defined(CONFIG_ARM_ERRATA_754327)
> > -#define cpu_relax() smp_mb()
> > +#define cpu_relax() do { \
> > + asm("nop"); \
> > + asm("nop"); \
> > + asm("nop"); \
> > + asm("nop"); \
> > + asm("nop"); \
>
> Can you use nop() instead of the explicit asm?
Yes. I just tried, and it works too.
> Also, I think we should try
> and use some methodology on deciding the number of nops to insert. Without
> having a full handle on the problem at the moment, it would seem that we
> need at least NR_CPUS worth (since the number of spinning secondaries is
> NR_CPUS-1 and they may execute their barriers in lock-step).
>
I'm not sure we get something like that. In my testing here, I need
at least 5 nops to get rid of the issue.
--
Regards,
Shawn
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] ARM: fix cpu_relax() in case of doing dmb
2012-08-23 13:58 ` Shawn Guo
@ 2012-08-23 18:31 ` Jon Medhurst (Tixy)
2012-08-24 1:15 ` Shawn Guo
0 siblings, 1 reply; 9+ messages in thread
From: Jon Medhurst (Tixy) @ 2012-08-23 18:31 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, 2012-08-23 at 21:58 +0800, Shawn Guo wrote:
> On Thu, Aug 23, 2012 at 11:43:56AM +0100, Will Deacon wrote:
> > On Wed, Aug 22, 2012 at 03:52:18PM +0100, Shawn Guo wrote:
> > > diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h
> > > index 99afa74..7cc67ce 100644
> > > --- a/arch/arm/include/asm/processor.h
> > > +++ b/arch/arm/include/asm/processor.h
> > > @@ -80,7 +80,14 @@ extern void release_thread(struct task_struct *);
> > > unsigned long get_wchan(struct task_struct *p);
> > >
> > > #if __LINUX_ARM_ARCH__ == 6 || defined(CONFIG_ARM_ERRATA_754327)
> > > -#define cpu_relax() smp_mb()
> > > +#define cpu_relax() do { \
> > > + asm("nop"); \
> > > + asm("nop"); \
> > > + asm("nop"); \
> > > + asm("nop"); \
> > > + asm("nop"); \
> >
> > Can you use nop() instead of the explicit asm?
>
> Yes. I just tried, and it works too.
>
> > Also, I think we should try
> > and use some methodology on deciding the number of nops to insert. Without
> > having a full handle on the problem at the moment, it would seem that we
> > need at least NR_CPUS worth (since the number of spinning secondaries is
> > NR_CPUS-1 and they may execute their barriers in lock-step).
> >
> I'm not sure we get something like that. In my testing here, I need
> at least 5 nops to get rid of the issue.
Doesn't A9 do dual issue? If so, the maths for your 4 core iMX6Q might
match up with Will's hypothesis. You could try the theory by building
say with CONFIG_NR_CPUS == 3.
--
Tixy
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] ARM: fix cpu_relax() in case of doing dmb
2012-08-23 10:43 ` Will Deacon
2012-08-23 13:58 ` Shawn Guo
@ 2012-08-24 1:10 ` Hui Wang
1 sibling, 0 replies; 9+ messages in thread
From: Hui Wang @ 2012-08-24 1:10 UTC (permalink / raw)
To: linux-arm-kernel
Will Deacon wrote:
> On Wed, Aug 22, 2012 at 03:52:18PM +0100, Shawn Guo wrote:
>
>> diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h
>> index 99afa74..7cc67ce 100644
>> --- a/arch/arm/include/asm/processor.h
>> +++ b/arch/arm/include/asm/processor.h
>> @@ -80,7 +80,14 @@ extern void release_thread(struct task_struct *);
>> unsigned long get_wchan(struct task_struct *p);
>>
>> #if __LINUX_ARM_ARCH__ == 6 || defined(CONFIG_ARM_ERRATA_754327)
>> -#define cpu_relax() smp_mb()
>> +#define cpu_relax() do { \
>> + asm("nop"); \
>> + asm("nop"); \
>> + asm("nop"); \
>> + asm("nop"); \
>> + asm("nop"); \
>>
>
> Can you use nop() instead of the explicit asm? Also, I think we should try
> and use some methodology on deciding the number of nops to insert. Without
> having a full handle on the problem at the moment, it would seem that we
> need at least NR_CPUS worth (since the number of spinning secondaries is
> NR_CPUS-1 and they may execute their barriers in lock-step).
>
Your concern sounds reasonable, but i did a test, the result show there
is no explicit relation between NR_CPUS and the number of nop needed.
NR_CPUS = 4 and NR_CPUS = 2 need at least the same number of nop.
Regards,
Hui.
> Will
>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] ARM: fix cpu_relax() in case of doing dmb
2012-08-23 18:31 ` Jon Medhurst (Tixy)
@ 2012-08-24 1:15 ` Shawn Guo
2012-08-24 9:14 ` Jon Medhurst (Tixy)
0 siblings, 1 reply; 9+ messages in thread
From: Shawn Guo @ 2012-08-24 1:15 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, Aug 23, 2012 at 07:31:26PM +0100, Jon Medhurst (Tixy) wrote:
> On Thu, 2012-08-23 at 21:58 +0800, Shawn Guo wrote:
> > On Thu, Aug 23, 2012 at 11:43:56AM +0100, Will Deacon wrote:
> > > On Wed, Aug 22, 2012 at 03:52:18PM +0100, Shawn Guo wrote:
> > > > diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h
> > > > index 99afa74..7cc67ce 100644
> > > > --- a/arch/arm/include/asm/processor.h
> > > > +++ b/arch/arm/include/asm/processor.h
> > > > @@ -80,7 +80,14 @@ extern void release_thread(struct task_struct *);
> > > > unsigned long get_wchan(struct task_struct *p);
> > > >
> > > > #if __LINUX_ARM_ARCH__ == 6 || defined(CONFIG_ARM_ERRATA_754327)
> > > > -#define cpu_relax() smp_mb()
> > > > +#define cpu_relax() do { \
> > > > + asm("nop"); \
> > > > + asm("nop"); \
> > > > + asm("nop"); \
> > > > + asm("nop"); \
> > > > + asm("nop"); \
> > >
> > > Can you use nop() instead of the explicit asm?
> >
> > Yes. I just tried, and it works too.
> >
> > > Also, I think we should try
> > > and use some methodology on deciding the number of nops to insert. Without
> > > having a full handle on the problem at the moment, it would seem that we
> > > need at least NR_CPUS worth (since the number of spinning secondaries is
> > > NR_CPUS-1 and they may execute their barriers in lock-step).
> > >
> > I'm not sure we get something like that. In my testing here, I need
> > at least 5 nops to get rid of the issue.
>
> Doesn't A9 do dual issue?
Do you have some details about the issue to share?
> If so, the maths for your 4 core iMX6Q might
> match up with Will's hypothesis. You could try the theory by building
> say with CONFIG_NR_CPUS == 3.
>
I'm still not quite sure about the hypothesis, but I assume you are
asking if 3 NOPs will fix the issue. If so, the answer is NO.
I increase the number of NOP incrementally starting from 1, and the
issue remains until we have 5 NOPs in there.
--
Regards,
Shawn
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] ARM: fix cpu_relax() in case of doing dmb
2012-08-24 1:15 ` Shawn Guo
@ 2012-08-24 9:14 ` Jon Medhurst (Tixy)
0 siblings, 0 replies; 9+ messages in thread
From: Jon Medhurst (Tixy) @ 2012-08-24 9:14 UTC (permalink / raw)
To: linux-arm-kernel
On Fri, 2012-08-24 at 09:15 +0800, Shawn Guo wrote:
> On Thu, Aug 23, 2012 at 07:31:26PM +0100, Jon Medhurst (Tixy) wrote:
> > On Thu, 2012-08-23 at 21:58 +0800, Shawn Guo wrote:
> > > On Thu, Aug 23, 2012 at 11:43:56AM +0100, Will Deacon wrote:
> > > > On Wed, Aug 22, 2012 at 03:52:18PM +0100, Shawn Guo wrote:
> > > > > diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h
> > > > > index 99afa74..7cc67ce 100644
> > > > > --- a/arch/arm/include/asm/processor.h
> > > > > +++ b/arch/arm/include/asm/processor.h
> > > > > @@ -80,7 +80,14 @@ extern void release_thread(struct task_struct *);
> > > > > unsigned long get_wchan(struct task_struct *p);
> > > > >
> > > > > #if __LINUX_ARM_ARCH__ == 6 || defined(CONFIG_ARM_ERRATA_754327)
> > > > > -#define cpu_relax() smp_mb()
> > > > > +#define cpu_relax() do { \
> > > > > + asm("nop"); \
> > > > > + asm("nop"); \
> > > > > + asm("nop"); \
> > > > > + asm("nop"); \
> > > > > + asm("nop"); \
> > > >
> > > > Can you use nop() instead of the explicit asm?
> > >
> > > Yes. I just tried, and it works too.
> > >
> > > > Also, I think we should try
> > > > and use some methodology on deciding the number of nops to insert. Without
> > > > having a full handle on the problem at the moment, it would seem that we
> > > > need at least NR_CPUS worth (since the number of spinning secondaries is
> > > > NR_CPUS-1 and they may execute their barriers in lock-step).
> > > >
> > > I'm not sure we get something like that. In my testing here, I need
> > > at least 5 nops to get rid of the issue.
> >
> > Doesn't A9 do dual issue?
>
> Do you have some details about the issue to share?
I don't have any particular insight, I was just making the observation
that if CPU clock cycles executed in the loop were a consideration, then
the fact that the A9 would probably execute two nops in a clock cycle
would be pertinent.
> > If so, the maths for your 4 core iMX6Q might
> > match up with Will's hypothesis. You could try the theory by building
> > say with CONFIG_NR_CPUS == 3.
> >
> I'm still not quite sure about the hypothesis, but I assume you are
> asking if 3 NOPs will fix the issue. If so, the answer is NO.
> I increase the number of NOP incrementally starting from 1, and the
> issue remains until we have 5 NOPs in there.
Right, your and Hui test seems to scupper the idea that number of nops
should be related to number of CPUs. (Which I didn't really under
either.)
Perhaps the issue is related to the size of the loop buffer,
or possibly cache line boundaries?
--
Tixy
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2012-08-24 9:14 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-08-22 14:52 [PATCH] ARM: fix cpu_relax() in case of doing dmb Shawn Guo
2012-08-23 2:47 ` Hui Wang
2012-08-23 10:23 ` Dirk Behme
2012-08-23 10:43 ` Will Deacon
2012-08-23 13:58 ` Shawn Guo
2012-08-23 18:31 ` Jon Medhurst (Tixy)
2012-08-24 1:15 ` Shawn Guo
2012-08-24 9:14 ` Jon Medhurst (Tixy)
2012-08-24 1:10 ` Hui Wang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).