* [PATCH] ARM: fix cpu_relax() in case of doing dmb @ 2012-08-22 14:52 Shawn Guo 2012-08-23 2:47 ` Hui Wang ` (2 more replies) 0 siblings, 3 replies; 9+ messages in thread From: Shawn Guo @ 2012-08-22 14:52 UTC (permalink / raw) To: linux-arm-kernel There is an issue reported on imx6q restart function. The issue is only seen with the image building ARMv7 and ARMv6 together, where cpu_relax() is define to do dmb. It's been root-caused by Russell as below. Russell King - ARM Linux wrote: > I suspect having this dmb inside cpu_relax() is flooding the > interconnects with traffic, which then prevents other CPUs getting > a look-in (maybe there's no fairness when it comes to dmb's. Fix the issue by insert a few NOPs into cpu_relax() where doing dmb. Cc: <stable@vger.kernel.org> Signed-off-by: Shawn Guo <shawn.guo@linaro.org> --- arch/arm/include/asm/processor.h | 9 ++++++++- 1 files changed, 8 insertions(+), 1 deletions(-) diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h index 99afa74..7cc67ce 100644 --- a/arch/arm/include/asm/processor.h +++ b/arch/arm/include/asm/processor.h @@ -80,7 +80,14 @@ extern void release_thread(struct task_struct *); unsigned long get_wchan(struct task_struct *p); #if __LINUX_ARM_ARCH__ == 6 || defined(CONFIG_ARM_ERRATA_754327) -#define cpu_relax() smp_mb() +#define cpu_relax() do { \ + asm("nop"); \ + asm("nop"); \ + asm("nop"); \ + asm("nop"); \ + asm("nop"); \ + smp_mb(); \ + } while (0) #else #define cpu_relax() barrier() #endif -- 1.7.5.4 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH] ARM: fix cpu_relax() in case of doing dmb 2012-08-22 14:52 [PATCH] ARM: fix cpu_relax() in case of doing dmb Shawn Guo @ 2012-08-23 2:47 ` Hui Wang 2012-08-23 10:23 ` Dirk Behme 2012-08-23 10:43 ` Will Deacon 2 siblings, 0 replies; 9+ messages in thread From: Hui Wang @ 2012-08-23 2:47 UTC (permalink / raw) To: linux-arm-kernel Shawn Guo wrote: > There is an issue reported on imx6q restart function. The issue is only > seen with the image building ARMv7 and ARMv6 together, where cpu_relax() > is define to do dmb. It's been root-caused by Russell as below. > > Russell King - ARM Linux wrote: > >> I suspect having this dmb inside cpu_relax() is flooding the >> interconnects with traffic, which then prevents other CPUs getting >> a look-in (maybe there's no fairness when it comes to dmb's. >> > > Fix the issue by insert a few NOPs into cpu_relax() where doing dmb. > Tested-by: Hui Wang <jason77.wang@gmail.com> ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] ARM: fix cpu_relax() in case of doing dmb 2012-08-22 14:52 [PATCH] ARM: fix cpu_relax() in case of doing dmb Shawn Guo 2012-08-23 2:47 ` Hui Wang @ 2012-08-23 10:23 ` Dirk Behme 2012-08-23 10:43 ` Will Deacon 2 siblings, 0 replies; 9+ messages in thread From: Dirk Behme @ 2012-08-23 10:23 UTC (permalink / raw) To: linux-arm-kernel On 22.08.2012 16:52, Shawn Guo wrote: > There is an issue reported on imx6q restart function. The issue is only > seen with the image building ARMv7 and ARMv6 together, where cpu_relax() > is define to do dmb. It's been root-caused by Russell as below. > > Russell King - ARM Linux wrote: >> I suspect having this dmb inside cpu_relax() is flooding the >> interconnects with traffic, which then prevents other CPUs getting >> a look-in (maybe there's no fairness when it comes to dmb's. > > Fix the issue by insert a few NOPs into cpu_relax() where doing dmb. > > Cc: <stable@vger.kernel.org> > Signed-off-by: Shawn Guo <shawn.guo@linaro.org> Tested on a i.MX6 SabreLite board together with the patch 'ARM: imx6q: remove imx_src_prepare_restart() call': Tested-by: Dirk Behme <dirk.behme@de.bosch.com> Thanks Dirk > --- > arch/arm/include/asm/processor.h | 9 ++++++++- > 1 files changed, 8 insertions(+), 1 deletions(-) > > diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h > index 99afa74..7cc67ce 100644 > --- a/arch/arm/include/asm/processor.h > +++ b/arch/arm/include/asm/processor.h > @@ -80,7 +80,14 @@ extern void release_thread(struct task_struct *); > unsigned long get_wchan(struct task_struct *p); > > #if __LINUX_ARM_ARCH__ == 6 || defined(CONFIG_ARM_ERRATA_754327) > -#define cpu_relax() smp_mb() > +#define cpu_relax() do { \ > + asm("nop"); \ > + asm("nop"); \ > + asm("nop"); \ > + asm("nop"); \ > + asm("nop"); \ > + smp_mb(); \ > + } while (0) > #else > #define cpu_relax() barrier() > #endif ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] ARM: fix cpu_relax() in case of doing dmb 2012-08-22 14:52 [PATCH] ARM: fix cpu_relax() in case of doing dmb Shawn Guo 2012-08-23 2:47 ` Hui Wang 2012-08-23 10:23 ` Dirk Behme @ 2012-08-23 10:43 ` Will Deacon 2012-08-23 13:58 ` Shawn Guo 2012-08-24 1:10 ` Hui Wang 2 siblings, 2 replies; 9+ messages in thread From: Will Deacon @ 2012-08-23 10:43 UTC (permalink / raw) To: linux-arm-kernel On Wed, Aug 22, 2012 at 03:52:18PM +0100, Shawn Guo wrote: > diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h > index 99afa74..7cc67ce 100644 > --- a/arch/arm/include/asm/processor.h > +++ b/arch/arm/include/asm/processor.h > @@ -80,7 +80,14 @@ extern void release_thread(struct task_struct *); > unsigned long get_wchan(struct task_struct *p); > > #if __LINUX_ARM_ARCH__ == 6 || defined(CONFIG_ARM_ERRATA_754327) > -#define cpu_relax() smp_mb() > +#define cpu_relax() do { \ > + asm("nop"); \ > + asm("nop"); \ > + asm("nop"); \ > + asm("nop"); \ > + asm("nop"); \ Can you use nop() instead of the explicit asm? Also, I think we should try and use some methodology on deciding the number of nops to insert. Without having a full handle on the problem at the moment, it would seem that we need at least NR_CPUS worth (since the number of spinning secondaries is NR_CPUS-1 and they may execute their barriers in lock-step). Will ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] ARM: fix cpu_relax() in case of doing dmb 2012-08-23 10:43 ` Will Deacon @ 2012-08-23 13:58 ` Shawn Guo 2012-08-23 18:31 ` Jon Medhurst (Tixy) 2012-08-24 1:10 ` Hui Wang 1 sibling, 1 reply; 9+ messages in thread From: Shawn Guo @ 2012-08-23 13:58 UTC (permalink / raw) To: linux-arm-kernel On Thu, Aug 23, 2012 at 11:43:56AM +0100, Will Deacon wrote: > On Wed, Aug 22, 2012 at 03:52:18PM +0100, Shawn Guo wrote: > > diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h > > index 99afa74..7cc67ce 100644 > > --- a/arch/arm/include/asm/processor.h > > +++ b/arch/arm/include/asm/processor.h > > @@ -80,7 +80,14 @@ extern void release_thread(struct task_struct *); > > unsigned long get_wchan(struct task_struct *p); > > > > #if __LINUX_ARM_ARCH__ == 6 || defined(CONFIG_ARM_ERRATA_754327) > > -#define cpu_relax() smp_mb() > > +#define cpu_relax() do { \ > > + asm("nop"); \ > > + asm("nop"); \ > > + asm("nop"); \ > > + asm("nop"); \ > > + asm("nop"); \ > > Can you use nop() instead of the explicit asm? Yes. I just tried, and it works too. > Also, I think we should try > and use some methodology on deciding the number of nops to insert. Without > having a full handle on the problem at the moment, it would seem that we > need at least NR_CPUS worth (since the number of spinning secondaries is > NR_CPUS-1 and they may execute their barriers in lock-step). > I'm not sure we get something like that. In my testing here, I need at least 5 nops to get rid of the issue. -- Regards, Shawn ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] ARM: fix cpu_relax() in case of doing dmb 2012-08-23 13:58 ` Shawn Guo @ 2012-08-23 18:31 ` Jon Medhurst (Tixy) 2012-08-24 1:15 ` Shawn Guo 0 siblings, 1 reply; 9+ messages in thread From: Jon Medhurst (Tixy) @ 2012-08-23 18:31 UTC (permalink / raw) To: linux-arm-kernel On Thu, 2012-08-23 at 21:58 +0800, Shawn Guo wrote: > On Thu, Aug 23, 2012 at 11:43:56AM +0100, Will Deacon wrote: > > On Wed, Aug 22, 2012 at 03:52:18PM +0100, Shawn Guo wrote: > > > diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h > > > index 99afa74..7cc67ce 100644 > > > --- a/arch/arm/include/asm/processor.h > > > +++ b/arch/arm/include/asm/processor.h > > > @@ -80,7 +80,14 @@ extern void release_thread(struct task_struct *); > > > unsigned long get_wchan(struct task_struct *p); > > > > > > #if __LINUX_ARM_ARCH__ == 6 || defined(CONFIG_ARM_ERRATA_754327) > > > -#define cpu_relax() smp_mb() > > > +#define cpu_relax() do { \ > > > + asm("nop"); \ > > > + asm("nop"); \ > > > + asm("nop"); \ > > > + asm("nop"); \ > > > + asm("nop"); \ > > > > Can you use nop() instead of the explicit asm? > > Yes. I just tried, and it works too. > > > Also, I think we should try > > and use some methodology on deciding the number of nops to insert. Without > > having a full handle on the problem at the moment, it would seem that we > > need at least NR_CPUS worth (since the number of spinning secondaries is > > NR_CPUS-1 and they may execute their barriers in lock-step). > > > I'm not sure we get something like that. In my testing here, I need > at least 5 nops to get rid of the issue. Doesn't A9 do dual issue? If so, the maths for your 4 core iMX6Q might match up with Will's hypothesis. You could try the theory by building say with CONFIG_NR_CPUS == 3. -- Tixy ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] ARM: fix cpu_relax() in case of doing dmb 2012-08-23 18:31 ` Jon Medhurst (Tixy) @ 2012-08-24 1:15 ` Shawn Guo 2012-08-24 9:14 ` Jon Medhurst (Tixy) 0 siblings, 1 reply; 9+ messages in thread From: Shawn Guo @ 2012-08-24 1:15 UTC (permalink / raw) To: linux-arm-kernel On Thu, Aug 23, 2012 at 07:31:26PM +0100, Jon Medhurst (Tixy) wrote: > On Thu, 2012-08-23 at 21:58 +0800, Shawn Guo wrote: > > On Thu, Aug 23, 2012 at 11:43:56AM +0100, Will Deacon wrote: > > > On Wed, Aug 22, 2012 at 03:52:18PM +0100, Shawn Guo wrote: > > > > diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h > > > > index 99afa74..7cc67ce 100644 > > > > --- a/arch/arm/include/asm/processor.h > > > > +++ b/arch/arm/include/asm/processor.h > > > > @@ -80,7 +80,14 @@ extern void release_thread(struct task_struct *); > > > > unsigned long get_wchan(struct task_struct *p); > > > > > > > > #if __LINUX_ARM_ARCH__ == 6 || defined(CONFIG_ARM_ERRATA_754327) > > > > -#define cpu_relax() smp_mb() > > > > +#define cpu_relax() do { \ > > > > + asm("nop"); \ > > > > + asm("nop"); \ > > > > + asm("nop"); \ > > > > + asm("nop"); \ > > > > + asm("nop"); \ > > > > > > Can you use nop() instead of the explicit asm? > > > > Yes. I just tried, and it works too. > > > > > Also, I think we should try > > > and use some methodology on deciding the number of nops to insert. Without > > > having a full handle on the problem at the moment, it would seem that we > > > need at least NR_CPUS worth (since the number of spinning secondaries is > > > NR_CPUS-1 and they may execute their barriers in lock-step). > > > > > I'm not sure we get something like that. In my testing here, I need > > at least 5 nops to get rid of the issue. > > Doesn't A9 do dual issue? Do you have some details about the issue to share? > If so, the maths for your 4 core iMX6Q might > match up with Will's hypothesis. You could try the theory by building > say with CONFIG_NR_CPUS == 3. > I'm still not quite sure about the hypothesis, but I assume you are asking if 3 NOPs will fix the issue. If so, the answer is NO. I increase the number of NOP incrementally starting from 1, and the issue remains until we have 5 NOPs in there. -- Regards, Shawn ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] ARM: fix cpu_relax() in case of doing dmb 2012-08-24 1:15 ` Shawn Guo @ 2012-08-24 9:14 ` Jon Medhurst (Tixy) 0 siblings, 0 replies; 9+ messages in thread From: Jon Medhurst (Tixy) @ 2012-08-24 9:14 UTC (permalink / raw) To: linux-arm-kernel On Fri, 2012-08-24 at 09:15 +0800, Shawn Guo wrote: > On Thu, Aug 23, 2012 at 07:31:26PM +0100, Jon Medhurst (Tixy) wrote: > > On Thu, 2012-08-23 at 21:58 +0800, Shawn Guo wrote: > > > On Thu, Aug 23, 2012 at 11:43:56AM +0100, Will Deacon wrote: > > > > On Wed, Aug 22, 2012 at 03:52:18PM +0100, Shawn Guo wrote: > > > > > diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h > > > > > index 99afa74..7cc67ce 100644 > > > > > --- a/arch/arm/include/asm/processor.h > > > > > +++ b/arch/arm/include/asm/processor.h > > > > > @@ -80,7 +80,14 @@ extern void release_thread(struct task_struct *); > > > > > unsigned long get_wchan(struct task_struct *p); > > > > > > > > > > #if __LINUX_ARM_ARCH__ == 6 || defined(CONFIG_ARM_ERRATA_754327) > > > > > -#define cpu_relax() smp_mb() > > > > > +#define cpu_relax() do { \ > > > > > + asm("nop"); \ > > > > > + asm("nop"); \ > > > > > + asm("nop"); \ > > > > > + asm("nop"); \ > > > > > + asm("nop"); \ > > > > > > > > Can you use nop() instead of the explicit asm? > > > > > > Yes. I just tried, and it works too. > > > > > > > Also, I think we should try > > > > and use some methodology on deciding the number of nops to insert. Without > > > > having a full handle on the problem at the moment, it would seem that we > > > > need at least NR_CPUS worth (since the number of spinning secondaries is > > > > NR_CPUS-1 and they may execute their barriers in lock-step). > > > > > > > I'm not sure we get something like that. In my testing here, I need > > > at least 5 nops to get rid of the issue. > > > > Doesn't A9 do dual issue? > > Do you have some details about the issue to share? I don't have any particular insight, I was just making the observation that if CPU clock cycles executed in the loop were a consideration, then the fact that the A9 would probably execute two nops in a clock cycle would be pertinent. > > If so, the maths for your 4 core iMX6Q might > > match up with Will's hypothesis. You could try the theory by building > > say with CONFIG_NR_CPUS == 3. > > > I'm still not quite sure about the hypothesis, but I assume you are > asking if 3 NOPs will fix the issue. If so, the answer is NO. > I increase the number of NOP incrementally starting from 1, and the > issue remains until we have 5 NOPs in there. Right, your and Hui test seems to scupper the idea that number of nops should be related to number of CPUs. (Which I didn't really under either.) Perhaps the issue is related to the size of the loop buffer, or possibly cache line boundaries? -- Tixy ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] ARM: fix cpu_relax() in case of doing dmb 2012-08-23 10:43 ` Will Deacon 2012-08-23 13:58 ` Shawn Guo @ 2012-08-24 1:10 ` Hui Wang 1 sibling, 0 replies; 9+ messages in thread From: Hui Wang @ 2012-08-24 1:10 UTC (permalink / raw) To: linux-arm-kernel Will Deacon wrote: > On Wed, Aug 22, 2012 at 03:52:18PM +0100, Shawn Guo wrote: > >> diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h >> index 99afa74..7cc67ce 100644 >> --- a/arch/arm/include/asm/processor.h >> +++ b/arch/arm/include/asm/processor.h >> @@ -80,7 +80,14 @@ extern void release_thread(struct task_struct *); >> unsigned long get_wchan(struct task_struct *p); >> >> #if __LINUX_ARM_ARCH__ == 6 || defined(CONFIG_ARM_ERRATA_754327) >> -#define cpu_relax() smp_mb() >> +#define cpu_relax() do { \ >> + asm("nop"); \ >> + asm("nop"); \ >> + asm("nop"); \ >> + asm("nop"); \ >> + asm("nop"); \ >> > > Can you use nop() instead of the explicit asm? Also, I think we should try > and use some methodology on deciding the number of nops to insert. Without > having a full handle on the problem at the moment, it would seem that we > need at least NR_CPUS worth (since the number of spinning secondaries is > NR_CPUS-1 and they may execute their barriers in lock-step). > Your concern sounds reasonable, but i did a test, the result show there is no explicit relation between NR_CPUS and the number of nop needed. NR_CPUS = 4 and NR_CPUS = 2 need at least the same number of nop. Regards, Hui. > Will > > ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2012-08-24 9:14 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-08-22 14:52 [PATCH] ARM: fix cpu_relax() in case of doing dmb Shawn Guo 2012-08-23 2:47 ` Hui Wang 2012-08-23 10:23 ` Dirk Behme 2012-08-23 10:43 ` Will Deacon 2012-08-23 13:58 ` Shawn Guo 2012-08-23 18:31 ` Jon Medhurst (Tixy) 2012-08-24 1:15 ` Shawn Guo 2012-08-24 9:14 ` Jon Medhurst (Tixy) 2012-08-24 1:10 ` Hui Wang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).