* spin_lock behavior with ARM64 big.Little/HMP
@ 2016-11-18 2:22 ` Vikram Mulukutla
0 siblings, 0 replies; 12+ messages in thread
From: Vikram Mulukutla @ 2016-11-18 2:22 UTC (permalink / raw)
To: linux-arm-kernel
Hello,
This isn't really a bug report, but just a description of a
frequency/IPC
dependent behavior that I'm curious if we should worry about. The
behavior
is exposed by questionable design so I'm leaning towards don't-care.
Consider these threads running in parallel on two ARM64 CPUs running
mainline
Linux:
(Ordering of lines between the two columns does not indicate a sequence
of
execution. Assume flag=0 initially.)
LittleARM64_CPU @ 300MHz (e.g.A53) | BigARM64_CPU @ 1.5GHz (e.g. A57)
-------------------------------------+----------------------------------
spin_lock_irqsave(s) | local_irq_save()
/* critical section */
flag = 1 | spin_lock(s)
spin_unlock_irqrestore(s) | while (!flag) {
| spin_unlock(s)
| cpu_relax();
| spin_lock(s)
| }
| spin_unlock(s)
| local_irq_restore()
I see a livelock occurring where the LittleCPU is never able to acquire
the
lock, and the BigCPU is stuck forever waiting on 'flag' to be set.
Even with ticket spinlocks, this bit of code can cause a livelock (or
very
long delays) if BigCPU runs fast enough. Afaics this can only happen if
the
LittleCPU is unable to put its ticket in the queue (i.e, increment the
next
field) since the store-exclusive keeps failing.
The problem is not present on SMP, and is mitigated by adding enough
additional clock cycles between the unlock and lock in the loop running
on the BigCPU. On big.Little, if both threads are scheduled on the same
cluster within the same clock domain, the problem is avoided.
Now the infinite loop may seem like questionable design but the problem
isn't entirely hypothetical; if BigCPU calls hrtimer_cancel with
interrupts disabled, this scenario can result if the hrtimer is about
to run on a littleCPU. It's of course possible that there's just enough
intervening code for the problem to not occur. At the very least it
seems
that loops like the one running in the BigCPU above should come with a
WARN_ON(irqs_disabled) or with a sufficient udelay() instead of the
cpu_relax.
Thoughts?
Thanks,
Vikram
^ permalink raw reply [flat|nested] 12+ messages in thread* spin_lock behavior with ARM64 big.Little/HMP @ 2016-11-18 2:22 ` Vikram Mulukutla 0 siblings, 0 replies; 12+ messages in thread From: Vikram Mulukutla @ 2016-11-18 2:22 UTC (permalink / raw) To: Catalin Marinas, Will Deacon; +Cc: linux-arm-kernel, linux-kernel Hello, This isn't really a bug report, but just a description of a frequency/IPC dependent behavior that I'm curious if we should worry about. The behavior is exposed by questionable design so I'm leaning towards don't-care. Consider these threads running in parallel on two ARM64 CPUs running mainline Linux: (Ordering of lines between the two columns does not indicate a sequence of execution. Assume flag=0 initially.) LittleARM64_CPU @ 300MHz (e.g.A53) | BigARM64_CPU @ 1.5GHz (e.g. A57) -------------------------------------+---------------------------------- spin_lock_irqsave(s) | local_irq_save() /* critical section */ flag = 1 | spin_lock(s) spin_unlock_irqrestore(s) | while (!flag) { | spin_unlock(s) | cpu_relax(); | spin_lock(s) | } | spin_unlock(s) | local_irq_restore() I see a livelock occurring where the LittleCPU is never able to acquire the lock, and the BigCPU is stuck forever waiting on 'flag' to be set. Even with ticket spinlocks, this bit of code can cause a livelock (or very long delays) if BigCPU runs fast enough. Afaics this can only happen if the LittleCPU is unable to put its ticket in the queue (i.e, increment the next field) since the store-exclusive keeps failing. The problem is not present on SMP, and is mitigated by adding enough additional clock cycles between the unlock and lock in the loop running on the BigCPU. On big.Little, if both threads are scheduled on the same cluster within the same clock domain, the problem is avoided. Now the infinite loop may seem like questionable design but the problem isn't entirely hypothetical; if BigCPU calls hrtimer_cancel with interrupts disabled, this scenario can result if the hrtimer is about to run on a littleCPU. It's of course possible that there's just enough intervening code for the problem to not occur. At the very least it seems that loops like the one running in the BigCPU above should come with a WARN_ON(irqs_disabled) or with a sufficient udelay() instead of the cpu_relax. Thoughts? Thanks, Vikram ^ permalink raw reply [flat|nested] 12+ messages in thread
* spin_lock behavior with ARM64 big.Little/HMP 2016-11-18 2:22 ` Vikram Mulukutla @ 2016-11-18 10:30 ` Sudeep Holla -1 siblings, 0 replies; 12+ messages in thread From: Sudeep Holla @ 2016-11-18 10:30 UTC (permalink / raw) To: linux-arm-kernel Hi Vikram, On 18/11/16 02:22, Vikram Mulukutla wrote: > Hello, > > This isn't really a bug report, but just a description of a frequency/IPC > dependent behavior that I'm curious if we should worry about. The behavior > is exposed by questionable design so I'm leaning towards don't-care. > > Consider these threads running in parallel on two ARM64 CPUs running > mainline > Linux: > Are you seeing this behavior with the mainline kernel on any platforms as we have a sort of workaround for this ? > (Ordering of lines between the two columns does not indicate a sequence of > execution. Assume flag=0 initially.) > > LittleARM64_CPU @ 300MHz (e.g.A53) | BigARM64_CPU @ 1.5GHz (e.g. A57) > -------------------------------------+---------------------------------- > spin_lock_irqsave(s) | local_irq_save() > /* critical section */ > flag = 1 | spin_lock(s) > spin_unlock_irqrestore(s) | while (!flag) { > | spin_unlock(s) > | cpu_relax(); > | spin_lock(s) > | } > | spin_unlock(s) > | local_irq_restore() > > I see a livelock occurring where the LittleCPU is never able to acquire the > lock, and the BigCPU is stuck forever waiting on 'flag' to be set. > Yes we saw this issue 3 years back on TC2 which has A7(with lowest frequency of 300MHz IIRC) and A15(with 1.2 GHz). We were observing that inter-cluster events are missed since the two clusters are operating at different frequencies (details below). The hardware recommendation is that there should be glue logic between the two clusters which captures events from one cluster and replays then on the other if its operating at a different frequency. Generally EVENTO from cluster 1 is connected to the EVENTI of the cluster 2 and vice versa. The only extra logic required is the double synchronizer in the receiving clock domain. This issue arise in reality if the synchronizer is missing and different CPUs hold EVENTO for different clock cycles. However there was a different requirement to implement timer event stream in Linux for some user-space locking and that indirectly help to resolve the issue on TC2. That event stream feature is enabled by default in Linux and should fix the issue and hence I asked you if you still see that issue. -- Regards, Sudeep ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: spin_lock behavior with ARM64 big.Little/HMP @ 2016-11-18 10:30 ` Sudeep Holla 0 siblings, 0 replies; 12+ messages in thread From: Sudeep Holla @ 2016-11-18 10:30 UTC (permalink / raw) To: Vikram Mulukutla Cc: Catalin Marinas, Will Deacon, Sudeep Holla, linux-kernel, linux-arm-kernel Hi Vikram, On 18/11/16 02:22, Vikram Mulukutla wrote: > Hello, > > This isn't really a bug report, but just a description of a frequency/IPC > dependent behavior that I'm curious if we should worry about. The behavior > is exposed by questionable design so I'm leaning towards don't-care. > > Consider these threads running in parallel on two ARM64 CPUs running > mainline > Linux: > Are you seeing this behavior with the mainline kernel on any platforms as we have a sort of workaround for this ? > (Ordering of lines between the two columns does not indicate a sequence of > execution. Assume flag=0 initially.) > > LittleARM64_CPU @ 300MHz (e.g.A53) | BigARM64_CPU @ 1.5GHz (e.g. A57) > -------------------------------------+---------------------------------- > spin_lock_irqsave(s) | local_irq_save() > /* critical section */ > flag = 1 | spin_lock(s) > spin_unlock_irqrestore(s) | while (!flag) { > | spin_unlock(s) > | cpu_relax(); > | spin_lock(s) > | } > | spin_unlock(s) > | local_irq_restore() > > I see a livelock occurring where the LittleCPU is never able to acquire the > lock, and the BigCPU is stuck forever waiting on 'flag' to be set. > Yes we saw this issue 3 years back on TC2 which has A7(with lowest frequency of 300MHz IIRC) and A15(with 1.2 GHz). We were observing that inter-cluster events are missed since the two clusters are operating at different frequencies (details below). The hardware recommendation is that there should be glue logic between the two clusters which captures events from one cluster and replays then on the other if its operating at a different frequency. Generally EVENTO from cluster 1 is connected to the EVENTI of the cluster 2 and vice versa. The only extra logic required is the double synchronizer in the receiving clock domain. This issue arise in reality if the synchronizer is missing and different CPUs hold EVENTO for different clock cycles. However there was a different requirement to implement timer event stream in Linux for some user-space locking and that indirectly help to resolve the issue on TC2. That event stream feature is enabled by default in Linux and should fix the issue and hence I asked you if you still see that issue. -- Regards, Sudeep ^ permalink raw reply [flat|nested] 12+ messages in thread
* spin_lock behavior with ARM64 big.Little/HMP 2016-11-18 10:30 ` Sudeep Holla @ 2016-11-18 20:22 ` Vikram Mulukutla -1 siblings, 0 replies; 12+ messages in thread From: Vikram Mulukutla @ 2016-11-18 20:22 UTC (permalink / raw) To: linux-arm-kernel Hi Sudeep, Thanks for taking a look! On 2016-11-18 02:30, Sudeep Holla wrote: > Hi Vikram, > > On 18/11/16 02:22, Vikram Mulukutla wrote: >> Hello, >> >> This isn't really a bug report, but just a description of a >> frequency/IPC >> dependent behavior that I'm curious if we should worry about. The >> behavior >> is exposed by questionable design so I'm leaning towards don't-care. >> >> Consider these threads running in parallel on two ARM64 CPUs running >> mainline >> Linux: >> > > Are you seeing this behavior with the mainline kernel on any platforms > as we have a sort of workaround for this ? > If I understand that workaround correctly, the ARM timer event stream is used to periodically wake up CPUs that are waiting in WFE, is that right? I think my scenario below may be different because LittleCPU doesn't actually wait on a WFE event in the loop that is trying to increment lock->next, i.e. it's stuck in the following loop: ARM64_LSE_ATOMIC_INSN( /* LL/SC */ " prfm pstl1strm, %3\n" "1: ldaxr %w0, %3\n" " add %w1, %w0, %w5\n" " stxr %w2, %w1, %3\n" " cbnz %w2, 1b\n", I have been testing internal platforms; I'll try to test on something available publicly that's b.L. In any case, the timer event stream was enabled when I tried this out. >> (Ordering of lines between the two columns does not indicate a >> sequence of >> execution. Assume flag=0 initially.) >> >> LittleARM64_CPU @ 300MHz (e.g.A53) | BigARM64_CPU @ 1.5GHz (e.g. >> A57) >> -------------------------------------+---------------------------------- >> spin_lock_irqsave(s) | local_irq_save() >> /* critical section */ >> flag = 1 | spin_lock(s) >> spin_unlock_irqrestore(s) | while (!flag) { >> | spin_unlock(s) >> | cpu_relax(); >> | spin_lock(s) >> | } >> | spin_unlock(s) >> | local_irq_restore() >> [...] Thanks, Vikram ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: spin_lock behavior with ARM64 big.Little/HMP @ 2016-11-18 20:22 ` Vikram Mulukutla 0 siblings, 0 replies; 12+ messages in thread From: Vikram Mulukutla @ 2016-11-18 20:22 UTC (permalink / raw) To: Sudeep Holla Cc: Catalin Marinas, Will Deacon, linux-kernel, linux-arm-kernel, linux-kernel-owner Hi Sudeep, Thanks for taking a look! On 2016-11-18 02:30, Sudeep Holla wrote: > Hi Vikram, > > On 18/11/16 02:22, Vikram Mulukutla wrote: >> Hello, >> >> This isn't really a bug report, but just a description of a >> frequency/IPC >> dependent behavior that I'm curious if we should worry about. The >> behavior >> is exposed by questionable design so I'm leaning towards don't-care. >> >> Consider these threads running in parallel on two ARM64 CPUs running >> mainline >> Linux: >> > > Are you seeing this behavior with the mainline kernel on any platforms > as we have a sort of workaround for this ? > If I understand that workaround correctly, the ARM timer event stream is used to periodically wake up CPUs that are waiting in WFE, is that right? I think my scenario below may be different because LittleCPU doesn't actually wait on a WFE event in the loop that is trying to increment lock->next, i.e. it's stuck in the following loop: ARM64_LSE_ATOMIC_INSN( /* LL/SC */ " prfm pstl1strm, %3\n" "1: ldaxr %w0, %3\n" " add %w1, %w0, %w5\n" " stxr %w2, %w1, %3\n" " cbnz %w2, 1b\n", I have been testing internal platforms; I'll try to test on something available publicly that's b.L. In any case, the timer event stream was enabled when I tried this out. >> (Ordering of lines between the two columns does not indicate a >> sequence of >> execution. Assume flag=0 initially.) >> >> LittleARM64_CPU @ 300MHz (e.g.A53) | BigARM64_CPU @ 1.5GHz (e.g. >> A57) >> -------------------------------------+---------------------------------- >> spin_lock_irqsave(s) | local_irq_save() >> /* critical section */ >> flag = 1 | spin_lock(s) >> spin_unlock_irqrestore(s) | while (!flag) { >> | spin_unlock(s) >> | cpu_relax(); >> | spin_lock(s) >> | } >> | spin_unlock(s) >> | local_irq_restore() >> [...] Thanks, Vikram ^ permalink raw reply [flat|nested] 12+ messages in thread
* spin_lock behavior with ARM64 big.Little/HMP 2016-11-18 20:22 ` Vikram Mulukutla @ 2016-11-21 15:21 ` Sudeep Holla -1 siblings, 0 replies; 12+ messages in thread From: Sudeep Holla @ 2016-11-21 15:21 UTC (permalink / raw) To: linux-arm-kernel On 18/11/16 20:22, Vikram Mulukutla wrote: > > Hi Sudeep, > > Thanks for taking a look! > > On 2016-11-18 02:30, Sudeep Holla wrote: >> Hi Vikram, >> >> On 18/11/16 02:22, Vikram Mulukutla wrote: >>> Hello, >>> >>> This isn't really a bug report, but just a description of a >>> frequency/IPC >>> dependent behavior that I'm curious if we should worry about. The >>> behavior >>> is exposed by questionable design so I'm leaning towards don't-care. >>> >>> Consider these threads running in parallel on two ARM64 CPUs running >>> mainline >>> Linux: >>> >> >> Are you seeing this behavior with the mainline kernel on any platforms >> as we have a sort of workaround for this ? >> > > If I understand that workaround correctly, the ARM timer event stream is > used > to periodically wake up CPUs that are waiting in WFE, is that right? I > think > my scenario below may be different because LittleCPU doesn't actually wait > on a WFE event in the loop that is trying to increment lock->next, i.e. > it's > stuck in the following loop: > > ARM64_LSE_ATOMIC_INSN( > /* LL/SC */ > " prfm pstl1strm, %3\n" > "1: ldaxr %w0, %3\n" > " add %w1, %w0, %w5\n" > " stxr %w2, %w1, %3\n" > " cbnz %w2, 1b\n", > Interesting. Just curious if this is r0p0/p1 A53 ? If so, is the errata 819472 enabled ? -- Regards, Sudeep ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: spin_lock behavior with ARM64 big.Little/HMP @ 2016-11-21 15:21 ` Sudeep Holla 0 siblings, 0 replies; 12+ messages in thread From: Sudeep Holla @ 2016-11-21 15:21 UTC (permalink / raw) To: Vikram Mulukutla Cc: Sudeep Holla, Catalin Marinas, Will Deacon, linux-kernel, linux-arm-kernel, linux-kernel-owner On 18/11/16 20:22, Vikram Mulukutla wrote: > > Hi Sudeep, > > Thanks for taking a look! > > On 2016-11-18 02:30, Sudeep Holla wrote: >> Hi Vikram, >> >> On 18/11/16 02:22, Vikram Mulukutla wrote: >>> Hello, >>> >>> This isn't really a bug report, but just a description of a >>> frequency/IPC >>> dependent behavior that I'm curious if we should worry about. The >>> behavior >>> is exposed by questionable design so I'm leaning towards don't-care. >>> >>> Consider these threads running in parallel on two ARM64 CPUs running >>> mainline >>> Linux: >>> >> >> Are you seeing this behavior with the mainline kernel on any platforms >> as we have a sort of workaround for this ? >> > > If I understand that workaround correctly, the ARM timer event stream is > used > to periodically wake up CPUs that are waiting in WFE, is that right? I > think > my scenario below may be different because LittleCPU doesn't actually wait > on a WFE event in the loop that is trying to increment lock->next, i.e. > it's > stuck in the following loop: > > ARM64_LSE_ATOMIC_INSN( > /* LL/SC */ > " prfm pstl1strm, %3\n" > "1: ldaxr %w0, %3\n" > " add %w1, %w0, %w5\n" > " stxr %w2, %w1, %3\n" > " cbnz %w2, 1b\n", > Interesting. Just curious if this is r0p0/p1 A53 ? If so, is the errata 819472 enabled ? -- Regards, Sudeep ^ permalink raw reply [flat|nested] 12+ messages in thread
* spin_lock behavior with ARM64 big.Little/HMP 2016-11-21 15:21 ` Sudeep Holla @ 2017-03-30 4:12 ` Vikram Mulukutla -1 siblings, 0 replies; 12+ messages in thread From: Vikram Mulukutla @ 2017-03-30 4:12 UTC (permalink / raw) To: linux-arm-kernel Hi Sudeep, > > Interesting. Just curious if this is r0p0/p1 A53 ? If so, is the errata > 819472 enabled ? Sorry for bringing this up after the loo-ong delay, but I've been assured that the A53 involved is > r0p1. I've also confirmed this problem on multiple internal platforms, and I'm pretty sure that it occurs on any b.L out there today. Also, we found the same problematic lock design used in the workqueue code in the kernel, causing the same livelock. It's very very rare and requires a perfect set of circumstances. If it would help I can provide a unit test if you folks would be generous enough to test it on the latest Juno or something b.L that's also upstream. Thanks, Vikram ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: spin_lock behavior with ARM64 big.Little/HMP @ 2017-03-30 4:12 ` Vikram Mulukutla 0 siblings, 0 replies; 12+ messages in thread From: Vikram Mulukutla @ 2017-03-30 4:12 UTC (permalink / raw) To: Sudeep Holla Cc: Catalin Marinas, Will Deacon, linux-kernel, linux-arm-kernel, linux-kernel-owner Hi Sudeep, > > Interesting. Just curious if this is r0p0/p1 A53 ? If so, is the errata > 819472 enabled ? Sorry for bringing this up after the loo-ong delay, but I've been assured that the A53 involved is > r0p1. I've also confirmed this problem on multiple internal platforms, and I'm pretty sure that it occurs on any b.L out there today. Also, we found the same problematic lock design used in the workqueue code in the kernel, causing the same livelock. It's very very rare and requires a perfect set of circumstances. If it would help I can provide a unit test if you folks would be generous enough to test it on the latest Juno or something b.L that's also upstream. Thanks, Vikram ^ permalink raw reply [flat|nested] 12+ messages in thread
* spin_lock behavior with ARM64 big.Little/HMP 2017-03-30 4:12 ` Vikram Mulukutla @ 2017-03-30 10:23 ` Sudeep Holla -1 siblings, 0 replies; 12+ messages in thread From: Sudeep Holla @ 2017-03-30 10:23 UTC (permalink / raw) To: linux-arm-kernel On 30/03/17 05:12, Vikram Mulukutla wrote: > > Hi Sudeep, > >> >> Interesting. Just curious if this is r0p0/p1 A53 ? If so, is the errata >> 819472 enabled ? > > Sorry for bringing this up after the loo-ong delay, but I've been > assured that the A53 involved is > r0p1. I've also confirmed this > problem on multiple internal platforms, and I'm pretty sure that it > occurs on any b.L out there today. Also, we found the same problematic > lock design used in the workqueue code in the kernel, causing the same > livelock. It's very very rare and requires a perfect set of circumstances. > > If it would help I can provide a unit test if you folks would be > generous enough to test it on the latest Juno or something b.L that's > also upstream. > Sure, please do share the unit test. I will give that a try on Juno. -- Regards, Sudeep ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: spin_lock behavior with ARM64 big.Little/HMP @ 2017-03-30 10:23 ` Sudeep Holla 0 siblings, 0 replies; 12+ messages in thread From: Sudeep Holla @ 2017-03-30 10:23 UTC (permalink / raw) To: Vikram Mulukutla Cc: Sudeep Holla, Catalin Marinas, Will Deacon, linux-kernel, linux-arm-kernel, linux-kernel-owner On 30/03/17 05:12, Vikram Mulukutla wrote: > > Hi Sudeep, > >> >> Interesting. Just curious if this is r0p0/p1 A53 ? If so, is the errata >> 819472 enabled ? > > Sorry for bringing this up after the loo-ong delay, but I've been > assured that the A53 involved is > r0p1. I've also confirmed this > problem on multiple internal platforms, and I'm pretty sure that it > occurs on any b.L out there today. Also, we found the same problematic > lock design used in the workqueue code in the kernel, causing the same > livelock. It's very very rare and requires a perfect set of circumstances. > > If it would help I can provide a unit test if you folks would be > generous enough to test it on the latest Juno or something b.L that's > also upstream. > Sure, please do share the unit test. I will give that a try on Juno. -- Regards, Sudeep ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2017-03-30 10:24 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-11-18 2:22 spin_lock behavior with ARM64 big.Little/HMP Vikram Mulukutla 2016-11-18 2:22 ` Vikram Mulukutla 2016-11-18 10:30 ` Sudeep Holla 2016-11-18 10:30 ` Sudeep Holla 2016-11-18 20:22 ` Vikram Mulukutla 2016-11-18 20:22 ` Vikram Mulukutla 2016-11-21 15:21 ` Sudeep Holla 2016-11-21 15:21 ` Sudeep Holla 2017-03-30 4:12 ` Vikram Mulukutla 2017-03-30 4:12 ` Vikram Mulukutla 2017-03-30 10:23 ` Sudeep Holla 2017-03-30 10:23 ` Sudeep Holla
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.