spin_lock behavior with ARM64 big.Little/HMP

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* spin_lock behavior with ARM64 big.Little/HMP
@ 2016-11-18  2:22 Vikram Mulukutla
  2016-11-18 10:30 ` Sudeep Holla
  0 siblings, 1 reply; 6+ messages in thread
From: Vikram Mulukutla @ 2016-11-18  2:22 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

This isn't really a bug report, but just a description of a 
frequency/IPC
dependent behavior that I'm curious if we should worry about. The 
behavior
is exposed by questionable design so I'm leaning towards don't-care.

Consider these threads running in parallel on two ARM64 CPUs running 
mainline
Linux:

(Ordering of lines between the two columns does not indicate a sequence 
of
execution. Assume flag=0 initially.)

LittleARM64_CPU @ 300MHz (e.g.A53)   |  BigARM64_CPU @ 1.5GHz (e.g. A57)
-------------------------------------+----------------------------------
spin_lock_irqsave(s)                 |  local_irq_save()
/* critical section */
flag = 1                             |  spin_lock(s)
spin_unlock_irqrestore(s)            |  while (!flag) {
                                      |      spin_unlock(s)
                                      |      cpu_relax();
                                      |      spin_lock(s)
                                      |  }
                                      |  spin_unlock(s)
                                      |  local_irq_restore()

I see a livelock occurring where the LittleCPU is never able to acquire 
the
lock, and the BigCPU is stuck forever waiting on 'flag' to be set.

Even with ticket spinlocks, this bit of code can cause a livelock (or 
very
long delays) if BigCPU runs fast enough. Afaics this can only happen if 
the
LittleCPU is unable to put its ticket in the queue (i.e, increment the 
next
field) since the store-exclusive keeps failing.

The problem is not present on SMP, and is mitigated by adding enough
additional clock cycles between the unlock and lock in the loop running
on the BigCPU. On big.Little, if both threads are scheduled on the same
cluster within the same clock domain, the problem is avoided.

Now the infinite loop may seem like questionable design but the problem
isn't entirely hypothetical; if BigCPU calls hrtimer_cancel with
interrupts disabled, this scenario can result if the hrtimer is about
to run on a littleCPU. It's of course possible that there's just enough
intervening code for the problem to not occur. At the very least it 
seems
that loops like the one running in the BigCPU above should come with a
WARN_ON(irqs_disabled) or with a sufficient udelay() instead of the 
cpu_relax.

Thoughts?

Thanks,
Vikram

^ permalink raw reply	[flat|nested] 6+ messages in thread

* spin_lock behavior with ARM64 big.Little/HMP
  2016-11-18  2:22 spin_lock behavior with ARM64 big.Little/HMP Vikram Mulukutla
@ 2016-11-18 10:30 ` Sudeep Holla
  2016-11-18 20:22   ` Vikram Mulukutla
  0 siblings, 1 reply; 6+ messages in thread
From: Sudeep Holla @ 2016-11-18 10:30 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Vikram,

On 18/11/16 02:22, Vikram Mulukutla wrote:
> Hello,
>
> This isn't really a bug report, but just a description of a frequency/IPC
> dependent behavior that I'm curious if we should worry about. The behavior
> is exposed by questionable design so I'm leaning towards don't-care.
>
> Consider these threads running in parallel on two ARM64 CPUs running
> mainline
> Linux:
>

Are you seeing this behavior with the mainline kernel on any platforms
as we have a sort of workaround for this ?

> (Ordering of lines between the two columns does not indicate a sequence of
> execution. Assume flag=0 initially.)
>
> LittleARM64_CPU @ 300MHz (e.g.A53)   |  BigARM64_CPU @ 1.5GHz (e.g. A57)
> -------------------------------------+----------------------------------
> spin_lock_irqsave(s)                 |  local_irq_save()
> /* critical section */
> flag = 1                             |  spin_lock(s)
> spin_unlock_irqrestore(s)            |  while (!flag) {
>                                      |      spin_unlock(s)
>                                      |      cpu_relax();
>                                      |      spin_lock(s)
>                                      |  }
>                                      |  spin_unlock(s)
>                                      |  local_irq_restore()
>
> I see a livelock occurring where the LittleCPU is never able to acquire the
> lock, and the BigCPU is stuck forever waiting on 'flag' to be set.
>

Yes we saw this issue 3 years back on TC2 which has A7(with lowest
frequency of 300MHz IIRC) and A15(with 1.2 GHz). We were observing that
inter-cluster events are missed since the two clusters are operating at
different frequencies (details below).

The hardware recommendation is that there should be glue logic between
the two clusters which captures events from one cluster and replays then
on the other if its operating at a different frequency.

Generally EVENTO from cluster 1 is connected to the EVENTI of the
cluster 2 and vice versa. The only extra logic required is the double
synchronizer in the receiving clock domain.

This issue arise in reality if the synchronizer is missing and different
CPUs hold EVENTO for different clock cycles.

However there was a different requirement to implement timer event
stream in Linux for some user-space locking and that indirectly help to
resolve the issue on TC2. That event stream feature is enabled by
default in Linux and should fix the issue and hence I asked you if you
still see that issue.

-- 
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 6+ messages in thread

* spin_lock behavior with ARM64 big.Little/HMP
  2016-11-18 10:30 ` Sudeep Holla
@ 2016-11-18 20:22   ` Vikram Mulukutla
  2016-11-21 15:21     ` Sudeep Holla
  0 siblings, 1 reply; 6+ messages in thread
From: Vikram Mulukutla @ 2016-11-18 20:22 UTC (permalink / raw)
  To: linux-arm-kernel


Hi Sudeep,

Thanks for taking a look!

On 2016-11-18 02:30, Sudeep Holla wrote:
> Hi Vikram,
> 
> On 18/11/16 02:22, Vikram Mulukutla wrote:
>> Hello,
>> 
>> This isn't really a bug report, but just a description of a 
>> frequency/IPC
>> dependent behavior that I'm curious if we should worry about. The 
>> behavior
>> is exposed by questionable design so I'm leaning towards don't-care.
>> 
>> Consider these threads running in parallel on two ARM64 CPUs running
>> mainline
>> Linux:
>> 
> 
> Are you seeing this behavior with the mainline kernel on any platforms
> as we have a sort of workaround for this ?
> 

If I understand that workaround correctly, the ARM timer event stream is 
used
to periodically wake up CPUs that are waiting in WFE, is that right? I 
think
my scenario below may be different because LittleCPU doesn't actually 
wait
on a WFE event in the loop that is trying to increment lock->next, i.e. 
it's
stuck in the following loop:

         ARM64_LSE_ATOMIC_INSN(
         /* LL/SC */
"       prfm    pstl1strm, %3\n"
"1:     ldaxr   %w0, %3\n"
"       add     %w1, %w0, %w5\n"
"       stxr    %w2, %w1, %3\n"
"       cbnz    %w2, 1b\n",


I have been testing internal platforms; I'll try to test on something
available publicly that's b.L. In any case, the timer event stream was 
enabled
when I tried this out.

>> (Ordering of lines between the two columns does not indicate a 
>> sequence of
>> execution. Assume flag=0 initially.)
>> 
>> LittleARM64_CPU @ 300MHz (e.g.A53)   |  BigARM64_CPU @ 1.5GHz (e.g. 
>> A57)
>> -------------------------------------+----------------------------------
>> spin_lock_irqsave(s)                 |  local_irq_save()
>> /* critical section */
>> flag = 1                             |  spin_lock(s)
>> spin_unlock_irqrestore(s)            |  while (!flag) {
>>                                      |      spin_unlock(s)
>>                                      |      cpu_relax();
>>                                      |      spin_lock(s)
>>                                      |  }
>>                                      |  spin_unlock(s)
>>                                      |  local_irq_restore()
>> 

[...]

Thanks,
Vikram

^ permalink raw reply	[flat|nested] 6+ messages in thread

* spin_lock behavior with ARM64 big.Little/HMP
  2016-11-18 20:22   ` Vikram Mulukutla
@ 2016-11-21 15:21     ` Sudeep Holla
  2017-03-30  4:12       ` Vikram Mulukutla
  0 siblings, 1 reply; 6+ messages in thread
From: Sudeep Holla @ 2016-11-21 15:21 UTC (permalink / raw)
  To: linux-arm-kernel



On 18/11/16 20:22, Vikram Mulukutla wrote:
>
> Hi Sudeep,
>
> Thanks for taking a look!
>
> On 2016-11-18 02:30, Sudeep Holla wrote:
>> Hi Vikram,
>>
>> On 18/11/16 02:22, Vikram Mulukutla wrote:
>>> Hello,
>>>
>>> This isn't really a bug report, but just a description of a
>>> frequency/IPC
>>> dependent behavior that I'm curious if we should worry about. The
>>> behavior
>>> is exposed by questionable design so I'm leaning towards don't-care.
>>>
>>> Consider these threads running in parallel on two ARM64 CPUs running
>>> mainline
>>> Linux:
>>>
>>
>> Are you seeing this behavior with the mainline kernel on any platforms
>> as we have a sort of workaround for this ?
>>
>
> If I understand that workaround correctly, the ARM timer event stream is
> used
> to periodically wake up CPUs that are waiting in WFE, is that right? I
> think
> my scenario below may be different because LittleCPU doesn't actually wait
> on a WFE event in the loop that is trying to increment lock->next, i.e.
> it's
> stuck in the following loop:
>
>         ARM64_LSE_ATOMIC_INSN(
>         /* LL/SC */
> "       prfm    pstl1strm, %3\n"
> "1:     ldaxr   %w0, %3\n"
> "       add     %w1, %w0, %w5\n"
> "       stxr    %w2, %w1, %3\n"
> "       cbnz    %w2, 1b\n",
>

Interesting. Just curious if this is r0p0/p1 A53 ? If so, is the errata
819472 enabled ?

-- 
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 6+ messages in thread

* spin_lock behavior with ARM64 big.Little/HMP
  2016-11-21 15:21     ` Sudeep Holla
@ 2017-03-30  4:12       ` Vikram Mulukutla
  2017-03-30 10:23         ` Sudeep Holla
  0 siblings, 1 reply; 6+ messages in thread
From: Vikram Mulukutla @ 2017-03-30  4:12 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Sudeep,

> 
> Interesting. Just curious if this is r0p0/p1 A53 ? If so, is the errata
> 819472 enabled ?

Sorry for bringing this up after the loo-ong delay, but I've been 
assured that the A53 involved is > r0p1. I've also confirmed this 
problem on multiple internal platforms, and I'm pretty sure that it 
occurs on any b.L out there today. Also, we found the same problematic 
lock design used in the workqueue code in the kernel, causing the same 
livelock. It's very very rare and requires a perfect set of 
circumstances.

If it would help I can provide a unit test if you folks would be 
generous enough to test it on the latest Juno or something b.L that's 
also upstream.

Thanks,
Vikram

^ permalink raw reply	[flat|nested] 6+ messages in thread

* spin_lock behavior with ARM64 big.Little/HMP
  2017-03-30  4:12       ` Vikram Mulukutla
@ 2017-03-30 10:23         ` Sudeep Holla
  0 siblings, 0 replies; 6+ messages in thread
From: Sudeep Holla @ 2017-03-30 10:23 UTC (permalink / raw)
  To: linux-arm-kernel



On 30/03/17 05:12, Vikram Mulukutla wrote:
> 
> Hi Sudeep,
> 
>>
>> Interesting. Just curious if this is r0p0/p1 A53 ? If so, is the errata
>> 819472 enabled ?
> 
> Sorry for bringing this up after the loo-ong delay, but I've been
> assured that the A53 involved is > r0p1. I've also confirmed this
> problem on multiple internal platforms, and I'm pretty sure that it
> occurs on any b.L out there today. Also, we found the same problematic
> lock design used in the workqueue code in the kernel, causing the same
> livelock. It's very very rare and requires a perfect set of circumstances.
> 
> If it would help I can provide a unit test if you folks would be
> generous enough to test it on the latest Juno or something b.L that's
> also upstream.
> 

Sure, please do share the unit test. I will give that a try on Juno.

-- 
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-03-30 10:23 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-11-18  2:22 spin_lock behavior with ARM64 big.Little/HMP Vikram Mulukutla
2016-11-18 10:30 ` Sudeep Holla
2016-11-18 20:22   ` Vikram Mulukutla
2016-11-21 15:21     ` Sudeep Holla
2017-03-30  4:12       ` Vikram Mulukutla
2017-03-30 10:23         ` Sudeep Holla

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).