[PATCH] clk: fix spin_lock/unlock imbalance on bad clk

linux-clk.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] clk: fix spin_lock/unlock imbalance on bad clk_enable() reentrancy
@ 2017-12-12 23:43 David Lechner
  2017-12-13  4:14 ` David Lechner
  0 siblings, 1 reply; 14+ messages in thread
From: David Lechner @ 2017-12-12 23:43 UTC (permalink / raw)
  To: linux-clk; +Cc: David Lechner, Michael Turquette, Stephen Boyd, linux-kernel

If clk_enable() is called in reentrant way and spin_trylock_irqsave() is
not working as expected, it is possible to get a negative enable_refcnt
which results in a missed call to spin_unlock_irqrestore().

It works like this:

1. clk_enable() is called.
2. clk_enable_unlock() calls spin_trylock_irqsave() and sets
   enable_refcnt = 1.
3. Another clk_enable() is called before the first has returned
   (reentrant), but somehow spin_trylock_irqsave() is returning true.
   (I'm not sure how/why this is happening yet, but it is happening to me
   with arch/arm/mach-davinci clocks that I am working on).
4. Because spin_trylock_irqsave() returned true, enable_lock has been
   locked twice without being unlocked and enable_refcnt = 1 is called
   instead of enable_refcnt++.
5. After the inner clock is enabled clk_enable_unlock() is called which
   decrements enable_refnct to 0 and calls spin_unlock_irqrestore()
6. The inner clk_enable() function returns.
7. clk_enable_unlock() is called again for the outer clock. enable_refcnt
   is decremented to -1 and spin_unlock_irqrestore() is *not* called.
8. The outer clk_enable() function returns.
9. Unrelated code called later issues a BUG warning about sleeping in an
   atomic context because of the unbalanced calls for the spin lock.

This patch fixes the problem of unbalanced calls by calling
spin_unlock_irqrestore() if enable_refnct <= 0 instead of just checking if
it is == 0.

The BUG warning about sleeping in an atomic context in the unrelated code
is eliminated with this patch, but there are still warnings printed from
clk_enable_unlock() and clk_enable_unlock() because of the reference
counting problems.

Signed-off-by: David Lechner <david@lechnology.com>
---
 drivers/clk/clk.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index 647d056..bb1b1f9 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -162,7 +162,7 @@ static void clk_enable_unlock(unsigned long flags)
 	WARN_ON_ONCE(enable_owner != current);
 	WARN_ON_ONCE(enable_refcnt == 0);

-	if (--enable_refcnt) {
+	if (--enable_refcnt > 0) {
 		__release(enable_lock);
 		return;
 	}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] clk: fix spin_lock/unlock imbalance on bad clk_enable() reentrancy
  2017-12-12 23:43 [PATCH] clk: fix spin_lock/unlock imbalance on bad clk_enable() reentrancy David Lechner
@ 2017-12-13  4:14 ` David Lechner
  2017-12-15 13:47   ` Jerome Brunet
  2017-12-15 16:29   ` David Lechner
  0 siblings, 2 replies; 14+ messages in thread
From: David Lechner @ 2017-12-13  4:14 UTC (permalink / raw)
  To: linux-clk; +Cc: Michael Turquette, Stephen Boyd, linux-kernel

On 12/12/2017 05:43 PM, David Lechner wrote:
> If clk_enable() is called in reentrant way and spin_trylock_irqsave() is
> not working as expected, it is possible to get a negative enable_refcnt
> which results in a missed call to spin_unlock_irqrestore().
> 
> It works like this:
> 
> 1. clk_enable() is called.
> 2. clk_enable_unlock() calls spin_trylock_irqsave() and sets
>     enable_refcnt = 1.
> 3. Another clk_enable() is called before the first has returned
>     (reentrant), but somehow spin_trylock_irqsave() is returning true.
>     (I'm not sure how/why this is happening yet, but it is happening to me
>     with arch/arm/mach-davinci clocks that I am working on).

I think I have figured out that since CONFIG_SMP=n and 
CONFIG_DEBUG_SPINLOCK=n on my kernel that

#define arch_spin_trylock(lock)({ barrier(); (void)(lock); 1; })

in include/linux/spinlock_up.h is causing the problem.

So, basically, reentrancy of clk_enable() is broken for non-SMP systems, 
but I'm not sure I know how to fix it.


> 4. Because spin_trylock_irqsave() returned true, enable_lock has been
>     locked twice without being unlocked and enable_refcnt = 1 is called
>     instead of enable_refcnt++.
> 5. After the inner clock is enabled clk_enable_unlock() is called which
>     decrements enable_refnct to 0 and calls spin_unlock_irqrestore()
> 6. The inner clk_enable() function returns.
> 7. clk_enable_unlock() is called again for the outer clock. enable_refcnt
>     is decremented to -1 and spin_unlock_irqrestore() is *not* called.
> 8. The outer clk_enable() function returns.
> 9. Unrelated code called later issues a BUG warning about sleeping in an
>     atomic context because of the unbalanced calls for the spin lock.
> 
> This patch fixes the problem of unbalanced calls by calling
> spin_unlock_irqrestore() if enable_refnct <= 0 instead of just checking if
> it is == 0.
> 
> The BUG warning about sleeping in an atomic context in the unrelated code
> is eliminated with this patch, but there are still warnings printed from
> clk_enable_unlock() and clk_enable_unlock() because of the reference
> counting problems.
> 
> Signed-off-by: David Lechner <david@lechnology.com>
> ---
>   drivers/clk/clk.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
> index 647d056..bb1b1f9 100644
> --- a/drivers/clk/clk.c
> +++ b/drivers/clk/clk.c
> @@ -162,7 +162,7 @@ static void clk_enable_unlock(unsigned long flags)
>   	WARN_ON_ONCE(enable_owner != current);
>   	WARN_ON_ONCE(enable_refcnt == 0);
>   
> -	if (--enable_refcnt) {
> +	if (--enable_refcnt > 0) {
>   		__release(enable_lock);
>   		return;
>   	}
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] clk: fix spin_lock/unlock imbalance on bad clk_enable() reentrancy
  2017-12-13  4:14 ` David Lechner
@ 2017-12-15 13:47   ` Jerome Brunet
  2017-12-15 16:26     ` David Lechner
  2017-12-15 16:29   ` David Lechner
  1 sibling, 1 reply; 14+ messages in thread
From: Jerome Brunet @ 2017-12-15 13:47 UTC (permalink / raw)
  To: David Lechner, linux-clk; +Cc: Michael Turquette, Stephen Boyd, linux-kernel

On Tue, 2017-12-12 at 22:14 -0600, David Lechner wrote:
> On 12/12/2017 05:43 PM, David Lechner wrote:
> > If clk_enable() is called in reentrant way and spin_trylock_irqsave() is
> > not working as expected, it is possible to get a negative enable_refcnt
> > which results in a missed call to spin_unlock_irqrestore().
> > 
> > It works like this:
> > 
> > 1. clk_enable() is called.
> > 2. clk_enable_unlock() calls spin_trylock_irqsave() and sets
> >     enable_refcnt = 1.
> > 3. Another clk_enable() is called before the first has returned
> >     (reentrant), but somehow spin_trylock_irqsave() is returning true.
> >     (I'm not sure how/why this is happening yet, but it is happening to me
> >     with arch/arm/mach-davinci clocks that I am working on).
> 
> I think I have figured out that since CONFIG_SMP=n and 
> CONFIG_DEBUG_SPINLOCK=n on my kernel that
> 
> #define arch_spin_trylock(lock)({ barrier(); (void)(lock); 1; })
> 
> in include/linux/spinlock_up.h is causing the problem.
> 
> So, basically, reentrancy of clk_enable() is broken for non-SMP systems, 
> but I'm not sure I know how to fix it.

Hi David,

Correct me if I'm wrong but, in uni-processor mode, a call to
spin_trylock_irqsave shall disable the preemption. see _raw_spin_trylock() in
spinlock_api_up.h:71

In this case I don't understand you could possibly get another call to
clk_enable() ? ... unless the implementation of your clock ops re-enable the
preemption or calls the scheduler.

> 
> 
> > 4. Because spin_trylock_irqsave() returned true, enable_lock has been
> >     locked twice without being unlocked and enable_refcnt = 1 is called
> >     instead of enable_refcnt++.
> > 5. After the inner clock is enabled clk_enable_unlock() is called which
> >     decrements enable_refnct to 0 and calls spin_unlock_irqrestore()
> > 6. The inner clk_enable() function returns.
> > 7. clk_enable_unlock() is called again for the outer clock. enable_refcnt
> >     is decremented to -1 and spin_unlock_irqrestore() is *not* called.
> > 8. The outer clk_enable() function returns.
> > 9. Unrelated code called later issues a BUG warning about sleeping in an
> >     atomic context because of the unbalanced calls for the spin lock.
> > 
> > This patch fixes the problem of unbalanced calls by calling
> > spin_unlock_irqrestore() if enable_refnct <= 0 instead of just checking if
> > it is == 0.

A negative ref is just illegal, which is why got this line:
WARN_ON_ONCE(enable_refcnt != 0);

If it ever happens, it means you've got a bug to fix some place else.
Unless I missed something, the fix proposed is not right.

> > 
> > The BUG warning about sleeping in an atomic context in the unrelated code
> > is eliminated with this patch, but there are still warnings printed from
> > clk_enable_unlock() and clk_enable_unlock() because of the reference
> > counting problems.
> > 
> > Signed-off-by: David Lechner <david@lechnology.com>
> > ---
> >   drivers/clk/clk.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
> > index 647d056..bb1b1f9 100644
> > --- a/drivers/clk/clk.c
> > +++ b/drivers/clk/clk.c
> > @@ -162,7 +162,7 @@ static void clk_enable_unlock(unsigned long flags)
> >   	WARN_ON_ONCE(enable_owner != current);
> >   	WARN_ON_ONCE(enable_refcnt == 0);
> >   
> > -	if (--enable_refcnt) {
> > +	if (--enable_refcnt > 0) {
> >   		__release(enable_lock);
> >   		return;
> >   	}
> > 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-clk" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] clk: fix spin_lock/unlock imbalance on bad clk_enable() reentrancy
  2017-12-15 13:47   ` Jerome Brunet
@ 2017-12-15 16:26     ` David Lechner
  0 siblings, 0 replies; 14+ messages in thread
From: David Lechner @ 2017-12-15 16:26 UTC (permalink / raw)
  To: Jerome Brunet, linux-clk; +Cc: Michael Turquette, Stephen Boyd, linux-kernel

On 12/15/2017 07:47 AM, Jerome Brunet wrote:
> On Tue, 2017-12-12 at 22:14 -0600, David Lechner wrote:
>> On 12/12/2017 05:43 PM, David Lechner wrote:
>>> If clk_enable() is called in reentrant way and spin_trylock_irqsave() is
>>> not working as expected, it is possible to get a negative enable_refcnt
>>> which results in a missed call to spin_unlock_irqrestore().
>>>
>>> It works like this:
>>>
>>> 1. clk_enable() is called.
>>> 2. clk_enable_unlock() calls spin_trylock_irqsave() and sets
>>>      enable_refcnt = 1.
>>> 3. Another clk_enable() is called before the first has returned
>>>      (reentrant), but somehow spin_trylock_irqsave() is returning true.
>>>      (I'm not sure how/why this is happening yet, but it is happening to me
>>>      with arch/arm/mach-davinci clocks that I am working on).
>>
>> I think I have figured out that since CONFIG_SMP=n and
>> CONFIG_DEBUG_SPINLOCK=n on my kernel that
>>
>> #define arch_spin_trylock(lock)({ barrier(); (void)(lock); 1; })
>>
>> in include/linux/spinlock_up.h is causing the problem.
>>
>> So, basically, reentrancy of clk_enable() is broken for non-SMP systems,
>> but I'm not sure I know how to fix it.
> 
> Hi David,
> 
> Correct me if I'm wrong but, in uni-processor mode, a call to
> spin_trylock_irqsave shall disable the preemption. see _raw_spin_trylock() in
> spinlock_api_up.h:71
> 
> In this case I don't understand you could possibly get another call to
> clk_enable() ? ... unless the implementation of your clock ops re-enable the
> preemption or calls the scheduler.
> 
>>
>>
>>> 4. Because spin_trylock_irqsave() returned true, enable_lock has been
>>>      locked twice without being unlocked and enable_refcnt = 1 is called
>>>      instead of enable_refcnt++.
>>> 5. After the inner clock is enabled clk_enable_unlock() is called which
>>>      decrements enable_refnct to 0 and calls spin_unlock_irqrestore()
>>> 6. The inner clk_enable() function returns.
>>> 7. clk_enable_unlock() is called again for the outer clock. enable_refcnt
>>>      is decremented to -1 and spin_unlock_irqrestore() is *not* called.
>>> 8. The outer clk_enable() function returns.
>>> 9. Unrelated code called later issues a BUG warning about sleeping in an
>>>      atomic context because of the unbalanced calls for the spin lock.
>>>
>>> This patch fixes the problem of unbalanced calls by calling
>>> spin_unlock_irqrestore() if enable_refnct <= 0 instead of just checking if
>>> it is == 0.
> 
> A negative ref is just illegal, which is why got this line:
> WARN_ON_ONCE(enable_refcnt != 0);
> 
> If it ever happens, it means you've got a bug to fix some place else.
> Unless I missed something, the fix proposed is not right.

You are correct that this does not fix the actual problem and the 
WARN_ON_ONCE() lines are still triggered. But it does prevent a red 
herring in that it fixes the BUG warning about sleeping in an atomic 
context in the unrelated code.

The part you are missing is that clk_enable() is called in a reentrant 
way by design. This means that the first clk_enable() calls another 
clk_enable() (and clk_disable()) before the first clk_enable() returns.

This is needed for a special case of the SoC I am working on. There is a 
PLL that supplies 48MHz for USB. To enable the PLL, another clock domain 
needs to be enabled temporarily while the PLL is being configured, but 
then the other clock domain can be turned back off after the PLL has 
locked. It is not your typical case of having a parent clock (in fact 
this clock already has a parent clock that is different from the one 
that is enabled temporarily).


Here is the code:


static void usb20_phy_clk_enable(struct davinci_clk *clk)
{
	u32 val;
	u32 timeout = 500000; /* 500 msec */

	val = readl(DA8XX_SYSCFG0_VIRT(DA8XX_CFGCHIP2_REG));

	/* The USB 2.O PLL requires that the USB 2.O PSC is enabled as well. */
	clk_enable(usb20_clk);

	/*
	 * Turn on the USB 2.0 PHY, but just the PLL, and not OTG. The USB 1.1
	 * host may use the PLL clock without USB 2.0 OTG being used.
	 */
	val &= ~(CFGCHIP2_RESET | CFGCHIP2_PHYPWRDN);
	val |= CFGCHIP2_PHY_PLLON;

	writel(val, DA8XX_SYSCFG0_VIRT(DA8XX_CFGCHIP2_REG));

	while (--timeout) {
		val = readl(DA8XX_SYSCFG0_VIRT(DA8XX_CFGCHIP2_REG));
		if (val & CFGCHIP2_PHYCLKGD)
			goto done;
		udelay(1);
	}

	pr_err("Timeout waiting for USB 2.0 PHY clock good\n");
done:
	clk_disable(usb20_clk);
}

> 
>>>
>>> The BUG warning about sleeping in an atomic context in the unrelated code
>>> is eliminated with this patch, but there are still warnings printed from
>>> clk_enable_unlock() and clk_enable_unlock() because of the reference
>>> counting problems.
>>>
>>> Signed-off-by: David Lechner <david@lechnology.com>
>>> ---
>>>    drivers/clk/clk.c | 2 +-
>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
>>> index 647d056..bb1b1f9 100644
>>> --- a/drivers/clk/clk.c
>>> +++ b/drivers/clk/clk.c
>>> @@ -162,7 +162,7 @@ static void clk_enable_unlock(unsigned long flags)
>>>    	WARN_ON_ONCE(enable_owner != current);
>>>    	WARN_ON_ONCE(enable_refcnt == 0);
>>>    
>>> -	if (--enable_refcnt) {
>>> +	if (--enable_refcnt > 0) {
>>>    		__release(enable_lock);
>>>    		return;
>>>    	}
>>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-clk" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] clk: fix spin_lock/unlock imbalance on bad clk_enable() reentrancy
  2017-12-13  4:14 ` David Lechner
  2017-12-15 13:47   ` Jerome Brunet
@ 2017-12-15 16:29   ` David Lechner
  2017-12-19 22:29     ` Michael Turquette
  1 sibling, 1 reply; 14+ messages in thread
From: David Lechner @ 2017-12-15 16:29 UTC (permalink / raw)
  To: linux-clk; +Cc: Michael Turquette, Stephen Boyd, linux-kernel, Jerome Brunet

On 12/12/2017 10:14 PM, David Lechner wrote:
> On 12/12/2017 05:43 PM, David Lechner wrote:
>> If clk_enable() is called in reentrant way and spin_trylock_irqsave() is
>> not working as expected, it is possible to get a negative enable_refcnt
>> which results in a missed call to spin_unlock_irqrestore().
>>
>> It works like this:
>>
>> 1. clk_enable() is called.
>> 2. clk_enable_unlock() calls spin_trylock_irqsave() and sets
>>     enable_refcnt = 1.
>> 3. Another clk_enable() is called before the first has returned
>>     (reentrant), but somehow spin_trylock_irqsave() is returning true.
>>     (I'm not sure how/why this is happening yet, but it is happening 
>> to me
>>     with arch/arm/mach-davinci clocks that I am working on).
> 
> I think I have figured out that since CONFIG_SMP=n and 
> CONFIG_DEBUG_SPINLOCK=n on my kernel that
> 
> #define arch_spin_trylock(lock)({ barrier(); (void)(lock); 1; })
> 
> in include/linux/spinlock_up.h is causing the problem.
> 
> So, basically, reentrancy of clk_enable() is broken for non-SMP systems, 
> but I'm not sure I know how to fix it.
> 
> 

Here is what I came up with for a fix. If it looks reasonable, I will 
resend as a proper patch.

diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index bb1b1f9..53fad5f 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -136,12 +136,23 @@ static void clk_prepare_unlock(void)
         mutex_unlock(&prepare_lock);
  }

+#ifdef CONFIG_SMP
+#define NO_SMP 0
+#else
+#define NO_SMP 1
+#endif
+
  static unsigned long clk_enable_lock(void)
         __acquires(enable_lock)
  {
-       unsigned long flags;
+       unsigned long flags = 0;

-       if (!spin_trylock_irqsave(&enable_lock, flags)) {
+       /*
+        * spin_trylock_irqsave() always returns true on non-SMP system 
(unless
+        * spin lock debugging is enabled) so don't call 
spin_trylock_irqsave()
+        * unless we are on an SMP system.
+        */
+       if (NO_SMP || !spin_trylock_irqsave(&enable_lock, flags)) {
                 if (enable_owner == current) {
                         enable_refcnt++;
                         __acquire(enable_lock);

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] clk: fix spin_lock/unlock imbalance on bad clk_enable() reentrancy
  2017-12-15 16:29   ` David Lechner
@ 2017-12-19 22:29     ` Michael Turquette
  2017-12-20 18:53       ` David Lechner
  0 siblings, 1 reply; 14+ messages in thread
From: Michael Turquette @ 2017-12-19 22:29 UTC (permalink / raw)
  To: David Lechner, linux-clk; +Cc: Stephen Boyd, linux-kernel, Jerome Brunet

Hi David,

Quoting David Lechner (2017-12-15 08:29:56)
> On 12/12/2017 10:14 PM, David Lechner wrote:
> > On 12/12/2017 05:43 PM, David Lechner wrote:
> >> If clk_enable() is called in reentrant way and spin_trylock_irqsave() =
is
> >> not working as expected, it is possible to get a negative enable_refcnt
> >> which results in a missed call to spin_unlock_irqrestore().
> >>
> >> It works like this:
> >>
> >> 1. clk_enable() is called.
> >> 2. clk_enable_unlock() calls spin_trylock_irqsave() and sets
> >> =C2=A0=C2=A0=C2=A0 enable_refcnt =3D 1.
> >> 3. Another clk_enable() is called before the first has returned
> >> =C2=A0=C2=A0=C2=A0 (reentrant), but somehow spin_trylock_irqsave() is =
returning true.
> >> =C2=A0=C2=A0=C2=A0 (I'm not sure how/why this is happening yet, but it=
 is happening =

> >> to me
> >> =C2=A0=C2=A0=C2=A0 with arch/arm/mach-davinci clocks that I am working=
 on).
> > =

> > I think I have figured out that since CONFIG_SMP=3Dn and =

> > CONFIG_DEBUG_SPINLOCK=3Dn on my kernel that
> > =

> > #define arch_spin_trylock(lock)({ barrier(); (void)(lock); 1; })
> > =

> > in include/linux/spinlock_up.h is causing the problem.
> > =

> > So, basically, reentrancy of clk_enable() is broken for non-SMP systems=
, =

> > but I'm not sure I know how to fix it.
> > =

> > =

> =

> Here is what I came up with for a fix. If it looks reasonable, I will =

> resend as a proper patch.
> =

> diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
> index bb1b1f9..53fad5f 100644
> --- a/drivers/clk/clk.c
> +++ b/drivers/clk/clk.c
> @@ -136,12 +136,23 @@ static void clk_prepare_unlock(void)
>          mutex_unlock(&prepare_lock);
>   }
> =

> +#ifdef CONFIG_SMP
> +#define NO_SMP 0
> +#else
> +#define NO_SMP 1
> +#endif
> +
>   static unsigned long clk_enable_lock(void)
>          __acquires(enable_lock)
>   {
> -       unsigned long flags;
> +       unsigned long flags =3D 0;
> =

> -       if (!spin_trylock_irqsave(&enable_lock, flags)) {
> +       /*
> +        * spin_trylock_irqsave() always returns true on non-SMP system =

> (unless

Ugh, wrapped lines in patch make me sad.

> +        * spin lock debugging is enabled) so don't call =

> spin_trylock_irqsave()
> +        * unless we are on an SMP system.
> +        */
> +       if (NO_SMP || !spin_trylock_irqsave(&enable_lock, flags)) {

I'm not sure that this looks reasonable. The inverse logic (NO_SMP =3D 0
being equivalent to SMP =3D 1) just makes things harder to read for no
reason.

More to the point, did you fix your enable/disable call imbalance? If
so, did you test that fix without this patch? I'd like to know if fixing
the enable/disable imbalance is Good Enough. I'd prefer to take only
that change and not this patch.

Best regards,
Mike

>                  if (enable_owner =3D=3D current) {
>                          enable_refcnt++;
>                          __acquire(enable_lock);
>=20

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] clk: fix spin_lock/unlock imbalance on bad clk_enable() reentrancy
  2017-12-19 22:29     ` Michael Turquette
@ 2017-12-20 18:53       ` David Lechner
  2017-12-20 19:24         ` Michael Turquette
  0 siblings, 1 reply; 14+ messages in thread
From: David Lechner @ 2017-12-20 18:53 UTC (permalink / raw)
  To: Michael Turquette, linux-clk; +Cc: Stephen Boyd, linux-kernel, Jerome Brunet

On 12/19/2017 04:29 PM, Michael Turquette wrote:
> Hi David,
> 
> Quoting David Lechner (2017-12-15 08:29:56)
>> On 12/12/2017 10:14 PM, David Lechner wrote:
>>> On 12/12/2017 05:43 PM, David Lechner wrote:
>>>> If clk_enable() is called in reentrant way and spin_trylock_irqsave() is
>>>> not working as expected, it is possible to get a negative enable_refcnt
>>>> which results in a missed call to spin_unlock_irqrestore().
>>>>
>>>> It works like this:
>>>>
>>>> 1. clk_enable() is called.
>>>> 2. clk_enable_unlock() calls spin_trylock_irqsave() and sets
>>>>      enable_refcnt = 1.
>>>> 3. Another clk_enable() is called before the first has returned
>>>>      (reentrant), but somehow spin_trylock_irqsave() is returning true.
>>>>      (I'm not sure how/why this is happening yet, but it is happening
>>>> to me
>>>>      with arch/arm/mach-davinci clocks that I am working on).
>>>
>>> I think I have figured out that since CONFIG_SMP=n and
>>> CONFIG_DEBUG_SPINLOCK=n on my kernel that
>>>
>>> #define arch_spin_trylock(lock)({ barrier(); (void)(lock); 1; })
>>>
>>> in include/linux/spinlock_up.h is causing the problem.
>>>
>>> So, basically, reentrancy of clk_enable() is broken for non-SMP systems,
>>> but I'm not sure I know how to fix it.
>>>
>>>
>>
>> Here is what I came up with for a fix. If it looks reasonable, I will
>> resend as a proper patch.
>>
>> diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
>> index bb1b1f9..53fad5f 100644
>> --- a/drivers/clk/clk.c
>> +++ b/drivers/clk/clk.c
>> @@ -136,12 +136,23 @@ static void clk_prepare_unlock(void)
>>           mutex_unlock(&prepare_lock);
>>    }
>>
>> +#ifdef CONFIG_SMP
>> +#define NO_SMP 0
>> +#else
>> +#define NO_SMP 1
>> +#endif
>> +
>>    static unsigned long clk_enable_lock(void)
>>           __acquires(enable_lock)
>>    {
>> -       unsigned long flags;
>> +       unsigned long flags = 0;
>>
>> -       if (!spin_trylock_irqsave(&enable_lock, flags)) {
>> +       /*
>> +        * spin_trylock_irqsave() always returns true on non-SMP system
>> (unless
> 
> Ugh, wrapped lines in patch make me sad.

Sorry, I was being lazy. :-/

> 
>> +        * spin lock debugging is enabled) so don't call
>> spin_trylock_irqsave()
>> +        * unless we are on an SMP system.
>> +        */
>> +       if (NO_SMP || !spin_trylock_irqsave(&enable_lock, flags)) {
> 
> I'm not sure that this looks reasonable. The inverse logic (NO_SMP = 0
> being equivalent to SMP = 1) just makes things harder to read for no
> reason.
> 
> More to the point, did you fix your enable/disable call imbalance? If
> so, did you test that fix without this patch? I'd like to know if fixing
> the enable/disable imbalance is Good Enough. I'd prefer to take only
> that change and not this patch.

Without this patch, the imbalance in calls to spin lock/unlock are 
fixed, but I still get several WARN_ONCE_ON() because the reference 
count becomes negative, so I would not call it Good Enough.

> 
> Best regards,
> Mike
> 
>>                   if (enable_owner == current) {
>>                           enable_refcnt++;
>>                           __acquire(enable_lock);
>>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] clk: fix spin_lock/unlock imbalance on bad clk_enable() reentrancy
  2017-12-20 18:53       ` David Lechner
@ 2017-12-20 19:24         ` Michael Turquette
  2017-12-20 20:33           ` David Lechner
  0 siblings, 1 reply; 14+ messages in thread
From: Michael Turquette @ 2017-12-20 19:24 UTC (permalink / raw)
  To: David Lechner, linux-clk; +Cc: Stephen Boyd, linux-kernel, Jerome Brunet

Quoting David Lechner (2017-12-20 10:53:27)
> On 12/19/2017 04:29 PM, Michael Turquette wrote:
> > Hi David,
> > =

> > Quoting David Lechner (2017-12-15 08:29:56)
> >> On 12/12/2017 10:14 PM, David Lechner wrote:
> >>> On 12/12/2017 05:43 PM, David Lechner wrote:
> >>>> If clk_enable() is called in reentrant way and spin_trylock_irqsave(=
) is
> >>>> not working as expected, it is possible to get a negative enable_ref=
cnt
> >>>> which results in a missed call to spin_unlock_irqrestore().
> >>>>
> >>>> It works like this:
> >>>>
> >>>> 1. clk_enable() is called.
> >>>> 2. clk_enable_unlock() calls spin_trylock_irqsave() and sets
> >>>>  =C2=A0=C2=A0=C2=A0 enable_refcnt =3D 1.
> >>>> 3. Another clk_enable() is called before the first has returned
> >>>>  =C2=A0=C2=A0=C2=A0 (reentrant), but somehow spin_trylock_irqsave() =
is returning true.
> >>>>  =C2=A0=C2=A0=C2=A0 (I'm not sure how/why this is happening yet, but=
 it is happening
> >>>> to me
> >>>>  =C2=A0=C2=A0=C2=A0 with arch/arm/mach-davinci clocks that I am work=
ing on).
> >>>
> >>> I think I have figured out that since CONFIG_SMP=3Dn and
> >>> CONFIG_DEBUG_SPINLOCK=3Dn on my kernel that
> >>>
> >>> #define arch_spin_trylock(lock)({ barrier(); (void)(lock); 1; })
> >>>
> >>> in include/linux/spinlock_up.h is causing the problem.
> >>>
> >>> So, basically, reentrancy of clk_enable() is broken for non-SMP syste=
ms,
> >>> but I'm not sure I know how to fix it.
> >>>
> >>>
> >>
> >> Here is what I came up with for a fix. If it looks reasonable, I will
> >> resend as a proper patch.
> >>
> >> diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
> >> index bb1b1f9..53fad5f 100644
> >> --- a/drivers/clk/clk.c
> >> +++ b/drivers/clk/clk.c
> >> @@ -136,12 +136,23 @@ static void clk_prepare_unlock(void)
> >>           mutex_unlock(&prepare_lock);
> >>    }
> >>
> >> +#ifdef CONFIG_SMP
> >> +#define NO_SMP 0
> >> +#else
> >> +#define NO_SMP 1
> >> +#endif
> >> +
> >>    static unsigned long clk_enable_lock(void)
> >>           __acquires(enable_lock)
> >>    {
> >> -       unsigned long flags;
> >> +       unsigned long flags =3D 0;
> >>
> >> -       if (!spin_trylock_irqsave(&enable_lock, flags)) {
> >> +       /*
> >> +        * spin_trylock_irqsave() always returns true on non-SMP system
> >> (unless
> > =

> > Ugh, wrapped lines in patch make me sad.
> =

> Sorry, I was being lazy. :-/
> =

> > =

> >> +        * spin lock debugging is enabled) so don't call
> >> spin_trylock_irqsave()
> >> +        * unless we are on an SMP system.
> >> +        */
> >> +       if (NO_SMP || !spin_trylock_irqsave(&enable_lock, flags)) {
> > =

> > I'm not sure that this looks reasonable. The inverse logic (NO_SMP =3D 0
> > being equivalent to SMP =3D 1) just makes things harder to read for no
> > reason.
> > =

> > More to the point, did you fix your enable/disable call imbalance? If
> > so, did you test that fix without this patch? I'd like to know if fixing
> > the enable/disable imbalance is Good Enough. I'd prefer to take only
> > that change and not this patch.
> =

> Without this patch, the imbalance in calls to spin lock/unlock are =

> fixed, but I still get several WARN_ONCE_ON() because the reference =

> count becomes negative, so I would not call it Good Enough.

Several WARN_ON_ONCE? Have you narrowed down exactly which conditions in
the lock/unlock path are firing?

Also, I wasn't referring to the lock/unlock imbalance in my email above.
I was referring to the enable count mismatch. Have you fixed that bug? I
assume it's an incorrect clk consumer driver causing it.

Regards,
Mike

> =

> > =

> > Best regards,
> > Mike
> > =

> >>                   if (enable_owner =3D=3D current) {
> >>                           enable_refcnt++;
> >>                           __acquire(enable_lock);
> >>
>=20

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] clk: fix spin_lock/unlock imbalance on bad clk_enable() reentrancy
  2017-12-20 19:24         ` Michael Turquette
@ 2017-12-20 20:33           ` David Lechner
  2017-12-20 21:50             ` David Lechner
  0 siblings, 1 reply; 14+ messages in thread
From: David Lechner @ 2017-12-20 20:33 UTC (permalink / raw)
  To: Michael Turquette, linux-clk; +Cc: Stephen Boyd, linux-kernel, Jerome Brunet

On 12/20/2017 01:24 PM, Michael Turquette wrote:
> Quoting David Lechner (2017-12-20 10:53:27)
>> On 12/19/2017 04:29 PM, Michael Turquette wrote:
>>> Hi David,
>>>
>>> Quoting David Lechner (2017-12-15 08:29:56)
>>>> On 12/12/2017 10:14 PM, David Lechner wrote:
>>>>> On 12/12/2017 05:43 PM, David Lechner wrote:
>>>>>> If clk_enable() is called in reentrant way and spin_trylock_irqsave() is
>>>>>> not working as expected, it is possible to get a negative enable_refcnt
>>>>>> which results in a missed call to spin_unlock_irqrestore().
>>>>>>
>>>>>> It works like this:
>>>>>>
>>>>>> 1. clk_enable() is called.
>>>>>> 2. clk_enable_unlock() calls spin_trylock_irqsave() and sets
>>>>>>       enable_refcnt = 1.
>>>>>> 3. Another clk_enable() is called before the first has returned
>>>>>>       (reentrant), but somehow spin_trylock_irqsave() is returning true.
>>>>>>       (I'm not sure how/why this is happening yet, but it is happening
>>>>>> to me
>>>>>>       with arch/arm/mach-davinci clocks that I am working on).
>>>>>
>>>>> I think I have figured out that since CONFIG_SMP=n and
>>>>> CONFIG_DEBUG_SPINLOCK=n on my kernel that
>>>>>
>>>>> #define arch_spin_trylock(lock)({ barrier(); (void)(lock); 1; })
>>>>>
>>>>> in include/linux/spinlock_up.h is causing the problem.
>>>>>
>>>>> So, basically, reentrancy of clk_enable() is broken for non-SMP systems,
>>>>> but I'm not sure I know how to fix it.
>>>>>
>>>>>
>>>>
>>>> Here is what I came up with for a fix. If it looks reasonable, I will
>>>> resend as a proper patch.
>>>>
>>>> diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
>>>> index bb1b1f9..53fad5f 100644
>>>> --- a/drivers/clk/clk.c
>>>> +++ b/drivers/clk/clk.c
>>>> @@ -136,12 +136,23 @@ static void clk_prepare_unlock(void)
>>>>            mutex_unlock(&prepare_lock);
>>>>     }
>>>>
>>>> +#ifdef CONFIG_SMP
>>>> +#define NO_SMP 0
>>>> +#else
>>>> +#define NO_SMP 1
>>>> +#endif
>>>> +
>>>>     static unsigned long clk_enable_lock(void)
>>>>            __acquires(enable_lock)
>>>>     {
>>>> -       unsigned long flags;
>>>> +       unsigned long flags = 0;
>>>>
>>>> -       if (!spin_trylock_irqsave(&enable_lock, flags)) {
>>>> +       /*
>>>> +        * spin_trylock_irqsave() always returns true on non-SMP system
>>>> (unless
>>>
>>> Ugh, wrapped lines in patch make me sad.
>>
>> Sorry, I was being lazy. :-/
>>
>>>
>>>> +        * spin lock debugging is enabled) so don't call
>>>> spin_trylock_irqsave()
>>>> +        * unless we are on an SMP system.
>>>> +        */
>>>> +       if (NO_SMP || !spin_trylock_irqsave(&enable_lock, flags)) {
>>>
>>> I'm not sure that this looks reasonable. The inverse logic (NO_SMP = 0
>>> being equivalent to SMP = 1) just makes things harder to read for no
>>> reason.
>>>
>>> More to the point, did you fix your enable/disable call imbalance? If
>>> so, did you test that fix without this patch? I'd like to know if fixing
>>> the enable/disable imbalance is Good Enough. I'd prefer to take only
>>> that change and not this patch.
>>
>> Without this patch, the imbalance in calls to spin lock/unlock are
>> fixed, but I still get several WARN_ONCE_ON() because the reference
>> count becomes negative, so I would not call it Good Enough.
> 
> Several WARN_ON_ONCE? Have you narrowed down exactly which conditions in
> the lock/unlock path are firing?
> 
> Also, I wasn't referring to the lock/unlock imbalance in my email above.
> I was referring to the enable count mismatch. Have you fixed that bug? I
> assume it's an incorrect clk consumer driver causing it.
> 

OK, explaining from the beginning once again. This bug was discovered 
because we need to call clk_enable() in a reentrant way on a non SMP 
system on purpose (by design, not by chance). The call path is this:

1. clk_enable() is called.

int clk_enable(struct clk *clk)
{
	if (!clk)
		return 0;

	return clk_core_enable_lock(clk->core);
}

2.  Which in turn calls clk_core_enable_lock()


static int clk_core_enable_lock(struct clk_core *core)
{
	unsigned long flags;
	int ret;

	flags = clk_enable_lock();
	ret = clk_core_enable(core);
	clk_enable_unlock(flags);

	return ret;
}

3. Which calls clk_enable_lock()

static unsigned long clk_enable_lock(void)
	__acquires(enable_lock)
{
	unsigned long flags = 0;

	if (!spin_trylock_irqsave(&enable_lock, flags)) {
		if (enable_owner == current) {
			enable_refcnt++;
			__acquire(enable_lock);
			return flags;
		}
		spin_lock_irqsave(&enable_lock, flags);
	}
	WARN_ON_ONCE(enable_owner != NULL);
	WARN_ON_ONCE(enable_refcnt != 0);
	enable_owner = current;
	enable_refcnt = 1;
	return flags;
}

4. spin_trylock_irqsave() returns true, enable_owner was NULL and 
enable_refcnt was 0, so no warnings. When the function exits, 
enable_owner == current and enable_refcnt ==1.

5. Now we are back in clk_core_enable_lock() and clk_core_enable() is 
called.

static int clk_core_enable(struct clk_core *core)
{
	int ret = 0;

	lockdep_assert_held(&enable_lock);

	if (!core)
		return 0;

	if (WARN_ON(core->prepare_count == 0))
		return -ESHUTDOWN;

	if (core->enable_count == 0) {
		ret = clk_core_enable(core->parent);

		if (ret)
			return ret;

		trace_clk_enable_rcuidle(core);

		if (core->ops->enable)
			ret = core->ops->enable(core->hw);

		trace_clk_enable_complete_rcuidle(core);

		if (ret) {
			clk_core_disable(core->parent);
			return ret;
		}
	}

	core->enable_count++;
	return 0;
}

6. This results in calling the callback core->ops->enable(), which in 
this case is the following function. This clock has to enable another 
clock temporarily in order for this clock to start.

static void usb20_phy_clk_enable(struct davinci_clk *clk)
{
	u32 val;
	u32 timeout = 500000; /* 500 msec */

	val = readl(DA8XX_SYSCFG0_VIRT(DA8XX_CFGCHIP2_REG));

	/* Starting USB 2.O PLL requires that the USB 2.O PSC is
	 * enabled. The PSC can be disabled after the PLL has locked.
	 */
	clk_enable(usb20_clk);

	/*
	 * Turn on the USB 2.0 PHY, but just the PLL, and not OTG. The
	 * USB 1.1 host may use the PLL clock without USB 2.0 OTG being
	 * used.
	 */
	val &= ~(CFGCHIP2_RESET | CFGCHIP2_PHYPWRDN);
	val |= CFGCHIP2_PHY_PLLON;

	writel(val, DA8XX_SYSCFG0_VIRT(DA8XX_CFGCHIP2_REG));

	while (--timeout) {
		val = readl(DA8XX_SYSCFG0_VIRT(DA8XX_CFGCHIP2_REG));
		if (val & CFGCHIP2_PHYCLKGD)
			goto done;
		udelay(1);
	}

	pr_err("Timeout waiting for USB 2.0 PHY clock good\n");
done:
	clk_disable(usb20_clk);
}

7. The usb20_phy_clk_enable() calls clk_enable(). We have reentrancy 
because we have not returned from the first clk_enable() yet.

8. As before, clk_enable() calls clk_core_enable_lock()

9. Which calls clk_enable_lock().

static unsigned long clk_enable_lock(void)
	__acquires(enable_lock)
{
	unsigned long flags = 0;

	if (!spin_trylock_irqsave(&enable_lock, flags)) {
		if (enable_owner == current) {
			enable_refcnt++;
			__acquire(enable_lock);
			return flags;
		}
		spin_lock_irqsave(&enable_lock, flags);
	}
	WARN_ON_ONCE(enable_owner != NULL);
	WARN_ON_ONCE(enable_refcnt != 0);
	enable_owner = current;
	enable_refcnt = 1;
	return flags;
}

10. If this was running on an SMP system, spin_trylock_irqsave() would 
return false because we already hold the lock for enable_lock that we 
aquired in step 3 and everything would be OK. But this is not an SMP 
system, so spin_trylock_irqsave() always returns true. So the if block 
is skipped and we get a warning because enable_owner == current and then 
another warning because enable_refcnt == 1.

11. clk_enable_lock() returns back to clk_core_enable_lock().

12. clk_core_enable() is called. There is nothing notable about this call.

13. clk_enable_unlock() is called.

static void clk_enable_unlock(unsigned long flags)
	__releases(enable_lock)
{
	WARN_ON_ONCE(enable_owner != current);
	WARN_ON_ONCE(enable_refcnt == 0);

	if (--enable_refcnt) {
		__release(enable_lock);
		return;
	}
	enable_owner = NULL;
	spin_unlock_irqrestore(&enable_lock, flags);
}

14. enable_owner == current and enable_refcnt == 0, so no warnings. When 
the function returns, enable_onwer == NULL and enable_refcnt == 0.

15. clk_core_enable_lock() returns to clk_enable()

16. clk_enable() returns to usb20_phy_clk_enable()

17. usb20_phy_clk_enable() locks the PLL, then calls clk_disable()

void clk_disable(struct clk *clk)
{
	if (IS_ERR_OR_NULL(clk))
		return;

	clk_core_disable_lock(clk->core);
}

18. Which calls clk_core_disable_lock()

static void clk_core_disable_lock(struct clk_core *core)
{
	unsigned long flags;

	flags = clk_enable_lock();
	clk_core_disable(core);
	clk_enable_unlock(flags);
}

19. Which calls clk_enable_lock()

static unsigned long clk_enable_lock(void)
	__acquires(enable_lock)
{
	unsigned long flags = 0;

	if (!spin_trylock_irqsave(&enable_lock, flags)) {
		if (enable_owner == current) {
			enable_refcnt++;
			__acquire(enable_lock);
			return flags;
		}
		spin_lock_irqsave(&enable_lock, flags);
	}
	WARN_ON_ONCE(enable_owner != NULL);
	WARN_ON_ONCE(enable_refcnt != 0);
	enable_owner = current;
	enable_refcnt = 1;
	return flags;
}

20. spin_trylock_irqsave() always returns true, so skip the if block. 
enable_owner == NULL and enable_refcnt == 0, so no warnings. On return 
enable_owner == current and enable_refcnt == 1.

21. clk_core_disable() is called. Nothing notable about this.

22. clk_enable_unlock() is called.

static void clk_enable_unlock(unsigned long flags)
	__releases(enable_lock)
{
	WARN_ON_ONCE(enable_owner != current);
	WARN_ON_ONCE(enable_refcnt == 0);

	if (--enable_refcnt) {
		__release(enable_lock);
		return;
	}
	enable_owner = NULL;
	spin_unlock_irqrestore(&enable_lock, flags);
}

23. enable_owner == current and enable_refcnt == 1, so no warnings. On 
return enable_owner == NULL and enable_refcnt == 0.

24. clk_core_disable_lock() returns to clk_disable()

25. clk_disable() returns to usb20_phy_clk_enable()

26. usb20_phy_clk_enable() returns to clk_core_enable()

27. clk_core_enable() returns to clk_core_enable_lock()

28 clk_core_enable_lock() calls clk_enable_unlock()

static void clk_enable_unlock(unsigned long flags)
	__releases(enable_lock)
{
	WARN_ON_ONCE(enable_owner != current);
	WARN_ON_ONCE(enable_refcnt == 0);

	if (--enable_refcnt) {
		__release(enable_lock);
		return;
	}
	enable_owner = NULL;
	spin_unlock_irqrestore(&enable_lock, flags);
}

29. enable_owner == NULL, so we get a warning. enable_refcnt == 0, so we 
get another warning. --enable_refcnt != 0, so we return early in the if 
statement. on return, enable_onwer == NULL and enable_refcnt == -1.

30. clk_enable_unlock() returns to clk_core_enable_lock()

31. clk_core_enable_lock() returns to clk_enable(). This is the original 
clk_enable() from step 1, so we are done, but we have left enable_refcnt 
== -1. The next call to a clk_enable() will fix this and the warning 
will be suppressed because of WARN_ON_ONCE().



So, as you can see, we get 4 warnings here. There is no problem with any 
clock provider or consumer (as far as I can tell). The bug here is that 
spin_trylock_irqsave() always returns true on non-SMP systems, which 
messes up the reference counting.

usb20_phy_clk_enable() currently works because mach-davinci does not use 
the common clock framework. However, I am trying to move it to the 
common clock framework, which is how I discovered this bug.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] clk: fix spin_lock/unlock imbalance on bad clk_enable() reentrancy
  2017-12-20 20:33           ` David Lechner
@ 2017-12-20 21:50             ` David Lechner
  2017-12-21  0:23               ` Stephen Boyd
  0 siblings, 1 reply; 14+ messages in thread
From: David Lechner @ 2017-12-20 21:50 UTC (permalink / raw)
  To: Michael Turquette, linux-clk; +Cc: Stephen Boyd, linux-kernel, Jerome Brunet

On 12/20/2017 02:33 PM, David Lechner wrote:
> 
> So, as you can see, we get 4 warnings here. There is no problem with any 
> clock provider or consumer (as far as I can tell). The bug here is that 
> spin_trylock_irqsave() always returns true on non-SMP systems, which 
> messes up the reference counting.
> 
> usb20_phy_clk_enable() currently works because mach-davinci does not use 
> the common clock framework. However, I am trying to move it to the 
> common clock framework, which is how I discovered this bug.

One more thing I mentioned previously, but is worth mentioning again in 
detail is that if you enable CONFIG_DEBUG_SPINLOCK, that changes the 
behavior of spin_trylock_irqsave() on non-SMP systems. It no longer 
always returns true and so everything works as expected in the call 
chain that I described previously.

The difference is that with CONFIG_DEBUG_SPINLOCK=n, we have

#define arch_spin_trylock(lock)	({ barrier(); (void)(lock); 1; })

But if CONFIG_DEBUG_SPINLOCK=y, then we have

static inline int arch_spin_trylock(arch_spinlock_t *lock)
{
	char oldval = lock->slock;

	lock->slock = 0;
	barrier();

	return oldval > 0;
}

This comes from include/linux/spinlock_up.h, which is included from 
include/linux/spinlock.h

#ifdef CONFIG_SMP
# include <asm/spinlock.h>
#else
# include <linux/spinlock_up.h>
#endif


So, the question I have is: what is the actual "correct" behavior of 
spin_trylock_irqsave()? Is it really supposed to always return true when 
CONFIG_DEBUG_SPINLOCK=n and CONFIG_SMP=n or is this a bug?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] clk: fix spin_lock/unlock imbalance on bad clk_enable() reentrancy
  2017-12-20 21:50             ` David Lechner
@ 2017-12-21  0:23               ` Stephen Boyd
  2017-12-22  1:39                 ` Stephen Boyd
  0 siblings, 1 reply; 14+ messages in thread
From: Stephen Boyd @ 2017-12-21  0:23 UTC (permalink / raw)
  To: David Lechner; +Cc: Michael Turquette, linux-clk, linux-kernel, Jerome Brunet

On 12/20, David Lechner wrote:
> On 12/20/2017 02:33 PM, David Lechner wrote:
> >
> >So, as you can see, we get 4 warnings here. There is no problem
> >with any clock provider or consumer (as far as I can tell). The
> >bug here is that spin_trylock_irqsave() always returns true on
> >non-SMP systems, which messes up the reference counting.
> >
> >usb20_phy_clk_enable() currently works because mach-davinci does
> >not use the common clock framework. However, I am trying to move
> >it to the common clock framework, which is how I discovered this
> >bug.
> 
> One more thing I mentioned previously, but is worth mentioning again
> in detail is that if you enable CONFIG_DEBUG_SPINLOCK, that changes
> the behavior of spin_trylock_irqsave() on non-SMP systems. It no
> longer always returns true and so everything works as expected in
> the call chain that I described previously.
> 
> The difference is that with CONFIG_DEBUG_SPINLOCK=n, we have
> 
> #define arch_spin_trylock(lock)	({ barrier(); (void)(lock); 1; })
> 
> But if CONFIG_DEBUG_SPINLOCK=y, then we have
> 
> static inline int arch_spin_trylock(arch_spinlock_t *lock)
> {
> 	char oldval = lock->slock;
> 
> 	lock->slock = 0;
> 	barrier();
> 
> 	return oldval > 0;
> }
> 
> This comes from include/linux/spinlock_up.h, which is included from
> include/linux/spinlock.h
> 
> #ifdef CONFIG_SMP
> # include <asm/spinlock.h>
> #else
> # include <linux/spinlock_up.h>
> #endif
> 
> 
> So, the question I have is: what is the actual "correct" behavior of
> spin_trylock_irqsave()? Is it really supposed to always return true
> when CONFIG_DEBUG_SPINLOCK=n and CONFIG_SMP=n or is this a bug?

Thanks for doing the analysis in this thread.

When CONFIG_DEBUG_SPINLOCK=n and CONFIG_SMP=n, spinlocks are
compiler barriers, that's it. So even if it is a bug to always
return true, I fail to see how we can detect that a spinlock is
already held in this configuration and return true or false.

I suppose the best option is to make clk_enable_lock() and
clk_enable_unlock() into nops or pure owner/refcount/barrier
updates when CONFIG_SMP=n. We pretty much just need the barrier
semantics when there's only a single CPU.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] clk: fix spin_lock/unlock imbalance on bad clk_enable() reentrancy
  2017-12-21  0:23               ` Stephen Boyd
@ 2017-12-22  1:39                 ` Stephen Boyd
  2017-12-22  3:29                   ` David Lechner
  2017-12-22 18:42                   ` David Lechner
  0 siblings, 2 replies; 14+ messages in thread
From: Stephen Boyd @ 2017-12-22  1:39 UTC (permalink / raw)
  To: David Lechner; +Cc: Michael Turquette, linux-clk, linux-kernel, Jerome Brunet

On 12/20, Stephen Boyd wrote:
> On 12/20, David Lechner wrote:
> > On 12/20/2017 02:33 PM, David Lechner wrote:
> > 
> > 
> > So, the question I have is: what is the actual "correct" behavior of
> > spin_trylock_irqsave()? Is it really supposed to always return true
> > when CONFIG_DEBUG_SPINLOCK=n and CONFIG_SMP=n or is this a bug?
> 
> Thanks for doing the analysis in this thread.
> 
> When CONFIG_DEBUG_SPINLOCK=n and CONFIG_SMP=n, spinlocks are
> compiler barriers, that's it. So even if it is a bug to always
> return true, I fail to see how we can detect that a spinlock is
> already held in this configuration and return true or false.
> 
> I suppose the best option is to make clk_enable_lock() and
> clk_enable_unlock() into nops or pure owner/refcount/barrier
> updates when CONFIG_SMP=n. We pretty much just need the barrier
> semantics when there's only a single CPU.
> 

How about this patch? It should make the trylock go away on UP
configs and then we keep everything else for refcount and
ownership. We would test enable_owner outside of any
irqs/preemption disabled section though. That needs a think.

---8<----
diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index 3526bc068f30..b6f61367aa8d 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -143,7 +143,8 @@ static unsigned long clk_enable_lock(void)
 {
 	unsigned long flags;
 
-	if (!spin_trylock_irqsave(&enable_lock, flags)) {
+	if (!IS_ENABLED(CONFIG_SMP) ||
+	    !spin_trylock_irqsave(&enable_lock, flags)) {
 		if (enable_owner == current) {
 			enable_refcnt++;
 			__acquire(enable_lock);


-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] clk: fix spin_lock/unlock imbalance on bad clk_enable() reentrancy
  2017-12-22  1:39                 ` Stephen Boyd
@ 2017-12-22  3:29                   ` David Lechner
  2017-12-22 18:42                   ` David Lechner
  1 sibling, 0 replies; 14+ messages in thread
From: David Lechner @ 2017-12-22  3:29 UTC (permalink / raw)
  To: Stephen Boyd; +Cc: Michael Turquette, linux-clk, linux-kernel, Jerome Brunet

On 12/21/2017 07:39 PM, Stephen Boyd wrote:
> On 12/20, Stephen Boyd wrote:
>> On 12/20, David Lechner wrote:
>>> On 12/20/2017 02:33 PM, David Lechner wrote:
>>>
>>>
>>> So, the question I have is: what is the actual "correct" behavior of
>>> spin_trylock_irqsave()? Is it really supposed to always return true
>>> when CONFIG_DEBUG_SPINLOCK=n and CONFIG_SMP=n or is this a bug?
>>
>> Thanks for doing the analysis in this thread.
>>
>> When CONFIG_DEBUG_SPINLOCK=n and CONFIG_SMP=n, spinlocks are
>> compiler barriers, that's it. So even if it is a bug to always
>> return true, I fail to see how we can detect that a spinlock is
>> already held in this configuration and return true or false.
>>
>> I suppose the best option is to make clk_enable_lock() and
>> clk_enable_unlock() into nops or pure owner/refcount/barrier
>> updates when CONFIG_SMP=n. We pretty much just need the barrier
>> semantics when there's only a single CPU.
>>
> 
> How about this patch? It should make the trylock go away on UP
> configs and then we keep everything else for refcount and
> ownership. We would test enable_owner outside of any
> irqs/preemption disabled section though. That needs a think.
> 
> ---8<----
> diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
> index 3526bc068f30..b6f61367aa8d 100644
> --- a/drivers/clk/clk.c
> +++ b/drivers/clk/clk.c
> @@ -143,7 +143,8 @@ static unsigned long clk_enable_lock(void)
>   {
>   	unsigned long flags;
>   
> -	if (!spin_trylock_irqsave(&enable_lock, flags)) {
> +	if (!IS_ENABLED(CONFIG_SMP) ||
> +	    !spin_trylock_irqsave(&enable_lock, flags)) {
>   		if (enable_owner == current) {
>   			enable_refcnt++;
>   			__acquire(enable_lock);
> 
> 

I came up with the exact same patch earlier today, but did not have a 
chance to send it. I've tested it and it fixes the problem for me.

I'm afraid I don't know enough about how preemption works yet to be of 
much help to say what or if something else is needed to protect 
enable_owner/enable_refcnt.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] clk: fix spin_lock/unlock imbalance on bad clk_enable() reentrancy
  2017-12-22  1:39                 ` Stephen Boyd
  2017-12-22  3:29                   ` David Lechner
@ 2017-12-22 18:42                   ` David Lechner
  1 sibling, 0 replies; 14+ messages in thread
From: David Lechner @ 2017-12-22 18:42 UTC (permalink / raw)
  To: Stephen Boyd; +Cc: Michael Turquette, linux-clk, linux-kernel, Jerome Brunet

On 12/21/2017 07:39 PM, Stephen Boyd wrote:
> On 12/20, Stephen Boyd wrote:
>> On 12/20, David Lechner wrote:
>>> On 12/20/2017 02:33 PM, David Lechner wrote:
>>>
>>>
>>> So, the question I have is: what is the actual "correct" behavior of
>>> spin_trylock_irqsave()? Is it really supposed to always return true
>>> when CONFIG_DEBUG_SPINLOCK=n and CONFIG_SMP=n or is this a bug?
>>
>> Thanks for doing the analysis in this thread.
>>
>> When CONFIG_DEBUG_SPINLOCK=n and CONFIG_SMP=n, spinlocks are
>> compiler barriers, that's it. So even if it is a bug to always
>> return true, I fail to see how we can detect that a spinlock is
>> already held in this configuration and return true or false.
>>
>> I suppose the best option is to make clk_enable_lock() and
>> clk_enable_unlock() into nops or pure owner/refcount/barrier
>> updates when CONFIG_SMP=n. We pretty much just need the barrier
>> semantics when there's only a single CPU.
>>
> 
> How about this patch? It should make the trylock go away on UP
> configs and then we keep everything else for refcount and
> ownership. We would test enable_owner outside of any
> irqs/preemption disabled section though. That needs a think.
> 
> ---8<----
> diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
> index 3526bc068f30..b6f61367aa8d 100644
> --- a/drivers/clk/clk.c
> +++ b/drivers/clk/clk.c
> @@ -143,7 +143,8 @@ static unsigned long clk_enable_lock(void)
>   {
>   	unsigned long flags;
>   
> -	if (!spin_trylock_irqsave(&enable_lock, flags)) {
> +	if (!IS_ENABLED(CONFIG_SMP) ||
> +	    !spin_trylock_irqsave(&enable_lock, flags)) {
>   		if (enable_owner == current) {
>   			enable_refcnt++;
>   			__acquire(enable_lock);
> 
> 


After sleeping on it, this is what I came up with. This keeps 
enable_owner and enable_refcnt protected and basically does the same 
thing that spin_lock_irqsave()/spin_unlock_irqrestore() would do on a UP 
system anyway, just more explicitly.

---

diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index bb1b1f9..adbace3 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -136,6 +136,8 @@ static void clk_prepare_unlock(void)
  	mutex_unlock(&prepare_lock);
  }

+#ifdef CONFIG_SMP
+
  static unsigned long clk_enable_lock(void)
  	__acquires(enable_lock)
  {
@@ -170,6 +172,43 @@ static void clk_enable_unlock(unsigned long flags)
  	spin_unlock_irqrestore(&enable_lock, flags);
  }

+#else
+
+static unsigned long clk_enable_lock(void)
+	__acquires(enable_lock)
+{
+	unsigned long flags;
+
+	__acquire(enable_lock);
+	local_irq_save(flags);
+	preempt_disable();
+
+	if (enable_refcnt++ == 0) {
+		WARN_ON_ONCE(enable_owner != NULL);
+		enable_owner = current;
+	} else {
+		WARN_ON_ONCE(enable_owner != current);
+	}
+
+	return flags;
+}
+
+static void clk_enable_unlock(unsigned long flags)
+	__releases(enable_lock)
+{
+	WARN_ON_ONCE(enable_owner != current);
+	WARN_ON_ONCE(enable_refcnt == 0);
+
+	if (--enable_refcnt == 0)
+		enable_owner = NULL;
+
+	__release(enable_lock);
+	local_irq_restore(flags);
+	preempt_enable();
+}
+
+#endif
+
  static bool clk_core_is_prepared(struct clk_core *core)
  {
  	bool ret = false;

^ permalink raw reply related	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2017-12-22 18:42 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-12-12 23:43 [PATCH] clk: fix spin_lock/unlock imbalance on bad clk_enable() reentrancy David Lechner
2017-12-13  4:14 ` David Lechner
2017-12-15 13:47   ` Jerome Brunet
2017-12-15 16:26     ` David Lechner
2017-12-15 16:29   ` David Lechner
2017-12-19 22:29     ` Michael Turquette
2017-12-20 18:53       ` David Lechner
2017-12-20 19:24         ` Michael Turquette
2017-12-20 20:33           ` David Lechner
2017-12-20 21:50             ` David Lechner
2017-12-21  0:23               ` Stephen Boyd
2017-12-22  1:39                 ` Stephen Boyd
2017-12-22  3:29                   ` David Lechner
2017-12-22 18:42                   ` David Lechner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).