linux-i2c.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RESEND] Do not mark ACPI devices as irq safe
@ 2024-08-08 12:14 Breno Leitao
  2024-08-08 23:57 ` Andi Shyti
  0 siblings, 1 reply; 9+ messages in thread
From: Breno Leitao @ 2024-08-08 12:14 UTC (permalink / raw)
  To: Laxman Dewangan, Dmitry Osipenko, Andi Shyti, Thierry Reding,
	Jonathan Hunter
  Cc: leit, Michael van der Westhuizen,
	open list:I2C SUBSYSTEM HOST DRIVERS,
	open list:TEGRA ARCHITECTURE SUPPORT, open list

On ACPI machines, the tegra i2c module encounters an issue due to a
mutex being called inside a spinlock. This leads to the following bug:

	BUG: sleeping function called from invalid context at kernel/locking/mutex.c:585
	in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 1282, name: kssif0010
	preempt_count: 0, expected: 0
	RCU nest depth: 0, expected: 0
	irq event stamp: 0

	Call trace:
	dump_backtrace+0xf0/0x140
	show_stack (./arch/x86/include/asm/current.h:49
		     arch/x86/kernel/dumpstack.c:312)
	dump_stack_lvl (lib/dump_stack.c:89 lib/dump_stack.c:115)
	dump_stack (lib/earlycpio.c:61)
	__might_resched (./arch/x86/include/asm/current.h:49
			 kernel/sched/core.c:10297)
	__might_sleep (./include/linux/lockdep.h:231
			 kernel/sched/core.c:10236)
	__mutex_lock_common+0x5c/0x2190
	mutex_lock_nested (kernel/locking/mutex.c:751)
	acpi_subsys_runtime_resume+0xb8/0x160
	__rpm_callback+0x1cc/0x4b0
	rpm_resume+0xa60/0x1078
	__pm_runtime_resume+0xbc/0x130
	tegra_i2c_xfer+0x74/0x398
	__i2c_transfer (./include/trace/events/i2c.h:122 drivers/i2c/i2c-core-base.c:2258)

The problem arises because during __pm_runtime_resume(), the spinlock
&dev->power.lock is acquired before rpm_resume() is called. Later,
rpm_resume() invokes acpi_subsys_runtime_resume(), which relies on
mutexes, triggering the error.

To address this issue, devices on ACPI are now marked as not IRQ-safe,
considering the dependency of acpi_subsys_runtime_resume() on mutexes.

Co-developed-by: Michael van der Westhuizen <rmikey@meta.com>
Signed-off-by: Michael van der Westhuizen <rmikey@meta.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
---
 drivers/i2c/busses/i2c-tegra.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/i2c/busses/i2c-tegra.c b/drivers/i2c/busses/i2c-tegra.c
index 85b31edc558d..6d783ecc3431 100644
--- a/drivers/i2c/busses/i2c-tegra.c
+++ b/drivers/i2c/busses/i2c-tegra.c
@@ -1804,7 +1804,7 @@ static int tegra_i2c_probe(struct platform_device *pdev)
 	 * VI I2C device shouldn't be marked as IRQ-safe because VI I2C won't
 	 * be used for atomic transfers.
 	 */
-	if (!IS_VI(i2c_dev))
+	if (!IS_VI(i2c_dev) && !ACPI_HANDLE(i2c_dev->dev))
 		pm_runtime_irq_safe(i2c_dev->dev);
 
 	pm_runtime_enable(i2c_dev->dev);
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH RESEND] Do not mark ACPI devices as irq safe
  2024-08-08 12:14 [PATCH RESEND] Do not mark ACPI devices as irq safe Breno Leitao
@ 2024-08-08 23:57 ` Andi Shyti
  2024-08-09 11:03   ` Andy Shevchenko
  0 siblings, 1 reply; 9+ messages in thread
From: Andi Shyti @ 2024-08-08 23:57 UTC (permalink / raw)
  To: Breno Leitao
  Cc: Laxman Dewangan, Dmitry Osipenko, Thierry Reding, Jonathan Hunter,
	leit, Michael van der Westhuizen,
	open list:I2C SUBSYSTEM HOST DRIVERS,
	open list:TEGRA ARCHITECTURE SUPPORT, open list, Andy Shevchenko

Hi,

On Thu, Aug 08, 2024 at 05:14:46AM GMT, Breno Leitao wrote:
> On ACPI machines, the tegra i2c module encounters an issue due to a
> mutex being called inside a spinlock. This leads to the following bug:
> 
> 	BUG: sleeping function called from invalid context at kernel/locking/mutex.c:585
> 	in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 1282, name: kssif0010
> 	preempt_count: 0, expected: 0
> 	RCU nest depth: 0, expected: 0
> 	irq event stamp: 0
> 
> 	Call trace:
> 	dump_backtrace+0xf0/0x140
> 	show_stack (./arch/x86/include/asm/current.h:49
> 		     arch/x86/kernel/dumpstack.c:312)
> 	dump_stack_lvl (lib/dump_stack.c:89 lib/dump_stack.c:115)
> 	dump_stack (lib/earlycpio.c:61)
> 	__might_resched (./arch/x86/include/asm/current.h:49
> 			 kernel/sched/core.c:10297)
> 	__might_sleep (./include/linux/lockdep.h:231
> 			 kernel/sched/core.c:10236)
> 	__mutex_lock_common+0x5c/0x2190
> 	mutex_lock_nested (kernel/locking/mutex.c:751)
> 	acpi_subsys_runtime_resume+0xb8/0x160
> 	__rpm_callback+0x1cc/0x4b0
> 	rpm_resume+0xa60/0x1078
> 	__pm_runtime_resume+0xbc/0x130
> 	tegra_i2c_xfer+0x74/0x398
> 	__i2c_transfer (./include/trace/events/i2c.h:122 drivers/i2c/i2c-core-base.c:2258)
> 
> The problem arises because during __pm_runtime_resume(), the spinlock
> &dev->power.lock is acquired before rpm_resume() is called. Later,
> rpm_resume() invokes acpi_subsys_runtime_resume(), which relies on
> mutexes, triggering the error.
> 
> To address this issue, devices on ACPI are now marked as not IRQ-safe,
> considering the dependency of acpi_subsys_runtime_resume() on mutexes.
> 
> Co-developed-by: Michael van der Westhuizen <rmikey@meta.com>
> Signed-off-by: Michael van der Westhuizen <rmikey@meta.com>
> Signed-off-by: Breno Leitao <leitao@debian.org>
> ---
>  drivers/i2c/busses/i2c-tegra.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/i2c/busses/i2c-tegra.c b/drivers/i2c/busses/i2c-tegra.c
> index 85b31edc558d..6d783ecc3431 100644
> --- a/drivers/i2c/busses/i2c-tegra.c
> +++ b/drivers/i2c/busses/i2c-tegra.c
> @@ -1804,7 +1804,7 @@ static int tegra_i2c_probe(struct platform_device *pdev)
>  	 * VI I2C device shouldn't be marked as IRQ-safe because VI I2C won't
>  	 * be used for atomic transfers.
>  	 */
> -	if (!IS_VI(i2c_dev))
> +	if (!IS_VI(i2c_dev) && !ACPI_HANDLE(i2c_dev->dev))

looks good to me, can I have an ack from Andy here?

Thanks,
Andi

>  		pm_runtime_irq_safe(i2c_dev->dev);
>  
>  	pm_runtime_enable(i2c_dev->dev);
> -- 
> 2.43.5
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH RESEND] Do not mark ACPI devices as irq safe
  2024-08-08 23:57 ` Andi Shyti
@ 2024-08-09 11:03   ` Andy Shevchenko
  2024-08-13 13:32     ` Breno Leitao
  0 siblings, 1 reply; 9+ messages in thread
From: Andy Shevchenko @ 2024-08-09 11:03 UTC (permalink / raw)
  To: Andi Shyti
  Cc: Breno Leitao, Laxman Dewangan, Dmitry Osipenko, Thierry Reding,
	Jonathan Hunter, leit, Michael van der Westhuizen,
	open list:I2C SUBSYSTEM HOST DRIVERS,
	open list:TEGRA ARCHITECTURE SUPPORT, open list

On Fri, Aug 9, 2024 at 2:57 AM Andi Shyti <andi.shyti@kernel.org> wrote:
> On Thu, Aug 08, 2024 at 05:14:46AM GMT, Breno Leitao wrote:
> > On ACPI machines, the tegra i2c module encounters an issue due to a
> > mutex being called inside a spinlock. This leads to the following bug:

First of all, ...

> >       BUG: sleeping function called from invalid context at kernel/locking/mutex.c:585
> >       in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 1282, name: kssif0010
> >       preempt_count: 0, expected: 0
> >       RCU nest depth: 0, expected: 0
> >       irq event stamp: 0
> >
> >       Call trace:
> >       dump_backtrace+0xf0/0x140
> >       show_stack (./arch/x86/include/asm/current.h:49
> >                    arch/x86/kernel/dumpstack.c:312)
> >       dump_stack_lvl (lib/dump_stack.c:89 lib/dump_stack.c:115)
> >       dump_stack (lib/earlycpio.c:61)
> >       __might_resched (./arch/x86/include/asm/current.h:49
> >                        kernel/sched/core.c:10297)
> >       __might_sleep (./include/linux/lockdep.h:231
> >                        kernel/sched/core.c:10236)
> >       __mutex_lock_common+0x5c/0x2190
> >       mutex_lock_nested (kernel/locking/mutex.c:751)
> >       acpi_subsys_runtime_resume+0xb8/0x160
> >       __rpm_callback+0x1cc/0x4b0
> >       rpm_resume+0xa60/0x1078
> >       __pm_runtime_resume+0xbc/0x130
> >       tegra_i2c_xfer+0x74/0x398
> >       __i2c_transfer (./include/trace/events/i2c.h:122 drivers/i2c/i2c-core-base.c:2258)

...read Submitting Patches and make the above to be ~5-6 significant
(useful) lines only.

> > The problem arises because during __pm_runtime_resume(), the spinlock
> > &dev->power.lock is acquired before rpm_resume() is called. Later,
> > rpm_resume() invokes acpi_subsys_runtime_resume(), which relies on
> > mutexes, triggering the error.
> >
> > To address this issue, devices on ACPI are now marked as not IRQ-safe,
> > considering the dependency of acpi_subsys_runtime_resume() on mutexes.

This is a step in the right direction, but somewhere in the replies
here I would like to hear about roadmap to get rid of the
pm_runtime_irq_safe() in all Tegra related code.

...

> >        * VI I2C device shouldn't be marked as IRQ-safe because VI I2C won't
> >        * be used for atomic transfers.
> >        */

Does the comment need an expansion?

> > -     if (!IS_VI(i2c_dev))
> > +     if (!IS_VI(i2c_dev) && !ACPI_HANDLE(i2c_dev->dev))
>
> looks good to me, can I have an ack from Andy here?

I prefer to see something like
is_acpi_node() / is_acpi_device_node() / is_acpi_data_node() /
has_acpi_companion()
instead depending on the actual ACPI representation of the device.

Otherwise no objections.
Please, Cc me (andy@kernel.org) for the next version.

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH RESEND] Do not mark ACPI devices as irq safe
  2024-08-09 11:03   ` Andy Shevchenko
@ 2024-08-13 13:32     ` Breno Leitao
  2024-08-13 15:28       ` Dmitry Osipenko
  0 siblings, 1 reply; 9+ messages in thread
From: Breno Leitao @ 2024-08-13 13:32 UTC (permalink / raw)
  To: Andy Shevchenko, dmitry.osipenko
  Cc: Andi Shyti, Laxman Dewangan, Dmitry Osipenko, Thierry Reding,
	Jonathan Hunter, leit, Michael van der Westhuizen,
	open list:I2C SUBSYSTEM HOST DRIVERS,
	open list:TEGRA ARCHITECTURE SUPPORT, open list

Hello Andy,

On Fri, Aug 09, 2024 at 02:03:27PM +0300, Andy Shevchenko wrote:
> On Fri, Aug 9, 2024 at 2:57 AM Andi Shyti <andi.shyti@kernel.org> wrote:
> > On Thu, Aug 08, 2024 at 05:14:46AM GMT, Breno Leitao wrote:

> > > The problem arises because during __pm_runtime_resume(), the spinlock
> > > &dev->power.lock is acquired before rpm_resume() is called. Later,
> > > rpm_resume() invokes acpi_subsys_runtime_resume(), which relies on
> > > mutexes, triggering the error.
> > >
> > > To address this issue, devices on ACPI are now marked as not IRQ-safe,
> > > considering the dependency of acpi_subsys_runtime_resume() on mutexes.
> 
> This is a step in the right direction

Thanks

> but somewhere in the replies
> here I would like to hear about roadmap to get rid of the
> pm_runtime_irq_safe() in all Tegra related code.

Agree, that seems the right way to go, but this is a question to
maintainers, Laxman and Dmitry.

By the way, looking at lore, I found that the last email from Laxman is
from 2022. And Dmitry seems to be using a different email!? Let me copy
the Dmitry's other email (dmitry.osipenko@collabora.com) here.

> > > +     if (!IS_VI(i2c_dev) && !ACPI_HANDLE(i2c_dev->dev))
> >
> > looks good to me, can I have an ack from Andy here?
> 
> I prefer to see something like
> is_acpi_node() / is_acpi_device_node() / is_acpi_data_node() /
> has_acpi_companion()
> instead depending on the actual ACPI representation of the device.
> 
> Otherwise no objections.
> Please, Cc me (andy@kernel.org) for the next version.

Thanks for the feedback, I agree that leveraging the functions about
should be better. What about something as:

Author: Breno Leitao <leitao@debian.org>
Date:   Thu Jun 6 06:27:07 2024 -0700

    Do not mark ACPI devices as irq safe
    
    On ACPI machines, the tegra i2c module encounters an issue due to a
    mutex being called inside a spinlock. This leads to the following bug:
    
            BUG: sleeping function called from invalid context at kernel/locking/mutex.c:585
            in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 1282, name: kssif0010
            preempt_count: 0, expected: 0
            RCU nest depth: 0, expected: 0
            irq event stamp: 0
    
            Call trace:
            __might_sleep
            __mutex_lock_common
            mutex_lock_nested
            acpi_subsys_runtime_resume
            rpm_resume
            tegra_i2c_xfer
    
    The problem arises because during __pm_runtime_resume(), the spinlock
    &dev->power.lock is acquired before rpm_resume() is called. Later,
    rpm_resume() invokes acpi_subsys_runtime_resume(), which relies on
    mutexes, triggering the error.
    
    To address this issue, devices on ACPI are now marked as not IRQ-safe,
    considering the dependency of acpi_subsys_runtime_resume() on mutexes.
    
    Co-developed-by: Michael van der Westhuizen <rmikey@meta.com>
    Signed-off-by: Michael van der Westhuizen <rmikey@meta.com>
    Signed-off-by: Breno Leitao <leitao@debian.org>

diff --git a/drivers/i2c/busses/i2c-tegra.c b/drivers/i2c/busses/i2c-tegra.c
index 85b31edc558d..1df5b4204142 100644
--- a/drivers/i2c/busses/i2c-tegra.c
+++ b/drivers/i2c/busses/i2c-tegra.c
@@ -1802,9 +1802,9 @@ static int tegra_i2c_probe(struct platform_device *pdev)
 	 * domain.
 	 *
 	 * VI I2C device shouldn't be marked as IRQ-safe because VI I2C won't
-	 * be used for atomic transfers.
+	 * be used for atomic transfers. ACPI device is not IRQ safe also.
 	 */
-	if (!IS_VI(i2c_dev))
+	if (!IS_VI(i2c_dev) && !has_acpi_companion(i2c_dev->dev))
 		pm_runtime_irq_safe(i2c_dev->dev);
 
 	pm_runtime_enable(i2c_dev->dev);


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH RESEND] Do not mark ACPI devices as irq safe
  2024-08-13 13:32     ` Breno Leitao
@ 2024-08-13 15:28       ` Dmitry Osipenko
  2024-08-13 15:52         ` Andy Shevchenko
  0 siblings, 1 reply; 9+ messages in thread
From: Dmitry Osipenko @ 2024-08-13 15:28 UTC (permalink / raw)
  To: Breno Leitao, Andy Shevchenko, dmitry.osipenko
  Cc: Andi Shyti, Laxman Dewangan, Thierry Reding, Jonathan Hunter,
	leit, Michael van der Westhuizen,
	open list:I2C SUBSYSTEM HOST DRIVERS,
	open list:TEGRA ARCHITECTURE SUPPORT, open list

13.08.2024 16:32, Breno Leitao пишет:
> Hello Andy,
> 
> On Fri, Aug 09, 2024 at 02:03:27PM +0300, Andy Shevchenko wrote:
>> On Fri, Aug 9, 2024 at 2:57 AM Andi Shyti <andi.shyti@kernel.org> wrote:
>>> On Thu, Aug 08, 2024 at 05:14:46AM GMT, Breno Leitao wrote:
> 
>>>> The problem arises because during __pm_runtime_resume(), the spinlock
>>>> &dev->power.lock is acquired before rpm_resume() is called. Later,
>>>> rpm_resume() invokes acpi_subsys_runtime_resume(), which relies on
>>>> mutexes, triggering the error.
>>>>
>>>> To address this issue, devices on ACPI are now marked as not IRQ-safe,
>>>> considering the dependency of acpi_subsys_runtime_resume() on mutexes.
>>
>> This is a step in the right direction
> 
> Thanks
> 
>> but somewhere in the replies
>> here I would like to hear about roadmap to get rid of the
>> pm_runtime_irq_safe() in all Tegra related code.
> 
> Agree, that seems the right way to go, but this is a question to
> maintainers, Laxman and Dmitry.
> 
> By the way, looking at lore, I found that the last email from Laxman is
> from 2022. And Dmitry seems to be using a different email!? Let me copy
> the Dmitry's other email (dmitry.osipenko@collabora.com) here.
> 
>>>> +     if (!IS_VI(i2c_dev) && !ACPI_HANDLE(i2c_dev->dev))
>>>
>>> looks good to me, can I have an ack from Andy here?
>>
>> I prefer to see something like
>> is_acpi_node() / is_acpi_device_node() / is_acpi_data_node() /
>> has_acpi_companion()
>> instead depending on the actual ACPI representation of the device.
>>
>> Otherwise no objections.
>> Please, Cc me (andy@kernel.org) for the next version.
> 
> Thanks for the feedback, I agree that leveraging the functions about
> should be better. What about something as:
> 
> Author: Breno Leitao <leitao@debian.org>
> Date:   Thu Jun 6 06:27:07 2024 -0700
> 
>     Do not mark ACPI devices as irq safe
>     
>     On ACPI machines, the tegra i2c module encounters an issue due to a
>     mutex being called inside a spinlock. This leads to the following bug:
>     
>             BUG: sleeping function called from invalid context at kernel/locking/mutex.c:585
>             in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 1282, name: kssif0010
>             preempt_count: 0, expected: 0
>             RCU nest depth: 0, expected: 0
>             irq event stamp: 0
>     
>             Call trace:
>             __might_sleep
>             __mutex_lock_common
>             mutex_lock_nested
>             acpi_subsys_runtime_resume
>             rpm_resume
>             tegra_i2c_xfer
>     
>     The problem arises because during __pm_runtime_resume(), the spinlock
>     &dev->power.lock is acquired before rpm_resume() is called. Later,
>     rpm_resume() invokes acpi_subsys_runtime_resume(), which relies on
>     mutexes, triggering the error.
>     
>     To address this issue, devices on ACPI are now marked as not IRQ-safe,
>     considering the dependency of acpi_subsys_runtime_resume() on mutexes.
>     
>     Co-developed-by: Michael van der Westhuizen <rmikey@meta.com>
>     Signed-off-by: Michael van der Westhuizen <rmikey@meta.com>
>     Signed-off-by: Breno Leitao <leitao@debian.org>
> 
> diff --git a/drivers/i2c/busses/i2c-tegra.c b/drivers/i2c/busses/i2c-tegra.c
> index 85b31edc558d..1df5b4204142 100644
> --- a/drivers/i2c/busses/i2c-tegra.c
> +++ b/drivers/i2c/busses/i2c-tegra.c
> @@ -1802,9 +1802,9 @@ static int tegra_i2c_probe(struct platform_device *pdev)
>  	 * domain.
>  	 *
>  	 * VI I2C device shouldn't be marked as IRQ-safe because VI I2C won't
> -	 * be used for atomic transfers.
> +	 * be used for atomic transfers. ACPI device is not IRQ safe also.
>  	 */
> -	if (!IS_VI(i2c_dev))
> +	if (!IS_VI(i2c_dev) && !has_acpi_companion(i2c_dev->dev))
>  		pm_runtime_irq_safe(i2c_dev->dev);
>  
>  	pm_runtime_enable(i2c_dev->dev);
> 

Looks good, thanks

Reviewed-by: Dmitry Osipenko <digetx@gmail.com>

> but somewhere in the replies
> here I would like to hear about roadmap to get rid of the
> pm_runtime_irq_safe() in all Tegra related code.

What is the problem with pm_runtime_irq_safe()? There were multiple
problems with RPM for this driver in the past, it wasn't trivial to make
it work for all Tegra HW generations. Don't expect anyone would want to
invest time into doing it all over again.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH RESEND] Do not mark ACPI devices as irq safe
  2024-08-13 15:28       ` Dmitry Osipenko
@ 2024-08-13 15:52         ` Andy Shevchenko
  2024-08-15  2:48           ` Dmitry Osipenko
  0 siblings, 1 reply; 9+ messages in thread
From: Andy Shevchenko @ 2024-08-13 15:52 UTC (permalink / raw)
  To: Dmitry Osipenko
  Cc: Breno Leitao, dmitry.osipenko, Andi Shyti, Laxman Dewangan,
	Thierry Reding, Jonathan Hunter, leit, Michael van der Westhuizen,
	open list:I2C SUBSYSTEM HOST DRIVERS,
	open list:TEGRA ARCHITECTURE SUPPORT, open list

On Tue, Aug 13, 2024 at 6:28 PM Dmitry Osipenko <digetx@gmail.com> wrote:
>
> 13.08.2024 16:32, Breno Leitao пишет:
> > Hello Andy,
> >
> > On Fri, Aug 09, 2024 at 02:03:27PM +0300, Andy Shevchenko wrote:
> >> On Fri, Aug 9, 2024 at 2:57 AM Andi Shyti <andi.shyti@kernel.org> wrote:
> >>> On Thu, Aug 08, 2024 at 05:14:46AM GMT, Breno Leitao wrote:
> >
> >>>> The problem arises because during __pm_runtime_resume(), the spinlock
> >>>> &dev->power.lock is acquired before rpm_resume() is called. Later,
> >>>> rpm_resume() invokes acpi_subsys_runtime_resume(), which relies on
> >>>> mutexes, triggering the error.
> >>>>
> >>>> To address this issue, devices on ACPI are now marked as not IRQ-safe,
> >>>> considering the dependency of acpi_subsys_runtime_resume() on mutexes.
> >>
> >> This is a step in the right direction
> >
> > Thanks
> >
> >> but somewhere in the replies
> >> here I would like to hear about roadmap to get rid of the
> >> pm_runtime_irq_safe() in all Tegra related code.
> >
> > Agree, that seems the right way to go, but this is a question to
> > maintainers, Laxman and Dmitry.
> >
> > By the way, looking at lore, I found that the last email from Laxman is
> > from 2022. And Dmitry seems to be using a different email!? Let me copy
> > the Dmitry's other email (dmitry.osipenko@collabora.com) here.
> >
> >>>> +     if (!IS_VI(i2c_dev) && !ACPI_HANDLE(i2c_dev->dev))
> >>>
> >>> looks good to me, can I have an ack from Andy here?
> >>
> >> I prefer to see something like
> >> is_acpi_node() / is_acpi_device_node() / is_acpi_data_node() /
> >> has_acpi_companion()
> >> instead depending on the actual ACPI representation of the device.
> >>
> >> Otherwise no objections.
> >> Please, Cc me (andy@kernel.org) for the next version.
> >
> > Thanks for the feedback, I agree that leveraging the functions about
> > should be better. What about something as:
> >
> > Author: Breno Leitao <leitao@debian.org>
> > Date:   Thu Jun 6 06:27:07 2024 -0700
> >
> >     Do not mark ACPI devices as irq safe
> >
> >     On ACPI machines, the tegra i2c module encounters an issue due to a
> >     mutex being called inside a spinlock. This leads to the following bug:
> >
> >             BUG: sleeping function called from invalid context at kernel/locking/mutex.c:585
> >             in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 1282, name: kssif0010
> >             preempt_count: 0, expected: 0
> >             RCU nest depth: 0, expected: 0
> >             irq event stamp: 0
> >
> >             Call trace:
> >             __might_sleep
> >             __mutex_lock_common
> >             mutex_lock_nested
> >             acpi_subsys_runtime_resume
> >             rpm_resume
> >             tegra_i2c_xfer
> >
> >     The problem arises because during __pm_runtime_resume(), the spinlock
> >     &dev->power.lock is acquired before rpm_resume() is called. Later,
> >     rpm_resume() invokes acpi_subsys_runtime_resume(), which relies on
> >     mutexes, triggering the error.
> >
> >     To address this issue, devices on ACPI are now marked as not IRQ-safe,
> >     considering the dependency of acpi_subsys_runtime_resume() on mutexes.
> >
> >     Co-developed-by: Michael van der Westhuizen <rmikey@meta.com>
> >     Signed-off-by: Michael van der Westhuizen <rmikey@meta.com>
> >     Signed-off-by: Breno Leitao <leitao@debian.org>
> >
> > diff --git a/drivers/i2c/busses/i2c-tegra.c b/drivers/i2c/busses/i2c-tegra.c
> > index 85b31edc558d..1df5b4204142 100644
> > --- a/drivers/i2c/busses/i2c-tegra.c
> > +++ b/drivers/i2c/busses/i2c-tegra.c
> > @@ -1802,9 +1802,9 @@ static int tegra_i2c_probe(struct platform_device *pdev)
> >        * domain.
> >        *
> >        * VI I2C device shouldn't be marked as IRQ-safe because VI I2C won't
> > -      * be used for atomic transfers.
> > +      * be used for atomic transfers. ACPI device is not IRQ safe also.
> >        */
> > -     if (!IS_VI(i2c_dev))
> > +     if (!IS_VI(i2c_dev) && !has_acpi_companion(i2c_dev->dev))
> >               pm_runtime_irq_safe(i2c_dev->dev);
> >
> >       pm_runtime_enable(i2c_dev->dev);
> >
>
> Looks good, thanks
>
> Reviewed-by: Dmitry Osipenko <digetx@gmail.com>

LGTM as well, feel free to add
Reviewed-by: Andy Shevchenko <andy@kernel.org>
to the above when sending it formally.

> > but somewhere in the replies
> > here I would like to hear about roadmap to get rid of the
> > pm_runtime_irq_safe() in all Tegra related code.
>
> What is the problem with pm_runtime_irq_safe()?

It's a hack. It has no reasons to stay in the kernel. It also prevents
PM from working properly (in some cases, not Tegra).

> There were multiple
> problems with RPM for this driver in the past, it wasn't trivial to make
> it work for all Tegra HW generations. Don't expect anyone would want to
> invest time into doing it all over again.

You may always refer to the OMAP case, which used to have 12 (IIRC,
but definitely several) calls to this API and now 0. Taking the OMAP
case into consideration I believe it's quite possible to get rid of
this hack and retire the API completely. Yes, this may take months or
even years. But I would like to have this roadmap be documented.

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH RESEND] Do not mark ACPI devices as irq safe
  2024-08-13 15:52         ` Andy Shevchenko
@ 2024-08-15  2:48           ` Dmitry Osipenko
  2024-08-19  9:20             ` Andy Shevchenko
  0 siblings, 1 reply; 9+ messages in thread
From: Dmitry Osipenko @ 2024-08-15  2:48 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Breno Leitao, dmitry.osipenko, Andi Shyti, Laxman Dewangan,
	Thierry Reding, Jonathan Hunter, leit, Michael van der Westhuizen,
	open list:I2C SUBSYSTEM HOST DRIVERS,
	open list:TEGRA ARCHITECTURE SUPPORT, open list

13.08.2024 18:52, Andy Shevchenko пишет:
...
>>> but somewhere in the replies
>>> here I would like to hear about roadmap to get rid of the
>>> pm_runtime_irq_safe() in all Tegra related code.
>>
>> What is the problem with pm_runtime_irq_safe()?
> 
> It's a hack. It has no reasons to stay in the kernel. It also prevents
> PM from working properly (in some cases, not Tegra).

Why is it a hack? Why it can't be made to work properly for all cases?

>> There were multiple
>> problems with RPM for this driver in the past, it wasn't trivial to make
>> it work for all Tegra HW generations. Don't expect anyone would want to
>> invest time into doing it all over again.
> 
> You may always refer to the OMAP case, which used to have 12 (IIRC,
> but definitely several) calls to this API and now 0. Taking the OMAP
> case into consideration I believe it's quite possible to get rid of
> this hack and retire the API completely. Yes, this may take months or
> even years. But I would like to have this roadmap be documented.

There should be alternative to the removed API. Otherwise drivers will
have to have own hacks to work around the RPM limitation, re-invent own
PM, or not do RPM at all.

Looking at the i2c-omap.c, I see it's doing pm_runtime_get_sync() in the
atomic transfer, which should cause a lockup without IRQ-safe RPM,
AFAICT. The OMAP example doesn't look great so far.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH RESEND] Do not mark ACPI devices as irq safe
  2024-08-15  2:48           ` Dmitry Osipenko
@ 2024-08-19  9:20             ` Andy Shevchenko
  2024-08-20  3:57               ` Tony Lindgren
  0 siblings, 1 reply; 9+ messages in thread
From: Andy Shevchenko @ 2024-08-19  9:20 UTC (permalink / raw)
  To: Dmitry Osipenko, Tony Lindgren
  Cc: Breno Leitao, dmitry.osipenko, Andi Shyti, Laxman Dewangan,
	Thierry Reding, Jonathan Hunter, leit, Michael van der Westhuizen,
	open list:I2C SUBSYSTEM HOST DRIVERS,
	open list:TEGRA ARCHITECTURE SUPPORT, open list

+Cc: Tony

On Thu, Aug 15, 2024 at 5:48 AM Dmitry Osipenko <digetx@gmail.com> wrote:
> 13.08.2024 18:52, Andy Shevchenko пишет:
> ...
> >>> but somewhere in the replies
> >>> here I would like to hear about roadmap to get rid of the
> >>> pm_runtime_irq_safe() in all Tegra related code.
> >>
> >> What is the problem with pm_runtime_irq_safe()?
> >
> > It's a hack. It has no reasons to stay in the kernel. It also prevents
> > PM from working properly (in some cases, not Tegra).
>
> Why is it a hack? Why it can't be made to work properly for all cases?

Because it messes up with the proper power transitions of the parent
devices. Refer to the initial commit c7b61de5b7b1 ("PM / Runtime: Add
synchronous runtime interface for interrupt handlers (v3)") that
pretty much explains the constraints of it. Also note, it was added
quite a while after the main PM machinery had been introduced.

What you have to use is device links to make sure the parent (PM
speaking) may not go away.
FWIW, if I am not mistaken the whole reconsideration of
pm_runtime_irq_safe() had been started with this [1] thread.

If you want to dive more into the history of this API, run `git log -S
pm_runtime_irq_safe`. It gives you also interesting facts of how it
was started being used and in many cases reverted or reworked for a
reason.

> >> There were multiple
> >> problems with RPM for this driver in the past, it wasn't trivial to make
> >> it work for all Tegra HW generations. Don't expect anyone would want to
> >> invest time into doing it all over again.
> >
> > You may always refer to the OMAP case, which used to have 12 (IIRC,
> > but definitely several) calls to this API and now 0. Taking the OMAP
> > case into consideration I believe it's quite possible to get rid of
> > this hack and retire the API completely. Yes, this may take months or
> > even years. But I would like to have this roadmap be documented.
>
> There should be alternative to the removed API. Otherwise drivers will
> have to have own hacks to work around the RPM limitation, re-invent own
> PM, or not do RPM at all.
>
> Looking at the i2c-omap.c, I see it's doing pm_runtime_get_sync() in the
> atomic transfer, which should cause a lockup without IRQ-safe RPM,
> AFAICT. The OMAP example doesn't look great so far.

Bugs may still appear, but it's not a point. I can easily find a
better example with a hint why it's bad to call that API [2][3][4] and
so on.

[1]: https://lore.kernel.org/all/20180515183409.78046-1-andriy.shevchenko@linux.intel.com/T/#u
[2]: https://lore.kernel.org/all/20191114101718.20619-1-peter.ujfalusi@ti.com/
[3]: https://lore.kernel.org/all/20180920193532.7714-1-tony@atomide.com/
[4]: https://lore.kernel.org/all/1463014396-4095-1-git-send-email-tony@atomide.com/


-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH RESEND] Do not mark ACPI devices as irq safe
  2024-08-19  9:20             ` Andy Shevchenko
@ 2024-08-20  3:57               ` Tony Lindgren
  0 siblings, 0 replies; 9+ messages in thread
From: Tony Lindgren @ 2024-08-20  3:57 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Dmitry Osipenko, Breno Leitao, dmitry.osipenko, Andi Shyti,
	Laxman Dewangan, Thierry Reding, Jonathan Hunter, leit,
	Michael van der Westhuizen, open list:I2C SUBSYSTEM HOST DRIVERS,
	open list:TEGRA ARCHITECTURE SUPPORT, open list, Kevin Hilman

* Andy Shevchenko <andy.shevchenko@gmail.com> [240819 09:21]:
> +Cc: Tony
> 
> On Thu, Aug 15, 2024 at 5:48 AM Dmitry Osipenko <digetx@gmail.com> wrote:
> > 13.08.2024 18:52, Andy Shevchenko пишет:
> > ...
> > >>> but somewhere in the replies
> > >>> here I would like to hear about roadmap to get rid of the
> > >>> pm_runtime_irq_safe() in all Tegra related code.
> > >>
> > >> What is the problem with pm_runtime_irq_safe()?
> > >
> > > It's a hack. It has no reasons to stay in the kernel. It also prevents
> > > PM from working properly (in some cases, not Tegra).
> >
> > Why is it a hack? Why it can't be made to work properly for all cases?
> 
> Because it messes up with the proper power transitions of the parent
> devices. Refer to the initial commit c7b61de5b7b1 ("PM / Runtime: Add
> synchronous runtime interface for interrupt handlers (v3)") that
> pretty much explains the constraints of it. Also note, it was added
> quite a while after the main PM machinery had been introduced.
> 
> What you have to use is device links to make sure the parent (PM
> speaking) may not go away.
> FWIW, if I am not mistaken the whole reconsideration of
> pm_runtime_irq_safe() had been started with this [1] thread.
> 
> If you want to dive more into the history of this API, run `git log -S
> pm_runtime_irq_safe`. It gives you also interesting facts of how it
> was started being used and in many cases reverted or reworked for a
> reason.

Yeah we should remove pm_runtime_irq_safe() completely. Fixing the use
of it in a driver afterwards is always a pain. And so far there has
always been a better solution available for the use cases I've seen.

> > >> There were multiple
> > >> problems with RPM for this driver in the past, it wasn't trivial to make
> > >> it work for all Tegra HW generations. Don't expect anyone would want to
> > >> invest time into doing it all over again.
> > >
> > > You may always refer to the OMAP case, which used to have 12 (IIRC,
> > > but definitely several) calls to this API and now 0. Taking the OMAP
> > > case into consideration I believe it's quite possible to get rid of
> > > this hack and retire the API completely. Yes, this may take months or
> > > even years. But I would like to have this roadmap be documented.
> >
> > There should be alternative to the removed API. Otherwise drivers will
> > have to have own hacks to work around the RPM limitation, re-invent own
> > PM, or not do RPM at all.
> >
> > Looking at the i2c-omap.c, I see it's doing pm_runtime_get_sync() in the
> > atomic transfer, which should cause a lockup without IRQ-safe RPM,
> > AFAICT. The OMAP example doesn't look great so far.
> 
> Bugs may still appear, but it's not a point. I can easily find a
> better example with a hint why it's bad to call that API [2][3][4] and
> so on.

Adding Kevin for the i2c-omap.c, sounds like it might depend on the
autosuspend timeout for runtime PM.

For issues where the controller may wake to an interrupt while runtime
idle, there's pm_runtime_get_noresume().

Regards,

Tony

> [1]: https://lore.kernel.org/all/20180515183409.78046-1-andriy.shevchenko@linux.intel.com/T/#u
> [2]: https://lore.kernel.org/all/20191114101718.20619-1-peter.ujfalusi@ti.com/
> [3]: https://lore.kernel.org/all/20180920193532.7714-1-tony@atomide.com/
> [4]: https://lore.kernel.org/all/1463014396-4095-1-git-send-email-tony@atomide.com/

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-08-20  4:06 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-08 12:14 [PATCH RESEND] Do not mark ACPI devices as irq safe Breno Leitao
2024-08-08 23:57 ` Andi Shyti
2024-08-09 11:03   ` Andy Shevchenko
2024-08-13 13:32     ` Breno Leitao
2024-08-13 15:28       ` Dmitry Osipenko
2024-08-13 15:52         ` Andy Shevchenko
2024-08-15  2:48           ` Dmitry Osipenko
2024-08-19  9:20             ` Andy Shevchenko
2024-08-20  3:57               ` Tony Lindgren

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).