All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marc Zyngier <maz@kernel.org>
To: Jerome Brunet <jbrunet@baylibre.com>
Cc: Jason Cooper <jason@lakedaemon.net>,
	Martin Blumenstingl <martin.blumenstingl@googlemail.com>,
	Kevin Hilman <khilman@baylibre.com>,
	linux-kernel@vger.kernel.org,
	Thomas Gleixner <tglx@linutronix.de>,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH] irqchip/meson-gpio: Fix HARDIRQ-safe -> HARDIRQ-unsafe lock order
Date: Thu, 16 Apr 2020 10:40:25 +0100	[thread overview]
Message-ID: <20200416104025.5a22c228@why> (raw)
In-Reply-To: <1jeesqt686.fsf@starbuckisacylon.baylibre.com>

On Tue, 14 Apr 2020 17:52:25 +0200
Jerome Brunet <jbrunet@baylibre.com> wrote:

Hi Jerome,

> On Tue 14 Apr 2020 at 15:20, Marc Zyngier <maz@kernel.org> wrote:
> 
> > On Tue,  7 Apr 2020 15:46:58 +0100
> > Marc Zyngier <maz@kernel.org> wrote:
> >
> > +Jerome, Martin,
> >  
> >> Running a lockedp-enabled kernel on a vim3l board (Amlogic SM1)
> >> leads to the following splat:
> >> 
> >> [   13.557138] WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
> >> [   13.587485] ip/456 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
> >> [   13.625922] ffff000059908cf0 (&irq_desc_lock_class){-.-.}-{2:2}, at: __setup_irq+0xf8/0x8d8
> >> [   13.632273] which would create a new lock dependency:
> >> [   13.637272]  (&irq_desc_lock_class){-.-.}-{2:2} -> (&ctl->lock){+.+.}-{2:2}
> >> [   13.644209]
> >> [   13.644209] but this new dependency connects a HARDIRQ-irq-safe lock:
> >> [   13.654122]  (&irq_desc_lock_class){-.-.}-{2:2}
> >> [   13.654125]
> >> [   13.654125] ... which became HARDIRQ-irq-safe at:
> >> [   13.664759]   lock_acquire+0xec/0x368
> >> [   13.666926]   _raw_spin_lock+0x60/0x88
> >> [   13.669979]   handle_fasteoi_irq+0x30/0x178
> >> [   13.674082]   generic_handle_irq+0x38/0x50
> >> [   13.678098]   __handle_domain_irq+0x6c/0xc8
> >> [   13.682209]   gic_handle_irq+0x5c/0xb0
> >> [   13.685872]   el1_irq+0xd0/0x180
> >> [   13.689010]   arch_cpu_idle+0x40/0x220
> >> [   13.692732]   default_idle_call+0x54/0x60
> >> [   13.696677]   do_idle+0x23c/0x2e8
> >> [   13.699903]   cpu_startup_entry+0x30/0x50
> >> [   13.703852]   rest_init+0x1e0/0x2b4
> >> [   13.707301]   arch_call_rest_init+0x18/0x24
> >> [   13.711449]   start_kernel+0x4ec/0x51c
> >> [   13.715167]
> >> [   13.715167] to a HARDIRQ-irq-unsafe lock:
> >> [   13.722426]  (&ctl->lock){+.+.}-{2:2}
> >> [   13.722430]
> >> [   13.722430] ... which became HARDIRQ-irq-unsafe at:
> >> [   13.732319] ...
> >> [   13.732324]   lock_acquire+0xec/0x368
> >> [   13.735985]   _raw_spin_lock+0x60/0x88
> >> [   13.739452]   meson_gpio_irq_domain_alloc+0xcc/0x290
> >> [   13.744392]   irq_domain_alloc_irqs_hierarchy+0x24/0x60
> >> [   13.749586]   __irq_domain_alloc_irqs+0x160/0x2f0
> >> [   13.754254]   irq_create_fwspec_mapping+0x118/0x320
> >> [   13.759073]   irq_create_of_mapping+0x78/0xa0
> >> [   13.763360]   of_irq_get+0x6c/0x80
> >> [   13.766701]   of_mdiobus_register_phy+0x10c/0x238 [of_mdio]
> >> [   13.772227]   of_mdiobus_register+0x158/0x380 [of_mdio]
> >> [   13.777388]   mdio_mux_init+0x180/0x2e8 [mdio_mux]
> >> [   13.782128]   g12a_mdio_mux_probe+0x290/0x398 [mdio_mux_meson_g12a]
> >> [   13.788349]   platform_drv_probe+0x5c/0xb0
> >> [   13.792379]   really_probe+0xe4/0x448
> >> [   13.795979]   driver_probe_device+0xe8/0x140
> >> [   13.800189]   __device_attach_driver+0x94/0x120
> >> [   13.804639]   bus_for_each_drv+0x84/0xd8
> >> [   13.808474]   __device_attach+0xe4/0x168
> >> [   13.812361]   device_initial_probe+0x1c/0x28
> >> [   13.816592]   bus_probe_device+0xa4/0xb0
> >> [   13.820430]   deferred_probe_work_func+0xa8/0x100
> >> [   13.825064]   process_one_work+0x264/0x688
> >> [   13.829088]   worker_thread+0x4c/0x458
> >> [   13.832768]   kthread+0x154/0x158
> >> [   13.836018]   ret_from_fork+0x10/0x18
> >> [   13.839612]
> >> [   13.839612] other info that might help us debug this:
> >> [   13.839612]
> >> [   13.850354]  Possible interrupt unsafe locking scenario:
> >> [   13.850354]
> >> [   13.855720]        CPU0                    CPU1
> >> [   13.858774]        ----                    ----
> >> [   13.863242]   lock(&ctl->lock);
> >> [   13.866330]                                local_irq_disable();
> >> [   13.872233]                                lock(&irq_desc_lock_class);
> >> [   13.878705]                                lock(&ctl->lock);
> >> [   13.884297]   <Interrupt>
> >> [   13.886857]     lock(&irq_desc_lock_class);
> >> [   13.891014]
> >> [   13.891014]  *** DEADLOCK ***
> >> 
> >> The issue can occur when CPU1 is doing something like irq_set_type()
> >> and CPU0 performing an interrupt allocation, for example. Taking
> >> an interrupt (like the one being reconfigured) would lead to a
> >> deadlock.  
> 
> Just to make sure I understand
> * the 1st trace is a CPU getting interrupted while setting the irq type
> * the 2nd trace is another CPU trying to allocate an irq for network PHY.

The traces are only what lockdep sees as a dangerous behaviour, not
necessarily what actually leads to a deadlock. The deadlock scenario is
the one outlined just before "*** DEADLOCK ***", and a way to get there
is my interpretation just above.

> >> 
> >> A solution to this is:
> >> 
> >> - Reorder the locking so that meson_gpio_irq_update_bits takes the lock
> >>   itself at all times, instead of relying on the caller to lock or not,
> >>   hence making the RMW sequence atomic,
> >> 
> >> - Rework the critical section in meson_gpio_irq_request_channel to only
> >>   cover the allocation itself, and let the gpio_irq_sel_pin callback
> >>   deal with its own locking if required,
> >> 
> >> - Take the private spin-lock with interrupts disabled at all times  
> 
> Looks like the only safe path if I understand correctly.
> The patch below looks good to me.
> 
> >> 
> >> Signed-off-by: Marc Zyngier <maz@kernel.org>  
> 
> Thanks for the fix Marc.
> 
> Reviewed-by: Jerome Brunet <jbrunet@baylibre.com>

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

WARNING: multiple messages have this Message-ID (diff)
From: Marc Zyngier <maz@kernel.org>
To: Jerome Brunet <jbrunet@baylibre.com>
Cc: linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org,
	Thomas Gleixner <tglx@linutronix.de>,
	Jason Cooper <jason@lakedaemon.net>,
	Kevin Hilman <khilman@baylibre.com>,
	Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Subject: Re: [PATCH] irqchip/meson-gpio: Fix HARDIRQ-safe -> HARDIRQ-unsafe lock order
Date: Thu, 16 Apr 2020 10:40:25 +0100	[thread overview]
Message-ID: <20200416104025.5a22c228@why> (raw)
In-Reply-To: <1jeesqt686.fsf@starbuckisacylon.baylibre.com>

On Tue, 14 Apr 2020 17:52:25 +0200
Jerome Brunet <jbrunet@baylibre.com> wrote:

Hi Jerome,

> On Tue 14 Apr 2020 at 15:20, Marc Zyngier <maz@kernel.org> wrote:
> 
> > On Tue,  7 Apr 2020 15:46:58 +0100
> > Marc Zyngier <maz@kernel.org> wrote:
> >
> > +Jerome, Martin,
> >  
> >> Running a lockedp-enabled kernel on a vim3l board (Amlogic SM1)
> >> leads to the following splat:
> >> 
> >> [   13.557138] WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
> >> [   13.587485] ip/456 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
> >> [   13.625922] ffff000059908cf0 (&irq_desc_lock_class){-.-.}-{2:2}, at: __setup_irq+0xf8/0x8d8
> >> [   13.632273] which would create a new lock dependency:
> >> [   13.637272]  (&irq_desc_lock_class){-.-.}-{2:2} -> (&ctl->lock){+.+.}-{2:2}
> >> [   13.644209]
> >> [   13.644209] but this new dependency connects a HARDIRQ-irq-safe lock:
> >> [   13.654122]  (&irq_desc_lock_class){-.-.}-{2:2}
> >> [   13.654125]
> >> [   13.654125] ... which became HARDIRQ-irq-safe at:
> >> [   13.664759]   lock_acquire+0xec/0x368
> >> [   13.666926]   _raw_spin_lock+0x60/0x88
> >> [   13.669979]   handle_fasteoi_irq+0x30/0x178
> >> [   13.674082]   generic_handle_irq+0x38/0x50
> >> [   13.678098]   __handle_domain_irq+0x6c/0xc8
> >> [   13.682209]   gic_handle_irq+0x5c/0xb0
> >> [   13.685872]   el1_irq+0xd0/0x180
> >> [   13.689010]   arch_cpu_idle+0x40/0x220
> >> [   13.692732]   default_idle_call+0x54/0x60
> >> [   13.696677]   do_idle+0x23c/0x2e8
> >> [   13.699903]   cpu_startup_entry+0x30/0x50
> >> [   13.703852]   rest_init+0x1e0/0x2b4
> >> [   13.707301]   arch_call_rest_init+0x18/0x24
> >> [   13.711449]   start_kernel+0x4ec/0x51c
> >> [   13.715167]
> >> [   13.715167] to a HARDIRQ-irq-unsafe lock:
> >> [   13.722426]  (&ctl->lock){+.+.}-{2:2}
> >> [   13.722430]
> >> [   13.722430] ... which became HARDIRQ-irq-unsafe at:
> >> [   13.732319] ...
> >> [   13.732324]   lock_acquire+0xec/0x368
> >> [   13.735985]   _raw_spin_lock+0x60/0x88
> >> [   13.739452]   meson_gpio_irq_domain_alloc+0xcc/0x290
> >> [   13.744392]   irq_domain_alloc_irqs_hierarchy+0x24/0x60
> >> [   13.749586]   __irq_domain_alloc_irqs+0x160/0x2f0
> >> [   13.754254]   irq_create_fwspec_mapping+0x118/0x320
> >> [   13.759073]   irq_create_of_mapping+0x78/0xa0
> >> [   13.763360]   of_irq_get+0x6c/0x80
> >> [   13.766701]   of_mdiobus_register_phy+0x10c/0x238 [of_mdio]
> >> [   13.772227]   of_mdiobus_register+0x158/0x380 [of_mdio]
> >> [   13.777388]   mdio_mux_init+0x180/0x2e8 [mdio_mux]
> >> [   13.782128]   g12a_mdio_mux_probe+0x290/0x398 [mdio_mux_meson_g12a]
> >> [   13.788349]   platform_drv_probe+0x5c/0xb0
> >> [   13.792379]   really_probe+0xe4/0x448
> >> [   13.795979]   driver_probe_device+0xe8/0x140
> >> [   13.800189]   __device_attach_driver+0x94/0x120
> >> [   13.804639]   bus_for_each_drv+0x84/0xd8
> >> [   13.808474]   __device_attach+0xe4/0x168
> >> [   13.812361]   device_initial_probe+0x1c/0x28
> >> [   13.816592]   bus_probe_device+0xa4/0xb0
> >> [   13.820430]   deferred_probe_work_func+0xa8/0x100
> >> [   13.825064]   process_one_work+0x264/0x688
> >> [   13.829088]   worker_thread+0x4c/0x458
> >> [   13.832768]   kthread+0x154/0x158
> >> [   13.836018]   ret_from_fork+0x10/0x18
> >> [   13.839612]
> >> [   13.839612] other info that might help us debug this:
> >> [   13.839612]
> >> [   13.850354]  Possible interrupt unsafe locking scenario:
> >> [   13.850354]
> >> [   13.855720]        CPU0                    CPU1
> >> [   13.858774]        ----                    ----
> >> [   13.863242]   lock(&ctl->lock);
> >> [   13.866330]                                local_irq_disable();
> >> [   13.872233]                                lock(&irq_desc_lock_class);
> >> [   13.878705]                                lock(&ctl->lock);
> >> [   13.884297]   <Interrupt>
> >> [   13.886857]     lock(&irq_desc_lock_class);
> >> [   13.891014]
> >> [   13.891014]  *** DEADLOCK ***
> >> 
> >> The issue can occur when CPU1 is doing something like irq_set_type()
> >> and CPU0 performing an interrupt allocation, for example. Taking
> >> an interrupt (like the one being reconfigured) would lead to a
> >> deadlock.  
> 
> Just to make sure I understand
> * the 1st trace is a CPU getting interrupted while setting the irq type
> * the 2nd trace is another CPU trying to allocate an irq for network PHY.

The traces are only what lockdep sees as a dangerous behaviour, not
necessarily what actually leads to a deadlock. The deadlock scenario is
the one outlined just before "*** DEADLOCK ***", and a way to get there
is my interpretation just above.

> >> 
> >> A solution to this is:
> >> 
> >> - Reorder the locking so that meson_gpio_irq_update_bits takes the lock
> >>   itself at all times, instead of relying on the caller to lock or not,
> >>   hence making the RMW sequence atomic,
> >> 
> >> - Rework the critical section in meson_gpio_irq_request_channel to only
> >>   cover the allocation itself, and let the gpio_irq_sel_pin callback
> >>   deal with its own locking if required,
> >> 
> >> - Take the private spin-lock with interrupts disabled at all times  
> 
> Looks like the only safe path if I understand correctly.
> The patch below looks good to me.
> 
> >> 
> >> Signed-off-by: Marc Zyngier <maz@kernel.org>  
> 
> Thanks for the fix Marc.
> 
> Reviewed-by: Jerome Brunet <jbrunet@baylibre.com>

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

  reply	other threads:[~2020-04-16  9:40 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-07 14:46 [PATCH] irqchip/meson-gpio: Fix HARDIRQ-safe -> HARDIRQ-unsafe lock order Marc Zyngier
2020-04-07 14:46 ` Marc Zyngier
2020-04-14 13:20 ` Marc Zyngier
2020-04-14 13:20   ` Marc Zyngier
2020-04-14 15:52   ` Jerome Brunet
2020-04-14 15:52     ` Jerome Brunet
2020-04-16  9:40     ` Marc Zyngier [this message]
2020-04-16  9:40       ` Marc Zyngier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200416104025.5a22c228@why \
    --to=maz@kernel.org \
    --cc=jason@lakedaemon.net \
    --cc=jbrunet@baylibre.com \
    --cc=khilman@baylibre.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=martin.blumenstingl@googlemail.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.